Clone
BIND 9.19 Planning: Discuss new features
Matthijs Mekking edited this page 2021-11-02 14:19:22 +00:00
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Back to agenda: https://gitlab.isc.org/isc-projects/bind9/-/wikis/BIND-9.19-Plan

Session 1

Attendees: Ondrej, Michał K., Michal N., Petr, Aram, Artem, Evan, Matthijs, Mark, Greg, Cathy, Peter.

Configuration system overhaul

There are two things: Configuration file (named.conf) and configuration state (database) which is the current state of all options.

Support has issues with figuring out what is the current state. See options https://gitlab.isc.org/isc-projects/bind9/-/issues/1326 (named-checkconf option to include defaults in output), https://gitlab.isc.org/isc-projects/bind9/-/issues/2798 (named-checkconf -p does not print effective values), https://gitlab.isc.org/isc-projects/bind9/-/issues/1075 (print entire running configuration to a file for troubleshooting, diagnostics, updating).

Other quotes:

  • Turn minimal named.conf into maximal named.conf.
  • Preserve configuration context.
  • Show what currently is running.
  • Show what options can be changed with rndc.

Generally "reinvent sysrepo" task - i.e. implement a configuration API and tools on top of it to query (or even modify?) configuration values at run-time (discussed in 10/13 BIND support as a pre-condition for reporting currently effective values of various options)

Feedback from DHCP team about sysrepo: https://mattermost.isc.org/isc/pl/ce7wzpkgrfrxxpxj5rzwxpno3h

pspacek: If we decide to redesign configuration system, we should think really hard about making it usable. In practice it means applying usability principles to it: https://www.nngroup.com/articles/ten-usability-heuristics/

Greg: From an ex users point of view I think there are three possible sets of config - basing this loosely on the Cisco model:

  1. what the user wrote (that's just named.conf)

  2. what BIND actually did at first startup (similar to Cisco's "show start")

  3. what BIND is actually doing right now (similar to Cisco's "show run")

  4. is the important one.

  5. might not be very different, but certain things can change as named is running.

  6. can be achieved by having a separate configuration context that we don't destroy. Or reuse the existing one and add code so we can query the context.

If we store the configuration it would be easy to query. But it might be tricky to update because of the inheritance property.

This needs more research, but it sounds like we are leaning towards adding (or reusing the existing) configuration context that we should add capability to query. This would at least solve option 2) from above and maybe can later be extended to be queried or be subscribed to, rather than storing the options in structures directly.

later comment from Vicky: it is important to have clear responses back to whatever system is sending the configuration messages that show:

  1. message was received (perhaps and parsed?). this is so that if the config change is unsuccessful the system knows not to just keep resending it.
  2. configuration change was successful, (perhaps echoing the change implemented?)
  3. configuration change was unsuccessful because the thing you are trying to change is not there (e.g. update a non-existant zone or view or similar), or the configuration change is otherwise illegal or inconsistent. very useful to have meaningful error codes here that a machine might be able to interpret usefully.
  4. in case of authorization/authentication error, we should consider carefully what response to send that is both helpful but also not giving away too much to an attacker

In general, although you all may be thinking of doing this manually, in production most larger operators are going to employ some sort of automated process and they will need an interface that is automation-friendly.

Zone Templates

Does this conflict with the configuration context work described above? Probably best to do one thing before the other.

Look at NSD and Knot for examples. For the user it is just another configuration clause (like dnssec-policy for example).

Inheritance rules changes: Check the zone options, then template, then view, then global options.

Assessment: Doable.

Catalog Zones

Aram started working on it.

We should do this in order to be competitive with other vendors.

We should also stay compatible with the old implementation.

HAProxy

A customer also basically want that to put BIND behind a dnsdist instance (despite trying to find ways to solve the problem in different ways)

Artem: I believe that they want PROXYv2 protocol: https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

Session 2

Attendees: Ondrej, Michał K., Michal N., Petr, Aram, Artem, Matthijs, Mark, Greg, Cathy, Vicky.

Improvements to dnssec-policy

We could add these features to dnssec-policy:

  1. Offline KSK
  2. key sharing
  3. RFC 5011
  4. parent-child synchronization (submit DS)

How important are they?

Cathy - request for 2 parallel signers (Shumon's RFC and testing to look at ... https://datatracker.ietf.org/doc/html/rfc8901

RIPE NCC has something like this, someone should ask Anand

Matthijs - It should work, but it isn't tested or documented. There are some known bug reports.

ACTION: Gitlab issue to document multisigner models that are known to work and that we would recommend ...

Also have tests.

Can we think of anything else we could do that would make it easier for the unsigned to start signing? Perhaps some way to do a test, or some confidence-builder?

Mark - problem is resolvers that don't properly handle DNSSEC failure modes (?)

Matthijs - We hardly can't make it easier in the code, except for perhaps "submit DS" parent-child synchronization. Knot has something like this.

Matthijs - possibly more documentation (Vicky thinks this might be more scary than helpful)

Vicky - more documentation, or tools (similar to "endscomp") to verify people's setup.

But folks already use "DNSViz".

Mark - Perhaps scan CDS, doing dns updates with tsig (possible solution for enterprise) Ondrej - logging - I found this CDS, you could add it by doing X Matthijs - but how useful is this feature. It is quite some work and only will be useful for some edge case enterprise scenarios. TLD's wont use this because of the high number of child zones.

Ondrej - .cz had key sharing, but when the CDS was deployed it ruined this solution. Maybe someone could write a KB explaining why CDS is better than key sharing. Petr - some idea related to sharing keys stored in HSM, walked it back because nobody wants to do more work with HSMs

Ondrej - use case for offline ksks = tlds (we have a lot of TLDs running BIND, but aren't aware of any of them asking for this)

In other words, most of these features don't have a high priority, should only be done upon request, with the exception of RFC 5011.

Feature requests from Support

Cathy - statistics and instrumentation around rbtdb and adb. We think there are issues with cache cleaning and locks, but we don't have any statistics or hard data. Like net mgr vs task manager conflicts with 9.16.

Vicky says: we should do the RBTDB refactoring first and have the requirement to instrument it adequately. Sadly, this will not help in the next 2 years, we may be stuck with what we have now in the current releases. Perhaps there is some analysis tool that sweng can provide?

Easy pickings: See - https://gitlab.isc.org/isc-projects/bind9/-/issues?label_name=Customer&milestone_title=BIND+9.19.x&state=all

SWENG tries to pick one or two off these each month.

The main thing Sales would like to see is a more flexible way of using ECS bits, or something like that, for traffic steering and segmentation. This has nothing whatsoever to do with subnets, necessarily, and does not have to be cached (answers don't have to be cached), but there are loads of customers trying to 'service differentiation' by giving more different answers based on who is asking. It is all fine to say that this is evil, but there seems to be a business imperative for a lot of users.

Priorities (from Support)

RBTDB refactoring and better instrumentation around it to make it easier to find out where performance and memory issues are.

Store "startup configuration" and ability to query startup values, including implicit values.