Table of Contents
Attendees: Ondrej, Vicky, Michał K., Michal N., Petr, Aram, Artem, Evan, Matthijs
Back to agenda: https://gitlab.isc.org/isc-projects/bind9/-/wikis/BIND-9.19-Plan
What did we learn from the netmgr/dispatch planning, design and execution, what should we do differently 'next time'?
Ondrej - we need more people engaged in such a big project. One person working alone can get stuck. Evan +1 was good ping-ponging with Witold but after he left ... Ondrej is much busier. (so, we need at least 2 fte ppl, and not including Ondrej)
Evan - It is really nice to hand something off at the end of the day and find it has progressed when you pick it back up again the next day.
Surprise to Support and Vicky that only half of the netmgr refactoring has been done. at the end of 2020. hard to tell how much % of the project was done. If there is any way to tell...
Use Witold's complexity factor, it can determine if the code has become actually less complex.
Have structure, milestones. Evan - we had these things. But the hard stuff was left at the end.
If possible, it might be good to have someone thinking about tests, or something like tests, along the way. This might not be an actual test, but side tasks like, - check to see if the xyz will still integrate with the wip.
- Ondrej - use cmocka interface more
- Artem - try to write down the state machine, facilitates writing tests late
Some sort of design document, even if it is only a list of things to do, and a list of things to be worried about or careful of, would enable more people to review.
Takeaways
lessons from dispatch work
- more than 2 people working together
- create comprehensive tests in parallel (or before start!)
- simplify current code before refactoring
some discussion about the utility of state machines, thinking in terms of state machines, and Artem described some sort of black box test thing that sounded like it might be a helpful test harness.
Discussing RBTDB
The meeting then railway-ed into discussing RBTDB refactoring:
Vicky - I really love the idea of parallel testing to see if the same question gets the same answer on the old rbtdb vs the new db. As I understand it, setting this kind of thing up can be a pretty big task.
How can we tell if there is a significant performance impact before it is done? is that important to look at?
Why refactor RBTDB first? RBTDB is complex, old, hard to understand.
Tony Finch to help. He has much experience with search optimized tree structures, particulary qp-trie.
Vicky - should we also work on an alternative to Tony's thing in parallel?
Artem - that is too big a project, unfeasible
Evan - we need a good unit test suite that validates all the inputs and outputs to the db. this will be useful in itself, but will also allow us to do some validation on the new db even before it is plumbed in.
Petr- Tony's new db was already implemented into NSD and the benchmarks are pretty good. https://dotat.at/prog/qp/README.html
Artem- have we considered any new tooling around dns db.
Ondrej - it will need a code style update (camelcased) but otherwise, won't need a lot of work to integrate. Look at wildcards, ENTs, ECS, Aggressive NSEC, NSEC3, Views.
Evan - all of the ECS code is a block that is ported from one branch to another. Should we port ECS support into the main db (just keep the config in -S), to minimize the difference between S and main.
Ondrej - fully on board Vicky ++,
Michal - would be good to minimize effort required for rebasing, improve long term maintainability.
Vicky - would like us to attack this early on in the new branch, rather than leaving to later
Evan - there are two databases, the zone db and the cache db. we should separate this into two more separate things. (a goal)
Vicky - we could use a system test suite ala the resolver test suite that Petr did, that was so helpful for the netmgr refactoring and DoH, something that exercises all the things that are updating and querying the db at once.
Evan - you should be able to write a unit test suite that would validate correctness is significant but doable. Making sure that it is still correct while cache cleaning is going on is harder.
Ondrej - there is a lot of research on concurrency, but much of it is impractical.
Michal - we can live with a performance problem but not with a correctness problem.