random technical thoughts from the Nominet technical team

Nameservers and very large zones

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by jay on Jun 2nd, 2008

There comes a point in a zone’s life when it gets too big to be held in memory. For TLDs we really only have .com that has reached this scale, but ENUM zones with this problem are numerous.

So if you want to run an authoritative nameserver for these zones you basically have these options:

  1. Use a DB plugin to an existing nameserver like BIND. However the performance from this is so poor that this is not a reasonable option in most scenarios.
  2. Buy an off-the-shelf nameserver like ANS from Nominum. However this is only a real option for the fantastically rich or those with such a controlled network they can make do with just two nameserver instances.
  3. Use a service provider that runs the zones for you and has their own database back end technology. We use UltraDNS who have this functionality.
  4. Write your own. Of course this is all Nominum and UltraDNS have done and more recently CommunityDNS, so how hard can it be.

Common misconceptions about a database back end

So how exactly do you go about designing the database back end for a nameserver? Well, in my opinion most people start back to front and continue that way.

What gets most people excited about using databases as a back end is three things. Each of which is quite wrong and can be dismissed in turn.

Use the main registration database

If you start using databases then why not have the nameserver run off the main registration database. That is already a fault tolerant cluster and it would mean that the nameserver was always instantly up to date.

This is madness for several reasons. First, no database has 100% uptime but a nameserver cluster should. To be clear, that does not mean every nameserver available 100% of the time, but at least one nameserver can be accessed 100% of the time.

Then there is the issue of zone file serial numbers. Are you going to update the serial number for every single update of the database (yes you have to)? What happens when things go wrong and need to be unwound? What happens if you need to restore from a backup?

There are also the performance issues from the way the data is stored. It is likely to be optimised for the registration system, not the nameserver system that pulls it off.

Finally this limits you to nameservers that are connected to the database by a reliable, fast and secure channel. Maybe possible in a single enterprise but not to be tried across the public Internet.

In essence running it this way is just too brittle and should not be considered. To be honest I don’t think any experienced people would think of that, but I wanted to make sure I covered all the options.

Multi-headed nameserver

The second cause for excitement is the possibility of a multi-headed nameserver. In other words, lots of front ends all dealing with the same database back end (which we assume is separate from the source database). The reasoning for this is that databases are large, expensive beasts, optimised for handling data requests from many clients. Whereas the front ends are much lighter, simpler beasts that are optimised around network processing. So fitting the two together seems to be a natural fit.

It also means that updates are only processed by one machine, the database, not directly impacting the others. Fewer updates need to be sent and there is less chance of inconsistency of the data.

In order to see why this is wrong (in most cases) we need to think about what kind of database do we really need for a nameserver. It turns out that we don’t actually need most of the features found in a modern RDBMS. For example we don’t need views, stored procedures, pluggable indexes etc. All we need is a simple, fast and efficient database, which doesn’t require hardware of a different nature to the front end.

The next point is to examine where the bottlenecks of performance are in that setup. The most obvious one is whenever the front end needs to ask the database for the data rather than use its cache. It starts with the network transfer then the contention for the lookup, it may have to go to disk, then the network transfer between the front end and back end. I would contend that with a multi-headed setup these bottlenecks are much worse than a simple 1-1 configuration of one front end and one database both on the same machine.

Note that one of the bottlenecks is not the processing of updates. The time spent processing updates is about one thousandth the time spent handling requests (or less) and so has no impact of the overall performance unless it is very badly implemented. Having more updates as a result of the 1-1 config is not going to outweigh the benefits.

Database replication

The third cause for excitement is the possibility of using database replication. It brings up a vision of nameservers magically being up to date across a whole cluster without anyone doing anything. After all databases are supposed to be good at replication.

This is the easiest point to rebut. Database replication is general purpose and proprietary. Whereas DNS already has an open standard, tried and tested and DNS specific database replication mechanism – AXFR and IXFR. Yes I hear you say, but using database replication means that you don’t need to serialise the data into DNS and back again and translate between DNS packet structure and the database structure.

Well for a start, the database structure should be very close to the DNS packet structure, not much point in having it any other way. Then of course even database replication has to do the serialisation at some point or another, so is there any thing actually gained by it?

The real killer though is the impact on caches. A sensible nameserver implementation will have multiple levels of data store, with a small, fast, pre-compiled cache first in line to check, then the database next which will aim for an in-memory hit first and finally a disk hit if all else fails. If you allow data to enter the nameserver directly at the database level then you need a mechanism for the database to signal the cache that it may have invalid data, and then have the cache refresh itself.

In my view this is simply too complex and will just slow down performance. Far easier to do away with it and have all updates come in the front door, where the nameserver knows they are and so can update its cache accordingly.

Ideal nameserver design

So putting this altogether gives me an ideal design for the database back end in a nameserver. As you’ve probably guessed, the big emphasis for me has been on performance, but that’s because adding a database is bound to be slower than an in-memory representation and so everything possible needs to be done to compensate for that.

In this design then the database is a lightweight embedded database in every single instance of the nameserver. It is stripped of all unnecessary features and the data tables and indexes are optimised solely for great DNS performance.

All access to this database is through the front end – either IXFR, AXFR, DDNS or a control process. No round-the-back access to the database is possible, it is hermetically sealed.

You may even be lucky and find a database used like this obviates the need for a separate cache.

3 Responses

  1. jason Says:

    Hi Jay,

    Really interesting posting there. I wonder though, at what size of zone it will become more performant to do some sort of partitioning of the data?

    I mean by that if a nameserver actually becomes a constellation of nameservers each serving a portion of the HUGE zone, then the whole zone can reside once more in memory again.

    Of course, you pay a penalty in terms of having to properly direct the traffic, but this partitioning trick is done by large web presences to scale, so may prove more performant than even your preffered method – you will still suffer disk i/o for some requests, while partitioning could eliminate that completely.

  2. davidb Says:

    Just FYI, .com still fits into memory. You just need more memory. And 64 bits. Actually, it is a long way from needing really expensive memory, at least these days.

  3. Prune Says:

    Hi,

    So what is your final solution / implementation ?

    For now, I have a replicated LDAP server with everything inside, includingdomain ownership, modification rights and so on, which is why a “database” (or a directory) is a good backend for data storage.
    Then, a script, on each DNS node, generate the appropriate files.
    Data is in “database”, giving the ease of management and production is in flat files, giving reliability of each dns server node.
    One this is every node can be a “master”, so dns servers does not need to transfert data to each other. They just get updated once and a while using a secured link.

    Of course, this will not solve the problem of a too big single zone file…

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Recent Posts

Highest Rated

Categories

Archives

Meta: