random technical thoughts from the Nominet technical team

Stub resolver server selection algorithms

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by alexd on Jul 20th, 2007

Having got the basic querying functionality and configuration sorted in dnsruby, I’m now implementing the high level lookup. I had hoped there would be an RFC to guide me here, but the closest I can find is in RFC1034, section 5.3.3, which is really for nameservers rather than stub resolvers. It says :

1. See if the answer is in local information, and if so return it to the client.

2. Find the best servers to ask.

3. Send them queries until one returns a response.

4. Analyze the response, either:

a. if the response answers the question or contains a name error, cache the data as well as returning it back to the client.

b. if the response contains a better delegation to other servers, cache the delegation information, and go to step 2.

c. if the response shows a CNAME and that is not the answer itself, cache the CNAME, change the SNAME to the canonical name in the CNAME RR and go to step 1.

d. if the response shows a servers failure or other bizarre contents, delete the server from the SLIST and go back to step 3.

In order to figure out the exact algorithm for dnsruby, I thought I’d look at some current open source stub resolver implementations, in dnsjava and Net::DNS / pnet-dns. I also thought I’d check out Ruby’s native resolv.rb (incomplete) DNS implementation.

dnsjava

dnsjava has several layers which perform lookups. The SimpleResolver targets a single nameserver and sends queries to it. No retries are performed in the case of timeouts or other errors.

The ExtendedResolver class has a clutch of SingleResolvers which it uses to fire queries at multiple nameservers. The class implements the retry mechanism in the case of failure. The basic algorithm is as follows :

  • Send the query to the first nameserver (this could be first in a list, or a randomly selected nameserver from the list)
  • If the response is good, or an NXDOMAIN is returned, then return it to the client.
  • If it is a timeout, then retry that server, but also, if it is the first timeout for that server, send the first query to the next server on the list. If it’s not the first timeout, then simply try that nameserver again up to the maximum number of retries
  • If an error is returned as the last response from the last nameserver, then send that to the client

The Lookup class adds another layer of complexity. This class implements a results cache, and performs lookups on names with the search list applied. When a query is made, the cache is checked first. If no records are found, then the ExtendedResolver which belongs to the Lookup is used to query all configured nameservers for the name (with the previous algorithm). The results are then added to the cache. CNAMEs are followed by the Lookup class.

Dnsjava can therefore be seen to follow what specification there is in the RFC.

In the case of totally unresponsive servers, this algorithm may take a total time of (retries*timeout + num_nameservers * timeout). However, if using the dnsjnio asynchronous extension, multiple retrying queries can be run concurrently in a single thread (plus a worker thread to do the I/O).

Net::DNS

The Net::DNS project has a different model. The Resolver class implements both the SingleResolver and ExtendedResolver functionality of dnsjava. It does not implement any form of result cache, and query forms exist to either append or not append the search list to query candidates.

The algorithm, the implementation of which is horrendously long and complicated, is as follows :

  • If the search list is to be applied, then for each element in the list, that element is appended to the query name, and the remaining steps are run for each result. Otherwise, the remaining steps are only run for the queried name
  • retry Retries of the following steps are attempted, and for each round of retries, retrans is doubled
  • If udp_timeout is more than 0, and udp_timeout seconds have elapsed since the query was started, then the query times out to the client
  • The first nameserver is queried, and the response awaited for retrans/num_nameservers seconds.
  • If NXDomain or the result is returned, then that is passed back to the caller
  • If the server returns any other error, then it is removed from the list of nameservers for this query
  • If the query timed out, then the next nameserver is tried.
  • Once all nameservers have been tried once, and all timed out, then this process is repeated up to retry times (see above)
  • If still no response after all of this, then the query times out to the client

If udp_timeout is set, then this is the maximum time that the query can take.

If udp_timeout is set to 0, then the retrans and retry parameters control the query time. In this case, the query could take up to ((2^retry) * retrans) seconds. [It could take an additional num_nameservers * retrans seconds if the nameserver is totally duff]

Ruby resolv.rb

The original Ruby DNS implemenation, resolv.rb, has a slightly different algorithm. This implementation only performs retried lookups, and will apply the searchlist. The algorithm is as follows :

  • All nameservers are tried sequentially. If a valid response is returned then that it is returned to the client. If an error is returned, it is returned to the user and the search stopped
  • Once all nameservers have been tried, and all timed out, then they are all tried again with double the timeout value. This is tried three times (so all nameservers can be tried up to 4 times with increasing timeouts)
  • If this is not successful, then the whole process is repeated with the next “candidate name”, which is formed by appending the next element of the search list to the domain

There is no way to set a per-client-query timeout.
This algorithm is probably the worst of the three - at least with dnsjava, the retries are concurrent once the first round of queries have timed out. However, Net::DNS has the nicest algorithm; splitting the timeout per retry between the number of nameservers limits the worst case scenario, and gives all nameservers a chance to respond. However, if the initial retry interval is set too short by the client, then more queries could be sent than are actually required (especially if several nameservers are targetted).

I think I’m going to go with a combination of the Net::DNS and dnsjava approach (as with so much in this project!), and have the option of :

  • A total timeout for the query
  • A retransmission system that targets the namervers concurrently once the first query round is complete, but in which the total time per query round is split between the number of nameservers targetted for the first round. and total time for query round is doubled for each query round

This system should limit the total number of queries sent in the best case, but allow good response from a poor or worst case.

Of course, unlike Net::DNS, all these queries will be done in the background by a worker thread.

Any comments?

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Recent Posts

Highest Rated

Categories

Archives

Meta: