random technical thoughts from the Nominet technical team

IPv4 Heat Maps

1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by roy on Apr 1st, 2008

Last year, Duane Wessels published some maps of the IPv4 address space. These were static maps containing one dimensional IPv4 addresses plotted along a 12th order Hilbert curve, resulting in a two dimensional “heatmap”.

In these maps, consecutive networks are grouped nicely together. However, trying to orientate in the map is hard. This is due to the stringent locality preserving properties of hilbert curves. Since we only need to group networks together that share the same prefix, I looked for a curve that has better orientation at the cost of less locality preserving properties. A Morton curve, or Z-order curve does exactly that.

Every pixel represents a /24 network. To cover the entire IPv4 address space, the size of a map is 4096 x 4096 pixes. If this is plotted in three dimensions, the map would be 256 x 256 x 256 pixels.

I’ve created a proof of concept tool to view these three dimensional IPv4 maps. The input is a file with IPv4 addresses, one per line. There are many controls to orientate the cube, select address blocks, highlight problem parts, change focus, zooming, etc.

A few examples:
example 1
The 3d image above represents open resolvers on the internet. The more open resolvers per /24, the hotter the color.

example 2
The 3d image above shows a small detail of the internet. It shows the open resolvers in 64.0.0.0/8.

Though the data source for the proof of concept was a list of open resolvers, the uses for this vizualizing technique are numerous. The software is available here. Let me know if you’ve created any nice imagery with it.

I gave a presentation about this at the 70th IETF

Stub resolver server selection algorithms

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by alexd on Jul 20th, 2007

Having got the basic querying functionality and configuration sorted in dnsruby, I’m now implementing the high level lookup. I had hoped there would be an RFC to guide me here, but the closest I can find is in RFC1034, section 5.3.3, which is really for nameservers rather than stub resolvers. It says :

1. See if the answer is in local information, and if so return it to the client.

2. Find the best servers to ask.

3. Send them queries until one returns a response.

4. Analyze the response, either:

a. if the response answers the question or contains a name error, cache the data as well as returning it back to the client.

b. if the response contains a better delegation to other servers, cache the delegation information, and go to step 2.

c. if the response shows a CNAME and that is not the answer itself, cache the CNAME, change the SNAME to the canonical name in the CNAME RR and go to step 1.

d. if the response shows a servers failure or other bizarre contents, delete the server from the SLIST and go back to step 3.

In order to figure out the exact algorithm for dnsruby, I thought I’d look at some current open source stub resolver implementations, in dnsjava and Net::DNS / pnet-dns. I also thought I’d check out Ruby’s native resolv.rb (incomplete) DNS implementation.

dnsjava

dnsjava has several layers which perform lookups. The SimpleResolver targets a single nameserver and sends queries to it. No retries are performed in the case of timeouts or other errors.

The ExtendedResolver class has a clutch of SingleResolvers which it uses to fire queries at multiple nameservers. The class implements the retry mechanism in the case of failure. The basic algorithm is as follows :

  • Send the query to the first nameserver (this could be first in a list, or a randomly selected nameserver from the list)
  • If the response is good, or an NXDOMAIN is returned, then return it to the client.
  • If it is a timeout, then retry that server, but also, if it is the first timeout for that server, send the first query to the next server on the list. If it’s not the first timeout, then simply try that nameserver again up to the maximum number of retries
  • If an error is returned as the last response from the last nameserver, then send that to the client

The Lookup class adds another layer of complexity. This class implements a results cache, and performs lookups on names with the search list applied. When a query is made, the cache is checked first. If no records are found, then the ExtendedResolver which belongs to the Lookup is used to query all configured nameservers for the name (with the previous algorithm). The results are then added to the cache. CNAMEs are followed by the Lookup class.

Dnsjava can therefore be seen to follow what specification there is in the RFC.

In the case of totally unresponsive servers, this algorithm may take a total time of (retries*timeout + num_nameservers * timeout). However, if using the dnsjnio asynchronous extension, multiple retrying queries can be run concurrently in a single thread (plus a worker thread to do the I/O).

Net::DNS

The Net::DNS project has a different model. The Resolver class implements both the SingleResolver and ExtendedResolver functionality of dnsjava. It does not implement any form of result cache, and query forms exist to either append or not append the search list to query candidates.

The algorithm, the implementation of which is horrendously long and complicated, is as follows :

  • If the search list is to be applied, then for each element in the list, that element is appended to the query name, and the remaining steps are run for each result. Otherwise, the remaining steps are only run for the queried name
  • retry Retries of the following steps are attempted, and for each round of retries, retrans is doubled
  • If udp_timeout is more than 0, and udp_timeout seconds have elapsed since the query was started, then the query times out to the client
  • The first nameserver is queried, and the response awaited for retrans/num_nameservers seconds.
  • If NXDomain or the result is returned, then that is passed back to the caller
  • If the server returns any other error, then it is removed from the list of nameservers for this query
  • If the query timed out, then the next nameserver is tried.
  • Once all nameservers have been tried once, and all timed out, then this process is repeated up to retry times (see above)
  • If still no response after all of this, then the query times out to the client

If udp_timeout is set, then this is the maximum time that the query can take.

If udp_timeout is set to 0, then the retrans and retry parameters control the query time. In this case, the query could take up to ((2^retry) * retrans) seconds. [It could take an additional num_nameservers * retrans seconds if the nameserver is totally duff]

Ruby resolv.rb

The original Ruby DNS implemenation, resolv.rb, has a slightly different algorithm. This implementation only performs retried lookups, and will apply the searchlist. The algorithm is as follows :

  • All nameservers are tried sequentially. If a valid response is returned then that it is returned to the client. If an error is returned, it is returned to the user and the search stopped
  • Once all nameservers have been tried, and all timed out, then they are all tried again with double the timeout value. This is tried three times (so all nameservers can be tried up to 4 times with increasing timeouts)
  • If this is not successful, then the whole process is repeated with the next “candidate name”, which is formed by appending the next element of the search list to the domain

There is no way to set a per-client-query timeout.
This algorithm is probably the worst of the three - at least with dnsjava, the retries are concurrent once the first round of queries have timed out. However, Net::DNS has the nicest algorithm; splitting the timeout per retry between the number of nameservers limits the worst case scenario, and gives all nameservers a chance to respond. However, if the initial retry interval is set too short by the client, then more queries could be sent than are actually required (especially if several nameservers are targetted).

I think I’m going to go with a combination of the Net::DNS and dnsjava approach (as with so much in this project!), and have the option of :

  • A total timeout for the query
  • A retransmission system that targets the namervers concurrently once the first query round is complete, but in which the total time per query round is split between the number of nameservers targetted for the first round. and total time for query round is doubled for each query round

This system should limit the total number of queries sent in the best case, but allow good response from a poor or worst case.

Of course, unlike Net::DNS, all these queries will be done in the background by a worker thread.

Any comments?

Ruby select

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4.5 out of 5)
Loading ... Loading ...
Posted by alexd on Jul 13th, 2007

I’ve been having some issues with Ruby IO#select() recently, which I think I now have my head around.

My Java non-blocking DNS project, dnsjnio, has a select loop in one thread, which processes all of the IO for all DNS queries. The client thread can wait on a blocking queue for more responses from the select thread. I had hoped to follow this model for Dnsruby. However, there are some differences between the select implementation in Ruby and Java which make things a little more tricky.

In Java, there is a Selector.wakeup() method which a client thread can call to rouse the select thread from its slumber. This means that you can send something (or ask the select thread to send something), and then wake the select thread up to deal with it. Otherwise, nothing would happen until the select() call timed out, or something happened on one of the descriptors it was monitoring - which could be a long time on a quiet system.

Ruby doesn’t have this facililty. So once a thread calls select(), it’s not going to do anything until either the timeout or a descriptor event occurs.

I read a comment by Dave Thomas in which he suggested waking the select thread up with an Exception. Well, I tried that, and it does work, but it doesn’t seem happy under JRuby. OK, it is possible to set up simple test code to get this working in JRuby. However, when you start hitting it hard it just doesn’t seem to work right, and I haven’t had the time to dig around in JRuby to figure out why.

UPDATE - I think this probably has something to do with it. Sounds like JRuby is having problems with Thread.critical=, which was getting used heavily in my multi-threaded stress testing code!

It would be nice to try to get the exception signalling working again in the future - the code is all still there. However, it is necessary to use excessive synchronisation between the threads in order to ensure that the exception is only fired at the time the select call is blocking. Basically, all the processing of the results of select has to be done in a synchronized block, guarded by the same Mutex that guards the exception raising.

Another possibility is to use pipes to signal the select that it is time to wakeup. Whilst this is a hack, it was still worth trying. The problem with it is that (once again) the behaviour or Ruby varies across platforms. Pipes aren’t implemented in Windows MRI, so we can’t use them.

This leaves the only option being to run select with a short timeout. Of course, this leaves us with a thread effectively in polling mode, which is not good. So, once the thread has not serviced any queries for a defined period of time (currently 1 second), it shuts itself down, to be restarted the next time a client query is sent. This leaves us with a worst case not much worse than the standard resolv.rb, and a best case significantly better!

It would be nice if there a non-hacky way to do this. If anyone knows of one, I’d love to hear it!!

Flow control with token bucket algorithm

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by jay on May 27th, 2005

We have a number of applications where we have to limit the access of users, otherwise the systems would collapse under the strain of excessive use. For that we rely heavily on the excellent token bucket algorithm. This has three different ways (at least) in which it can be used.

Rate limiting

The most basic use of token bucket is to rate limit a client to a number of operations within a specific time period. For example you could use it to limit the number of whois lookups to 60 per 60 seconds for a particular client. The way to do this is to assign a bucket of 60 tokens to each client when it makes its initial request. Each time it makes a request a token is removed from the bucket. If there are no tokens left then the request is not carried out and so the client is effectively blocked. Every 60 seconds the bucket is refilled to a maximum of 60 tokens and so request can once again be processed.

Rolling window

There are times when you might not be worried about the rate but you do want to limit a client to a maximum number of operations within a rolling time window. For this example lets say you want to limit a client to 10,000 operations per rolling 24 hours and you are going to recalculate every hour. You start by giving each client the maximum 10,000 tokens when it first connects. Once again a token is removed each time a request is serviced and no tokens means no service. Every hour the total number of requests over the last hour is calculated and stored.

The algorithm to refill the bucket is a bit more complex. The total the number of requests received over the last 23 hours (the 23 is the key) is calculated and this total is then subtracted from 10,000, which leaves the maximum remaining requests that could be made over the next hour. So the number of tokens in the bucket is then increased (or decreased) to that maximum.

Just to make sure you get this, assume you want to limit a client to Q queries over a rolling window of W time and you are going to recalculate every P time units. Then every P you refill the bucket to be equal to Q - (Total queries for previous period of (W - P)).

Rolling window and rate limiting

Combining the two is pretty simple if you want to limit a client to a certain number of queries within a rolling window time period plus limiting the maximum rate at which they can carry out operations. Say for example you wanted to allow a maximum of 10,000 operations per rolling 24 hours, with a maximum rate of 1,000 per hour and you are going to recalculate every hour. This time you give each new client a bucket of 1,000 tokens. Once again a token is removed each time a request is serviced and no tokens means no service. Again, every hour the total number of requests over the last hour is calculated and stored.

The algorithm to refill the bucket works like this. Again the total number of requests over the last 23 hours is calculated and again this is substracted from 10,000. This time though the bucket is filled to either this figure if it is less than 1,000 or to 1,000 otherwise. This effectively limits the rate to a maximum of 1,000 per hour.

Of course in the last two examples if someone does exceed their limit then they could be blocked for up to an hour, which may be far too long for your application. The processing requirements obviously increase as you make the recalculation window shorter.

Recent Posts

Highest Rated

Categories

Archives

Meta: