Those who monitor the availability of .uk nameservers through RIPE NCC’s DNSMON service will have noticed that ns6.nic.uk was offline for over 24 hours last week. This is a much longer outage than I would have liked but it has highlighted some interesting issues.
We manage seven of our nameservers directly. Since a rationalisation effort in 2002 they have all been hosted in other organisation’s networks, usually IXPs or ISPs, and are subject to contracts with specific SLAs. (Prior to this .uk nameservers were hosted by volunteers on a ‘best effort’ basis.) One nameserver, ns6.nic.uk, is different. It is hosted by NikHef in Amsterdam. When we were offered the chance to have a nameserver in Amsterdam we were very keen to take it. The arrangement to host did not include an SLA and was not subject to contract. It was in effect a gentleman’s agreement. We paid a modest sum for rack space and connectivity, but we had no claim over NikHef regarding the availability of the service.
The base level of DNS traffic to the .uk nameservers is pretty low at about 15 Mb/s over all seven that we manage and monitor. However, we do see an increasing number of traffic spikes, with peaks of 400-500% of this base level. During these spikes we see a very large number of queries for non-existent MX records coming from a great number of machines. My assumption is that this is backscatter from a spam storm, with the originating IPs being part of a botnet. I have no proof of this however.
There was a major spike on Thursday 25 September with total traffic levels peaking at over 70 Mb/s. Though six of the nameservers dealt with this surge with no obvious problems, this had a large impact on ns6.nic.uk. Because NikHef is not a supplier of network services the .uk nameserver is hosted within their own network. The surge caused their gateway router to be overwhelmed and within an hour they had downed our connection. I cannot blame them for this, and I would probably sacrifice a ‘guest’ service in similar circumstances. They tried to bring it back with severe rate limiting, but it was still largely unreachable. I began to think we would have a very long outage, even that we would have to relocate the whole nameserver system.
I moaned about the situation to anyone who would listen, and it was my friend Will Hargrave who suggested I buy alternative transit from Goscomb Technologies. They are handily placed as Dan Goscomb is presently based in Amsterdam and has a router in the same datacentre. Dan was able to get ns6.nic.uk back online within 3-4 hours of me speaking to him. The biggest effort required on our side was deleting and recreating the RIPE route-object, as this involved several different maintainers working in concert.
So, we now have all our nameservers back online. The main lesson from this is that we should ensure that all services we take are properly protected by contract. This has been the way we work for many years, but we need to review the arrangements we have as this one clearly fell through the net. As it stands now we still have no contract with NikHef regarding the rack space we occupy. Arrangements like this are just not good enough for a ccTLD registry.