random technical thoughts from the Nominet technical team

Name Server Control Protocol

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by stephen on Jun 27th, 2008

Background
In many ways, name servers are standardised: the format of queries and responses are defined by standards, as are the ways of transferring zone information into and out of them (zone files, dynamic updates, AXFR, IXFR). This is not the case of the commands and files used to configure and control them, which are specific to each type of server.

Having a common means of interacting with servers would stimulate the development of a common management client, so simplifying operations. Although particularly benefiting users of multiple name servers, good management software should make it easier for occasional users to securely configure and manage their systems.

We have been investigating the idea of a common management interface, as have the IETF, who set up the DCOMA (DNS Configuration Management) committee to consider the problem. We have contributed to the DCOMA discussions, the result of which has been the publication of an Internet Draft containing the requirements for the system.

Implementation
Our approach has to define a protocol (NSCP - Name Server Control Protocol) layered on top of NETCONF, an XML-based protocol for the configuration and control of network devices. In NETCONF, a data model is defined for a network device, with configuration commands being framed in terms of it. NSCP defines a generic object model for a name server, and extends the NETCONF command set with name server-specific ones.

Although the long-term aim is to get NSCP understood by name server implementations, a more pragmatic approach is to put the control into server-specific wrappers, so avoiding the need for changes to the server software. The wrapper accepts NSCP commands and, on the basis of them, modifies the configuration file (and zone files), and causes the server to reload the data. The way this operates is shown in the figure below:

NSCP Message Processing

The first step is to create an XML version of the server configuration file. Name server configuration files tend to map into XML quite well as they usually have a hierarchical structure. In our tests, this was accomplished for BIND and NSD by modifying the parser module to emit XML as the configuration file was processed. Although this required a modification to the server software, the modification is a small, localised, change and does not otherwise affect its operation.

Next, an XSL transformation is used to convert the server-specific XML into NSCP, a process that involves mapping server-specific objects and attributes into NSCP objects and attributes. A two-step approach is used to keep changes to the name server software to a minimum: the parser module only needs to create XML that is isomorphic to the configuration file, something that is relatively simple to do. The intelligence needed to convert the representation into an NSCP is held in an (external) XSLT.

Once the NSCP representation of the configuration file is obtained, it can be manipulated with NETCONF commands. “Listing” commands (such as “get-config”) extract the relevant part of the configuration from the representation and send it back to the client. “Modification” commands (e.g. “edit-config”) are applied to the NSCP representation of the configuration file (again using an XSLT) to obtain NSCP describing the updated configuration: this is converted back into the configuration file format. As before, the conversion is a two-stage process to separate out the logic of the object model mapping from the mechanical process of creating the configuration file.

Once the configuration file has been updated, the wrapper forces the server to reload it to apply the changes.

Current Work
A small proof of concept project has shown that this approach is both feasible and practical. Effort is now being put into a pilot implementation.

40K signatures / second on fips 140-2 level 3 hardware.

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4.5 out of 5)
Loading ... Loading ...
Posted by roy on Jun 2nd, 2008

Vendors use different terminology to specify the performance of their Hardware Security Modules (HSMs). Regular terms are transactions, exponentiations, encryptions or signatures per second, or microsecond per transaction, exponentiation, etc. Performance statistics that use different units are incomparable. We’re trying to overcome that by using common unit. This post elaborates further on an small application for performance measurement.

Performance depends on algorithm and size of the key. Mostly, 1024-bit RSA private key operations are used, but that is often not specified. Using units like “encryption” or “verification” is biased as well, as both encryption/verification are public key operations (and thus small exponents), which are much faster than “decrypting/signing”. Using “exponentiations” is sometimes used to amplify the statistics. For example, a 1024 bit RSA key implies 512 exponentiations for a single “transaction” (the performance numbers are blown up by a factor of 2^9 …. on paper).

Performance is only comparable when using the same standard measurement unit. Since most vendors use 1024 bit RSA key signatures per second (sig/sec), let’s use that for a performance specification conformance test (or… lets check the marketing on the box).

For this test we’re using a Sun Fire T2000 with 3 SCA6000 cards. The technical specification promises “Up to 13,000 RSA operations per second with 1,024-bit keys”. All three combined should get a nice performance of about 39,000 RSA signatures/second…. in theory.

An often used method to measure performance is the OpenSSL speed test. However, it is not possible to specify keys that are located on the HSM. Also, an engine is needed to let OpenSSL use the pkcs11 interface. The well known OpenSC PKCS11 engine assumes that keys are on the HSM, while the RSA speed test generates its own key causing the speed test to fail. Sun’s PKCS11 engine is fully supported (thanks for Darren J. Moffat for pointing that out, see his comment below), the patches for OpenSSL are not supported by Sun. Lastly, the OpenSSL speed test uses fork/wait/pipe (using the undocumented -multi and -elapsed for proper timing), where we want to use threads (less overhead, no IPC). So it was time to write a small performance test application that uses native PKCS11 calls.

The result of that speed test is a whopping 39353 sig/sec for a 1024 RSA private key. This was verified independently by the unix time utility (for elapsed time) and Solaris kstat utility (for actual hardware transactions).

Or….. signing 7 million records in less than 3 minutes.

hsm-speed implementation notes

Download the hsm-speed package.

Simply creating a loop in which data is signed might not get the desired performance. A single loop performed at about 1600 sig/sec, while the specification promised 13000 sig/sec per card. A single loop (one process thread) did not get enough exposure to fill the bus fast enough. Creating multiple processing threads seems the obvious answer, especially since the T2000 uses an UltraSPARC T1 processor with 32 simultaneous processing threads. The speed-test is made multi threaded (using pthreads for portability, not the Solaris native threads), and gets about 13200 sig/sec on a single card. Note that there is also the option to fork processes, which effectively causes multithreading per forked process. Since forking has more overhead than threading, and threading has more overhead than looping, a straightforward way to maximize performance is to increase the loop iteration until it adds no more speed. Then increase the threads until it adds no more, then increase the forks.

Solaris Cryptographic Framework notes

The Solaris cryptographic framework allows different slot configurations. The “Metaslot” serves as a single virtual slot with all the combined capabilities of all the tokens and slots that have been installed. The “Keystore” slot groups only the crypto hardware together. The order in which multiple calls to C_FindObject returns objects from the metastore is reverse of that of the keystore. Hence, a search for a key without specifying the object class, will on the metaslot return the private key first, and on the keystore slot return the public key first. Effectively, when using the keystore slot, a C_SignInit that returns error “CKR_KEY_TYPE_INCONSISTENT” might be the result of not having specified CKO_PRIVATE_KEY in the search template for C_FindObjectsInit().

Another problem encountered with the Metaslot configuration is that it has a bug in meta_release_slot_session, used by C_CloseAllSessions, causing a nasty segmentation fault when trying to close a certain amount of idle sessions. This is circumvented by closing individual sessions one by one, though that is a tiny bit detrimental on the overall performance.

It is essential that the cards have the same firmware. Exporting the keystore information to another card requires the same firmware on both cards.

The PIN is a combination of the username and password, separated by a colon. When the password requirement for the SCA6000 is set to high, the password must be at least 8 characters long. However, the solaris getpass() call (from stdlib.h) only returns the first 8 characters, thus it leaves no room for the username to be specified. The GNU getpass() (libc) does not have this limitation. To circumvent this issue, use getpassphrase() on solaris. Note that this function is not portable.

Notes on PKCS11

Threads that share a single session might interfere each other between a C_SignInit and a C_Sign call. This will have unpredictable behavior. A thread safe way of sharing sessions is using mutex locks. This will significantly reduce the benefit of using threads. One way to avoid interference without having to use mutex locks is to create one session per thread. Since sessions can safely interleave and interfere, this is a very effective way to guarantee thread safety without locking.

Nameservers and very large zones

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by jay on Jun 2nd, 2008

There comes a point in a zone’s life when it gets too big to be held in memory. For TLDs we really only have .com that has reached this scale, but ENUM zones with this problem are numerous.

So if you want to run an authoritative nameserver for these zones you basically have these options:

  1. Use a DB plugin to an existing nameserver like BIND. However the performance from this is so poor that this is not a reasonable option in most scenarios.
  2. Buy an off-the-shelf nameserver like ANS from Nominum. However this is only a real option for the fantastically rich or those with such a controlled network they can make do with just two nameserver instances.
  3. Use a service provider that runs the zones for you and has their own database back end technology. We use UltraDNS who have this functionality.
  4. Write your own. Of course this is all Nominum and UltraDNS have done and more recently CommunityDNS, so how hard can it be.

Common misconceptions about a database back end

So how exactly do you go about designing the database back end for a nameserver? Well, in my opinion most people start back to front and continue that way.

What gets most people excited about using databases as a back end is three things. Each of which is quite wrong and can be dismissed in turn.

Use the main registration database

If you start using databases then why not have the nameserver run off the main registration database. That is already a fault tolerant cluster and it would mean that the nameserver was always instantly up to date.

This is madness for several reasons. First, no database has 100% uptime but a nameserver cluster should. To be clear, that does not mean every nameserver available 100% of the time, but at least one nameserver can be accessed 100% of the time.

Then there is the issue of zone file serial numbers. Are you going to update the serial number for every single update of the database (yes you have to)? What happens when things go wrong and need to be unwound? What happens if you need to restore from a backup?

There are also the performance issues from the way the data is stored. It is likely to be optimised for the registration system, not the nameserver system that pulls it off.

Finally this limits you to nameservers that are connected to the database by a reliable, fast and secure channel. Maybe possible in a single enterprise but not to be tried across the public Internet.

In essence running it this way is just too brittle and should not be considered. To be honest I don’t think any experienced people would think of that, but I wanted to make sure I covered all the options.

Multi-headed nameserver

The second cause for excitement is the possibility of a multi-headed nameserver. In other words, lots of front ends all dealing with the same database back end (which we assume is separate from the source database). The reasoning for this is that databases are large, expensive beasts, optimised for handling data requests from many clients. Whereas the front ends are much lighter, simpler beasts that are optimised around network processing. So fitting the two together seems to be a natural fit.

It also means that updates are only processed by one machine, the database, not directly impacting the others. Fewer updates need to be sent and there is less chance of inconsistency of the data.

In order to see why this is wrong (in most cases) we need to think about what kind of database do we really need for a nameserver. It turns out that we don’t actually need most of the features found in a modern RDBMS. For example we don’t need views, stored procedures, pluggable indexes etc. All we need is a simple, fast and efficient database, which doesn’t require hardware of a different nature to the front end.

The next point is to examine where the bottlenecks of performance are in that setup. The most obvious one is whenever the front end needs to ask the database for the data rather than use its cache. It starts with the network transfer then the contention for the lookup, it may have to go to disk, then the network transfer between the front end and back end. I would contend that with a multi-headed setup these bottlenecks are much worse than a simple 1-1 configuration of one front end and one database both on the same machine.

Note that one of the bottlenecks is not the processing of updates. The time spent processing updates is about one thousandth the time spent handling requests (or less) and so has no impact of the overall performance unless it is very badly implemented. Having more updates as a result of the 1-1 config is not going to outweigh the benefits.

Database replication

The third cause for excitement is the possibility of using database replication. It brings up a vision of nameservers magically being up to date across a whole cluster without anyone doing anything. After all databases are supposed to be good at replication.

This is the easiest point to rebut. Database replication is general purpose and proprietary. Whereas DNS already has an open standard, tried and tested and DNS specific database replication mechanism - AXFR and IXFR. Yes I hear you say, but using database replication means that you don’t need to serialise the data into DNS and back again and translate between DNS packet structure and the database structure.

Well for a start, the database structure should be very close to the DNS packet structure, not much point in having it any other way. Then of course even database replication has to do the serialisation at some point or another, so is there any thing actually gained by it?

The real killer though is the impact on caches. A sensible nameserver implementation will have multiple levels of data store, with a small, fast, pre-compiled cache first in line to check, then the database next which will aim for an in-memory hit first and finally a disk hit if all else fails. If you allow data to enter the nameserver directly at the database level then you need a mechanism for the database to signal the cache that it may have invalid data, and then have the cache refresh itself.

In my view this is simply too complex and will just slow down performance. Far easier to do away with it and have all updates come in the front door, where the nameserver knows they are and so can update its cache accordingly.

Ideal nameserver design

So putting this altogether gives me an ideal design for the database back end in a nameserver. As you’ve probably guessed, the big emphasis for me has been on performance, but that’s because adding a database is bound to be slower than an in-memory representation and so everything possible needs to be done to compensate for that.

In this design then the database is a lightweight embedded database in every single instance of the nameserver. It is stripped of all unnecessary features and the data tables and indexes are optimised solely for great DNS performance.

All access to this database is through the front end - either IXFR, AXFR, DDNS or a control process. No round-the-back access to the database is possible, it is hermetically sealed.

You may even be lucky and find a database used like this obviates the need for a separate cache.

Leopard and FileVault won’t work well with Time Machine

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by Al on Oct 31st, 2007

Chatting to a colleague this morning, and it looks like Leopard’s Time Machine just won’t work with Filevault when he tried on his laptop. As Apple state on their Time Machine page in their marketing blurb: “Time Machine: a giant leap backward” …when working with Filevault.

Time Machine monitors your disk drive by checking for changed files on the hour and backing these up incrementally. Filevault works by encrypting and storing the entire contents of your Home folder into a safely encrypted disk image, then reading and writing to that, encrypting and decrypting on the fly.

Because of this, your home directory is essentially a single file as seen by Time Machine, so every time you try to make a change to your Filevault protected home directory, Time Machine tries to backup this whole disk image.

Now as a business user, I can see why Filevault would be used to protect sensitive business data on a laptop in a business environment, but really businesses should have a more robust backup solution should be in place already, rather than depending on a consumer grade solution, and businesses should not really depend on Time Machine as their sole reliable backup solution. Time Machine won’t work reliably across a network (unless to another Mac) anyway, which is what a lot of businesses will be doing backup-wise.

However as a home user, on my machine at home, I can see the benefits of Time Machine, and really running Filevault on my home directory would be pointless, as the amount of RAW image processing I do would seriously be hampered by encrypting/decrypting on the fly, and I have absolutely no need to encrypt my MP3s! At home, most of the document processing I do now is web based anyway, and short of a few applications and music/photos, I have precious little on my home hard drive that really needs encryption, but would benefit from something like Time Machine for occasional file recovery/chance of component failure. At work, I use Filevault on my laptop, and our source code repository for storing code and Lotus notes for storing project related info, so have no need for Time Machine, but Filevault on the other hand is very useful.

Now obviously my particular computer usage will work well with this situation, but for those who store more sensitive documents and want encryption and to use time machine, another solution might well be needed.

The only workaround I can think of is to use the Disk Utility to create an encrypted AES-128 disk image. This is the same technology Apple uses for Filevault. Them while using this, mount it and write files to it, and close it when done. Time Machine will back this up as usual, but as it is storing just the files you want encrypted, it should be a lot snappier, due to much smaller file size. It’s not an ideal situation, but if someone had to use both encryption and Time Machine it might help.

_nicname SRV record

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4.5 out of 5)
Loading ... Loading ...
Posted by jay on Apr 17th, 2007

If you want to find out the WHOIS server for a particular TLD then in many cases you can do it with a simple DNS lookup. Just query for an SRV record for the domain _nicname._tcp.tld, like this:

~ jay$ dig +short _nicname._tcp.uk srv
0 0 43 whois.nic.uk.

The answer tells you that the WHOIS server for .uk is on port 43 (as it should be) of the server whois.nic.uk.

Many other TLDs follow this convention including .au .at .dk .fr .de .hu .ie .li .lu .nl .no .re .si .se and .ch. This list has now expanded to include .us and .biz and other registries are actively considering it. Of course for gTLDs with distributed WHOIS services there may be some problems to be overcome.

However, I hope developers pick up on this and start building it into their code. At the moment most developers tend to use a service like whois-servers.net, but this mechanism means they can get the WHOIS server address directly from the registry and so users should get a better experience.

doc:// - is that too much to ask?

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jay on Nov 21st, 2006

I’ve just invented a new protocol called docdav. Except it only exists as a vision of a future I’d like to see today and I haven’t actually got as far as the technology. Despite this obvious flaw I think it still has much to offer, so here goes:

The internet hasn’t really affected documents in the way it has many other things. We still think of documents as things we author locally, store on file systems in islands of information and share by email or, shudder, ftp. To be clear, what I mean by a document in this context is:

  • A file in an domain-specific/application-specific format, such as a spreadsheet or a mindmap; or
  • A file in a format that gives us full control over the layout of the information as well as the content, such as a word processing document.

But I don’t mean anything to do with the web as we currently use it. HTML/CSS is truly awful for layout and is not going to reach even word processor levels of sophistication for years. The one difference HTML/CSS does have is interactivity, but that’s nothing to with documents, that’s GUI functionality.

So here’s what I want docdav to do:

  • I want browsers, embedded browsers and anything that currently understands http:// to be able to understand doc:// in the same way. For browsers that means retrieving and then rendering the document at that location.
  • I want a docdav server that holds a set of documents and provides the following functionality:
    • Cataloguing, so that I can see an index of all the documents on the server and details of them
    • Categorisation for all documents, with multiple levels and mutliple categories
    • Indexing, so that I can search through all the documents on the server
    • Versioning so that I can retrieve old copies and see changes
    • Access control, so that I can specify exactly who can do what on this server
    • And one I’m not sure about - embedding. Some document formats do this, some don’t but this may not be the best place to try and fix that.
  • For all this functionality I want a simple set of conventions that determine how to use it. For example:
    • doc://domain.tld/ brings up the catalogue
    • doc://domain.tld/mycategory/ brings up a catalogue for that category
    • doc://domain.tld/?term=myterm for searching
    • doc://domain.tld/mydocument.doc retrieves a named MS Word document
    • doc://domain.tld/mydocument.doc?version=1.1 for a specific version of the document
    • and so on, with appropriate i18n considerations of course
  • Finally, I want the server to have a simple verb based protocol like HTTP, which deals with the basic CRUD operations and access control so that I can easily interact with the server

When I’ve got it then I’m going to abolish file systems for end users. We will finally be able to move from network document storage to Internet document storage.

Hopefully docdav will also make Lotus Notes and Sharepoint redundant (or they become docdav compliant). If I think about it, with a combination of email, iCal (caldav), docdav and something like XForms there is nothing I can’t do that I can do with proprietary solutions. Instead of sending someone an email with a document attached, I send a link to a doc:// URI. Okay, so I haven’t worked the XForms bit through properly.

In case you haven’t picked up on it, I’ve called it docdav because webdab/dav is almost there. It could probably be the foundation for docdav in the same way that it is for caldav. So maybe this vision isn’t that far away.

Recent Posts

Highest Rated

Categories

Archives

Meta: