random technical thoughts from the Nominet technical team

Improving copy speeds between servers

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by andyh on Sep 30th, 2008

As an SA I frequently have to copy large files between servers. We have a good network here at Nominet and normally files get copied relatively quickly. When I copying things to remote sites then the copy times start to increase (perfectly normal as we have less bandwidth to remote sites). This isn’t normally a problem, but has been bugging me recently. There is an answer to this - use the “-C” flag of scp to copy the file. Whilst this won’t affect copy times for highly compressed data but it will improve things for non compressed data.

Testing a Sun X4150 server

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 1 out of 5)
Loading ... Loading ...
Posted by andyh on Aug 20th, 2008

We have a Sun X4150 server on trial here so see that it performs as well as the equivalent HP server. The Sun servers claim to be more energy efficient and as we have a CSR policy at work then we really should be trying to reduce the power we use in our server room. The Sun server arrived with Solaris pre-configured, but we wanted to test running CentOS, so a quick PXE boot and kickstart later we had an OS installed. Puppet then configured the server to our standard build. All was going well so far - as you would expect as they are just standard servers inside.

Configuring the lights out management network interface proved a bit trickier. This is done from the bios but the LOM interface is not easy to spot. It is in the server menu and is called AST200 LAN Configuration - not obvious at all but once you know where it is then it’s not a problem. The eLOM can now be upgraded to iLOM which is good but why not just install iLOM on all new servers? Maybe this is an old loan server but you would think that they would make sure it had the latest version of LOM installed.

On our HP servers we use hardware raid, so our test server came with a StorageTek (adaptec) Raid controller so that we were doing a like for like comparison. Initial volume configuration was easy, but monitoring the array proved a bit trickier. I was initially pointed to a GUI to do this. Whilst this looks very good and has the ability to send email alerts it does not fit in very well with our monitoring system which uses scripts run on the remote server. The answer came the X4150 server downloads page. This download contains a linux.zip file which has a StorMan rpm inside it. Installing this gives access to the /usr/StorMan/arcconf command which can be used to get the array status (Initially this failed to run as it was looking for the file libstdc++.so.5. This was provided by the StorMan rpm in /usr/Storman but I just installed the compat-libstdc++-33 rpm instead). A wrapper script around this command means we can monitor the array remotely.

This server does everything that our current servers do and the hardware can be monitored correctly.

Avocent Mergepoint - creating a new SSL Certificate and allowing SSH public key logins

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by andyh on Jun 25th, 2008

We have just purchased a 40 port Avocent Mergepoint device for remote server console access management. This saves us using up valuable switch ports and separates these devices from the main network which results in a much more secure console access lan. This device can be managed through Avocent’s DSView3 software, but we are currently running it as a standalone device for testing. The DSView software will also manage their ACS console servers, presenting us with a single solution for console access whether they use a network or serial method of console access.

It is basically a switch that can run DHCP on its ports, and (Linux flash based) software to access and configure everything. It all sounded great so we deployed it out into the field for further testing at a site that was running out of switch ports. At under £3k it is probably cheaper than an enterprise level switch to manage these devices. Using it we connected up 21 remote servers and freed up 20 valuable switch ports. It has dual power, redundant network connections and a serial port for when all that fails.

All well and good so far. Next thing was to configure the web interface and create a new SSL certificate signed by our Nominet CA. This is where it all started to go wrong. The manual linked to on the Avocent website is wrong in so many ways. Firstly the web interface is completely different. Our Mergepoint came with firmware version 4, but the manuals (linked to from the product page) seem to be a previous version. I can cope with a different GUI, but the instructions for creating the certificate used the command line - and were wrong. They said to use

openssl req -new -nodes -keyout private.key -out public.csr

but of course you also need a config file, so the command should be

openssl req -new -nodes -keyout private.key -out public.csr -config /path/to/openssl.conf

with openssl.conf containing (for example) this:

[ req ]
default_bits            = 1024
default_keyfile         = privkey.pem
distinguished_name     = req_distinguished_name

[ req_distinguished_name ]
countryName                     = Country Name (2 letter code)
countryName_default             = GB

stateOrProvinceName             = State or Province Name (full name)
stateOrProvinceName_default     = Oxfordshire

localityName                    = Oxford
localityName_default            = Oxford

0.organizationName              = Organization Name (eg, company)
0.organizationName_default      = Nominet

organizationalUnitName          = Organizational Unit Name (eg, section)
organizationalUnitName_default  = Tech

emailAddress                    = Email Address
emailAddress_default            = example@nominet.org.uk

commonName                      = Common Name (eg, YOUR name)
commonName_default              = MergePointDeviceName

After that you can use the CSR to created a new SSL certificate. The manual says the certificate should go into /etc/httpd/conf/ssl.key (it actually says use the command “cat cert.cert-/etc/httpd/conf/ssl.crt/server.crt” to do this. Does anyone every proof read manuals these days?). This is wrong and the private key and certificate should actually go into /etc/httpd as server.crt and server.key.

Next you should restart apache. Again the manual is wrong and says to use “daemon.sh restart APACHE”. Wrong - that’s the command that you would have used on an ACS console server. The Mergepoint is much more like standard unix here and a simple

/etc/init.d/apache2 restart

or

apachectl restart

is all that is required. All well and good and your new certificate is now in place and working. However, this is a flash based linux so you’ll need to ensure that these new files get saved to flash or they will be lost at the next reboot. There’s the handy manual that tells you to use the saveconf command (correct for once), but it is incorrect in telling you that all files listed in /etc/config_files get backed up. There is no /etc/config_files file (there is one on an ACS console server which is obviously what the manual was based on). The actual file to edit is backup_list.txt. Add these lines to the end:

/etc/openssl.conf
/etc/httpd

Finally if you want to add users to this device and allow ssh access via public key then add /home to the /backup_list.txt file. The users must be added through the web interface as this also updates a database allowing access to the web interface. Then add the users keys, update backup_list.txt and run saveconf. Optionally edit /etc/ssh/sshd_config - we remove root access and password access as we use non-root key based logins only.

I have voiced my concerns about the poor quality manuals to Avocent so that no-one else has to try and reverse engineer things. They originally said that creating a new CSR was impossible, but have since provided a draft of how to do it which was still missing some of the points above (specifically about getting the files saved to flash). A new firmware version is due out in July and hopefully the manuals will be better this time. They still maintain that public key ssh access is impossible without using the DSView software.

It seems I have done something I have been trying to do for years and achieved the impossible.

Ruby segmentation fault on Solaris Sparc

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Jun 6th, 2008

Recently I’ve been running puppet a *lot* on Solaris 10 (Sparc). I would say consistently around 50% of the time the following happens:

bash-3.00# /usr/local/bin/puppetd -t
notice: Ignoring cache
/opt/ruby/lib/ruby/1.8/timeout.rb:52: [BUG] Segmentation fault
ruby 1.8.6 (2007-09-24) [sparc-solaris2.10]
Abort (core dumped)

Though this has been causing me to get very frustrated with puppet, and indeed it has made the adoption of puppet within the organisation more problematic than it should have been, I realise this is unfair on puppet as the problem seems to be with the ruby jvm. We have not seen this type of behaviour with puppet on Linux or indeed with the Solaris x86 port. Maybe ruby can’t handle the 32 cores of our T2000.

I’m hoping with the release of a later version, either 1.8.7 or the developer release of 1.8.9 that these problems disappear. So a question:

Has anyone else experienced this type of segmentation fault with ruby?

40K signatures / second on fips 140-2 level 3 hardware.

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4.67 out of 5)
Loading ... Loading ...
Posted by roy on Jun 2nd, 2008

Vendors use different terminology to specify the performance of their Hardware Security Modules (HSMs). Regular terms are transactions, exponentiations, encryptions or signatures per second, or microsecond per transaction, exponentiation, etc. Performance statistics that use different units are incomparable. We’re trying to overcome that by using common unit. This post elaborates further on an small application for performance measurement.

Performance depends on algorithm and size of the key. Mostly, 1024-bit RSA private key operations are used, but that is often not specified. Using units like “encryption” or “verification” is biased as well, as both encryption/verification are public key operations (and thus small exponents), which are much faster than “decrypting/signing”. Using “exponentiations” is sometimes used to amplify the statistics. For example, a 1024 bit RSA key implies 512 exponentiations for a single “transaction” (the performance numbers are blown up by a factor of 2^9 …. on paper).

Performance is only comparable when using the same standard measurement unit. Since most vendors use 1024 bit RSA key signatures per second (sig/sec), let’s use that for a performance specification conformance test (or… lets check the marketing on the box).

For this test we’re using a Sun Fire T2000 with 3 SCA6000 cards. The technical specification promises “Up to 13,000 RSA operations per second with 1,024-bit keys”. All three combined should get a nice performance of about 39,000 RSA signatures/second…. in theory.

An often used method to measure performance is the OpenSSL speed test. However, it is not possible to specify keys that are located on the HSM. Also, an engine is needed to let OpenSSL use the pkcs11 interface. The well known OpenSC PKCS11 engine assumes that keys are on the HSM, while the RSA speed test generates its own key causing the speed test to fail. Sun’s PKCS11 engine is fully supported (thanks for Darren J. Moffat for pointing that out, see his comment below), the patches for OpenSSL are not supported by Sun. Lastly, the OpenSSL speed test uses fork/wait/pipe (using the undocumented -multi and -elapsed for proper timing), where we want to use threads (less overhead, no IPC). So it was time to write a small performance test application that uses native PKCS11 calls.

The result of that speed test is a whopping 39353 sig/sec for a 1024 RSA private key. This was verified independently by the unix time utility (for elapsed time) and Solaris kstat utility (for actual hardware transactions).

Or….. signing 7 million records in less than 3 minutes.

hsm-speed implementation notes

Download the hsm-speed package.

Simply creating a loop in which data is signed might not get the desired performance. A single loop performed at about 1600 sig/sec, while the specification promised 13000 sig/sec per card. A single loop (one process thread) did not get enough exposure to fill the bus fast enough. Creating multiple processing threads seems the obvious answer, especially since the T2000 uses an UltraSPARC T1 processor with 32 simultaneous processing threads. The speed-test is made multi threaded (using pthreads for portability, not the Solaris native threads), and gets about 13200 sig/sec on a single card. Note that there is also the option to fork processes, which effectively causes multithreading per forked process. Since forking has more overhead than threading, and threading has more overhead than looping, a straightforward way to maximize performance is to increase the loop iteration until it adds no more speed. Then increase the threads until it adds no more, then increase the forks.

Solaris Cryptographic Framework notes

The Solaris cryptographic framework allows different slot configurations. The “Metaslot” serves as a single virtual slot with all the combined capabilities of all the tokens and slots that have been installed. The “Keystore” slot groups only the crypto hardware together. The order in which multiple calls to C_FindObject returns objects from the metastore is reverse of that of the keystore. Hence, a search for a key without specifying the object class, will on the metaslot return the private key first, and on the keystore slot return the public key first. Effectively, when using the keystore slot, a C_SignInit that returns error “CKR_KEY_TYPE_INCONSISTENT” might be the result of not having specified CKO_PRIVATE_KEY in the search template for C_FindObjectsInit().

Another problem encountered with the Metaslot configuration is that it has a bug in meta_release_slot_session, used by C_CloseAllSessions, causing a nasty segmentation fault when trying to close a certain amount of idle sessions. This is circumvented by closing individual sessions one by one, though that is a tiny bit detrimental on the overall performance.

It is essential that the cards have the same firmware. Exporting the keystore information to another card requires the same firmware on both cards.

The PIN is a combination of the username and password, separated by a colon. When the password requirement for the SCA6000 is set to high, the password must be at least 8 characters long. However, the solaris getpass() call (from stdlib.h) only returns the first 8 characters, thus it leaves no room for the username to be specified. The GNU getpass() (libc) does not have this limitation. To circumvent this issue, use getpassphrase() on solaris. Note that this function is not portable.

Notes on PKCS11

Threads that share a single session might interfere each other between a C_SignInit and a C_Sign call. This will have unpredictable behavior. A thread safe way of sharing sessions is using mutex locks. This will significantly reduce the benefit of using threads. One way to avoid interference without having to use mutex locks is to create one session per thread. Since sessions can safely interleave and interfere, this is a very effective way to guarantee thread safety without locking.

Testing and Debugging your puppet configuration

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on May 2nd, 2008

We have been using puppet, for all new server installs here at nominet for a few months now. The idea of course is to simplify the system administration, making us more agile and able to install a particular server with a particular specification far quicker.

It is also designed to give us repeatability, meaning each server “type” we install should be configured in an identical way with the only differences being the uniquely identifying configuration files, which also are controlled via puppet.

It is a paradigm shift I think, and it takes some while to get up to speed with administering a system via puppet rather than traditional methods. However, there are two techniques for testing and debugging your configuration that I have found invaluable. We are using subversion to provide a repository for all our puppet configuration.

Once I have made a change to the configuration and before I have checked this back into subversion I have found the following very useful to run:

puppetmasterd --parseonly --confdir=/var/home/jason/trunk --debug

The parses my configuration that I have checked out to trunk and if it encounters a syntax error in any of the files gives me a file name and line number so I can go debug the issue.

Next up, is when I run the config on the server, it is useful to use the following:

puppetd --debug --test

Somewhat contradictory, this will actually apply the configuration to the system, but runs puppetd just once (rather than every 30 minutes) and provides copious quantities of output so you can spot if your configuration has actually managed to accomplish what you had intended.

I think when you are first getting to grips with puppet, these options can be really useful

NVidia drivers on Ubuntu 7.10

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by ray on Apr 1st, 2008

One of my colleagues has already blogged about getting IRIS Explorer running on 64-bit Ubuntu 7.10.

A rather trickier problem that had to be solved first was getting X11 working reliably with the official NVidia drivers for his Quadro FX 3500 graphics card.

On first downloading and installing the drivers as per the NVidia instructions everything worked as expected. However on starting the machine up next morning X11 would only start in 800×600 resolution and wouldn’t run in Twinview (aka “Dual Head”) mode. At this point I was called in to help.

Installing the drivers again got X11 working again, but only until the next reboot. Looking at the list of loaded modules (with lsmod) I could see that an NVidia driver was loaded, but there was no explanation for why it wouldn’t work. Eventually, having run the driver installation process yet again again I noticed that according to lsmod the working driver appeared to be taking ~12MB of memory, whereas the non-working driver only used 8MB. Could it be that there were two different drivers?

I then verified (by looking at dmesg output) that there did indeed appear to be two different drivers - the non-working driver was actually being loaded into the kernel at boot-time, before any of the rest of the OS had been started. It seems that the boot image for starting the system has an NVidia driver built-in which is being used to support Ubuntu’s flashy boot screens!

I found that simply unloading the boot-time driver and loading the official driver (rmmod nvidia; modprobe nvidia) was sufficient to get X11 working again.

For now then, the work around is simple - in /etc/rc.local we’ve just put in:
/sbin/rmmod nvidia

This ensures that the boot-time driver is unloaded before any user programs are started. When X11 starts the module loader then automatically loads the correct driver and everything works as expected!

Apache & Shared Memory

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Mar 26th, 2008

We recently had an issue whereby one of our apache servers crashed failing to clean up a shared memory segment. We were using shared memory for SSL session caching:

SSLSessionCache         shm:/var/log/httpd2/gcache(512000)

Which uses a shared memory segment:

-bash-2.05b$ sudo ipcs -a
Password:
IPC status from  as of Tue Mar 25 16:26:29 GMT 2008
T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES QNUM QBYTES LSPID LRPID STIME  RTIME CTIME
Message Queues:
T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
Shared Memory:
m 2304 0x13022 --rw------- root other root other 32 512000 17975 9314 9:01:41 9:26:26 9:01:41
T    ID  KEY  MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME  CTIME
Semaphores:
s 65536 0x1cbd       --ra-------     root     root     root     root   129 no-entry 13:47:45
s       1  0x1001cbd  --ra-------     root     root     root     root   128 no-entry 13:47:45

After Apache had crashed when we attempted a startup we received the following errors:

 Tue Mar 25 08:43:48 2008] [error] Cannot allocate shared memory: (17)File exists 

It was a pretty straightforward fix, all we had to do was the following (as root):

ipcrm -m 2304 

This removes the shared memory segment and Apache could be started quite happily once again.

installing 32-bit libs on 64-bit Ubuntu

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by andyh on Mar 14th, 2008

I’ve been running 64-bit Ubuntu now for years and have occasionally had problems when I wanted to install a 32-bit application. I could use the –forceall option to dpkg to force the installation, but there were often missing 32-bit library files. The install of notes 8 client was a good example of this. There is now a solution thanks to a post on ubuntuforums. There’s a getlibs package that will do exactly what you need it to do - install the 32-bit libs without installing anything else. I came across this when installing skype at home and I must say it should make like much easier when installing the new lotus notes 8.0.1 client that I see is now available!

compiling ruby with ssl support

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by andyh on Mar 14th, 2008

I’ve been making packages for Solaris i386 architecture recently and have been compiling ruby as it’s required by puppet - our excellent configuration management software.

I had no trouble compiling it and making a Solaris package but when I came to use it to install puppet I ran into this

-bash-3.00$ ruby install.rb DESTDIR=${DESTDIR}
Could not load openssl; cannot install

It seems that ruby hadn’t included the openssl files into it’s build.

I found a post that said that openssl wasn’t included in ubuntu by default and it could be installed manually, so I tried it and found out that it could not find a header file:

-bash-3.00$ cd ruby-1.8.6-p114/ext/openssl/
-bash-3.00$ ruby extconf.rb
=== OpenSSL for Ruby configurator ===
=== Checking for system dependent stuff... ===
checking for t_open() in -lnsl... yes
checking for socket() in -lsocket... yes
checking for assert.h... yes
=== Checking for required stuff... ===
checking for openssl/ssl.h... no
=== Checking for required stuff failed. ===
Makefile wasn't created. Fix the errors above.

A quick modify of the CFLAGS variable to add “-I” and a re-run of configure/make and ruby had openssl support built in.

Next »

Recent Posts

Highest Rated

Categories

Archives

Meta: