random technical thoughts from the Nominet technical team

Quick ZFS performance numbers

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Oct 15th, 2007

I have been doing a little bit of playing around with our new Sun X4500 box. I’ve already discussed elsewhere how compelling the price/GB of this box is. I have now had the chance to get some out-the-box performance numbers for running ZFS on the X4500.

First off, I created a zfs pool using a mirror-stripe combination:

zpool create -f testpool  mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0 
         mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0 mirror c0t2d0 c1t2d0 
         mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 mirror c0t3d0 c1t3d0 mirror c4t3d0 c5t3d0 
         mirror c6t3d0 c7t3d0 mirror c0t4d0 c1t4d0 mirror c4t4d0 c6t4d0 mirror c0t5d0 c1t5d0 
         mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 mirror c0t6d0 c1t6d0 mirror c4t6d0 c5t6d0 
         mirror c6t6d0 c7t6d0 mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0 mirror c6t7d0 c7t7d0 
         mirror c7t0d0 c7t4d0

I then created an 8GB test file with the following:

time dd if=/dev/zero of=/testpool/test.dbf bs=8k count=1048576
1048576+0 records in
1048576+0 records out

real    0m15.330s
user    0m0.375s
sys     0m14.941s

This gives a sustained data write transfer of 523MB/s. I also looked at read speed:

time dd if=/testpool/test.dbf of=/dev/null bs=8k
1048576+0 records in
1048576+0 records out

real    0m7.007s
user    0m0.313s
sys     0m6.694s

This gives a sustained read rate of 1145MB/s.
As a simple comparison I created a RAID-Z pool as well:

zpool create -f  testpool  
raidz c0t0d0 c1t0d0 c4t0d0 c6t0d0 c7t0d0 
raidz c1t1d0 c4t1d0 c5t1d0 c6t1d0 c7t1d0 
raidz c0t2d0 c4t2d0 c5t2d0 c6t2d0 c7t2d0 
raidz c0t3d0 c1t3d0 c5t3d0 c6t3d0 c7t3d0 
raidz c0t4d0 c1t4d0 c4t4d0 c6t4d0 c7t4d0 
raidz c0t5d0 c1t5d0 c4t5d0 c5t5d0 c7t5d0 
raidz c0t6d0 c1t6d0 c4t6d0 c5t6d0 c6t6d0 
raidz c0t7d0 c1t7d0 c4t7d0 c6t7d0 c7t7d0 
raidz c0t1d0 c1t2d0 c4t3d0 c6t5d0 c7t6d0

I also tested read and write preformance on this pool:

time dd if=/dev/zero of=/testpool/test.dbf bs=8k count=1048576
1048576+0 records in
1048576+0 records out

real    0m15.107s
user    0m0.381s
sys     0m14.637s

This gives a sustained data write rate of 531MB/s, very similar to the RAID10 performance. The read performance was as follows:

time dd if=/testpool/test.dbf of=/dev/null bs=8k
1048576+0 records in
1048576+0 records out

real    0m6.715s
user    0m0.311s
sys     0m6.404s

Again giving a data transfer rate of 1194 a pretty similiar rate as that achieved with RAID10.

No one is saying these tests in any way model a real world situation, however I would argue they are pretty indicative of maxium possible sustained data transfer rate. It’s interesting to me that RAID-Z and RAID10 performed pretty much identically, not quite what i would have expected, perhaps the write penalty associated with parity calculations would be more apparent with multiple random I/O’s.

The other really interesting thing is the comparison of maxium transfer rate with Fibre Channel. We use a lot of fibre here at nominet for connecting databases to storage, the theoretical maximum transfer rate of 2Gb/s fibre is only around 250MB/s, so even a pair of fibres ain’t touching the X4500. You’d really need to go to dual connected 4Gb/s fibre to start competing on a transfer rate basis. Of course as I said at the start, the X4500 will still win in the price/performance department hands down.

4 Responses

  1. Chris Says:

    Mostly likely, you just determined the max throughput of the dd command (a single process app) running on the X4500 CPU and not the limit of the disk subsytem. Your file is 8GB and the thumper has 16GB of cache by default.

    I’ve seen sustained rates about 50% higher than above on a raidz pool and in some I/O conditions as high as 1.3GB/sec.

    Try iozone for testing filesystem throughput. Or for fun, repeat your test with compression turned on since the data from /dev/null and /dev/zero compresses very well. :)

  2. jason arneil Says:

    Hi Chris,

    I fear you may be correct.

    I am really sold on the X4500, I think the price/performance on them is really fantastic, and whatever numbers you have, it is beating the theoretical maximum throughput on our SAN by a good way.

    I have looked at iozone, I’m not sure the 3D plots were making much sense though, but i may revist it.

    Also I have ran swingbench: http://www.dominicgiles.com/swingbench.html
    on both X4500 and a fibre connected HP box, the X4500 (admittedly with a LOT more spindles) won out by a factor of around 7.

    Thanks for the comment, I think you show you need to think carefully when assessing the performance capability of a system.

  3. Buchan Milne Says:

    You can’t compare local disks to a SAN, really. Unless you’re going to access all 24TB raw on the X4500 locally. If you aren’t going to, you should be comparing the data rates you achieve from the host writing the data in each case.

    I am quite sure any real SAN storage array will beat the pants off the X4500 when writing/reading locally. Adding more HBAs to your SAN-attached HP box with decent path-management software will also improve performance significantly (e.g. Powerpath scales almost linearly with number of HBAs).

    In any tests of throughput, you *must* test with at least an order of magnitude more data than the cache available to the storage device, so you should test at least 160GB, otherwise you are just testing the OS’s disk cache performance.

    No matter what filesystem, the 240 15k SCSI disks we have in our EMC SAN will always outperform the 48 10k (?) SATAs in the X4500. More spindles always wins …

  4. jason Says:

    Hi Bucahn,

    Thanks for reading, and great comment. We too have a large infrastructure investment in EMC storage. I have been impressed with the reliability of this.

    I think the X4500 fits in to a very good niche space. What I mean is, if you have requirements of a large datastore and can live with the cpu limitations of the X4500, then the price/performance comparison with EMC products is extremely compelling.

    A good example is backups We have one of these as our backup server, and we can keep backups on disk for a goodly while.

    It’s interesting what you say about powerpath scalling linearly, but of course you only have limited ports on the actual array itself. For example, we use Clariions and these are limited in the number of fiber ports available to plug into.

    Of course, if you have multiple servers attached via a switched fabric, then they are all going to be competing for the bandwidth.

    I’m sure, 4GB/s connectivity helps, but there is a fundamental limitation there.

    Agreed on the testing, this was not advanced benchmarking!

    240 EMC drives is a LOT of money, not every shop has access to those funds.

    jason.

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Recent Posts

Highest Rated

Categories

Archives

Meta: