Quick ZFS performance numbers
I have been doing a little bit of playing around with our new Sun X4500 box. I’ve already discussed elsewhere how compelling the price/GB of this box is. I have now had the chance to get some out-the-box performance numbers for running ZFS on the X4500.
First off, I created a zfs pool using a mirror-stripe combination:
zpool create -f testpool mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0
mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0 mirror c0t2d0 c1t2d0
mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 mirror c0t3d0 c1t3d0 mirror c4t3d0 c5t3d0
mirror c6t3d0 c7t3d0 mirror c0t4d0 c1t4d0 mirror c4t4d0 c6t4d0 mirror c0t5d0 c1t5d0
mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 mirror c0t6d0 c1t6d0 mirror c4t6d0 c5t6d0
mirror c6t6d0 c7t6d0 mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0 mirror c6t7d0 c7t7d0
mirror c7t0d0 c7t4d0
I then created an 8GB test file with the following:
time dd if=/dev/zero of=/testpool/test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real 0m15.330s user 0m0.375s sys 0m14.941s
This gives a sustained data write transfer of 523MB/s. I also looked at read speed:
time dd if=/testpool/test.dbf of=/dev/null bs=8k 1048576+0 records in 1048576+0 records out real 0m7.007s user 0m0.313s sys 0m6.694s
This gives a sustained read rate of 1145MB/s.
As a simple comparison I created a RAID-Z pool as well:
zpool create -f testpool raidz c0t0d0 c1t0d0 c4t0d0 c6t0d0 c7t0d0 raidz c1t1d0 c4t1d0 c5t1d0 c6t1d0 c7t1d0 raidz c0t2d0 c4t2d0 c5t2d0 c6t2d0 c7t2d0 raidz c0t3d0 c1t3d0 c5t3d0 c6t3d0 c7t3d0 raidz c0t4d0 c1t4d0 c4t4d0 c6t4d0 c7t4d0 raidz c0t5d0 c1t5d0 c4t5d0 c5t5d0 c7t5d0 raidz c0t6d0 c1t6d0 c4t6d0 c5t6d0 c6t6d0 raidz c0t7d0 c1t7d0 c4t7d0 c6t7d0 c7t7d0 raidz c0t1d0 c1t2d0 c4t3d0 c6t5d0 c7t6d0
I also tested read and write preformance on this pool:
time dd if=/dev/zero of=/testpool/test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real 0m15.107s user 0m0.381s sys 0m14.637s
This gives a sustained data write rate of 531MB/s, very similar to the RAID10 performance. The read performance was as follows:
time dd if=/testpool/test.dbf of=/dev/null bs=8k 1048576+0 records in 1048576+0 records out real 0m6.715s user 0m0.311s sys 0m6.404s
Again giving a data transfer rate of 1194 a pretty similiar rate as that achieved with RAID10.
No one is saying these tests in any way model a real world situation, however I would argue they are pretty indicative of maxium possible sustained data transfer rate. It’s interesting to me that RAID-Z and RAID10 performed pretty much identically, not quite what i would have expected, perhaps the write penalty associated with parity calculations would be more apparent with multiple random I/O’s.
The other really interesting thing is the comparison of maxium transfer rate with Fibre Channel. We use a lot of fibre here at nominet for connecting databases to storage, the theoretical maximum transfer rate of 2Gb/s fibre is only around 250MB/s, so even a pair of fibres ain’t touching the X4500. You’d really need to go to dual connected 4Gb/s fibre to start competing on a transfer rate basis. Of course as I said at the start, the X4500 will still win in the price/performance department hands down.


October 31st, 2007 at 10:43 pm
Mostly likely, you just determined the max throughput of the dd command (a single process app) running on the X4500 CPU and not the limit of the disk subsytem. Your file is 8GB and the thumper has 16GB of cache by default.
I’ve seen sustained rates about 50% higher than above on a raidz pool and in some I/O conditions as high as 1.3GB/sec.
Try iozone for testing filesystem throughput. Or for fun, repeat your test with compression turned on since the data from /dev/null and /dev/zero compresses very well. :)
November 1st, 2007 at 9:03 am
Hi Chris,
I fear you may be correct.
I am really sold on the X4500, I think the price/performance on them is really fantastic, and whatever numbers you have, it is beating the theoretical maximum throughput on our SAN by a good way.
I have looked at iozone, I’m not sure the 3D plots were making much sense though, but i may revist it.
Also I have ran swingbench: http://www.dominicgiles.com/swingbench.html
on both X4500 and a fibre connected HP box, the X4500 (admittedly with a LOT more spindles) won out by a factor of around 7.
Thanks for the comment, I think you show you need to think carefully when assessing the performance capability of a system.
February 29th, 2008 at 5:36 pm
You can’t compare local disks to a SAN, really. Unless you’re going to access all 24TB raw on the X4500 locally. If you aren’t going to, you should be comparing the data rates you achieve from the host writing the data in each case.
I am quite sure any real SAN storage array will beat the pants off the X4500 when writing/reading locally. Adding more HBAs to your SAN-attached HP box with decent path-management software will also improve performance significantly (e.g. Powerpath scales almost linearly with number of HBAs).
In any tests of throughput, you *must* test with at least an order of magnitude more data than the cache available to the storage device, so you should test at least 160GB, otherwise you are just testing the OS’s disk cache performance.
No matter what filesystem, the 240 15k SCSI disks we have in our EMC SAN will always outperform the 48 10k (?) SATAs in the X4500. More spindles always wins …
March 3rd, 2008 at 9:00 am
Hi Bucahn,
Thanks for reading, and great comment. We too have a large infrastructure investment in EMC storage. I have been impressed with the reliability of this.
I think the X4500 fits in to a very good niche space. What I mean is, if you have requirements of a large datastore and can live with the cpu limitations of the X4500, then the price/performance comparison with EMC products is extremely compelling.
A good example is backups We have one of these as our backup server, and we can keep backups on disk for a goodly while.
It’s interesting what you say about powerpath scalling linearly, but of course you only have limited ports on the actual array itself. For example, we use Clariions and these are limited in the number of fiber ports available to plug into.
Of course, if you have multiple servers attached via a switched fabric, then they are all going to be competing for the bandwidth.
I’m sure, 4GB/s connectivity helps, but there is a fundamental limitation there.
Agreed on the testing, this was not advanced benchmarking!
240 EMC drives is a LOT of money, not every shop has access to those funds.
jason.