Using NexentaStor ZFS storage appliance with vSphere

NexentaStor is an storage appliance based on OpenSolaris kernel and its advanced ZFS file system. Latest free Community Edition comes with goodies like online deduplication, transparent compression and snapshots to name few. NexentaStor can share storage over NFS, CIFS, iSCSI, FTP, RSYNC, WebDAV and even FC. On this article I am concentrating using NexentaStor NFS shares with VMware vSphere.

NexentaStor is available as commercially supported enterprise version and as community supported free to use Community Edition. Community Edition is limited to 12 terabytes of used storage, it is also lacking some optional modules available on enterprise version.

I downloaded NexentaStor Community Edition 3.0.3 and installed it on PC I built of some left over components I had at hand. My finished storage appliance consist of

  • AMD Athlon 4850e dual core CPU
  • Asus M3A78 Pro motherboard
  • 4 GB DDR2 RAM
  • 1 x 80GB Seagate 7200.10 ATA disk as boot drive
  • 5 x 500GB Seagate 7200.11 SATA disks as ZFS pool
  • Intel X25-M G2 as flash cache
  • Intel  PRO/1000 PT Quad Port Ethernet adapter

Above system fits to NexentaStor hardware recommendations like a glove, 64-bit CPU, 4GB RAM, dedicated spindles for ZFS pool and dedicated boot drive.

Although Community Edition is free, it requires registration key which is unique for each hardware appliance is installed on. You get registration key in email by submitting machine signature to Nexenta, machine signature is generated and presented during appliance installation. I got mine registration key in email within a minute of submitting signature, which was nice.

NexentaStor recognized all of my storage system components correctly and installation was finished 15 minutes later. Once registration was active I was queried of management IP and WebGUI port settings, those set I launched web browser and pointed it to my new storage appliance.

Configuration wizard

First step at WebGUI is to set attributes such as host and domain name, NTP server and localization.

Second step is network configuration,

including iSCSI target settings.

In third and fourth step you create your ZFS volume, you may also configure any available SSD drives as secondary read cache (L2ARC), or as ZFS intent log device (ZIL) which is used for writes

SSD as secondary cache

Employing fast SSDs as secondary disk cache is a really neat feature. SSDs are much larger in sizes and cheaper per gigabyte than RAM and they still provide incredible amount of IOPS, for example even consumer grade SSD Intel X25-M is rated up to 35,000 IOPS on 4k reads! You need approximately 175 15kRPM spindles to deliver that amount of IOPS!

Folders

Now that you have your ZFS volume created, you are asked to create folders used as network shares, I created single folder named “nfs”. You can also set ZFS attributes such as deduplication and compression

NFS share settings

Once you save and review your configuration you are finished with configuration wizard. Now select “Data Management” from top menu, and in drop down list select “Shares

If you did not yet enable NFS sharing you may do it now, then click “Configure” in “NFS Server” box on left side of screen

Set Server and Client Version to 3, it’s the version vSphere is using and it is recommended by Nexenta to limit NFS version to 3 when using with vSphere, click Save

Go back to Shares view and click folder link

In this view you see mount point in which folder is available to NFS client, make note of it

If you scroll down the view there is options like deduplication, checksum, compression and access time. Checksum is ZFS feature for detecting data corruption, it works only if you are using RAID-Z or ZFS mirroring, if you use hardware RAID you should disable this since it has small CPU overhead. You might disable access time as well, it is not needed with vSphere datastores and it has small disk IO overhead. You can enable or disable deduplication and compression any time you like, new setting will affect only new data written to ZFS.

Mounting NFS share to vSphere

I assume that you have necessary VMkernel port set at ESX so I won’t go into details with that. Open vSphere Client and mount NAS datastore, folder path is mountpoint you saw at folder settings view

And you have NexentaStor NFS share mounted to vSphere

NexentaStor and ESX 4.0 both support jumbo frames so you can improve NFS performance by configuring MTU up to 9000k or what is maximum supported by your NICs and/or switches.

Performance

I configured my ZFS volume as striped RAID-0 (since this is purely experimental, no valuable data stored in here), I have ZIL enabled and stored at ZFS pool, Intel X25-M SSD as L2ARC (flash cache) device. Compression and deduplication are off.

Raw throughput of NexentaStor NAS is quite good, below is an results of large file write with 8k blocks within CentOS Linux VM

# dd if=/dev/zero of=/sdb/test bs=8k count=838860
838860+0 records in
838860+0 records out
6871941120 bytes (6.9 GB) copied, 58.3875 seconds, 118 MB/s

and same file read in 8k blocks

# dd if=/sdb/test of=/dev/null bs=8k
838860+0 records in
838860+0 records out
6871941120 bytes (6.9 GB) copied, 79.2254 seconds, 86.7 MB/s

By looking read performance there might be something wrong with my L2ARC setup as reads are slower than writes, running iostat in storage appliance also reveals that L2ARC device is not read that much. Write performance could be improved a quite lot (to point of GigE saturation at least) by assigning second SSD drive as ZIL device, or I could just disable ZIL altogether but lose on data integrity. Well, I think I’ll investigate that on some another day.

7 comments to Using NexentaStor ZFS storage appliance with vSphere

  • Thanks a lot thakala for this great post. I was looking for a strong storage solution that support deduplication since freenas neither openfiler support it.

  • Matt Van Mater

    I have wanted to experiment with the exact same Nexentia store/ESXi technology and use-case, but unfortunately I don’t have a SSD to use. Thanks for saving me some work!

    One thing I wanted to mention is your closing comments about performance. My understanding is that it takes a while for ZIL and L2ARC to populate, so it makes sense that you would not see an immediate performance benefit. After all they are performing a cache function, and a cache is only useful the second time you try to transfer data (or on other subsequent tries). I have seen several mentions of letting the cache “warm up”, but am not sure how quickly that occurs.

    With that in mind, have you had a chance to experiment with more performance testing scenarios that mimic real world usage? e.g booting multiple VMs that are nearly identical, once using the SSD based L2ARC/ZIL and once without? Same thing goes for file reads/writes like what you did in your example, but over a longer time frame and with/without the SSD based disks as cache?

    One thing I have done in similar circumstances is to use an Ubuntu based VM and then read/write data between the /dev/shm mountpoint and your virtual hard disk. This should allow you to better isolate what the Nexentia based storage system bottlenecks are.

    FYI /dev/shm is a on-demand allocated ramdisk equal to half the size of the RAM you allocate to the virtual machine. Similarly you can use tmpfs to create a ramdisk (allocated manually, not on demand) that has even better performance than /dev/shm. Check out this link I found for a good writeup: http://kevin.vanzonneveld.net/techblog/article/create_turbocharged_storage_using_tmpfs/

    I’ve tried the Nexentia Store community edition and I think it needs a little polishing for use-cases like this. I’m a big fan of FreeNAS and I have my fingers crossed that the upcoming v0.8 of FreeNAS is going to support the newer versions of ZFS that allow use of L2ARC, ZIL, dedup, etc.

  • Matt’s correct about the L2ARC – it takes a while to populate. The comment about the ZIL, on the other hand, is way off. The ZIL – ZFS intent log – is a synchronous commit “cache” that does just what you’d think it does: make sure committed writes are actually on non-volatile storage.

    In the case of the L2ARC, the “warm-up” period is not that rapid ~ 18MB/sec. MFU and MRU items in the ARC (main memory cache) are committed to the L2ARC as ARC demands dictate. Since this is a block-based cache, block frequency and recency are the deciding factors.

    With the ZIL – which ALWAYS exists unless you explicitly (and dangerously) disable it – all writes that are synchronous get committed there immediately. In the absence of a synchronous logging device (i.e. SLOG or ZIL accelerator), these are first committed to the pool to conform with non-volatile requirements (hard commit) and then re-committed to the pool along with the non-synchronous components of the associated transaction groups.

    It is a common misunderstanding, but the ZIL never gets “read” (from disk or cache) but always reads from the ARC. In case of a checksum failure from ARC data (oops!), the ZIL data would be re-read from disk and reconstructed (if possible, as necessary) to then commit with the TX group. If that is happening, your architecture is flawed :)

    A note on disabling ZFS checksum: don’t. For that matter, don’t use hardware RAID unless you have a LUN aggregation issue that compels it. The ZFS checksum is a key factor in data integrity and protects you from silent data corruption. Yes there is overhead, but your data is usually worth it. Plus it’s better that having to fall-back to application based data integrity…

  • Turner Foster

    I’m late to this post, but a big fan of Nexenta. Wanted to point out that turning dedup off on a populated volume is an incredibly expensive operation. We made the mistake of doing this on a 2TB volume and effectively shut our entire SAN down for 4 days while it processed all the transactions needed to unwind the duplication. Much better to simply destroy the volume and create a new one with dedup turned off rather than undo. Those 4 days were terrible, all the other LUNs dropped offline and the system was useless until it finished.

  • Paul Traue

    While being exceptionally late to this party as well, the write performance is approaching the theoretical limits of 1Gb/s ethernet. The speed given is 118 MB/s (Megabytes per second). This translates to 944 Mb/s (Megabit per second) which is exceptional in and of itself.

  • Chakravarthi PS

    Excellent post… I recently started using Nexenta and see the benefit. My setup similar to above cited configuration but without cache drive.

    Processor Intel Core2Duo
    Memory 4 GB
    SCSI Controller : LSI 8888ELP with 512 MB cache with BBU
    HDD : WD5000AAKS (500 GB) x 4 (2 HW RAID0 (2 x 2) ) (Total two data volumes)
    HDD : WD 80 GB for boot
    NIC : Intel MT 1000 PCI NIC

    I am using iSCSI amd created two VOLs on 1st RAID0.. If I am starting / shudown more than 3 VMs at a given time, the performance is not so good. I configured MTU 9000 on vSwitch, VMK and in Nexenta.

    If I add a SSD as cache would it help me in performance improvement ?

    iSCSI or NFS which is best option ?

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>