I’m all over the place this week. Tonight I’m at a cool little coffee shop in Providence, RI enjoying an excellent latte and some gingerbread while trying to ignore the woman on the other side of the room with the really annoying laugh. Yesterday Joe put up another great post about VMware and NFS. Articles like these always start some really good discussion amongst the engineers. The topic often comes up about which storage connectivity to use for ESX servers: FC, iSCSI, or NFS. Like almost everything else in IT the answer is the dreaded “it depends”. Different factors come in to play. How important is performance? How do you want to manage your VMs and file systems? How complex is your network configuration? How much or how little do you want to involve your network team?
Performance is always a key driver when working through this decision. But performance at one level may not scale to another level. It’s important to assess and see what level of performance is required and what type of applications are pushing it. In many cases any of the three choices will perform as needed. Many people have seen the “Comparison of Storage Protocol Performance” paper by VMware. The results shouldn’t be too surprising. FC came out in the lead with NFS and iSCSI finishing in a tight race for second. In my opinion, the best information in this paper is the results of the response time tests. They show all three protocols finishing very well and close to each other on small I/O. This is why virtualizing Exchange works so well over iSCSI. It’s a typical transaction based application doing small 4K or 8K I/Os. It’s not hitting the throughput limits of a single GigE and iSCSI has very good response time, assuming the disk back-end can keep up with the required IOPS. NFS is no different. It’s only when we talk about throughput do we really run in to the significant differences.
It’s pretty standard to recommend FC when throughput needs exceed that of a single Gb Ethernet connection. Why is that? Simply put, the IP based storage protocols just do not scale well without a much more complex implementation, and even then it is questionable. This isn’t a negative about iSCSI or NFS, instead the problem lies with some inherit weaknesses in the lower level Ethernet protocols. You increase throughput on GigE by aggregating connections together but as anyone that has worked with it knows, that doesn’t work for some storage needs. In the usual deployment a single ESX server is talking to a single NAS device. That single conversation between devices can not be distributed across multiple GigE connections in an aggregate. It’s only when we start adding complexity through multiple IPs on a single datastore or multiple datastores, each with their own IPs, do we start to see some statistical load balancing. Even then we can’t give a single VM doing disk access more than one connection’s worth of bandwidth without splitting up disks across multiple connections. Who wants to manage something like that? In situations like these I think we are still looking at FC until 10Gb Ethernet and 20Gb Infiniband are more widely adopted. I have personally done testing with ESX running over 20Gb Infiniband and the results are impressive.
What if you’re a smaller shop or consolidating systems with lower performance requirements? Do you already have a fibre channel switching fabric? If so it’s hard to argue against continuing to use it unless there are other mitigating factors. If you do not have a fibre fabric then most likely you’re looking at NFS and iSCSI. We’ve seen that both offer excellent response time performance for most applications and throughput isn’t a big consideration. NFS allows you to manage your VM files just like any other objects on the file system. To a lot of people this is a big benefit. To me, the big benefit to iSCSI is the ability for ESX to control the file system and use VMFS. It just seems to me that in the future we’ll see new features appear that require VMFS. VMware is limited in what they can do with NFS as they do not control that specification. With VMFS they can do anything they want. We’ve already seen hints of this in that Site Recovery Manager did not support NFS at launch. With the exciting new features coming in the next versions of the VI suite I just think you have a safer bet with iSCSI (or FC).
Having said all that…. There are still other questions. How does iSCSI to a file system based NAS device, like NetApp, compare to iSCSI on a block based SAN like a CLARiiON? There is some performance hit when going through a file system layer that you don’t see on a block device, but I don’t have those numbers. Then you have the different types of file systems underneath, such as the Net App WAFL technology. Everyone makes compromises when designing file systems. The key is deciding which one made the compromises that match your requirements and use cases.
Image may be NSFW.
Clik here to view.

Clik here to view.

Clik here to view.

Clik here to view.
