There has been tremendous innovation in the data storage industry over the past few years. Proprietary, monolithic SAN and NAS solutions are beginning to give way to open-system solutions and distributed architectures. Traditional storage interfaces such as parallel SCSI and Fibre Channel are being challenged by iSCSI (SCSI over TCP/IP), SATA (serial ATA), SAS (serial attached SCSI), and even Infiniband. New filesystem designs and alternatives to NFS and CIFS are enabling high-performance filesharing measured in gigabytes (yes,bytes, notbits) per second. New spindle management techniques are enabling higher-performance and lower-cost disk storage. Meanwhile, a whole new set of efficiency technologies are allowing storage protocols to flow over the WAN with unprecedented performance. This tutorial is a survey of the latest storage networking technologies, with commentary on where and when these technologies are most suitably deployed.
intro
When trying to figure out how fast your storage array is, you must account for I/O processing. Jacob oscillates on whether he thinks NAS is a good idea or not. His general rule of thumb is that if you have a non-trivial file serving challenge, and someone wraps up everything you need and abstracts it in a nice shiny device, then go for it.NAS set backups back 10 years; NDMP moved it forward 5 years.SNIA is trying to be vendor-neutral, but this can result in a least-common-denominator approach. Jacob refers to this as vendor-neuter. Fibre Channel is just SCSI in a star topology instead of a bus topology. Storage virtualization is simply redirecting physical addresses to virtual addresses. Arguably, almost all storage isvirtualized(e.g., RAID virtualizes). So, Jacob reserves the term virtualized storage for techniques that are new, novel, and/or clever. SCSI is a channel protocol; it assumes the wire takes care of reliable transmission. When the wire drops the ball, SCSI breaks. The real motivation for SATA was to make it easy for big PC vendors to plug hard drives into PC motherboards. All of the other features came later, after PC vendors had gotten the ball rolling. Fibre channel is a pain to scale. Right now: if you're building on your own, stick with fibre channel. Don't go with a SAS-based solution unless it's inside an encapsulated system supplied and supported by a big-name vendor, and you test the bejeezus out of it. The difference between enteprise hard drives and workstation hard drives is blurring; i.e., near line storage drives. According to Jacob, when you have your data sitting in a collection of SATA drives for long periods of time, you'll find that small corruptions occur over time. This is because of differences in the error rate. Tolerance for rotational virbration is also an important factor. When Seagate calculates tolerance for rotational vibration, as long as the drive is within tolerance 80% of the time, it is considered to be within spec. Jacob finds this alarming, because it means a drive could be spending up to 20% of its time reseeking, and still be considered to be within spec. 7200RPM drives are build significantly differently than 15K RPM drives. Most people say they'll just do iSCSI in software on their fileservers, but but SNIC cards for their big database servers. In reality, this is backwards: database servers generally don't pump that much actual data over the wire, while people tend to underestimate the amount of data that their fileservers pump over the network. Jacob likes to describe Infiniband as a Vulcan mind meld for your PC. He expounds:Right now, I'm vibrating air with my vocal cords, which is being picked up my little bones in your ear, which is translated into electrical impulses, which is then interpreted by your brain. If you think about it, this communication protocol has a lot of overhead. If I could just core dump directly into your brain, we could've been out of here hours ago.The reason why Jacob thinks iSCSI is a win over fibre channel is because you can take the money you would've spent on fibre channel devices and instead buy more disk spindles. Latency is iSCSI typically comes in two places: smaller MTUs, and software I/O processing. But the real latency you see is in the hard drives; latency of the wire tends to pale in comparison (unless your software I/O processing really sucks). One of Jacob's customers is heavily invested in a SAN solution from a three-letter vendor. They told him that they'd love to look at simpler SAN technology, but they've invested $100K in training their staff to understand the SAN. Jacob replied that that's their problem: they have a SAN that they have to spend $100K on training in order to figure out how to use it. People *will* use Microsoft Sharepoint as a fileserver. You can go from megabytes to gigabytes to terbytes with frightening ease.
NetApp was the company that came up with the alternative to copy-on-write for snapshots. Multiple other SAN vendors implement it now. Snapshotting can be used if you have a mostly-identical volume partitioned over many hosts; just make a snapshot for each host. Dell does not think highly of PowerVaults. (At least, PowerVaults as of a few years ago.) All of the small vendors that were pushing the envelope with storage virtualization were bought up by top-tier companies. Either neat things are on the way, or some VPs made some bad acquisitions... Jacob believes that at least for now, you need to learn thoroughly understand the virtual storage system you're building in order to run it properly, which is why he favors buying the software and assembling the hardware yourself over buying a black box. Jacob normally makes the argument that two cheapo systems provide better redundancy than a single fault-tolerant system. Jacob doesn't like block-level virtualization in the switch—switches are alreadytoo complicated, and virtualization offers more advantages with the logic resides closer to the disks. The latest compression appliances can get throughput on the order of 500MB/s, not 200MB/s. WAN accelerators work because WAN throughput is constrained more by latency than by bandwidth. WAFS gateways tend to simpler (usually just caching). Montillio makes the NFS/CIFS Hardware Accelerator Card. Every time Jacob turns around, someone is innovating in the storage acceleration space.
Gear 6 makes an accelerator that uses solid state storage; it's massively optimized for I/O. It accelerates only reads (not writes), so it doesn't boost write performance, but it boosts read performance by insane amounts.
Check out their paper Server Virtualization: Avoiding The I/O Trap.
There were only a few questions, mostly related to specific projects some of the attendees were working on.
Alas, I really don't know enough about our SAN project to pick Jacob's brain for advice.
You can go to the index of my Usenix notes.