Let's face it: distributed systems design is tough. And scale-out storage is no exception. Look "under the hood" of any clustered storage system
these days and you'll find some very interesting design choices.
But does cluster design matter?
Is it best to create a shared pool or virtualize storage? How does data layout impact multiple workloads? Can data parity enable both availability and efficiency? Let's explore these questions.ABSTRACTION
Just as the server world has gone through virtualization, NetApp has introduced virtualization for storage. This is accomplished through clustered Data ONTAP Storage Virtual Machines (formally called, "Vservers").
A Storage Virtual Machine (SVM) is a secure, virtualized storage container that includes its own security, IP addresses, and namespace. It can include volumes residing on any node in the cluster. Furthermore, clustered Data ONTAP supports one to hundreds of SVMs in a single cluster.
Storage admins also need to restrict who has access to what data; the good news here is that multi-tenancy is natively built into clustered Data ONTAP with SVMs.
So what's wrong with a single shared pool of storage?
In such a shared pool, you are forcing a file to be "owned" by several nodes (verses a "single brain" approach like NetApp). The side effects of this design may mean that random I/O is problematic and may require extremely large amounts of bandwidth. This is not
optimal design.DATA LAYOUT
Another NetApp innovation is WAFL (Write Anywhere File Layout), which abstracts the data access layer from the storage layer.
Small files on clustered Data ONTAP are also stored very efficiently. In fact, if a file is less than 64 bytes, it doesn’t even occupy a data block (as all of it is stored in the inode itself). Coupled with deduplication and compression, clustered Data ONTAP uses 50% less storage
(as compared to traditional storage).
Without virtual addressing, you're forced to physically map everything. Wouldn’t a cluster designed this way become extremely busy, constantly writing and finding data? DATA PARITY
For a decade now, NetApp has offered RAID-DP -- a Storage Networking Industry Association (SNIA
) standard RAID implementation -- on FAS and V-Series storage systems. And it's highly efficient: when a disk fails, rebuilds are easy and short -- even when the system is more than half full.
With RAID Groups, all blocks are related to parity. In other words, RAID-DP is built on implied relationships that are mathematically provable.
If you simply leverage metadata without RAID groups, you could be forced to sift through tons
of metadata before you can do anything! This could also mean having each node involved in the process, slowing down the entire cluster, with no clear estimate of how long it will take to rebuild.
Again, this is non-optimal
Clustered Data ONTAP is a more flexible approach to scale-out storage. Virtualized, abstracted, and protected -- without compromises.
Clearly, design matters.