Growing arrays need bigger pipes
Be generous with bandwidth
Storage array and disk drive vendors have excelled themselves and delivered the high-capacity goods. But some of us are still not happy, because although we have big fat data vaults they are being held back by anorexic pipes.
What is going on here? Business is generating more data than ever before and wants to store and access it as fast as it is accustomed to. The array vendors have done their bit – but we still need bigger network pipes to provide a good ratio of bandwidth to storage array capacity.
It is not just array suppliers who want to increase capacity. Indeed, they are helped by the hard disk drive makers, who are in a constant race to increase capacity.
We saw a rash of 2TB disk drives introduced in 2009 – 4-platter units with 500GB per platter. We saw 1TB drives turn up in 2006. Four terabyte drives have just been announced by Seagate and Hitachi GSS.
In other words, we have seen a quadrupling of disk drive capacity in five years. Just by replacing the 1TB drives in storage arrays by 4TB ones, the array's capacity increases fourfold.
The array manufacturers haven't stopped either. An EMC Clariion CX3 array in 2006 could hold up to 230TB of data; the 2011 VNX array, in 7500 guise, holds 2PB using about 1,000 drives.
Just increasing the number of drives, the spindle count, increases an array's responsiveness. Having 1,000 drives instead of 500 means twice the number of I/Os can come flooding out of an array to servers.
Another boost to arrays’ responsiveness has been the use of solid state drives to hold the most frequently accessed data. These can output many more I/Os per second than a hard disk drive, and they saturate an array's network links to accessing servers if they are too few and too small.
Don’t be deduped
A storage array's network links have a certain bandwidth: the larger that bandwidth and the more numerous the links, the more data can be served by the array to servers.
But building bigger and bigger arrays means that the array network links can become bottlenecks. Can clever software get us out of this trap?
On the face of it deduplication – the removal of duplicated blocks of data in data records and files – could help sort out the problem. If the amount of data following between servers and the array is halved, then the network pipe caries only half the data and the bottleneck is eased.
The continual increase in data means that the array fills up and the data burden on its network pipes increases back to what it was before deduplication. Deduping data only gives a one-time boost to an array's use of network bandwidth.
Thick and thin
Another piece of clever software is thin provisioning. Can it solve the problem better? The blunt answer is, no. What it can do is to mitigate the data increase rate by slowing the allocation of disk space to volumes.
When a server application needs block storage it is allocated a volume, say 300TB, which is estimated on the basis of the data the app will need to store.
Actually, it is not going to write 300TB of data all at once. Instead it writes, say, 5TB a week, so taking more than a year to reach the 300TB figure. For much of that year most of the allocated 300TB is sitting empty and could be used for data written by other applications.
What thin provisioning does is “thinning” a volume to little more than the written data size plus a buffer. If the buffer is encroached upon, then more physical space is given to the volume. Thin provisioning does not affect the amount of data flowing between the array and its accessing servers.
WAN optimisation software and hardware can reduce this data amount by reducing the network overhead needed to move it and, possibly, deduplicating it.
We could see such technology effectively turning a 1Gbit Ethernet pipe between an array and its servers into, say, a 1.5Gbit pipe. That will provide temporary and much needed relief, and is still beneficial if the Ethernet wire is uprated to a 10Gbit/s one.
Go forth and multiply
Fibre Channel SAN storage arrays have responded to this problem by uprating the size and number of their Fibre Channel ports.
Where two 2Gbit/s Fibre Channel ports sufficed once, there may now be four 8Gbit/s ports, an eightfold increase in network bandwidth. With 16Gbit/s ports and fabric switches forthcoming, a further doubling in bandwidth can take place.
For servers accessing network-attached storage (NAS) across Ethernet links, and also iSCSI SAN access, it is necessary to add more and more 1Gbit/sec Ethernet links to maintain a satisfactory ratio of network pipe bandwidth and array capacity, or preferably, to move to 10Gbit/s Ethernet as well.
The use of scale-out storage – adding more arrays and connecting them together – helps solve a network bottleneck issue at the individual array level, but not at the overall storage networking level. Having a federation of 20 arrays, reach using four 1Gbit/s Ethernet wires, means your Ethernet infrastructure has 80 Ethernet cables linking to the accessing servers.
Just changing to 10Gbit/s Ethernet cuts that down to eight cables, which is much more manageable.
The moral is: a bathplug is fine for draining a puddle but it is no use if you are trying to drain a lake.
Keep networking bandwidth and storage array capacity in step. If the network starts hobbling array access, then it needs speeding up. ®