Lightning crashes: How to manage server flash caches
Step change in complexity
What does it mean for a storage array to manage server flash drives as will happen with EMC's Project Lightning?
Let's look at an EMC VNX doing FAST VP (Fully-Automated Storage Tiering for Virtual Pools) tiering of data on the one hand and a VNX supporting connected server flash drives on the other.
With in-array tiering, the VNX FAST VP software has to track the access rate on sub-LUN chunks of data, 1GB slices, in the array. A LUN can be made up of many slices. EMC documentation states: "The relative activity level of each slice is used to determine which slices should be promoted to higher ties of storage." The VNX could have three storage tiers, such as slow capacity tier SATA drives, performance tier 10K SAS drives and extreme performance tier solid state drives (SSDs). The VNX SW has to know about these three tiers, know which slices are in the tiers, know the activity rate on those slices and know the activity rate on the other slices of data being accessed.
Each slice has to have activity metadata associated with it, including a unique ID, its activity level over a time period and its tier location. For a simplistic example, slice (510,666, 6, 2) means slice number 510,666 has been accessed six times in the last five minutes and is located in the second (SAS performance disk) tier.
Actually FAST VP maintains a cumulative I/O count and "weights" each I/O by how recently it arrived. This weighting decays over time, with new I/O being given full weight, I/Os about a day old given half-weight, and week-old I/Os having very little weight. Recent activity counts more than old activity.
FAST VP operates by periodically – once each hour – relocating the most active data up to the highest available tiers, typically the extreme performance or performance tier. Less active data is relocated to lower tiers. So this is a fairly simple algorithm: know the capacity of three storage tiers in slice terms, track the location within three tiers of the thousands of active slices and track their activity levels. Every N seconds the most active slices relocate to fill the extreme performance tier, the next most active move to fill the performance tier, and the rest go into the capacity tier.
We'll put aside user manipulation of tiering policies to keep things simple.
Now let's add server flash drives as a fourth "tier" to to this scheme. How can FAST VP move data in and out of server flash drives?
Tiering outside the array
First of all, the fourth tier has to be sub-divided into mini-tiers, one for each supported server. Let's suppose there are 50 attached servers. Now FAST VP has to collect statistics on the activity level of the slices in the server caches. We could simplistically imagine slice (666,804, 5, 4, 25) is slice number 666,804 which has been accessed five times in the last five minutes and is located in the fourth, server flash, tier in server number 25. Before Project Lightning, FAST VP didn't care about the source or destination of I/Os; now it very much has to do that.
The amount of data collected by the VNX FAST VP software has increased. A 300GB server flash drive in each of 50 servers means 150,000 extra slices have to be tracked. One hundred connected servers means tracking 300,000 slices. Two hundred servers each with 600GB SSDs means 1.2 million slices are tracked.
Also, each server flash drive has to be filled and filled only with slices relevant to that server. That means, in effect, that a tier 4 with 50 servers is actually 50 individual tiers commingled.
Once FAST VP has done this it can deal with tiers 3, 2 and 1 as before.
Obviously the processing burden on the VNX or VMAX CPUs is going to increase.
FAST VP can treat the server flash locations as a storage tier, storing data that is not in the array, or as caches only, in which case data in them stays in the extreme performance tier of the connected VNX array. This is safer since, if a server crashes, its flash cache data is unavailable and may be lost for ever. If it is a read cache only this is the simple case. If it is a write cache then written data has to be shipped back to the VNX quickly, again for safety.
The in-array tiering uses hourly relocation intervals. Tiering to server flash caches could well need to be performed at shorter intervals to make the system more responsive to sudden changes in the server's I/O activity levels. That would mean FAST VP having separate server flash cache slice relocation analysis sessions. With a 300GB server flash cache it means tracking the top 300 slices requested by that server and putting them in its cache from whichever other tier they happen to be in.
That could also mean a slice in the capacity tier also being in a server flash cache and not being moved up the in-array tiers until the next hourly in-array FAST VP analysis run. Does this matter? Do server flash caches replace the extreme performance SSD tier in the array? You could argue that, yes, they do but only for connected servers with flash caches and not for non-flash servers. So server flash caching can be layered onto in-array tiering without too much trouble.
Life does start to get complicated, doesn't it, and that's without thinking that, well, perhaps we should cache slices for individual virtual machines (VM) in servers, for virtual servers, and so sub-divide a server's flash cache into sections for each VM. Let's not go there.
This look at server flash caching and how FAST VP might manage it tells us that EMC's Project Lightning VFCache is going to have to be accompanied by updates to any supporting array's operating system. ®