Original URL: http://www.theregister.co.uk/2012/05/29/large_array_migration_impossibility/

Are you handcuffed to the rails of your disk array's sinking ship?

How your data could end up tied down to a supplier

By Chris Mellor

Posted in Storage, 29th May 2012 07:02 GMT

Blocks and Files Are your storage arrays now so big, you can't easily migrate your data off them? If so, you've handcuffed yourself to your supplier, open interfaces or not.

There are good reasons to need a new storage array, such as the existing one running out of gas or lacking the features you need in the new one. Allow us to demonstrate the problems faced in shifting your data onto new kit.

We'll suppose you're going to move your data off a, say, fully-loaded VMAX 40K array. How long will it take? It still has to do its normal work while the move is going on, so you can't use all of its 10GbitE links. Let's make some wild assumptions: a 10GbitE link theoretically moves at least ten billion bits a second at top speed, but in reality hardware and software is not that fast. So, if we pick a generous practical average transfer rate of 700MB/s, that's 41GB per minute. Over an hour that will be 2.4TB and it will be 57TB in a day.

We'll use, say, 32 of the available 64 links and that gives us 1.8PB per day. It will take 2.22 days to move 4PB off the storage vault onto the destination array with the array's controllers' normal workload slowed down by all the data moving.

That's quite a long time and, in reality, we couldn't use half the array's links to do the job. Instead we'd have to use fewer, ten say, so as not affect the array's normal workload too much. In that case the data transfer rate would be 0.55PB per day and the transfer would need 7.27 days at best - and probably a couple of weeks in real life.

If we use 40GbitE instead, it take four or so days in real life; not too horrible.

If the incoming array can virtualise third-party arrays, including the one you are migrating from, then things get better. The old array becomes a data tub behind the new one, with all requests being satisfied through the new array's controllers. Over time the contents of the old array are drip-fed into the new one until the old array is empty and can be moved out. Thus only buying new arrays that can virtualise your existing array looks like a good idea, and that means using EMC's VMAX, HDS' VSP, IBM's SVC as a virtualising front-end, or NetApp's V-Series.

Admittedly the new array's controllers will be quite occupied with virtualising the old array and receiving its incoming data for as long as the drip-feed data migration is under way. You won't get full controller performance for user data requests from it while this is going on.

Wheel out the big guns

What about the idea of using 100GbitE? The El Reg storage desk understands that using a single big link is more expensive than using multiple smaller links that, collectively, equal the big fat link's bandwidth. Thus ten 10GbitE links are cheaper than one 100GbitE link. It looks like you can't get away from using multiple ports and thus compromising the array's server IO request bandwidth.

Another problem comes when the source array has a replication arrangement with one or more arrays, which will make the migration planning even more complicated.

Now let's make things worse and migrate a 144-node Isilon array holding 15PB of data. Data growth is around 50 per cent per year, so in three years that will be a touch over 50PB. The Isilon scale-out filer doesn't virtualise third-party arrays and may not be virtualisable by them. Migration means a straight data transfer, up to 50PB of it off 144 nodes. This looks to be a potentially horrendous task.

The net result of this particular puzzle is that your big data scale-out filer supplier may effectively be your supplier for life. You will be joined at the hip and unable, practically speaking, to throw them off.

The prospect here is that massive data tubs could become a permanent fixture in your data centres because it is effectively impossible to move the data off them onto a new array. Open systems? Not really.

Perhaps the long-term answer is to move the data into the cloud, moving the apps that need low network latency there too, and hand the problem over to the service provider du jour.

Big data tubs could become boat anchors holding you stuck fast in the mud. You may need to start thinking about alternatives. ®

Bootnote

Don't forget to post your thoughts in the El Reg storage forum.