Feeds

How do you copy 60m files?

Apart from telling someone else to do it, that is

  • alert
  • submit to reddit

Application security programs and practises

Sysadmin blog Recently I copied 60 million files from one Windows file server to another. Tools used to move files from system to another are integrated into every operating system, and there are third party options too. The tasks they perform are so common that we tend to ignore their limits.

Many systems administrators are guilty of not knowing exactly where the limits of their file management tools lie - at least until they run up against them.

But I know that using Windows Explorer in Windows XP/Server 2003 would be lunacy. The first permissions, filename or path length problem and the copy grinds to a halt.

If I were moving files from one server to another, a halt would just be a minor annoyance. The moved files are on the destination server. If the transfer halts, restart with the faulting file omitted. Irritating and time consuming, but not difficult.

Copying is another matter entirely. The file manager must handle exceptions. If it doesn't, then you have to do a lot of checking to find out what has been copied and what hasn’t.

FTP is one of my favourite ways to handle this problem. A good FTP client is designed with all sorts of abnormal situations in mind. It has bandwidth and treading controls, a transfer queue, the ability to resume failed transfers, and it reconnects to the target server if the connection is lost.

Decades after it was created, FTP still remains one of the best ways to move files. But no graphical FTP client I could find would cope with 60 million files. Filezilla blew up somewhere around one million. WS-FTP managed a few more. None of them were capable of more than about four million files.

My next attempt was to package them into a ball on the originating server and unpack them on the destination. No good. Neither Windows Server 2003’s native zip utilities, WinZip, 7Zip or WinRAR were up to it. Somewhere between four and ten million files all of them threw an exception and died.

Knowing that Windows 7/Server 2008 R2’s Windows Explorer is more advanced than the Server 2003 version, I tried using a third server to move the files from A to B via C (which was Server 2008 R2). It's much better at handling exceptions, but it too fell apart at four million files too.

Consulting my flash keyfob, I started trying my sysadmin tools. XXCopy, FastCopy, TeraCopy and Beyond Compare all made valiant, but ultimately ineffective, attempts. Of the GUI tools I tried, only Richcopy was able to handle the load. Richcopy is a free multi-threaded file management application written by Ken Tamaru at Microsoft. Increasingly it is my fallback for handling odd or exceptional file transfer scenarios.

I wanted to give several command-line tools a go as well. XCopy and Robocopy most likely would have been able to handle the file volume but - like Windows Explorer - they are bound by the fact that NTFS can store files with longer names and greater path than CMD can handle. I tried ever more complicated batch files, with various loops in them, in an attempt to deal with the path depth issues. I failed.

What worked brilliantly was using a Linux virtual machine. A simple default CentOS 5.5 install was able to mount SMB shares on both the originator and destination machines. From there, the command-line tool cp was able to succeed where every tool except Richcopy failed.

The Linux command line tool cp, while slower than Richcopy, copies in a linear fashion. It copies each file in sequence, leading to no fragmentation on the destination server.

Richcopy can handle large quantities of files, but can multi-thread the copy, and so is several hours faster than using a Linux server as an intermediary. The disadvantage is that the resulting file system on the destination server is heavily fragmented. You could restrict Richcopy to a single thread, but then it is no faster than cp.

Richcopy is not so fast that there is time to defragment an NTFS partition with 60 million files on it before CP would have finished. So the best way to move 60 million files from one Windows server to another turns out to be: use Linux.

Bridging the IT gap between rising business demands and ageing tools

More from The Register

next story
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Attack of the clones: Oracle's latest Red Hat Linux lookalike arrives
Oracle's Linux boss says Larry's Linux isn't just for Oracle apps anymore
THUD! WD plonks down SIX TERABYTE 'consumer NAS' fatboy
Now that's a LOT of porn or pirated movies. Or, you know, other consumer stuff
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
prev story

Whitepapers

Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.