Feeds

How do you copy 60m files?

Apart from telling someone else to do it, that is

  • alert
  • submit to reddit

Next gen security for virtualised datacentres

Sysadmin blog Recently I copied 60 million files from one Windows file server to another. Tools used to move files from system to another are integrated into every operating system, and there are third party options too. The tasks they perform are so common that we tend to ignore their limits.

Many systems administrators are guilty of not knowing exactly where the limits of their file management tools lie - at least until they run up against them.

But I know that using Windows Explorer in Windows XP/Server 2003 would be lunacy. The first permissions, filename or path length problem and the copy grinds to a halt.

If I were moving files from one server to another, a halt would just be a minor annoyance. The moved files are on the destination server. If the transfer halts, restart with the faulting file omitted. Irritating and time consuming, but not difficult.

Copying is another matter entirely. The file manager must handle exceptions. If it doesn't, then you have to do a lot of checking to find out what has been copied and what hasn’t.

FTP is one of my favourite ways to handle this problem. A good FTP client is designed with all sorts of abnormal situations in mind. It has bandwidth and treading controls, a transfer queue, the ability to resume failed transfers, and it reconnects to the target server if the connection is lost.

Decades after it was created, FTP still remains one of the best ways to move files. But no graphical FTP client I could find would cope with 60 million files. Filezilla blew up somewhere around one million. WS-FTP managed a few more. None of them were capable of more than about four million files.

My next attempt was to package them into a ball on the originating server and unpack them on the destination. No good. Neither Windows Server 2003’s native zip utilities, WinZip, 7Zip or WinRAR were up to it. Somewhere between four and ten million files all of them threw an exception and died.

Knowing that Windows 7/Server 2008 R2’s Windows Explorer is more advanced than the Server 2003 version, I tried using a third server to move the files from A to B via C (which was Server 2008 R2). It's much better at handling exceptions, but it too fell apart at four million files too.

Consulting my flash keyfob, I started trying my sysadmin tools. XXCopy, FastCopy, TeraCopy and Beyond Compare all made valiant, but ultimately ineffective, attempts. Of the GUI tools I tried, only Richcopy was able to handle the load. Richcopy is a free multi-threaded file management application written by Ken Tamaru at Microsoft. Increasingly it is my fallback for handling odd or exceptional file transfer scenarios.

I wanted to give several command-line tools a go as well. XCopy and Robocopy most likely would have been able to handle the file volume but - like Windows Explorer - they are bound by the fact that NTFS can store files with longer names and greater path than CMD can handle. I tried ever more complicated batch files, with various loops in them, in an attempt to deal with the path depth issues. I failed.

What worked brilliantly was using a Linux virtual machine. A simple default CentOS 5.5 install was able to mount SMB shares on both the originator and destination machines. From there, the command-line tool cp was able to succeed where every tool except Richcopy failed.

The Linux command line tool cp, while slower than Richcopy, copies in a linear fashion. It copies each file in sequence, leading to no fragmentation on the destination server.

Richcopy can handle large quantities of files, but can multi-thread the copy, and so is several hours faster than using a Linux server as an intermediary. The disadvantage is that the resulting file system on the destination server is heavily fragmented. You could restrict Richcopy to a single thread, but then it is no faster than cp.

Richcopy is not so fast that there is time to defragment an NTFS partition with 60 million files on it before CP would have finished. So the best way to move 60 million files from one Windows server to another turns out to be: use Linux.

5 things you didn’t know about cloud backup

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.