How do you copy 60m files?

Apart from telling someone else to do it, that is

  • alert
  • submit to reddit

Secure remote control for conventional and virtual desktops

Sysadmin blog Recently I copied 60 million files from one Windows file server to another. Tools used to move files from system to another are integrated into every operating system, and there are third party options too. The tasks they perform are so common that we tend to ignore their limits.

Many systems administrators are guilty of not knowing exactly where the limits of their file management tools lie - at least until they run up against them.

But I know that using Windows Explorer in Windows XP/Server 2003 would be lunacy. The first permissions, filename or path length problem and the copy grinds to a halt.

If I were moving files from one server to another, a halt would just be a minor annoyance. The moved files are on the destination server. If the transfer halts, restart with the faulting file omitted. Irritating and time consuming, but not difficult.

Copying is another matter entirely. The file manager must handle exceptions. If it doesn't, then you have to do a lot of checking to find out what has been copied and what hasn’t.

FTP is one of my favourite ways to handle this problem. A good FTP client is designed with all sorts of abnormal situations in mind. It has bandwidth and treading controls, a transfer queue, the ability to resume failed transfers, and it reconnects to the target server if the connection is lost.

Decades after it was created, FTP still remains one of the best ways to move files. But no graphical FTP client I could find would cope with 60 million files. Filezilla blew up somewhere around one million. WS-FTP managed a few more. None of them were capable of more than about four million files.

My next attempt was to package them into a ball on the originating server and unpack them on the destination. No good. Neither Windows Server 2003’s native zip utilities, WinZip, 7Zip or WinRAR were up to it. Somewhere between four and ten million files all of them threw an exception and died.

Knowing that Windows 7/Server 2008 R2’s Windows Explorer is more advanced than the Server 2003 version, I tried using a third server to move the files from A to B via C (which was Server 2008 R2). It's much better at handling exceptions, but it too fell apart at four million files too.

Consulting my flash keyfob, I started trying my sysadmin tools. XXCopy, FastCopy, TeraCopy and Beyond Compare all made valiant, but ultimately ineffective, attempts. Of the GUI tools I tried, only Richcopy was able to handle the load. Richcopy is a free multi-threaded file management application written by Ken Tamaru at Microsoft. Increasingly it is my fallback for handling odd or exceptional file transfer scenarios.

I wanted to give several command-line tools a go as well. XCopy and Robocopy most likely would have been able to handle the file volume but - like Windows Explorer - they are bound by the fact that NTFS can store files with longer names and greater path than CMD can handle. I tried ever more complicated batch files, with various loops in them, in an attempt to deal with the path depth issues. I failed.

What worked brilliantly was using a Linux virtual machine. A simple default CentOS 5.5 install was able to mount SMB shares on both the originator and destination machines. From there, the command-line tool cp was able to succeed where every tool except Richcopy failed.

The Linux command line tool cp, while slower than Richcopy, copies in a linear fashion. It copies each file in sequence, leading to no fragmentation on the destination server.

Richcopy can handle large quantities of files, but can multi-thread the copy, and so is several hours faster than using a Linux server as an intermediary. The disadvantage is that the resulting file system on the destination server is heavily fragmented. You could restrict Richcopy to a single thread, but then it is no faster than cp.

Richcopy is not so fast that there is time to defragment an NTFS partition with 60 million files on it before CP would have finished. So the best way to move 60 million files from one Windows server to another turns out to be: use Linux.

Secure remote control for conventional and virtual desktops

More from The Register

next story
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
Trio of XSS turns attackers into admins
prev story


Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Go beyond APM with real-time IT operations analytics
How IT operations teams can harness the wealth of wire data already flowing through their environment for real-time operational intelligence.
The total economic impact of Druva inSync
Examining the ROI enterprises may realize by implementing inSync, as they look to improve backup and recovery of endpoint data in a cost-effective manner.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.