Feeds

But it said so in the manual

NTFS limits provoke scoffs, RAM purchases

  • alert
  • submit to reddit

Internet Security Threat Report 2014

Sysadmin blog Working with 60 million files pushes the boundaries of any storage. Windows underpins most of my storage and so the theoretical and practical limitations of NTFS and Distributed File System Replication (DFSR), and the difference between theoretical and practical limits on the number and size of files they handle, are important in my life.

The theoretical limit of NTFS is more than 4 billion files, but practical experience has taught me that beyond 4 million files on a single NTFS volume, the system starts to slow down. If you're low on RAM, you will notice performance degradation sooner, when you run out of room in RAM to cache the Master File Table (MFT). Pay attention to the size of your MFT: if you cannot fit your entire MFT into RAM, then accessing random files leads to increased access times.

The more files you have, the larger your MFT will be. Moreover, the more of those files you access on a regular basis, the more RAM you need to act as a cache. File fragmentation will also impact the size of your MFT. Heavily-fragmented files require more space to describe the locations in which files exist.

One million files broken into a (not unheard of) 4.5 million fragments equals 1Gb of MFT. The numbers I live by: for every million files on your server you should have a gigabyte of RAM plus at least 2Gb for the host operating system.

DFSR has completely different limitations. The maximum file size that it will replicate is 64Gb. It will replicate ten terabytes - but only handle eight million files per volume. In real life, after about 4 million files per replication set, DFSR also noticeably slows down. For any server running DFSR, you need the RAM as you have set aside using your MFT rule of thumb, plus half again. So for every million files you will have on your server running DFSR you need 1.5GB of RAM.

I have extracted these numbers from practical use, and your mileage may vary. I have a server with 60 million files, but less than a tenth of these files are accessed in any month. So, while I should have 96Gb of RAM to ensure optimal responsiveness, in day-to-day operations I squeak by with 8GB.

You notice the difference when the backups start running. Backups touch every file on the system. As the number of files increases, I am forced to add a more appropriate amount of RAM. The lag caused by the system being unable to load the MFT into RAM causes the backups to take longer than the after-hours backup period allows.

Breaking up my file storage into multiple partitions has also greatly increased the speed and responsiveness of this system. Neither NTFS nor DFSR like more than 4 million files per volume, and DFSR certainly won’t replicate all 60 million files in a single replication group.

Of course, it is more complicated than the maths would suggest. Windows is capable of operating without loading the MFT into RAM. If you don’t access a file frequently then there is no need for its record to be cached. Windows also caches frequently-used files in their entirety, while only caching the bits of the MFT that you actually use.

But I stick by my rule of thumb: one gig per million files, and one and a half for DFSR.

Secure remote control for conventional and virtual desktops

More from The Register

next story
729 teraflops, 71,000-core Super cost just US$5,500 to build
Cloud doubters, this isn't going to be your best day
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
Cray heaves out even mightier, Lustre-ous Sonexion 2000
Met Office and Los Alamos bomb boffins are apparently among its fans
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
How to simplify SSL certificate management
Simple steps to take control of SSL certificates across the enterprise, and recommendations centralizing certificate management throughout their lifecycle.
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.