Feeds

But it said so in the manual

NTFS limits provoke scoffs, RAM purchases

  • alert
  • submit to reddit

Choosing a cloud hosting partner with confidence

Sysadmin blog Working with 60 million files pushes the boundaries of any storage. Windows underpins most of my storage and so the theoretical and practical limitations of NTFS and Distributed File System Replication (DFSR), and the difference between theoretical and practical limits on the number and size of files they handle, are important in my life.

The theoretical limit of NTFS is more than 4 billion files, but practical experience has taught me that beyond 4 million files on a single NTFS volume, the system starts to slow down. If you're low on RAM, you will notice performance degradation sooner, when you run out of room in RAM to cache the Master File Table (MFT). Pay attention to the size of your MFT: if you cannot fit your entire MFT into RAM, then accessing random files leads to increased access times.

The more files you have, the larger your MFT will be. Moreover, the more of those files you access on a regular basis, the more RAM you need to act as a cache. File fragmentation will also impact the size of your MFT. Heavily-fragmented files require more space to describe the locations in which files exist.

One million files broken into a (not unheard of) 4.5 million fragments equals 1Gb of MFT. The numbers I live by: for every million files on your server you should have a gigabyte of RAM plus at least 2Gb for the host operating system.

DFSR has completely different limitations. The maximum file size that it will replicate is 64Gb. It will replicate ten terabytes - but only handle eight million files per volume. In real life, after about 4 million files per replication set, DFSR also noticeably slows down. For any server running DFSR, you need the RAM as you have set aside using your MFT rule of thumb, plus half again. So for every million files you will have on your server running DFSR you need 1.5GB of RAM.

I have extracted these numbers from practical use, and your mileage may vary. I have a server with 60 million files, but less than a tenth of these files are accessed in any month. So, while I should have 96Gb of RAM to ensure optimal responsiveness, in day-to-day operations I squeak by with 8GB.

You notice the difference when the backups start running. Backups touch every file on the system. As the number of files increases, I am forced to add a more appropriate amount of RAM. The lag caused by the system being unable to load the MFT into RAM causes the backups to take longer than the after-hours backup period allows.

Breaking up my file storage into multiple partitions has also greatly increased the speed and responsiveness of this system. Neither NTFS nor DFSR like more than 4 million files per volume, and DFSR certainly won’t replicate all 60 million files in a single replication group.

Of course, it is more complicated than the maths would suggest. Windows is capable of operating without loading the MFT into RAM. If you don’t access a file frequently then there is no need for its record to be cached. Windows also caches frequently-used files in their entirety, while only caching the bits of the MFT that you actually use.

But I stick by my rule of thumb: one gig per million files, and one and a half for DFSR.

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Hey - who wants 4.8 TERABYTES almost AS FAST AS MEMORY?
China's Memblaze says they've got it in PCIe. Yow
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
This time it's SO REAL: Overcoming the open-source orgasm myth with TODO
If the web giants need it to work, hey, maybe it'll work
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
Storage array giants can use Azure to evacuate their back ends
Site Recovery can help to move snapshots around
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.
The Heartbleed Bug: how to protect your business with Symantec
What happens when the next Heartbleed (or worse) comes along, and what can you do to weather another chapter in an all-too-familiar string of debilitating attacks?