This article is more than 1 year old

But it said so in the manual

NTFS limits provoke scoffs, RAM purchases

Sysadmin blog Working with 60 million files pushes the boundaries of any storage. Windows underpins most of my storage and so the theoretical and practical limitations of NTFS and Distributed File System Replication (DFSR), and the difference between theoretical and practical limits on the number and size of files they handle, are important in my life.

The theoretical limit of NTFS is more than 4 billion files, but practical experience has taught me that beyond 4 million files on a single NTFS volume, the system starts to slow down. If you're low on RAM, you will notice performance degradation sooner, when you run out of room in RAM to cache the Master File Table (MFT). Pay attention to the size of your MFT: if you cannot fit your entire MFT into RAM, then accessing random files leads to increased access times.

The more files you have, the larger your MFT will be. Moreover, the more of those files you access on a regular basis, the more RAM you need to act as a cache. File fragmentation will also impact the size of your MFT. Heavily-fragmented files require more space to describe the locations in which files exist.

One million files broken into a (not unheard of) 4.5 million fragments equals 1Gb of MFT. The numbers I live by: for every million files on your server you should have a gigabyte of RAM plus at least 2Gb for the host operating system.

DFSR has completely different limitations. The maximum file size that it will replicate is 64Gb. It will replicate ten terabytes - but only handle eight million files per volume. In real life, after about 4 million files per replication set, DFSR also noticeably slows down. For any server running DFSR, you need the RAM as you have set aside using your MFT rule of thumb, plus half again. So for every million files you will have on your server running DFSR you need 1.5GB of RAM.

I have extracted these numbers from practical use, and your mileage may vary. I have a server with 60 million files, but less than a tenth of these files are accessed in any month. So, while I should have 96Gb of RAM to ensure optimal responsiveness, in day-to-day operations I squeak by with 8GB.

You notice the difference when the backups start running. Backups touch every file on the system. As the number of files increases, I am forced to add a more appropriate amount of RAM. The lag caused by the system being unable to load the MFT into RAM causes the backups to take longer than the after-hours backup period allows.

Breaking up my file storage into multiple partitions has also greatly increased the speed and responsiveness of this system. Neither NTFS nor DFSR like more than 4 million files per volume, and DFSR certainly won’t replicate all 60 million files in a single replication group.

Of course, it is more complicated than the maths would suggest. Windows is capable of operating without loading the MFT into RAM. If you don’t access a file frequently then there is no need for its record to be cached. Windows also caches frequently-used files in their entirety, while only caching the bits of the MFT that you actually use.

But I stick by my rule of thumb: one gig per million files, and one and a half for DFSR.

More about

TIP US OFF

Send us news


Other stories you might like