But it said so in the manual

NTFS limits provoke scoffs, RAM purchases

  • alert
  • submit to reddit

Remote control for virtualized desktops

Sysadmin blog Working with 60 million files pushes the boundaries of any storage. Windows underpins most of my storage and so the theoretical and practical limitations of NTFS and Distributed File System Replication (DFSR), and the difference between theoretical and practical limits on the number and size of files they handle, are important in my life.

The theoretical limit of NTFS is more than 4 billion files, but practical experience has taught me that beyond 4 million files on a single NTFS volume, the system starts to slow down. If you're low on RAM, you will notice performance degradation sooner, when you run out of room in RAM to cache the Master File Table (MFT). Pay attention to the size of your MFT: if you cannot fit your entire MFT into RAM, then accessing random files leads to increased access times.

The more files you have, the larger your MFT will be. Moreover, the more of those files you access on a regular basis, the more RAM you need to act as a cache. File fragmentation will also impact the size of your MFT. Heavily-fragmented files require more space to describe the locations in which files exist.

One million files broken into a (not unheard of) 4.5 million fragments equals 1Gb of MFT. The numbers I live by: for every million files on your server you should have a gigabyte of RAM plus at least 2Gb for the host operating system.

DFSR has completely different limitations. The maximum file size that it will replicate is 64Gb. It will replicate ten terabytes - but only handle eight million files per volume. In real life, after about 4 million files per replication set, DFSR also noticeably slows down. For any server running DFSR, you need the RAM as you have set aside using your MFT rule of thumb, plus half again. So for every million files you will have on your server running DFSR you need 1.5GB of RAM.

I have extracted these numbers from practical use, and your mileage may vary. I have a server with 60 million files, but less than a tenth of these files are accessed in any month. So, while I should have 96Gb of RAM to ensure optimal responsiveness, in day-to-day operations I squeak by with 8GB.

You notice the difference when the backups start running. Backups touch every file on the system. As the number of files increases, I am forced to add a more appropriate amount of RAM. The lag caused by the system being unable to load the MFT into RAM causes the backups to take longer than the after-hours backup period allows.

Breaking up my file storage into multiple partitions has also greatly increased the speed and responsiveness of this system. Neither NTFS nor DFSR like more than 4 million files per volume, and DFSR certainly won’t replicate all 60 million files in a single replication group.

Of course, it is more complicated than the maths would suggest. Windows is capable of operating without loading the MFT into RAM. If you don’t access a file frequently then there is no need for its record to be cached. Windows also caches frequently-used files in their entirety, while only caching the bits of the MFT that you actually use.

But I stick by my rule of thumb: one gig per million files, and one and a half for DFSR.

Beginner's guide to SSL certificates

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story


Go beyond APM with real-time IT operations analytics
How IT operations teams can harness the wealth of wire data already flowing through their environment for real-time operational intelligence.
10 threats to successful enterprise endpoint backup
10 threats to a successful backup including issues with BYOD, slow backups and ineffective security.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.