Feeds

But it said so in the manual

NTFS limits provoke scoffs, RAM purchases

  • alert
  • submit to reddit

Combat fraud and increase customer satisfaction

Sysadmin blog Working with 60 million files pushes the boundaries of any storage. Windows underpins most of my storage and so the theoretical and practical limitations of NTFS and Distributed File System Replication (DFSR), and the difference between theoretical and practical limits on the number and size of files they handle, are important in my life.

The theoretical limit of NTFS is more than 4 billion files, but practical experience has taught me that beyond 4 million files on a single NTFS volume, the system starts to slow down. If you're low on RAM, you will notice performance degradation sooner, when you run out of room in RAM to cache the Master File Table (MFT). Pay attention to the size of your MFT: if you cannot fit your entire MFT into RAM, then accessing random files leads to increased access times.

The more files you have, the larger your MFT will be. Moreover, the more of those files you access on a regular basis, the more RAM you need to act as a cache. File fragmentation will also impact the size of your MFT. Heavily-fragmented files require more space to describe the locations in which files exist.

One million files broken into a (not unheard of) 4.5 million fragments equals 1Gb of MFT. The numbers I live by: for every million files on your server you should have a gigabyte of RAM plus at least 2Gb for the host operating system.

DFSR has completely different limitations. The maximum file size that it will replicate is 64Gb. It will replicate ten terabytes - but only handle eight million files per volume. In real life, after about 4 million files per replication set, DFSR also noticeably slows down. For any server running DFSR, you need the RAM as you have set aside using your MFT rule of thumb, plus half again. So for every million files you will have on your server running DFSR you need 1.5GB of RAM.

I have extracted these numbers from practical use, and your mileage may vary. I have a server with 60 million files, but less than a tenth of these files are accessed in any month. So, while I should have 96Gb of RAM to ensure optimal responsiveness, in day-to-day operations I squeak by with 8GB.

You notice the difference when the backups start running. Backups touch every file on the system. As the number of files increases, I am forced to add a more appropriate amount of RAM. The lag caused by the system being unable to load the MFT into RAM causes the backups to take longer than the after-hours backup period allows.

Breaking up my file storage into multiple partitions has also greatly increased the speed and responsiveness of this system. Neither NTFS nor DFSR like more than 4 million files per volume, and DFSR certainly won’t replicate all 60 million files in a single replication group.

Of course, it is more complicated than the maths would suggest. Windows is capable of operating without loading the MFT into RAM. If you don’t access a file frequently then there is no need for its record to be cached. Windows also caches frequently-used files in their entirety, while only caching the bits of the MFT that you actually use.

But I stick by my rule of thumb: one gig per million files, and one and a half for DFSR.

3 Big data security analytics techniques

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story

Whitepapers

SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.