From server to end user: What's coming up for NFS?
Big changes: Hole-punching and application data blocks
Deep Dive NFS (Network File System) is one of two of the most successful file protocols in the history of computing. From the 1980s with NFSv2 through the widely deployed NFSv3 in the 1990s, and now with today’s NFS4.1 standard – and if you don’t know about NFSv4.1 and pNFS (parallel NFS), you should – the protocol has been developed to keep pace with user requirements and the changing nature of data access and processing.
Growth of storage has exploded. In 2010, according to IDC, nearly an exabyte (1,000 petabytes) of open systems storage was sold worldwide. There was more storage sold in one quarter of 2011 than was sold in the whole of 2007. That’s an astonishing growth in the amount of data, so what’s driving it? Certainly, one of the biggest drivers of storage consumption we’ve seen in recent years is the rise of virtualisation and cloud computing. Both have made the management and processing of larger quantities of data much simpler.
These kinds of requirements are driving the next set of proposed changes to the NFS standard: NFSv4.2. NFSv4.1 and pNFS provided the foundations of improved security, scalability and much improved manageability over NFSv3. The latest proposals for NFSv4.2 promise many features that end users have been requesting above and beyond those in NFSv4.1 – features that make NFS more relevant as not only an “everyday” protocol, but one that has application as a preferred distributed file protocol in and beyond the virtualised data centre.
Server side copy
Virtualisation means compute mobility. No longer tied to physical hardware in specific locations, operating systems and applications can seamlessly move from one server to another. But each virtual machine represents data and may refer to ancillary data too, and all that data needs moving. Today, a copy requires potentially costly and unnecessary moves, as a client has to request data from the source data server simply to re-write it to a target data server, a three-way interconnection.
Server-Side Copy (SSC) removes entirely one leg of such a copy operation. Instead of reading entire files or even directories of files from one data server out to the requesting client, and then having the client write them back out to another data server, SSC permits the target and source servers to communicate directly. The client manages the copy process, but isn’t involved in moving the data. Data is moved directly between data servers, and SSC removes the requirements of maintaining costly and high-bandwidth server-to-client-to-server connections, and reduces the potential for congestion on copy operations.
Guaranteed Space Reservation
There’s a limit to the demands that can be met by simply piling on more disks to meet our potential data needs. Every disk costs money to buy, to run, and more to manage. Many storage system administrators are keenly aware that users tend to overestimate their storage requirements; sometimes by orders of magnitude. Over the years, various efficiency techniques have been employed to give the appearance of a large virtual pool of storage on much smaller real storage systems.
One of those techniques, thin provisioning, gives the appearance of large amounts of available space. Although now commonplace, it can be problematic to manage in fast-growing environments, for example, two users both requesting more than 50 per cent of the available free space: both can’t have it.
A guaranteed space reservation feature in NFSv4.2 will ensure that, regardless of the thin provisioning policies, individual files will always have space available for their maximum extent.
While desirable for specific types of data, and a reassurance for the end-user who needs the space to be available, such guarantees can defeat the best efforts of storage administrators to efficiently utilise disk.
For example, when a hypervisor creates a virtual disk file, it often tries to pre-allocate the space for the file so that there are no future allocation-related errors during the operation of the virtual machine. Applications like this typically zero the entire file, which is inefficient in I/O, and inefficient in storage used.
In support of better storage efficiencies, NFSv4.2 introduces support for such sparse files. Commonly called “hole-punching”, deleted and unused parts of files are returned to the storage system’s free space pool (see figure 1).
Figure 1: Thin provisioned and hole-punched data
Thin provisioning removes the need for reserving real storage for expansion that may never happen, and the real free space can be shared amongst many users. NFSv4.2’s hole punching takes that one step further, by recognising that files themselves very often contain holes that reserve space, but that contain no useful data. The client’s view is unchanged: NFSv4.2 provides hole punching transparently.
Application Data Blocks (ADB)
Application Data Blocks (ADB) extends hole-punching for zero-filled blocks to support applications that write blocks that contain patterns, for example a hypervisor or database application that provides sophisticated data corruption checks by writing guard patterns to a VM image or a database. ADB allows such applications to define the format of these blocks in a file, and NFSv4.2 can then store a much-reduced representation of the block in a map, thus saving space. It’s a form of deduplication in a way.
Here’s a simple example. Your application might initialise its file by writing blocks with the hex string 0xDEADBEEF at offset zero. Ordinarily, that would require writing every block initialised this way, for the whole file. But ADB can be “taught” about this representation, note the fact that this is an initialisation block in its map, and simply not write the block.
When a read request is made for the block, the storage system can inspect its map for this file, find that it is an initialisation block, and return a constructed 0xDEADBEEF block back to the application. The application is satisfied but none the wiser that the storage system has made significant savings in space and IO.
With this feature, NFSv4.2 will be able to do rapid and space-reduced initialisation of data stores; a large database or a VM image on the server can be created with a single operation.
Application I/O hints
With increasing amounts of I/O as data volumes rise, and with the availability of tiered storage systems that employ cache or SSDs to provide a buffer between fast DRAM and much slower traditional disks, NFSv4.2 provides facilities for applications to communicate data access patterns to the underlying storage system. For instance, data will be read sequentially, so consider read ahead. Or data will be read and written multiple times, so consider caching the data. Or data will be written but not read, so avoid polluting the cache.
So when is this available?
The NFSv4.2 specification is likely to be ratified in March 2012. How long it takes client and server providers to implement the features is dependent on the demand from end users (that’s you) to support them.
Which raises the question: are you still on NFSv3? If so, not only are you losing out on the advanced features of NFSv4, NFSv4.1 and pNFS, but you won’t be able to take advantage of NFSv4.2 once it is implemented either, since only one feature – improved security – has been “retrofitted” from NFSv4 to NFSv3.
Even though NFSv4.2 will take time to be available, it’s time to plan NFSv4 for your next project. ®
This article was written by Alex McDonald, the SNIA ESF NFS Co-Chair. He works for NetApp.
About the SNIA
The Storage Networking Industry Association (SNIA) is a not-for-profit global organisation, made up of some 400 member companies spanning virtually the entire storage industry. SNIA's mission is to lead the storage industry worldwide in developing and promoting standards, technologies, and educational services to empower organisations in the management of information. To this end, the SNIA is uniquely committed to delivering standards, education, and services that will propel open storage networking solutions into the broader market.
About SNIA Europe
The Storage Networking Industry Association (SNIA) Europe is dedicated to educating the market on the evolution and application of storage infrastructure solutions for the data centre by providing thought leadership and industry education focused on storage technologies and business value. For more information visit: www.snia-europe.org  . For more Information about SNIA’s ESF NFS SIG, visit this webpage