The secret sauce in SanDisk's ExtremeFFS
Shortcuts the read-modify-write cycle
SanDisk's ExtremeFFS technology can speed random writes to flash memory up to 100 times but doesn't do anything for sequential writes. How does it work?
The company says ExtremeFFS (Extreme Flash File System) "operates on a page-based algorithm, which means there is no fixed coupling between physical and logical location. When a sector of data is written, the SSD puts it where it is most convenient and efficient. The result is an improvement in random write performance – by up to 100 times – as well as in overall endurance".
With Windows, particularly Vista, there are a large number of random writes to the flash solid state drive (SSD) and there is a mismatch between the size of these writes and the SSD block size. So there are two things going on here; there are a large number of random writes which are smaller than the SSD's block size and this affects the write endurance of the SSD, the overall endurance.
A file allocation table or the registry is used thousands of times a day and SanDisk's previous (M-Systems) TrueFFS, introduced in 1994 and incorporated in Windows 95, moved the FAT and registry around an SSD instead of keeping it one place to avoid exhausting the write life of a static FAT location. The logical location of the FAT was separated from its physical location but the logical addresses were tied to physical block addresses. What TrueFFS did was to move these physical block addresses around to level out the wear.
ExtremeFFS is said by SanDisk to have this in common with TrueFFS - no fixed coupling between physical and logical location. So what's new?
The classic method of matching a fast input stream with a slow write device is to buffer it. But buffering is not used here.
According to Don Barnetson, SanDisk's senior marketing director for SSD, flash is organised into 512KB blocks or sectors which contain pages, 8KB in SanDisk's case, and the page is the minimum flash read or write unit. Generally a write to a flash page or block first requires that page or block's contents to be erased, a read-modify-write operation. This is like putting a new book in an already full book shelf. An old book has to be taken out first so the new one can be put there. Such a read-modify-write operation can take 100 milliseconds.
If there is already empty flash then writing to it takes 1 millisecond, a hundred times faster. Suppose a registry entry or File Allocation Table record changes and a new value comes into the flash. It has to write the new data and delete the old data, either by over-writing it or marking it for delete.
If you can write the new value to empty flash and mark the old data for delete then you would separate these operations and avoid the full read-modify-write operation.
SanDisk and other flash product suppliers generally add extra capacity to the stated flash capacity so that there is empty space to begin with. ExtremeFFS also runs a garbage collection process whereby pages marked for erase are erased and so made empty and available. What happens is that the controller maintains a look-up table and this contains page status. Pages are full, marked for erase or empty.
When a random write comes in, on average needing a page or small number of pages, the controller looks for that number of empty pages and writes the data without having to go through a full read-modify-write operation. It marks any existing data that has been changed for delete.
The flash is multi-threaded, incorporating a non-blocking architecture, so that the 4 to 10 channels from the controller to the NAND chips can be doing different things simultaneously. They can be used for reading, writing or garbage collection.
Think of garbage collection as being akin to checking the bookshelf and removing unwanted books thus creating empty space. Or think of the Wallace and Grommit cartoon where new railway track is continuously being laid down in front of a speeding locomotive. That's what the garbage collection does; create fresh track for the speeding (writing) locomotive by pulling up used track from behind the train.
So there are always, or should always be, empty pages for small random writes to be written into quickly without having to do an erase before the write. That's been done already in the background. Without this page-based algorithm every random write would need a block-level operation and take 100 milliseconds instead of, for example, one.
In effect, SanDisk has accelerated random writes by avoiding the need to erase flash for the write data before writing it.
Sequential writes are not accelerated because they operate at block level already as the files are typically large, images from digital cameras, etc.
SanDisk has also added a process whereby pages that are often accessed sequentially are placed contiguously so that their access is speeded up.
The net effect of ExtremeFFS according to SanDisk is that SSDs are now much more suitable for Windows use because the thousands and thousands of small writes, sub-block size, that Windows makes are accelerated through the new page-level algorithm. This should show up as greatly improved benchmarks for Windows SSD-using systems.
More on TrueFFS here (PDF). ®
Thanks for the reply Steven, I'll sit tight for now and see how things go :)
Thanks for that, was really interesting to see an explanation of how the difference types work in practise (NAND/NOR).
Even 133x isn't that fast. Theoretically you get about 20MBps and the fastest CF around (300X) should give up to 45MBps. However, that's theoretical and often cheap CF cards are asymmetric in performance. You get far faster reads than writes.
However, the real problem with CF Nand flash will be that random write speed will be atrocious. You could get as few as 10 writes per second against the 80 random writes or more you would get from the most modest of laptop drives.
So for writing large sequential files, such as a Digital camera might produce, then (moderately) cheap flash might be acceptable, but as a general purpose disk replacement they will be dreadful. The SSD drives have got extra things to speed this up somewhat, but prices for now are high. However, flash prices are dropping through the floor so hold on.
For the most part, the best way of speeding up an old PC is to add more memory. Putting in a faster disk has more limited benefits and cheap flash would be awful.
The reason for the slower operation (and reasoning for ExtremeFFS) is...
... The same as the reduced price in SSDs these days - MLC Flash. MLC is very cheap compared to SLC, but it is a magnitude slower than SLC. The SSDs you get these days are often MLC Flash-based, and what ExtremeFFS does for Sandisk is speeding up MLC Flash operations to the point where it becomes almost cheaper to buy one of their ExtremeFFS-driven multi-channel SSDs in favour of SLC Flash-based versions.
Look at it, 40% of the channels used purely to "hunt down" pages to be erased while the other 60% are used to read or write, this is definitely a performance improvement, and because of the intrinsic difference in timings, randomises wear-leveling. I like it.
random/sequential writes and ssd prices
Back when SCSI interfaces could to 1MB/s, you could reformat a SCSI disk with a new sector size. Microsoft can only handle 512 bytes/sector, so over time support for larger sectors disappeared. (Larger sectors mean a higher capacity because there are fewer inter-sector gaps, but waste time or space when many files are smaller than a sector.)
Nand flash typically has pages from 2 to 8K. Last time I used Nor flash, the page size was 64K. You can change any single 1 to a 0 in Nor flash, but it takes time. It is more efficient to write as many bytes at once with the chip allows (32 on that 16MB chip with 64K pages). The only way to change a zero to a one is to change all the zeroes to ones in an entire page. It used to be possible to change a few zeroes to ones in Nand flash. Modern devices cannot do this. It is only possible to write or erase an entire page.
When Nand flash is packaged up to pretend to be a hard disk, the operating system will issue some 512-byte writes all over the place. The wrong thing for the disk emulation layer to do is to read an entire 8K page, change 512 bytes of it, erase the page and write the data back. Sandisk have finally caught up with JFFS2.
JFFS2 is a Linux file system designed for Nor flash that is not hidden behind a hardware disk emulation layer. All writes are go sequentially to a single page until it is full, then the next erased page is used. This make some of the data on previous pages irrelevant when a more modern version is written. When there is only one erased page left, (or if there is a lull in disk activity), a full page is selected and its useful data is copied to the erased page and the selected page is erased. This leaves one erased page and one partially written page, so further writes can go into that partially written page.
JFFS2 is old tech. It takes a long time to mount large filesystems because the kernel has to read the entire device to map out where the most modern version of all the data is. OK for my ancient 16MB chip, but not so good for 1GB - which is a bit small by modern standards. There are newer shiner flash filesystems in Linux. Unfortunately I rarely get to play with them because most flash is hidden behind a defective disk emulator.
Nand flash comes in two flavours: ordinary, which is fast and costly per gigabyte and multilevel cell which is slow and cheap. You can make a fast SSD out of ordinary Nand flash, or by writing to multiple channels of multi-level flash simultaneously. The most profitable solution is to use multi-level flash with a single channel controller and sell it at a high price to people who do not check if the sustained transfer rate is tolerable.
Big SSD's are expensive because people will pay lots of money for the reduced latency. If you want these things at a good price, wait a bit.