# Floating point numbers - what else can be done?

## Avoiding errors

High performance access to file storage

So what does using base-10 floating point actually mean? Well, as there effectively isn't a standard governing decimal floats we can't say exactly, but to give us an idea lets stick with the structure of the IEEE base-2 float.

The representation for a 32-bit decimal float is of the form 10^{exp} * 1.n where n is an arithmetic sequence 1 * 10^{-1} + 1 * 10^{-2} + … + 1 * 10^{-5}. You may remember that the equivalent binary sequence continued to 2<sup-23, the difference is because each decimal term takes 4-bits and five terms use 20-bits.

The IEEE representation allows 23-bits for the mantissa and, although there are three bits left over, these aren't adequate to represent a BCD number so we don't have a 10^{-6} term and we can represent six digit numbers in our decimal float. This compares to eight useful digits of a decimal number when we use 32-binary float representations so the difference in storage efficiency is easily seen.

Six digits may seem too small to be useful - it's only five decimal places after all - but two things should be taken into consideration; firstly this is for a 32-bit float, most of the time 64-bit floats are available which would allow for a more respectable 12-digit number.

Secondly, the format for decimal floating point numbers is probably going to change as part of the revision to the floating point standard. Currently, if you want to start using decimal arithmetic and you're a Java programmer you're in luck, there's the `java.math.BigDecimal`

class in the library.

If you're a C++ programmer there are plenty of libraries out there. The IBM one is in the early stages of development and contains known bugs, but as the IBM people are heavily involved in the associated extensions to C and C++, this library is probably going to resemble what will eventually be seen in the C++ language that bit more than the others.

This isn't all good news, however. Firstly, decimal floating calculations are going to be mostly done in software until the new floating standard is ratified and the hardware catches up. This means that for a while using decimal arithmetic is going to imply a performance cost and, depending on the constraints you're working to, that may or may be acceptable.

Secondly, converting to and from base-2 isn't the only source of error in floating point calculations. It is, however, the one that we've all seen a bit too often and, although we're still going to need to understand about floats and numerical analysis to do serious things with floats, at least it won't be as easy to shoot yourself in the foot with the basics.

So let's be thankful that we no longer work in the dark days when storage was at a premium and that we can do things that were unthinkable in the past; such as using four digits to store the year and sacrificing a few bits to make life that bit easier for the poor misunderstood man in the trench writing the code that keeps our satellites in orbit. ®

### References

http://www2.hursley.ibm.com/decimal/

<a href="http://www.petebecker.com/js%