Floating point numbers - what else can be done?

Avoiding errors

Rational numbers

How about rational numbers? Most of the numbers we deal with can be expressed as a fraction or ratio, so why not store these numbers as a numerator and a denominator so that we represent them accurately.

We're all familiar with the math for manipulating fractions; for addition and subtraction you rewrite both sides to have a common denominator and for multiplication and division you multiply nominators and denominators. There are some drawbacks with this approach, however. Firstly, as numerator and denominator are stored separately they are calculated separately. This means there will be twice as many operations per calculations as there are with floats.

Secondly, numerators and denominators can get big very quickly as calculations are performed. This means there needs to be an overflow protection that will factorise the numerator and denominator to make their values smaller as necessary. As this factorisation is not always possible, rational numbers can overflow.

Thirdly, any expressions involving addition, subtraction or comparison are going to have to determine lowest common denominators.

A final point - it's my impression that most interfaces use real numbers - when was the last time the store had a can of soda at \$37/100? So at least in the presentation there's going to be extra conversions going from the rational format back to the real number. So this is an approach that works and for the languages that don't incorporate rational number types and there are almost certainly libraries available that are easy to understand.

In terms of performance, rational numbers may not compare very well to floating points and in terms of storage they'll be twice the size for a similar range. Rational numbers are also only suitable for fractions and not every number can be represented this way (e.g. √2). That said, if you want to see more, for C++ there is a boost implementation of rational numbers available here.

Base-10 floating point numbers

Which brings us to base-10 floating point numbers. If we remember from the previous article, the problem of large errors came from the small approximation errors that arose when base-10 real numbers were converted to the from x/2y. Therefore, it seems that if we could instead represent our number as x/10y then there would be no conversion to a base-2 format and consequently no approximation error. And because there's no conversion back to a base-10 number there's no large errors arising as described in the last article.

The IEEE floating point standard currently undergoing revision allows for a base-10 to be used, so this idea has been around for a long time. That said, current floating point hardware tends not to support the base-10 mode. The reason is that using base-10 implies using binary coded decimal, a number representation format about 20 per cent less storage efficient than base-2. This is because in general binary coded decimal uses four bits per decimal digit; in base-2 these four bits can represent 16 distinct values whereas in the same space BCD can represent only, well 10 distinct values. There are schemes that reduce the amount of redundant space, but not to the efficiency of base-2 and these schemes also render calculations more computationally expensive.

Sponsored: The Nuts and Bolts of Ransomware in 2016

Next page: References