Go home now Header Background Image
Submission Procedure
share: |
Follow us
Volume 1 / Issue 7 / Abstract

available in:   PDF (130 kB) PS (33 kB)
Similar Docs BibTeX   Write a comment
Links into Future
DOI:   10.3217/jucs-001-07-0560

On Formally Correct Implementation of IEEE Computer Arithmetic

Evgenija D. Popova
(Bulgarian Academy of Science, Bulgaria

Abstract: IEEE floating-point arithmetic standards 754 and 854 reflect the present state of the art in designing and implementing floating-point arithmetic units. A formalism applied to a standard non-trapping mode floating-point system shows incorrectness of some numeric and non-numeric results. A software emulation of decimal floating-point computer arithmetic supporting an enhanced set of exception symbols is reported. Some implementation details, discussion of some open questions about utility and consistency of the implemented arithmetic with the IEEE Standards are provided. The potential benefit for computations with infinite symbolic elements is outlined.

Key Words: computer arithmetic, implementation, IEEE standards, exception handling

Category: G.1.0

1 Introduction

At the beginning of the computer age arithmetic was defined and implemented by computer manufacturers. The main interest at that time was to optimize the speed of the operations and minimize the circuitry needed to implement them. Accuracy was only considered as a side effect. Numerical scientists suffered from these deficiencies and had to spend much effort to overcome the difficulties. Furthermore many existing floating-point units exhibit machine-dependent irregularities in behaviour which complicate the problem of writing floating-point programs that are portable, in the sense of offering equivalent numerical behaviour on different machines. Upcoming in the early eighties, the IEEE arithmetic standard 754 [ANSI/IEEE 1985], further generalized by the IEEE Std. 854 [ANSI/IEEE 1987] to remove the dependencies on radix and wordlength, changed the situation. The IEEE standards provide direct support of:

- uniform floating-point formats including constrains on parameters defining values of basic and extended floating-point numbers,

- well-defined computer arithmetic operations performed with maximum accuracy,

- four different rounding modes including directed roundings,

- execution-time diagnostics of anomalies and smoother handling of exceptions.

The IEEE arithmetic standards enhance the capabilities and safety available to programmers and facilitate the movement of programs between the diverse computers adhering these standards. Between the main achievements of the standards is the provision of arithmetic operations with directed roundings

Page 560

which allows implementation of numerical algorithms with automatic result verification (see e. g. [Kaucher et al. 1992], [Kulisch and Miranker 1983] for such algorithms).

The IEEE standard has been widely adopted to most hardware platforms (chips: Intel 8087, Motorola 6839, etc.) and software implementations ([Falco Korn et al. 1992], [Klatte et al. 1992], [Klatte et al. 1993], [Metzger and Walter 1990], etc.). So the IEEE floating-point arithmetic, as in- tended, is rapidly becoming a de facto computer industry standard for the design of floating-point arithmetic units.

The IEEE standards reflect the present state of the art and they are subject to comments, revision, reaffirmation or change. A surprising number of details and variety of different reasons for their selection must be settled in the design of a practical floating-point unit. Any proposal concerning the Standard should be judged by its consistency, utility and ease of implementation. Amongst the other comments on the standards Lynch and Swartzlander [Lynch and Swartzlander 1992] applied a formalism for specifying the num ber systems to the IEEE Std. 754 and showed that the standard conforming systems exhibit an inconsistency. Having the opportunity to develop a soft ware emulation of decimal floating-point arithmetic according to IEEE Std. 854 [ANSI/IEEE 1987], which was intended to replace the binary arithmetic in the programming language PASCAL-XSC [Klatte et al. 1992], we designed and im plemented another version of the decimal arithmetic routines which support an enhanced set of exception symbols as proposed in [Lynch and Swartzlander 1992].

Section [2 IEEE Standard and its error algebra] outlines the formalization of the IEEE non-trapping floating-point system considered in [Lynch and Swartzlander 1992] and the proposed modified system. In [Section 3] we report some implementation details and discuss the cost and the consistency with the IEEE floating-point Standards of this implementation. All considerations below are valid for both standards 754 and 854, so we shall not refer to the standard number except whenever it is especially necessary.

2 IEEE Standard and its Error Algebra

The IEEE floating-point scheme uses its formats to represent valid floating-point numbers (normalized or denormalized), called also representable numbers; some specially distinguished values as zero and infinity; and a set of special values called NaNs (Not a Number). A formalism considered in [Lynch and Swartzlander 1992] attaches a logical proposition to each element in the operation domain which defines its meaning and specifies the accuracy of the numeric results and the scope of symbolic results. In accordance with the supported floating-point formats the real numbers can be divided into six categories: (1) zero; (2) numbers too small to represent, called underflow numbers; (3) numbers which are too small to be approximated in the normalized format, called denormalized numbers; (4) numbers which may be approximated in the normalized format, called normal numbers; (5) numbers which are too large to represent, called overflow numbers; and (6) infinity. An input real value or the exact result of a floating-point arithmetic operation are approximated by some floating-point number in the supported format. Addition, multiplication and division are defined to include operation on numeric, non-numeric or mixed

Page 561

operands. Four rounding modes corresponding to different type of approximation are supplied by the Standard: rounding to the nearest representable floating- point number as a default mode, rounding to zero, to minus infinity and to plus infinity.

A number of exceptional situations such as Invalid Operation, Overflow, Underflow, Division by Zero and Inexact Result may arise during numerical computations in a floating-point environment conforming IEEE standards. Every exception, when it occurs must raise a flag that a program may subsequently sense and/or take a trap engineered to pass control to some code to handle the detected exceptional condition. The set of special values called NaNs are used for communicating results of Invalid Operation exceptions, attempt to extract the square root of a negative number etc. There are two types of NaNs: quiet NaN which propagate through the arithmetic operations without precipitating exceptions and signaling NaN which precipitate an Invalid Operation exception whenever an attempt is made to use one as arithmetic operand. The IEEE standards require that the default response to the exceptional situations is not to trap on them, but to compute and deliver to the destination a default result, specified in a reasonable way if not universally acceptable, for each possible exception. [Tab. 1] gives the error algebra defined by the IEEE standard for calculations performed in non-trapping mode and positive sign of the operands (valid floating-point numbers are denoted by R).

Table 1: The IEEE Std Non-, Trapping Mode Substitutions.

As it was mentioned above a standard conforming computer can represent the fact that a result is indeterminate or not real with the symbol NaN. But there is no representation for prerounded results known to be in the underflow region (category 2) or the overflow region (category 5). By the following example Lynch and Swartzlander [Lynch and Swartzlander 1992] showed that the IEEE standard conforming system exhibits an inconsistency. [Fig. 1] shows an example program, the theoretically correct and the computed result in the default rounding mode.

Page 562

Figure 1: Example Program, its Theoretical and Computer Execution. (LNN stands for the largest normalized number)

Execution of this example is according to the rules of IEEE Standard. Each result satisfies the statement of the Standard "every operation is performed as if it first produced an intermediate result correct to infinite precision and with unbounded range and then rounded accordingly". But incorrect results are evident in lines 3 and 5. In line 3 the IEEE result is infinity, which corresponds to the proposition "the rounded theoretically correct value is greater than the maximum representable, LNN", although the theoretically correct result is 2. In line 5 the standard result is -2, but the theoretically correct result is infinity. This result is especially dangerous, because it appears to be reasonable.

In [Lynch and Swartzlander 1992] the following enhanced set of elements is proposed to be used in IEEE floating-point computations to circumvent the difficulties arising in the above example:

INF:                the theoretically correct
                    value is infinite
OV:                 the theoretically correct
                    value is greater than the 
                    maximum representable
 representables: the theoretically correct 
                    value is approximately x 
UN:                 the theoretically correct 
                    result is smaller than the 
                    minimum representable
0:                  the theoretically correct 
                    result is zero 
INDET:              the theoretically correct
                    result is indeterminable.

The repeated calculations of the example based on this alternative set of exception symbols and the corresponding rules [Lynch and Swartzlander 1992] give IND.ET as the most specific correct result (B = OV, C = INDET since an OV divided by LNN may be an OV or normal number).

The point is how to implement a floating-point arithmetic system in order to take the advantage of the above enhanced set of exception symbols.

3 Implementation and Consistency

An unfortunate weakness of the Standard is that so far no common programming language neither allows access to the IEEE floating-point operations with directed roundings nor provide suitable interface for testing and handling the exceptions. Only a few prototype languages [Falco Korn et al. 1992], [Klatte et al. 1992], [Klatte et al. 1993], [Metzger and Walter 1990] provide software emulatedfloating- point arithmetic conforming IEEE Std 754.

For the developed floating-point arithmetic supporting an enhanced set of exception symbols the compliance with the PASCAL-XSC design is determined by the use of the following language features:

Page 563

- Both PASCAL-XSC compiler [Alendoerfer and Shiriaev 1992] and the runtime system [Cordes 1991] are written in ANSI C, which ensures running PASCAL-XSC on nearly every computer.

- The runtime routines simulate the decimal IEEE "double" format and the IEEE operations in software and, thus, are independent of the actually used hardware and the floating-point formats of the C compiler in use.

- The PASCAL-XSC runtime system provides a set of routines which allow a flexible monitoring and handling of the exceptions. An individual condition code and a default exception routine is defined for each exception. An unique interface is given by the trap handler for all exception routines.

The IEEE Standard require for each format representation of "at least one signaling NaN" and "at least one quiet NaN". The Standard "does not specify the... interpretation of the sign and significand fields of NaNs". Representation of a decimal IEEE "double" format number r is given on [Fig. 2] [Bohlender et al. 1991]. According to this representation - 2 encodings are used for the NaNs. By default the PASCAL-XSC runtime system assumes that a signaling NaN is identified by bit 51 of this representation being set. A quiet NaN is identified by bit 51 of the representation of the floating-point number being not set. We take advantage of the freedom given by the Standard and use part of the encodings of the quiet NaNs for the representation of the additional exception symbols UN and OV. The INDET value is carried by the qNaN itself. Thus the structure of a quiet NaN in the corresponding floating-point system supporting the extended set of exception symbols becomes as that presented on [Fig. 3]. To be more specific an INDET value is represented by d = 7, OV by d = 3 and UN by d = 1 [see Fig. 2].

Figure 2: Representation of a double precision decimal number

In [Tab. 2] we give the detailed non-trapping mode substitutions for addition according to the sign of the operands and according to the rounding modes.

The Standard does not interpret the sign of a NaN but says that "an implementation may find it helpful to provide additional information about a variable that is a NaN through an algebraic sign". For the realization of a correct addition operation involving UN or/and OV arguments their signs have to be properly determined. So the rules of the Standard concerning the algebraic sign of a result

Page 564

have to be applied even when operands or results are zero, infinite, UN or OV. The signs of the other NaNs remain not interpretable.

Figure 3: Structure of a quiet NaN

Table 2: Non-Trapping Mode Substitutions for Addition where

The following Proposition shows that a valid floating-point number results an addition/subtraction operation on any representable number and an underflow value in any rounding mode.

Page 565

Proposition For any nonzero normal or subnormal number R of the supported floating-point format and a positive underflow value UN, is the smallest machine interval containing R + UN and is the smallest machine interval containing R - UN.

Proof follows from the inequalities 0 < UN < ulp(R) valid for any positive representable number R.

A result R+ ulp(R) will signal an overflow floating-point exception when R is the largest normal number and a result R - ulp(R) will signal an Underflow floating-point exception when R is the smallest denormalized number of the supported floating-point format.

[Tab. 3] and [Tab. 4] give the non-trapping mode substitutions for multiplication and division operations of positive operands. For negative or mixed sign operands the corresponding IEEE Standard rules for the algebraic sign of the result have to be additionally applied.

Table 3: Non-Trapping Mode Substitutions for Multiplication

All operations involving UN or OV argument will signal no exceptions, except for the special cases of addition mentioned above after the Proposition. Trapped overflow on decimal string to floating-point conversion when the result lies too far outside the range of the exponent to be adjusted will deliver to the trap handler an appropriately signed OV. Similar rule is applied to the trapped underflow.

In order to preserve the main purpose of the directed roundings to provide an enclosure of the theoretically correct result we have implemented the non- trapping substitutions given in [Tab. 2]-[Tab. 4] only for the arithmetic operations in round to nearest mode. The enhanced exception set requires the following modification of the IEEE definition of default result on Overflow/Underflow. Round to nearest carries all overflows to OV with the sign of the intermediate result. The delivered default result in round to nearest mode when Underflow have been detected and the corresponding trap is not enabled shall be UN with

Page 566

Table 4: Non-Trapping Mode Substitutions for Division

the sign of the intermediate result. Remarkably, the proposed implementation scheme for a floating-point arithmetic supporting an enhanced set of exception symbols fits quite well in the frames prescribed by the IEEE floating-point Standard. This is ensured by the decision UN and OV elements to be implemented as belonging to the set of quiet NaNs and except for the substitutions from [Tab. 2]-[Tab. 4] to apply the corresponding rules for quiet NaNs prescribed by the Standard. Non-trapping substitutions from [Tab. 2]-[Tab. 4] do not contradict the Standard. Some minor inconsistencies of no practical importance can be met. The new implementation does not conform the IEEE requirement "Every operation involving one or two input NaNs, none of them signaling, shall signal no exception but, if a floating- point result is to be delivered, shall deliver as its result a quiet NaN, which should be one ofthe inputs NaNs".

For example, R/UN = and

neither OV nor qNaN is "one ofthe input NaNs" (UN). According to the substitution tables the result of the example of [Fig. 1] will be qNaN and the execution of line 3: C = OV/LNN = qNaN will show the same inconsistency.

It should be mentioned that the difficulties connected with overflowed results in IEEE non-trapping mode computations, as those of the example of [Fig. 1], can be overcomed also at an user level by the following substitutions:

In order to ensure correct behaviour of the non-trapping floating-point computations, always when Overflow exception arises one can switch to predefined

Page 567

arithmetic operations which will check for the special cases (1) and will provide more correct results. This can be done for all rounding modes. Ofcourse, checking special cases (1) will be much more time consuming than using a floating-point arithmetic supporting the enhanced exception set.

The major advantage of the enhanced set of exception symbols proposed in [Lynch and Swartzlander 1992] concerns those applications dealing with infinite input elements. According to the proposed substitutions infinity as a result of a floating-point operation can be obtained only when at least one of the operands is infinity or when a nonzero number is divided by zero. Thus infinite large or infinite small in magnitude values obtained as a result of roundoff errors can be clearly distinguished from the operations involving infinities. Thus the implemented floating-point system provides more functionality and safety for only a small additional implementation cost.

4 Conclusion

A considerable amount of manpower is required for the practical implementation of any proposal concerning IEEE floating-point arithmetic. The presented implementation of floating-point arithmetic supporting an enhanced set of exception symbols comes to answer some open questions about its utility and consistency with the IEEE floating-point Standards. The new expanded computational capability is gained at no additional cost. This implementation does not implicate performance penalty. Moreover, with the enhanced capability, the computer can be used to appraise the quality and the reliability of the computed results over a wide range of applications.


This work was made possible due to the contract "Decimal Arithmetic" between the Bulgarian Academy of Sciences and the University of Karlsruhe, Germany.


[Alendoerfer and Shiriaev 1992] Alendoerfer, U., Shiriaev, D.: "PASCAL-XSC to C, A Portable PASCAL-XSC Compiler"; In [Kaucher et al. 1992], 91-104.

[ANSI/IEEE 1985] American National Standards Institute/Institute of Electrical and Electronics Engineers: "IEEE Standard for Binary Floating-Point Arith- metic"; ANSI/IEEE Std 754-1985, New York (1985).

[ANSI/IEEE 1987] American National Standards Institute/Institute of Electrical and Electronics Engineers: "IEEE Standard for Radix-lndependent Floating- Point Arithmetic"; ANSI/IEEE Std 854-1987, New York (1987).

[Bohlender et al. 1991] Bohlender, G., Cordes, D., Klatte, R., Kraemer, W.: "Technical Specifications for a Decimal Arithmetic"; lnstitut fuer Angewandte Mathematik, Universitaet Karlsruhe, Karlsruhe (1991).

[Cordes 1991] Cordes, D.: "Runtime System for a PASCAL-SC Compiler"; In [Kaucher et al. 1992], 151-160.

[Falco Korn et al. 1992] Falco Korn, C., Gutzwiller, S., Koenig, S., Ullrich, Ch.: "Modula-SC. Motivation, Language Definition and lmplementation"; ln [Kaucher et al. 1992], 161-181.

Page 568

[Kaucher et al. 1992] Kaucher, E.,Markov, S. M.,Mayer, G. (Eds.): "Computer Arithmetic, Scientific Computation and Mathematical Modelling"; IMACS Annals on Computing and Appl. Math., 12, J. C. Balzer, Basel (1992).

[Klatte et al. 1993] Klatte, R., Kulisch, U., Lawo, C., Rauch, M., Wiethoff, A.: "CXSC A C++ Class Library for Extended Scientific Computation"; Springer, Berlin (1993).

[Klatte et al. 1992] Klatte, R., Kulisch, U., Neaga, M., Ratz, D., Ullrich, Ch.: "PASCAL-XSC Language Reference with Examples"; Springer, Berlin (1992).

[Kulisch and Miranker 1981] Kulisch, U., Miranker, W. L.: "Computer Arithmetic in Theory and Practice"; Academic Press, New York (1981).

[Kulisch and Miranker 1983] Kulisch, U., Miranker, W. L. (Eds.): "A New Approach to Scientific Computation"; Academic Press, New York (1983).

[Lynch and Swartzlander 1992] Lynch, T. W., Swartzlander E. E.: "A Formalization for Computer Arithmetic"; In Atanassova, L., Herzberger, J. (Eds.): "Computer Arithmetic and Enclosure Methods" Elsevier Sci. Publishers B. V. (1992), 137-145.

[Metzger and Walter 1990] Metzger, M., Walter, W. V.: "FORTRAN-SC: A Programming Language for Engineering/Scientific Computation" In Ullrich, Ch. (Ed.): "Contribution to Computer Arithmetic and Self-Validating Numerical Methods"; IMACS Annals on Computing and Appl. Math., 7, J. C. Balzer, Basel (1990), 427-441.

Page 569