Modular Range Reduction: A New Algorithm for Fast and Accurate
Computation of the Elementary Functions
Marc Daumas (Lab. LIP, Ecole Normale Superieure de Lyon
Marc.Daumas@lip.ens-lyon.fr)
Christophe Mazenc (Lab. LIP, Ecole Normale Superieure de Lyon
Christophe.Mazenc@lip.ens-lyon.fr)
Xavier Merrheim (Lab. LIP, Ecole Normale Superieure de Lyon
Xavier.Merrheim@lip.ens-lyon.fr)
Jean-Michel Muller (CNRS, Lab. LIP, Ecole Normale Superieure de
Lyon Jean-Michel.Muller@lip.ens-lyon.fr)
Abstract: A new range reduction algorithm, called Modular
Range Reduction (MRR), briefly introduced by the authors in [Daumas
et al. 1994] is deeply analyzed. It is used to reduce the arguments
to exponential and trigonometric function algorithms to be within the
small range for which the algorithms are valid. MRR reduces the
arguments quickly and accurately. A fast hardwired implementation of
MRR operates in time (log(n)), where n is the number of bits of the
binary input value. For example, with MRR it becomes possible to
compute the sine and cosine of a very large number accurately. We
propose two possible architectures implementing this algorithm. Key Words: Computer Arithmetic, Elementary Functions, Range Reduction Category: B.2, G.1.0
1 Introduction
The algorithms used for evaluating the elementary functions
(polynomial or rational approximations [Cody and Waite 1980], [Remes 1934],
Taylor expansions, shift-and-add algorithms-see [Ercegovac
1973], [DeLugish 1970], [Walther 1971] and [Asada et al. 1987], table lookup methods...) only give a correct result if the argument is
within a given bounded interval. In order to evaluate an elementary
function f(x) for any x, one must find some 'transformation' that
makes it possible to deduce f(x) from some value g(x*), with: -x* is deduced from x, x* is called the reduced argument -x* belongs to the convergence domain of the algorithm implemented for the evaluation of g. In practice, there are two different kinds of reduction: 1. Additive reduction. x* is equal to x - kC, where k is an integer
and C a constant (for instance, for the trigonometric functions, C is a multiple of  Page 162
2. Multiplicative reduction. x* is equal to , where k is an
integer and C a constant (for instance, for the
logarithm function, a convenient choice for C is the radix of the
number system). Example 1 Computation of the cosine function. Assume that we want to
evaluate cos(x), and that the convergence domain of the algorithm
used to evaluate the sine and cosine of the reduced argument contains . We choose and the computation of cos(x) is
decomposed in three steps: 
The previous reduction mechanism is an additive reduction. Let us
examine another example of additive reduction. Example 2 Computation of the exponential function. Assume that we
want to evaluate in a radix-2 number system, and that the
convergence domain of the algorithm used to evaluate the exponential of the reduced argument contains . We choose C = ln(2),
and the computation of is decomposed in three steps: 
The radix-2 number system makes the final multiplication by straightforward. Another way of performing the range reduction for the exponential
function (with an algorithm whose convergence domain is ) is to
choose , Then, and can either be evaluated by performing a few multiplications -
since k is an integer - or by table-lookup. Usually, this latter
method is preferable, since there is no loss of accuracy when
computing x*. With the range reduction algorithm we give in the
following, the former choices for x*, g, and k become interesting,
for several reasons: - we will be able to compute x* very accurately, - the required convergence interval is smaller, which means that to reach the same accuracy, we need a smaller number of coefficients for a polynomial or rational approximation, - there will not be any error when deducing from g(x*). Anyway, range reduction is more a problem for trigonometric functions than for exponentials, since, in practice, we never have to deal with exponentials of very large numbers: they merely are overflows! Page 163
Example 3 Computation of the logarithm function. Assume that we want
to evaluate ln(x), x > 0, in a radix-2 number system, and that the
convergence domain of the algorithm used to compute the logarithm of
the reduced argument contains . We choose C = 2, and the
computation of ln(x)is decomposed in three steps: - Compute and k such that (if x is a normalized radix-2 floating-point number, x* is its mantissa, while k is its exponent). - Compute g(x*, k) = ln(x*) - Compute ln(x) = g(x*,k) + k ln(2) The previous mechanism is a multiplicative reduction. In practice, multiplicative reduction is not a problem: when
computing the usual mathematical functions, it only occurs with
logarithms and n-th roots. With these functions, as in the example
above, C can be chosen equal to a power of the radix of the number
system. This makes the computation of straight-forward.
Therefore, in the following, we concentrate on the problem of
additive range reduction only.
2 The Modular Range Reduction Algorithm
We focus on the following problem: we assume that we have an
algorithm able to compute the function g in an interval of the form (we call this case 'symmetrical reduction')
or (we call this case 'positive reduction'), with .
We want to compute and an integer k such that: x* = x - kC (2) If then x* and k are not uniquely defined by Eq. 2. In such a
case, the problem of deducing these values from x will be
called 'redundant range reduction'. For example, if and x = 2.5, then k = 1 and x* = 0.929203 ... or k = 2 and
x* = -0.641592 ... are possible values. If this problem is
called 'non-redundant range reduction'. As in many fields of
computer arithmetic, redundancy will allow faster algorithms. Table 1 sums-up the different possible cases. 
Table 1. The different cases in additive range-reduction Page 164
It is worth noticing that: 1. In some practical cases, it is not necessary to fully compute k.
For instance, for the trigonometric functions, if C = then one just needs to know k mod 4. If C = there is no need for any information about k. 2. With the usual algorithms for evaluating the elementary functions, one can assume that the length of the convergence domain is greater than C, i.e. that we can perform a redundant range reduction. For instance, with the CORDIC algorithm, when performing rotations (see [Walther 1971]), the convergence domain is , which is much larger than . With polynomial or rational approximations, the convergence domain can be enlarged by adding one coefficient to the approximation. Let us define as the nearest integer to x. Usually, the range
reduction is done by: - Computing k = (in the case of a positive reduction), or k = (in the case of a symmetrical reduction) by the means of multiplication or division. - Computing the reduced argument x* = x - kC by the means of a multiplication by k or a table-lookup if the values kC are pre computed and stored followed by a subtraction. The above process may be rather inaccurate (for large values of x,
the final subtraction step leads to a catastrophic cancellation -
although cunning tricks have been proposed to limit this cancellation
[Cody and Waite 1980]). In [Daumas et al. 1994] we briefly proposed a
new algorithm, called the modular reduction algorithm (MRR), that
performs the range reduction quickly and accurately. In the following
we deeply analyze that algorithm in terms of speed and accuracy, and we propose architectures implementing it.
2.1 Fixed-point reduction
First of all, we assume that the input operands are Fixed-point
radix-2 numbers, less than . These numbers have a N-bit integer
part and a p-bit fractional part. So the digit chain: 
represents the number  We assume that we should perform a redundant range reduction, and we
call the integer such that
Let us define, for the number is an
integer (in the following, we will write ). The
Modular Range Reduction (MRR) algorithm consists in performing the two following steps: Page 165
First reduction We compute the number1*: 1*(This formula looks correct only for positive values of . It would be more correct, although maybe less clear, to write: ) 
Since the are equal to 0 or 1, this computation is reduced to
the sum of terms. The result r of this first reduction is
between . This is a consequence of
the fact that all the have an absolute value less than C/2 and 
has an absolute value less than , which is less than C. Second reduction Define the as
the digits of the result of the first reduction: 
where  Let us define as the number obtained by truncating
the binary representation of r after the -th bit, that
is (using the relation
is an m-digit number, where m =
is a very small number in all practical cases (see the example
below). If we define k as
i.e. it will be the correct result of the symmetrical (resp. positive) range reduction.
Proof 
Since k can be deduced from , this second reduction step will be
implemented by looking up the value kC in a -bit entry table at the
address constituted by the bits of . Fig.1 Sums up the different steps of MRR. During this reduction process, we perform the addition
of terms. If these terms (namely the and the value kC
of the second reduction step) are represented in fixed-point with q
fractional bits (i.e. the error on each of these term is bounded by ), then the difference between the result of the computation and
the exact reduced argument is bounded by .
In order to obtain the reduced argument x* with the same absolute accuracy as
the input argument x (i.e. p significant fixed-point fractional
digits), one needs to store the and the values kC with p + fractional bits. Page 166

Fig. 1. The Modular Reduction Algorithm Example 4. Assume we need to compute sines of angles between and , and that the algorithm used with the reduced arguments is CORDIC
[Volder 1959], [Walther 1971]. The convergence interval , therefore (since 1.743 ) we have to perform a symmetrical redundant range reduction, with We immediately get the following parameters: - N = 20 and  - The first range reduction consists of the addition of 19 terms - the second reduction step requires a table with 8-address bits. Page 167
- To obtain the reduced argument with p significant fractional bits, one needs to store the and the values kC with p + 5 bits. Assume we compute the sine of 355. The binary representation of 355 is
101100011. Therefore during the first reduction, we have to compute where: 
We get The second
reduction consists in subtracting from that result, which gives the sine of which is
Therefore, sin(355) = -0.000030144353359488449 ...
2.2 Floating-point reduction
Now, assume that the input value x is a radix-2 floating-point
number: 
The range reduction can be performed exactly as in the fixed-point
case. During the first reduction, we replace the addition of the
terms by the addition of the terms As previously, mod C is the number belonging
to such that is an integer. The main difference
between this
reduction method and the other ones is that during the reduction
process, we just add numbers (the ) of the same order of
magnitude, represented in fixed-point. This makes the reduction very
accurate. One can easily show that if the and the terms kC of
the second reduction are represented with q fractional bits then the
absolute error on the reduced argument is bounded by
Thus, for instance it is possible to compute with good accuracy the
sine and cosine of a huge floating-point number. In floating point
number systems, one would prefer informations on the relative error:
this will be discussed later.
3 Architectures for Modular Reduction
The first reduction consists of adding numbers. This
addition can be performed in a redundant number system (carry-save or
signed-digit) in order to benefit from the carry-free ability of such
a system, and/or in an arborescent way.
This problem is obviously
closely related to the problem of multiplying two numbers
(multiplying reduces to
computing the sum of the q + 1 terms ). Therefore, almost
all the classical architectures proposed in the multiplication literature (see for instance [Braun 1963],
[Dadda 1965], [Harata et al. 1987], [Nakamura 1986], [Takagi et al.
1985], [Wallace 1964]), can be slightly modified in order to be used
for range reduction. For instance, the architecture shown Fig. 2 is
obtained from Braun's cellular array multiplier [Braun 1963], while
the logarithmic-time architecture shown in Fig. 4 is a Page 168
Wallace
tree [Wallace 1964]. In Fig. 3, is the digit of weight
This similarity between the Modular Range Reduction algorithm and the
multiplication makes it possible to perform both operations with the same hardware, which can save some silicon area on a circuit. The similarity between the range reduction and the multiplication
leads us to another idea: in order to accelerate the first reduction, one can perform a Booth recoding [Booth 1951], or merely a modified booth recoding [Hwang 1979], of x.
This would give a signed digit (with digits -1, 0 and 1)
representation of x with at least half of the digits equal to zero.
Then the number of terms to be added during the first reduction
would be halved.
4 Accuracy of MRR with Floating-point Inputs
As pointed out in section 2.2, if the input value x is a m-mantissa
bit radix-2 floating-point number, and if the terms of the first
reduction and the terms kC of the second reduction are represented with q fractional bits then the absolute error on the reduced argument is bounded by . This makes it possible to evaluate sines and cosines of huge floating-point numbers with a good absolute accuracy.
However, in floating-point, one is more concerned
with the relative accuracy: what could be done if the result of MRR
is zero or a number close to zero ? This would indicate that the exact reduced argument is very small, but the computed value would only have a few (maybe not at all)
significant digits. In the sequel of this paper, we deal with that problem.
4.1 How could MRR Results close to Zero be avoided ?
As in section 2.2, the input number x is the radix-2 floating-point
number: 

M is an integer. Assume that the result of the range reduction is very small, say less than a very small real number . This means that there exists an integer k such that: 
This implies: 
This means that is a very good rational approximation of C.
To see to what extent relation (4) is likely to happen, we can compare this approximation to the sequence of the 'best' possible rational approximation of C, i.e. the sequence of its continued fraction
approximations. Let us call the sequence of the
continued fraction approximations of C. For instance if C = then: 
Page 169
Let us define as the integer such that: 
Since (from the theory of continued fractions), for each rational
number such that we have: 
So, if one wants to get a relative error for input values lower than the way to proceed is the following one: 
5 Conclusion
We have proposed a fast and accurate algorithm to perform the range
reduction for computing the elementary functions. This algorithm can
be implemented using a slightly modified multiplier, so that range
reduction and multiplication can be performed using the same
hardware, which can save some silicon area on a circuit. Using this
algorithm, accurately and quickly computing the sine of a number such
as becomes possible. Page 170

Fig. 2. A cellular array for range reduction Page 171

Fig. 3. A black cell of the cellular array Page 172

Fig. 4. A logarithmic-time range-reduction tree Page 173
References
[Asada et al. 1987]
T. Asada, N. Takagi, and S. Yajima. A hardware
algorithm for computing sine and cosine using redundant binary
representation. Systems and computers in Japan, 18(8), 1987.
[Booth 1951]
A.D. Booth. A signed binary multiplication technique. Quarterly
journal of mechanics and applied mathematics, 4(2):236-240,
1951. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol. 1,
IEEE Computer Society Press Tutorial, 1990.
[Braun 1963]
E.L. Braun.
Digital computer design. New York academic, 1963.
[Cody and Waite 1980]
W. Cody and W. Waite. Software Manual for the Elementary
Functions. Prentice-Hall Inc, 1980.
[Dadda 1965]
L. Dadda. Some schemes for parallel multipliers. Alta Frequenza, 34:349-356, March 1965. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol. 1, IEEE Computer Society Press Tutorial, 1990.
[Daumas et al. 1994]
M. Daumas, C. Mazenc, X. Merrheim, and J.M. Muller. Fast and
Accurate Range Reduction for the Computation of Elementary
Functions. In 14th IMACS World Congress on Computational and
Applied Mathematics, Atlanta, Georgia, 1994.
[Ercegovac 1973]
M.D. Ercegovac. Radix 16 evaluation of certain elementary functions. IEEE
Transactions on Computers, C-22(6), June 1973. Reprinted in E.E.
Swartzlander, Computer Arithmetic, Vol. 1, IEEE Computer Society
Press Tutorial, 1990.
[Harata et al. 1987] Y. Harata, Y. Nakamura, H.
Nagase, M. Takigawa, and N.,l,akagi. A high-speed multiplier using a
redundant binary adder tree. IEEE journal of solid-state circuits,
SC-22(1):28-34, February 1987. Reprinted in E.E. Swartzlander,
Computer Arithmetic, Vol. 2, IEEE Computer Society Press Tutorial,
1990.
[Hwang 1979]
K. Hwang. Computer Arithmetic Principles,
Architecture and design. Wiley & Sons Inc, 1979.
[DeLugish 1970]
B. De Lugish. A Class of Algorithms for Automatic Evaluation of
Functions and Computations in a Digital Computer. PhD thesis, Dept.
of Computer Science, University of Illinois, Urbana, 1970.
[Nakamura 1986]
S. Nakamura. Algorithms for iterative array multiplication.
IEEE Transactions on Computers, C-35(8), August 1986.
[Remes 1934]
E. Remes. Sur un procede convergent d,approximations successives pour
determiner les polynomes d'approximation. C.R. Acad. Sci. Paris,
198, 1934.
[Takagi et al. 1985]
N. Takagi, H. Yasukura, and S. Yajima.
High speed multiplication algorithm with a redundant binary
addition tree. IEEE Transactions on Computers, C-34(9), September
1985.
[Volder 1959]
J. Volder. The cordic computing technique. IRE
Transactions on Electronic Computers, 1959. Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol. 1, IEEE Computer Society
Press Tutorial, 1990.
[Wallace 1964]
C.S. Wallace. A suggestion for a
fast multiplier. IEEE Transactions on Electronic Computers, pages
14-17, February 1964. Reprinted in E.E. Swartzlander, Computer
Arithmetic, Vol. 1, IEEE Computer Society Press Tutorial, 1990.
[Walther 1971]
J. Walther. A unified algorithm for elementary
functions. In Joint Computer Conference Proceedings, 1971. Reprinted in Page 174
E.E. Swartzlander, Computer Arithmetic, Vol. 1, IEEE Computer Society Press Tutorial, 1990.
Page 175
|