Fig 1 - uploaded by Jay P. Lim
Content may be subject to copyright.
(a) The bit-string of a 32-bit float. (b) The bit-string of a 5-bit FP representation with 2 exponent and 2 mantissa bits. (c) The bit-string of a 4-bit FP representation with 2 exponent and 1 mantissa bits.

(a) The bit-string of a 32-bit float. (b) The bit-string of a 5-bit FP representation with 2 exponent and 2 mantissa bits. (c) The bit-string of a 4-bit FP representation with 2 exponent and 1 mantissa bits.

Source publication
Preprint
Full-text available
Mainstream math libraries for floating point (FP) do not produce correctly rounded results for all inputs. In contrast, CR-LIBM and RLIBM provide correctly rounded implementations for a specific FP representation with one rounding mode. Using such libraries for a representation with a new rounding mode or with different precision will result in wro...

Contexts in source publication

Context 1
... FP bit-string consists of a sign bit, |í µí°¸| bits to represent the exponent, and í µí±› − 1 − |í µí°¸| bits to represent the mantissa (í µí°¹ ). Figure 1 shows the bit-string for a standard 32-bit float and the custom 5-bit and 4-bit FP representations. If the sign bit is 0, then the value is positive. ...
Context 2
... describe our entire approach with an end-to-end example for creating a polynomial approximation for í µí±™í µí±›(í µí±¥) that produces correctly rounded results for a 5-bit FP representation with 2 exponent bits (FP5) and a 4-bit FP with 2 exponent bits (FP4) for all standard rounding modes (i.e., í µí±Ÿí µí±›, í µí±Ÿí µí±Ž, í µí±Ÿí µí± §, í µí±Ÿí µí±¢, and í µí±Ÿí µí±‘). Figure 1(b) and Figure 1(c) show the bit-string of FP5 and FP4, respectively. Although we illustrate our approach with FP5 and FP4 for ease of exposition, it is beneficial in practice to create table-lookups for FP5 and FP4 because there are only 32 and 16 distinct bit-patterns, respectively. ...
Context 3
... describe our entire approach with an end-to-end example for creating a polynomial approximation for í µí±™í µí±›(í µí±¥) that produces correctly rounded results for a 5-bit FP representation with 2 exponent bits (FP5) and a 4-bit FP with 2 exponent bits (FP4) for all standard rounding modes (i.e., í µí±Ÿí µí±›, í µí±Ÿí µí±Ž, í µí±Ÿí µí± §, í µí±Ÿí µí±¢, and í µí±Ÿí µí±‘). Figure 1(b) and Figure 1(c) show the bit-string of FP5 and FP4, respectively. Although we illustrate our approach with FP5 and FP4 for ease of exposition, it is beneficial in practice to create table-lookups for FP5 and FP4 because there are only 32 and 16 distinct bit-patterns, respectively. ...
Context 4
... general, we use mathematical properties of the elementary function for larger data types to effectively handle such that solves for the coeï¿¿cients of P(x). We create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. ...
Context 5
... create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. ...
Context 6
... the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. Thus, the result of the polynomial will round to the correctly rounded result of f (x) when rounded to FP5 or FP4 representation using any rounding mode. ...
Context 7
... < P (0.25) < 1.375 0.75 < P (0.5) < 0.625 0.375 < P (0.75) < 0.25 0.125 < P (1.25) < 0.25 0.375 < P (1.5) < 0. that solves for the coeï¿¿cients of P(x). We create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. ...
Context 8
... create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. ...
Context 9
... the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. Thus, the result of the polynomial will round to the correctly rounded result of f (x) when rounded to FP5 or FP4 representation using any rounding mode. ...
Context 10
... example to show why the round-to-odd result avoids double rounding errors. We provide intuition on how rounding with the round-to-odd mode avoids double rounding errors in Figure 10. Any value that is representable in T í µí±› is also representable in T í µí±›+2 . ...
Context 11
... the real value is exactly equal to í µí±¤ 0 , then the round-to-odd mode with T í µí±›+2 also rounds to í µí±¤ 0 (similarly í µí±¤ 2 and í µí±¤ 4 with T í µí±›+2 ). Figure 10 illustrates the task of rounding the real value directly to T í µí±› with the í µí±Ÿí µí±› mode (solid arrow) and the result produced from double rounding the í µí±Ÿí µí±œ result from T í µí±›+2 to T í µí±› using the í µí±Ÿí µí±› mode. ...
Context 12
... an elementary function í µí±“ (í µí±¥) and a list of inputs í µí±‹ in the T í µí±› representation (i.e., í µí±‹ ⊆ T í µí±› ), the first step is to compute the correctly rounded result in representation T í µí±›+2 using the round-to-odd mode (i.e., í µí±¦ í µí±Ÿí µí±œ for each input í µí±¥ ∈ í µí±‹ ). Figure 11 shows our algorithm to compute the round-to-odd result í µí±¦ í µí±Ÿí µí±œ for each input using the real value from the oracle. ...
Context 13
... we compute the odd interval of each result í µí±¦ í µí±Ÿí µí±œ such that any real value in the odd interval rounds to í µí±¦ í µí±Ÿí µí±œ . Figure 11 also provides our algorithm to compute the odd interval. The odd intervals of some inputs can be a singleton (i.e., only one value in the odd interval), which we handle separately (Section 4.3). ...
Context 14
... resulting polynomial when used with range reduction (í µí± í µí± H ) and output compensation (í µí±‚í µí° ¶ H ) produces correctly rounded results for all inputs í µí±¥ ∈ í µí±‹ with all representations T í µí±˜ for all standard rounding modes. CalcResultsInRO computes the round-to-odd result using an oracle (see Figure 11). CalcOddIntervals computes the set of odd intervals (í µí°¿) and set (í µí±†) of singleton odd intervals (see Figure 11). ...
Context 15
... computes the round-to-odd result using an oracle (see Figure 11). CalcOddIntervals computes the set of odd intervals (í µí°¿) and set (í µí±†) of singleton odd intervals (see Figure 11). Once we have the odd intervals and singletons, we use RLibm's polynomial generation procedure (RLibmPolyGen) to obtain the generic polynomial. ...
Context 16
... first step in our approach is to identify the correctly rounded result í µí±¦ í µí±Ÿí µí±œ for input í µí±¥. Figure 11 provides the steps to compute the round-to-odd result in T í µí±›+2 given a real value of í µí±“ (í µí±¥). We compute the real value í µí±¦ = í µí±“ (í µí±¥) for each input í µí±¥ using an oracle (e.g., MPFR library). ...
Context 17
... we determine the correctly rounded result í µí±¦ í µí±Ÿí µí±œ of í µí±“ (í µí±¥) in representation T í µí±›+2 using the round-to-odd mode, the next step is compute the interval of values in representation H, which is used for polynomial evaluation and range reduction, such that producing any value in the interval rounds to í µí±¦ í µí±Ÿí µí±œ , which we call as the odd interval. The function CalcOddIntervals in Figure 11 describes the steps to compute the odd interval. If the correct rounded result í µí±¦ í µí±Ÿí µí±œ in T í µí±›+2 is even, then the odd interval is a singleton. ...
Context 18
... deduce the odd interval for each input. In Figure 11, í µí°¿ represents the set of non-singleton odd intervals for all inputs, which is given to the polynomial generator. ...
Context 19
... sticky bit, í µí± í µí±¡í µí±–í µí±í µí±˜í µí±¦ 1 is the bitwise-OR of all bits starting from the (í µí±˜ + 2) í µí±¡ℎ -bit of í µí±£ R . Figure 12 pictorially shows the rounding components í µí±£ − 1 , í µí±Ÿí µí± 1 , and í µí± í µí±¡í µí±–í µí±í µí±˜í µí±¦ 1 while rounding í µí±£ R to T í µí±˜ . ...
Context 20
... µí±Ÿí µí± 2 = í µí± í µí±˜+1 í µí± í µí±¡í µí±–í µí±í µí±˜í µí±¦ 2 = í µí± í µí±˜+2 | í µí± í µí±˜+3 | · · · | í µí± í µí±›+1 | í µí±¡ Figure 12 shows these components while rounding í µí±£ í µí±Ÿí µí±›í µí±œ to T í µí±˜ . Now, we compare the rounding components when we directly round í µí±£ R to T í µí±˜ with the rounding components when we round í µí±£ í µí±Ÿí µí±›í µí±œ to T í µí±˜ . ...
Context 21
... contrast to RLibm-All, glibc's libm, Intel's libm, and CR-LIBM do not produce correct results for all inputs when used for a 32-bit float type. Figure 13(d) presents the speedup of RLibm-All's FP functions over RLibm-32's functions in producing 32-bit float values rounded with the í µí±Ÿí µí±› rounding mode. On average, RLibm-All is almost as fast as RLibm-32 (i.e., 2% slower than RLibm-32). ...
Context 22
... describe our entire approach with an end-to-end example for creating a polynomial approximation for í µí±™í µí±›(í µí±¥) that produces correctly rounded results for a 5-bit FP representation with 2 exponent bits (FP5) and a 4-bit FP with 2 exponent bits (FP4) for all standard rounding modes (i.e., í µí±Ÿí µí±›í µí±’, í µí±Ÿí µí±›í µí±Ž, í µí±Ÿí µí±›í µí± §, í µí±Ÿí µí±›í µí±, and í µí±Ÿí µí±›í µí±›). Figure 1(b) and Figure 1(c) shows the bit-string of FP5 and FP4, respectively. We illustrate our approach with FP5 and FP4 for exposition. ...
Context 23
... describe our entire approach with an end-to-end example for creating a polynomial approximation for í µí±™í µí±›(í µí±¥) that produces correctly rounded results for a 5-bit FP representation with 2 exponent bits (FP5) and a 4-bit FP with 2 exponent bits (FP4) for all standard rounding modes (i.e., í µí±Ÿí µí±›í µí±’, í µí±Ÿí µí±›í µí±Ž, í µí±Ÿí µí±›í µí± §, í µí±Ÿí µí±›í µí±, and í µí±Ÿí µí±›í µí±›). Figure 1(b) and Figure 1(c) shows the bit-string of FP5 and FP4, respectively. We illustrate our approach with FP5 and FP4 for exposition. ...
Context 24
... solves for the coeï¿¿cients of P(x). We create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. ...
Context 25
... create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. ...
Context 26
... the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. Thus, the result of the polynomial will round to the correctly rounded result of f (x) when rounded to FP5 or FP4 representation using any rounding mode. ...
Context 27
... < P (0.25) < 1.375 0.75 < P (0.5) < 0.625 0.375 < P (0.75) < 0.25 0.125 < P (1.25) < 0.25 0.375 < P (1.5) < 0. that solves for the coeï¿¿cients of P(x). We create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. ...
Context 28
... create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. ...
Context 29
... the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. Thus, the result of the polynomial will round to the correctly rounded result of f (x) when rounded to FP5 or FP4 representation using any rounding mode. ...
Context 30
... values in this region rounds to w 1 All values in this region rounds to w 3 Fig. 10. The í µí±Ÿí µí±›í µí±œ rounding mode. We show the rounding of í µí±£ R with the í µí±Ÿí µí±›í µí±œ mode. Here, í µí±¤ 0 , í µí±¤ 1 , í µí±¤ 2 , í µí±¤ 3 , and í µí±¤ 4 are values representable in representation T. If í µí±£ R is exactly representable in T, then í µí±£ R rounds to that value. Otherwise, í µí±£ R rounds to the nearest ...
Context 31
... í µí±£ R rounds to the nearest odd value in the target representation. Figure 10 illustrates the í µí±Ÿí µí±›í µí±œ rounding mode. Using the rounding components (í µí± , í µí±£ − , í µí±Ÿí µí±, í µí± í µí±¡í µí±–í µí±í µí±˜í µí±¦) from Section 2.2, the í µí±Ÿí µí±›í µí±œ mode can be defined as follows: ...
Context 32
... resulting polynomial when used with range reduction (í µí± í µí± H ) and output compensation (í µí±‚í µí° ¶ H ) produces correctly rounded results for all inputs í µí±¥ ∈ í µí±‹ with all representations T í µí±˜ for all standard rounding modes. CalcResultsInRNO computes the í µí±Ÿí µí±›í µí±œ result using an oracle (see Figure 12). CalcOddIntervals computes the set of odd intervals (í µí°¿) and set (í µí±†) of singleton odd intervals (see Figure 12). ...
Context 33
... computes the í µí±Ÿí µí±›í µí±œ result using an oracle (see Figure 12). CalcOddIntervals computes the set of odd intervals (í µí°¿) and set (í µí±†) of singleton odd intervals (see Figure 12). Once we have the odd intervals and singletons, we use RLibm's polynomial generation procedure (RLibmPolyGen) to obtain the generic polynomial. ...
Context 34
... example to show why the í µí±Ÿí µí±›í µí±œ result avoids double rounding errors. We provide intuition on how rounding with the í µí±Ÿí µí±›í µí±œ mode avoids double rounding errors in Figure 11. Any value that is representable in T í µí±› is also representable in T í µí±›+2 . ...
Context 35
... the real value is exactly equal to í µí±¤ 0 , then the í µí±Ÿí µí±›í µí±œ mode in T í µí±›+2 rounds to í µí±¤ 0 (similarly í µí±¤ 2 and í µí±¤ 4 in T í µí±›+2 ). Figure 11 illustrates the task of rounding the real value directly to T í µí±› with the í µí±Ÿí µí±›í µí±’ mode (solid arrow) and the result produced from double rounding the í µí±Ÿí µí±›í µí±œ result from T í µí±›+2 to T í µí±› using the í µí±Ÿí µí±›í µí±’ mode. In summary, the í µí±Ÿí µí±›í µí±œ result in T í µí±›+2 maintains sufficient information about the real value so that when the í µí±Ÿí µí±›í µí±œ result is (double) rounded to T í µí±˜ with any rounding mode, it produces the correctly rounded result for T í µí±˜ . ...
Context 36
... an elementary function í µí±“ (í µí±¥) and a list of inputs í µí±‹ in the T í µí±› representation (i.e., í µí±‹ ⊆ T í µí±› ), the first step is to compute the correctly rounded result in representation T í µí±›+2 using the í µí±Ÿí µí±›í µí±œ rounding mode (i.e., í µí±¦ í µí±Ÿí µí±›í µí±œ 1 Function CalcResultsInRNO(í µí±“ , T í µí±›+2 , í µí±‹ ): return í µí±‚ 1 Function CalcOddIntervals(í µí±‚, T í µí±›+2 , H): 2 foreach (í µí±¥, í µí±¦ í µí±Ÿí µí±›í µí±œ ) ∈ í µí±‚ do for each input í µí±¥ ∈ í µí±‹ ). Figure 12 shows our algorithm to compute the í µí±Ÿí µí±›í µí±œ result í µí±¦ í µí±Ÿí µí±›í µí±œ for each input using the real value from the oracle. Next, we compute the odd interval of each result í µí±¦ í µí±Ÿí µí±›í µí±œ such that any real value in the odd interval rounds to í µí±¦ í µí±Ÿí µí±›í µí±œ . ...
Context 37
... we compute the odd interval of each result í µí±¦ í µí±Ÿí µí±›í µí±œ such that any real value in the odd interval rounds to í µí±¦ í µí±Ÿí µí±›í µí±œ . Figure 12 also provides our algorithm to compute the odd interval. The odd intervals of some inputs can be a singleton (i.e., only one value in the odd interval). ...
Context 38
... first step in our approach is to identify the correctly rounded result í µí±¦ í µí±Ÿí µí±›í µí±œ for input í µí±¥. Figure 12 provides the steps to compute the í µí±Ÿí µí±›í µí±œ result in T í µí±›+2 given a real value of í µí±“ (í µí±¥). We compute the real value í µí±¦ = í µí±“ (í µí±¥) for each input í µí±¥ using an oracle (e.g., MPFR library). ...
Context 39
... we determine the correctly rounded result í µí±¦ í µí±Ÿí µí±›í µí±œ of í µí±“ (í µí±¥) in representation T í µí±›+2 using the í µí±Ÿí µí±›í µí±œ mode, the next step is compute the interval of values in representation H, which is used for polynomial evaluation and range reduction, such 757:15 that producing any value in the interval rounds to í µí±¦ í µí±Ÿí µí±›í µí±œ , which we call as the odd interval. The function CalcOddIntervals in Figure 12 describes the steps to compute the odd interval. If the correct rounded result í µí±¦ í µí±Ÿí µí±›í µí±œ in T í µí±›+2 is even, then odd interval is a singleton. ...
Context 40
... deduce the odd interval for each input. In Figure 12, í µí°¿ represents the set of non-singleton odd intervals for all inputs, which is given to the polynomial generator. Piecewise polynomial generation using the odd intervals. ...
Context 41
... 6.12. Let vR be a real value. Define two FP representations Tk and Tn+2 as 163 Fig. 13. Let rno be the rounded result of R in Tn+2 using rno rounding mode. (a) shows the bit-string representation of | R | in the infinite extended precision representation. B 1 , rb1, and stick1 shows three rounding components for rounding | R | to Tn , the bit-string representation of the truncated value, the rounding bit, and the sticky ...
Context 42
... . We then truncate B |R | to n bits to get the bit-string representation of the truncated value 1 and identify the rounding bit and the sticky bit: Figure 13(a) pictorially shows B 1 , rb1, and stick1 extracted from B |R | . The value represented by the bit-string B 1 in Tn is the truncated value, 1 . ...
Context 43
... 6.12. Let vR be a real value. Define two FP representations Tk and Tn+2 as 163 Fig. 13. Let rno be the rounded result of R in Tn+2 using rno rounding mode. (a) shows the bit-string representation of | R | in the infinite extended precision representation. B 1 , rb1, and stick1 shows three rounding components for rounding | R | to Tn , the bit-string representation of the truncated value, the rounding bit, and the sticky ...
Context 44
... . . . We then truncate B |R | to n bits to get the bit-string representation of the truncated value 1 and identify the rounding bit and the sticky bit: Figure 13(a) pictorially shows B 1 , rb1, and stick1 extracted from B |R | . The value represented by the bit-string B 1 in Tn is the truncated value, 1 . ...
Context 45
... . 13. (a) Rounding components while rounding í µí±£ R and í µí±£ í µí±Ÿí µí±›í µí±œ to T í µí±› . (b) Rounding components while rounding í µí±£ R and í µí±£ í µí±Ÿí µí±›í µí±œ to T í µí±˜ . We show the bit-string of í µí±£ R in extended infinite precision representation. Note í µí±£ í µí±Ÿí µí±›í µí±œ is a value in T í µí±›+2 . precision ...

Similar publications

Preprint
Full-text available
This paper presents a novel method for generating a single polynomial approximation that produces correctly rounded results for all inputs of an elementary function for multiple representations. The generated polynomial approximation has the nice property that the first few lower degree terms produce correctly rounded results for specific represent...