(a) The bit-string of a 32-bit float. (b) The bit-string of a 5-bit FP representation with 2 exponent and 2 mantissa bits. (c) The bit-string of a 4-bit FP representation with 2 exponent and 1 mantissa bits.

Source publication

RLIBM-ALL: A Novel Polynomial Approximation Method to Produce Correctly Rounded Results for Multiple Representations and Rounding Modes

Preprint

Full-text available

Aug 2021

Mainstream math libraries for floating point (FP) do not produce correctly rounded results for all inputs. In contrast, CR-LIBM and RLIBM provide correctly rounded implementations for a specific FP representation with one rounding mode. Using such libraries for a representation with a new rounding mode or with different precision will result in wro...

Context 1

... FP bit-string consists of a sign bit, |í µí°¸| bits to represent the exponent, and í µí± − 1 − |í µí°¸| bits to represent the mantissa (í µí°¹ ). Figure 1 shows the bit-string for a standard 32-bit float and the custom 5-bit and 4-bit FP representations. If the sign bit is 0, then the value is positive. ...

View in full-text

Context 2

... describe our entire approach with an end-to-end example for creating a polynomial approximation for í µí±í µí±(í µí±¥) that produces correctly rounded results for a 5-bit FP representation with 2 exponent bits (FP5) and a 4-bit FP with 2 exponent bits (FP4) for all standard rounding modes (i.e., í µí±í µí±, í µí±í µí±, í µí±í µí± §, í µí±í µí±¢, and í µí±í µí±). Figure 1(b) and Figure 1(c) show the bit-string of FP5 and FP4, respectively. Although we illustrate our approach with FP5 and FP4 for ease of exposition, it is beneficial in practice to create table-lookups for FP5 and FP4 because there are only 32 and 16 distinct bit-patterns, respectively. ...

View in full-text

Context 3

... describe our entire approach with an end-to-end example for creating a polynomial approximation for í µí±í µí±(í µí±¥) that produces correctly rounded results for a 5-bit FP representation with 2 exponent bits (FP5) and a 4-bit FP with 2 exponent bits (FP4) for all standard rounding modes (i.e., í µí±í µí±, í µí±í µí±, í µí±í µí± §, í µí±í µí±¢, and í µí±í µí±). Figure 1(b) and Figure 1(c) show the bit-string of FP5 and FP4, respectively. Although we illustrate our approach with FP5 and FP4 for ease of exposition, it is beneficial in practice to create table-lookups for FP5 and FP4 because there are only 32 and 16 distinct bit-patterns, respectively. ...

View in full-text

Context 4

... general, we use mathematical properties of the elementary function for larger data types to effectively handle such that solves for the coeï¿¿cients of P(x). We create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. ...

View in full-text

Context 5

... create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. ...

View in full-text

Context 6

... the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. Thus, the result of the polynomial will round to the correctly rounded result of f (x) when rounded to FP5 or FP4 representation using any rounding mode. ...

View in full-text

Context 7

... < P (0.25) < 1.375 0.75 < P (0.5) < 0.625 0.375 < P (0.75) < 0.25 0.125 < P (1.25) < 0.25 0.375 < P (1.5) < 0. that solves for the coeï¿¿cients of P(x). We create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. ...

View in full-text

Context 8

... create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. ...

View in full-text

Context 9

... the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. Thus, the result of the polynomial will round to the correctly rounded result of f (x) when rounded to FP5 or FP4 representation using any rounding mode. ...

View in full-text

Context 10

... example to show why the round-to-odd result avoids double rounding errors. We provide intuition on how rounding with the round-to-odd mode avoids double rounding errors in Figure 10. Any value that is representable in T í µí± is also representable in T í µí±+2 . ...

View in full-text

Context 11

... the real value is exactly equal to í µí±¤ 0 , then the round-to-odd mode with T í µí±+2 also rounds to í µí±¤ 0 (similarly í µí±¤ 2 and í µí±¤ 4 with T í µí±+2 ). Figure 10 illustrates the task of rounding the real value directly to T í µí± with the í µí±í µí± mode (solid arrow) and the result produced from double rounding the í µí±í µí± result from T í µí±+2 to T í µí± using the í µí±í µí± mode. ...

View in full-text

Context 12

... an elementary function í µí± (í µí±¥) and a list of inputs í µí± in the T í µí± representation (i.e., í µí± ⊆ T í µí± ), the first step is to compute the correctly rounded result in representation T í µí±+2 using the round-to-odd mode (i.e., í µí±¦ í µí±í µí± for each input í µí±¥ ∈ í µí± ). Figure 11 shows our algorithm to compute the round-to-odd result í µí±¦ í µí±í µí± for each input using the real value from the oracle. ...

View in full-text

Context 13

... we compute the odd interval of each result í µí±¦ í µí±í µí± such that any real value in the odd interval rounds to í µí±¦ í µí±í µí± . Figure 11 also provides our algorithm to compute the odd interval. The odd intervals of some inputs can be a singleton (i.e., only one value in the odd interval), which we handle separately (Section 4.3). ...

View in full-text

Context 14

... resulting polynomial when used with range reduction (í µí± í µí± H ) and output compensation (í µí±í µí° ¶ H ) produces correctly rounded results for all inputs í µí±¥ ∈ í µí± with all representations T í µí± for all standard rounding modes. CalcResultsInRO computes the round-to-odd result using an oracle (see Figure 11). CalcOddIntervals computes the set of odd intervals (í µí°¿) and set (í µí±) of singleton odd intervals (see Figure 11). ...

View in full-text

Context 15

... computes the round-to-odd result using an oracle (see Figure 11). CalcOddIntervals computes the set of odd intervals (í µí°¿) and set (í µí±) of singleton odd intervals (see Figure 11). Once we have the odd intervals and singletons, we use RLibm's polynomial generation procedure (RLibmPolyGen) to obtain the generic polynomial. ...

View in full-text

Context 16

... first step in our approach is to identify the correctly rounded result í µí±¦ í µí±í µí± for input í µí±¥. Figure 11 provides the steps to compute the round-to-odd result in T í µí±+2 given a real value of í µí± (í µí±¥). We compute the real value í µí±¦ = í µí± (í µí±¥) for each input í µí±¥ using an oracle (e.g., MPFR library). ...

View in full-text

Context 17

... we determine the correctly rounded result í µí±¦ í µí±í µí± of í µí± (í µí±¥) in representation T í µí±+2 using the round-to-odd mode, the next step is compute the interval of values in representation H, which is used for polynomial evaluation and range reduction, such that producing any value in the interval rounds to í µí±¦ í µí±í µí± , which we call as the odd interval. The function CalcOddIntervals in Figure 11 describes the steps to compute the odd interval. If the correct rounded result í µí±¦ í µí±í µí± in T í µí±+2 is even, then the odd interval is a singleton. ...

View in full-text

Context 18

... deduce the odd interval for each input. In Figure 11, í µí°¿ represents the set of non-singleton odd intervals for all inputs, which is given to the polynomial generator. ...

View in full-text

Context 19

... sticky bit, í µí± í µí±¡í µí±í µí±í µí±í µí±¦ 1 is the bitwise-OR of all bits starting from the (í µí± + 2) í µí±¡ℎ -bit of í µí±£ R . Figure 12 pictorially shows the rounding components í µí±£ − 1 , í µí±í µí± 1 , and í µí± í µí±¡í µí±í µí±í µí±í µí±¦ 1 while rounding í µí±£ R to T í µí± . ...

View in full-text

Context 20

... µí±í µí± 2 = í µí± í µí±+1 í µí± í µí±¡í µí±í µí±í µí±í µí±¦ 2 = í µí± í µí±+2 | í µí± í µí±+3 | · · · | í µí± í µí±+1 | í µí±¡ Figure 12 shows these components while rounding í µí±£ í µí±í µí±í µí± to T í µí± . Now, we compare the rounding components when we directly round í µí±£ R to T í µí± with the rounding components when we round í µí±£ í µí±í µí±í µí± to T í µí± . ...

View in full-text

Context 21

... contrast to RLibm-All, glibc's libm, Intel's libm, and CR-LIBM do not produce correct results for all inputs when used for a 32-bit float type. Figure 13(d) presents the speedup of RLibm-All's FP functions over RLibm-32's functions in producing 32-bit float values rounded with the í µí±í µí± rounding mode. On average, RLibm-All is almost as fast as RLibm-32 (i.e., 2% slower than RLibm-32). ...

View in full-text

Context 22

... describe our entire approach with an end-to-end example for creating a polynomial approximation for í µí±í µí±(í µí±¥) that produces correctly rounded results for a 5-bit FP representation with 2 exponent bits (FP5) and a 4-bit FP with 2 exponent bits (FP4) for all standard rounding modes (i.e., í µí±í µí±í µí±, í µí±í µí±í µí±, í µí±í µí±í µí± §, í µí±í µí±í µí±, and í µí±í µí±í µí±). Figure 1(b) and Figure 1(c) shows the bit-string of FP5 and FP4, respectively. We illustrate our approach with FP5 and FP4 for exposition. ...

View in full-text

Context 23

... describe our entire approach with an end-to-end example for creating a polynomial approximation for í µí±í µí±(í µí±¥) that produces correctly rounded results for a 5-bit FP representation with 2 exponent bits (FP5) and a 4-bit FP with 2 exponent bits (FP4) for all standard rounding modes (i.e., í µí±í µí±í µí±, í µí±í µí±í µí±, í µí±í µí±í µí± §, í µí±í µí±í µí±, and í µí±í µí±í µí±). Figure 1(b) and Figure 1(c) shows the bit-string of FP5 and FP4, respectively. We illustrate our approach with FP5 and FP4 for exposition. ...

View in full-text

Context 24

... solves for the coeï¿¿cients of P(x). We create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. ...

View in full-text

Context 25

... create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. ...

View in full-text

Context 26

... the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. Thus, the result of the polynomial will round to the correctly rounded result of f (x) when rounded to FP5 or FP4 representation using any rounding mode. ...

View in full-text

Context 27

... < P (0.25) < 1.375 0.75 < P (0.5) < 0.625 0.375 < P (0.75) < 0.25 0.125 < P (1.25) < 0.25 0.375 < P (1.5) < 0. that solves for the coeï¿¿cients of P(x). We create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. ...

View in full-text

Context 28

... create a system of linear inequalities that specify the constraints of P(x) (Figure 10(a)) and use an LP solver to solve for the coeï¿¿cients of P(x) that satisï¿¿es all the constraints. If the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. ...

View in full-text

Context 29

... the LP solver produces a solution, then the resulting polynomial P(x) created using the coeï¿¿cients (Figure 10(b)) satisï¿¿es all the constraints speciï¿¿ed by the generic intervals. Figure 10(c) pictorially shows that the polynomial we generate does indeed produce a value in the generic interval for each input. Thus, the result of the polynomial will round to the correctly rounded result of f (x) when rounded to FP5 or FP4 representation using any rounding mode. ...

View in full-text

Context 30

... values in this region rounds to w 1 All values in this region rounds to w 3 Fig. 10. The í µí±í µí±í µí± rounding mode. We show the rounding of í µí±£ R with the í µí±í µí±í µí± mode. Here, í µí±¤ 0 , í µí±¤ 1 , í µí±¤ 2 , í µí±¤ 3 , and í µí±¤ 4 are values representable in representation T. If í µí±£ R is exactly representable in T, then í µí±£ R rounds to that value. Otherwise, í µí±£ R rounds to the nearest ...

View in full-text

Context 31

... í µí±£ R rounds to the nearest odd value in the target representation. Figure 10 illustrates the í µí±í µí±í µí± rounding mode. Using the rounding components (í µí± , í µí±£ − , í µí±í µí±, í µí± í µí±¡í µí±í µí±í µí±í µí±¦) from Section 2.2, the í µí±í µí±í µí± mode can be defined as follows: ...

View in full-text

Context 32

... resulting polynomial when used with range reduction (í µí± í µí± H ) and output compensation (í µí±í µí° ¶ H ) produces correctly rounded results for all inputs í µí±¥ ∈ í µí± with all representations T í µí± for all standard rounding modes. CalcResultsInRNO computes the í µí±í µí±í µí± result using an oracle (see Figure 12). CalcOddIntervals computes the set of odd intervals (í µí°¿) and set (í µí±) of singleton odd intervals (see Figure 12). ...

View in full-text

Context 33

... computes the í µí±í µí±í µí± result using an oracle (see Figure 12). CalcOddIntervals computes the set of odd intervals (í µí°¿) and set (í µí±) of singleton odd intervals (see Figure 12). Once we have the odd intervals and singletons, we use RLibm's polynomial generation procedure (RLibmPolyGen) to obtain the generic polynomial. ...

View in full-text

Context 34

... example to show why the í µí±í µí±í µí± result avoids double rounding errors. We provide intuition on how rounding with the í µí±í µí±í µí± mode avoids double rounding errors in Figure 11. Any value that is representable in T í µí± is also representable in T í µí±+2 . ...

View in full-text

Context 35

... the real value is exactly equal to í µí±¤ 0 , then the í µí±í µí±í µí± mode in T í µí±+2 rounds to í µí±¤ 0 (similarly í µí±¤ 2 and í µí±¤ 4 in T í µí±+2 ). Figure 11 illustrates the task of rounding the real value directly to T í µí± with the í µí±í µí±í µí± mode (solid arrow) and the result produced from double rounding the í µí±í µí±í µí± result from T í µí±+2 to T í µí± using the í µí±í µí±í µí± mode. In summary, the í µí±í µí±í µí± result in T í µí±+2 maintains sufficient information about the real value so that when the í µí±í µí±í µí± result is (double) rounded to T í µí± with any rounding mode, it produces the correctly rounded result for T í µí± . ...

View in full-text

Context 36

... an elementary function í µí± (í µí±¥) and a list of inputs í µí± in the T í µí± representation (i.e., í µí± ⊆ T í µí± ), the first step is to compute the correctly rounded result in representation T í µí±+2 using the í µí±í µí±í µí± rounding mode (i.e., í µí±¦ í µí±í µí±í µí± 1 Function CalcResultsInRNO(í µí± , T í µí±+2 , í µí± ): return í µí± 1 Function CalcOddIntervals(í µí±, T í µí±+2 , H): 2 foreach (í µí±¥, í µí±¦ í µí±í µí±í µí± ) ∈ í µí± do for each input í µí±¥ ∈ í µí± ). Figure 12 shows our algorithm to compute the í µí±í µí±í µí± result í µí±¦ í µí±í µí±í µí± for each input using the real value from the oracle. Next, we compute the odd interval of each result í µí±¦ í µí±í µí±í µí± such that any real value in the odd interval rounds to í µí±¦ í µí±í µí±í µí± . ...

View in full-text

Context 37

... we compute the odd interval of each result í µí±¦ í µí±í µí±í µí± such that any real value in the odd interval rounds to í µí±¦ í µí±í µí±í µí± . Figure 12 also provides our algorithm to compute the odd interval. The odd intervals of some inputs can be a singleton (i.e., only one value in the odd interval). ...

View in full-text

Context 38

... first step in our approach is to identify the correctly rounded result í µí±¦ í µí±í µí±í µí± for input í µí±¥. Figure 12 provides the steps to compute the í µí±í µí±í µí± result in T í µí±+2 given a real value of í µí± (í µí±¥). We compute the real value í µí±¦ = í µí± (í µí±¥) for each input í µí±¥ using an oracle (e.g., MPFR library). ...

View in full-text

Context 39

... we determine the correctly rounded result í µí±¦ í µí±í µí±í µí± of í µí± (í µí±¥) in representation T í µí±+2 using the í µí±í µí±í µí± mode, the next step is compute the interval of values in representation H, which is used for polynomial evaluation and range reduction, such 757:15 that producing any value in the interval rounds to í µí±¦ í µí±í µí±í µí± , which we call as the odd interval. The function CalcOddIntervals in Figure 12 describes the steps to compute the odd interval. If the correct rounded result í µí±¦ í µí±í µí±í µí± in T í µí±+2 is even, then odd interval is a singleton. ...

View in full-text

Context 40

... deduce the odd interval for each input. In Figure 12, í µí°¿ represents the set of non-singleton odd intervals for all inputs, which is given to the polynomial generator. Piecewise polynomial generation using the odd intervals. ...

View in full-text

Context 41

... 6.12. Let vR be a real value. Define two FP representations Tk and Tn+2 as 163 Fig. 13. Let rno be the rounded result of R in Tn+2 using rno rounding mode. (a) shows the bit-string representation of | R | in the infinite extended precision representation. B 1 , rb1, and stick1 shows three rounding components for rounding | R | to Tn , the bit-string representation of the truncated value, the rounding bit, and the sticky ...

View in full-text

Context 42

... . We then truncate B |R | to n bits to get the bit-string representation of the truncated value 1 and identify the rounding bit and the sticky bit: Figure 13(a) pictorially shows B 1 , rb1, and stick1 extracted from B |R | . The value represented by the bit-string B 1 in Tn is the truncated value, 1 . ...

View in full-text

Context 43

... 6.12. Let vR be a real value. Define two FP representations Tk and Tn+2 as 163 Fig. 13. Let rno be the rounded result of R in Tn+2 using rno rounding mode. (a) shows the bit-string representation of | R | in the infinite extended precision representation. B 1 , rb1, and stick1 shows three rounding components for rounding | R | to Tn , the bit-string representation of the truncated value, the rounding bit, and the sticky ...

View in full-text

Context 44

... . . . We then truncate B |R | to n bits to get the bit-string representation of the truncated value 1 and identify the rounding bit and the sticky bit: Figure 13(a) pictorially shows B 1 , rb1, and stick1 extracted from B |R | . The value represented by the bit-string B 1 in Tn is the truncated value, 1 . ...

View in full-text

Context 45

... . 13. (a) Rounding components while rounding í µí±£ R and í µí±£ í µí±í µí±í µí± to T í µí± . (b) Rounding components while rounding í µí±£ R and í µí±£ í µí±í µí±í µí± to T í µí± . We show the bit-string of í µí±£ R in extended infinite precision representation. Note í µí±£ í µí±í µí±í µí± is a value in T í µí±+2 . precision ...

View in full-text

RLibm-Prog: Progressive Polynomial Approximations for Fast Correctly Rounded Math Libraries

Preprint

Full-text available

Nov 2021

This paper presents a novel method for generating a single polynomial approximation that produces correctly rounded results for all inputs of an elementary function for multiple representations. The generated polynomial approximation has the nice property that the first few lower degree terms produce correctly rounded results for specific represent...