Figure - uploaded by Mauro Olivieri
Content may be subject to copyright.
Resources @ 40MHz clock with DSP disabled

Resources @ 40MHz clock with DSP disabled

Source publication
Chapter
Full-text available
This paper presents the design and the implementation of a fully combinatorial floating point unit (FPU). The FPU can be reconfigured at implementation time in order to use an arbitrary number of bits for the mantissa and exponent, and it can be synthesized in order to support all IEEE-754 compliant FP formats but also non-standard FP formats, expl...

Context in source publication

Context 1
... approach was required in order to compare the area occupied by different FP formats, which otherwise would have been implemented using DSP block using fixed FP formats. Table 2 shows the results gathered from the Utilization report. This report has been produced after the implementation of the FPU core at 40 MHz clock speed (25 ns is the minimum period obtained for the single precision FP format) and it collects data regarding number of LUTs and slices used in the unit. ...

Citations

Article
Time series analysis (TSA) comprises methods for extracting information in domains as diverse as medicine, seismology, speech recognition and economics. Matrix Profile (MP) is the state-of-the-art TSA technique, which provides the most similar neighbor to each subsequence of the time series. However, this computation requires a huge amount of floating-point (FP) operations, which are a major contributor (≈ 50%) to the energy consumption in modern computing platforms. In this sense, Transprecision Computing has recently emerged as a promising approach to improve energy efficiency and performance by using fewer bits in FP operations while providing accurate results. In this work, we present TraTSA, the first transprecision framework for efficient time series analysis based on MP. TraTSA allows the user to deploy a high-performance and energy-efficient computing solution with the exact precision required by the TSA application. To this end, we first propose implementations of TraTSA for both commodity CPU and FPGA platforms. Second, we propose an accuracy metric to compare the results with the double-precision MP. Third, we study MP’s accuracy when using a transprecision approach. Finally, our evaluation shows that, while obtaining results accurate enough, the FPGA transprecision MP (i) is 22.75× faster than a 72-core server, and (ii) the energy consumption is up to 3.3× lower than the double-precision executions.