Figure 6 - uploaded by Hannu Tenhunen
Content may be subject to copyright.
Relative simulation times for reception of 150 000 bits for floating point (FoP), overloaded FxP (FxP1), dedicated FxP m-code (FxP2m) and compiled dedicated m-code (FxP2mex-slowest!) implementations.

Relative simulation times for reception of 150 000 bits for floating point (FoP), overloaded FxP (FxP1), dedicated FxP m-code (FxP2m) and compiled dedicated m-code (FxP2mex-slowest!) implementations.

Source publication
Article
Full-text available
An object-oriented fixed-point library for Matlab has been developed. We present a design flow for DSP ASIC applications where this library is used for floating- to fixed-point refinement. Matlab is chosen for its excellence and popularity in system design and modeling. The library allows a system designer to model e.g. a receiver architecture with...

Context in source publication

Context 1
... simulation execution times, the MR is rated better than the SR strategy since 1) the code/ block diagrams will be more application specific, and 2) the code can usually be speeded up with special compilation. However, in an experiment where we implemented the AUT (figure 2) as a dedicated FxP function (FxP2m) in Matlab, the simulation times decreased only 20% compared to the overloaded diffRake (FxP1), see figure 6. Also interesting is that the compiled version (FxP2mex) resulted in the slowest simulation. ...

Citations

... The type refinement technique that we propose is similar to [10], which requires two execution passes. In the first pass, the data types of all variables in the specification are recorded. ...
Article
Full-text available
We present a simulation-based technique to estimate area and latency of an FPGA implementation of a Matlab specification. During simulation of the Matlab model, a trace is generated that can be used for multiple estimations. For estimation the user provides some design constraints such as the rate and bit width of data streams. In our experience the runtime of the estimator is approximately only 1/10 of the simulation time, which is typically fast enough to generate dozens of estimates within a few hours and to build cost-performance trade-off curves for a particular algorithm and input data. In addition, the estimator reports on the scheduling and resource binding used for estimation. This information can be utilized not only to assess the estimation quality, but also as first starting point for the final implementation.
... The type refinement technique that we propose is similar to [10], which requires two execution passes. In the first pass, the data types of all variables in the specification are recorded. ...
Conference Paper
Full-text available
We present a simulation-based technique to estimate area and latency of an FPGA implementation of a Matlab specification. During simulation of the Matlab model, a trace is generated that can be used for multiple estimations. For estimation the user provides some design constraints such as the rate and bit width of data streams. In our experience the runtime of the estimator is approximately only 1/10 of the simulation time, which is typically fast enough to generate dozens of estimates within a few hours and to build cost-performance trade-off curves for a particular algorithm and input data. In addition, the estimator reports on the scheduling and resource binding used for estimation. This information can be utilized not only to assess the estimation quality, but also as first starting point for the final implementation
Article
The key to enabling widespread use of FPGAs for algorithm acceleration is to allow programmers to create efficient designs without the time-consuming hardware design process. Programmers are used to developing scientific and mathematical algorithms in high-level languages (C/C++) using floating point data types. Although easy to implement, the dynamic range provided by floating point is not necessary in many applications; more efficient implementations can be realized using fixed point arithmetic. While this topic has been studied previously [Han et al. 2006; Olson et al. 1999; Gaffar et al. 2004; Aamodt and Chow 1999], the degree of full automation has always been lacking. We present a novel design flow for cases where FPGAs are used to offload computations from a microprocessor. Our LLVM-based algorithm inserts value profiling code into an unmodified C/C++ application to guide its automatic conversion to fixed point. This allows for fast and accurate design space exploration on a host microprocessor before any accelerators are mapped to the FPGA. Through experimental results, we demonstrate that fixed-point conversion can yield resource savings of up to 2x--3x reductions. Embedded RAM usage is minimized, and 13%--22% higher Fmax than the original floating-point implementation is observed. In a case study, we show that 17% reduction in logic and 24% reduction in register usage can be realized by using our algorithm in conjunction with a High-Level Synthesis (HLS) tool.