Article

Altivec technology: A Second Generation SIMD Microprocessor Architecture

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... While SIMD extensions have been in mainframes since the early 80s, they moved to desktop processors in the late 90s with the goal of improving the performance of data-parallel applications. Popular examples of these extensions are Intel R SSE [24], AltiVec TM technology [25], and 3DNow! TM [26] from AMD. ...
Conference Paper
Full-text available
The Intel® Xeon Phi™ coprocessor has software prefetching instructions to hide memory latencies and special store instructions to save bandwidth on streaming non-temporal store operations. In this work, we provide details on compiler-based generation of these instructions and evaluate their impact on the performance of the Intel® Xeon Phi™ coprocessor using a wide range of parallel applications with different characteristics. Our results show that the Intel® Composer XE 2013 compiler can make effective use of these mechanisms to achieve significant performance improvements.
... Media data delivery and processing – such as telecommunications, networking, video processing, speech recognition and 3D graphics – is increasing in importance and will soon dominate the processing cycles consumed in computer-based systems [1]. SIMD extensions to existing processor architectures [2] [3] for supporting DSP type of operations are essentially narrow vector designs without support for vector memory operations. They have limited scalability because each instruction specifies a fixed number of operations. ...
Article
Full-text available
Over the past few years, technology drivers for processor designs have changed significantly. Media data delivery and processing -- such as telecommunications, networking, video processing, speech recognition and 3D graphics -- is increasing in importance and will soon dominate the processing cycles consumed in computer-based systems. This paper describes a processo, called Linedancer, that provides high media performance with low energy consumption by integrating associative SIMD parallel processing with embedded microprocessor technology. The major innovations in the Linedancer is the integration of thousands of processing units in a single chip that are capable to support software programmable high-performance mathematical functions as well as abstract data processing. In addition to 4096 processing units, Linedancer integrates on a single chip a RISC controller that is an implementation of the SPARC architecture, 128 Kbytes of Data Memory, and I/O interfaces. The SIMD processing in Linedancer implements the ASProCore architecture, which is a proprietary implementation of SIMD processing, operates at 266 MHz with program instructions issued by the RISC controller. The device also integrates a 64-bit synchronous main memory interface operating at 133 MHZ (double-data rate, DDR), and a 64-bit 66 MHz PCI interface.
... Many of them do not target the embedded market, as their price and power settings are not suitable for mass-produced consumer electronic devices. However, the PowerPC with Altivec extension surely deserves mention here [6]. The PowerPC is a superscalar design, with an additional new execution pipeline with a dedicated register file for media processing, having a 128-bit wide datapath supporting 3-argument operations. ...
Article
Full-text available
We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for embedded use in media-processing devices like DTVs and set-top boxes. Intended as a core, its design must be supplemented with on-chip co-processors to obtain a cost-effective system. Good performance is obtained through a uniform 64-bit 5 issue-slot VLIW design, supporting subword parallelism with an extensive instruction set optimized with respect to media-processing. Multi-slot `super-ops' allow powerful multi-argument and multi-result operations. As an example, an IDCT algorithm shows a very low instruction count in comparison with other processors. To achieve good performance, critical sections in the application program source code need to be rewritten with vector data types and function calls for media operations. Benchmarking with several media applications was used to tune the instruction set and study cache behavior. This resulted in a VLIW architecture with wide data paths and relatively simple cpu control. 1.
Conference Paper
Full-text available
We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for embedded use in media-processing devices like DTVs and set-top boxes. Intended as a core, its design must be supplemented with on-chip co-processors to obtain a cost-effective system. Good performance is obtained through a uniform 64-bit 5 issue-slot VLIW design, supporting subword parallelism with an extensive instruction set optimized with respect to media-processing. Multi-slot `super-ops' allow powerful multi-argument and multi-result operations. As an example, the IDCT algorithm shows a very low instruction count in comparison with other processors. To achieve good performance, critical sections in the application program source code need to be rewritten with vector data types and function calls for media operations. Benchmarking with several media applications was used to tune the instruction set and study cache behaviour. This resulted in a VLIW architecture with wide data paths and relatively simple CPU control
ResearchGate has not been able to resolve any references for this publication.