Figure 3 - uploaded by Martin Snelgrove
Content may be subject to copyright.
Source publication
In this paper, a Computational Random Access Memory (C*RAM) implementation of MPEG-2 video compression standard is presented. This implementation has the advantage of processing image/video data in parallel and directly in the frame buffers. Therefore, savings in execution time and I/O bandwidth due to massively parallel on-chip computation and red...
Context in source publication
Context 1
... a group of pictures (GOP) approach is used instead of frame by frame coding. A GOP is typically a combination of one or two intra-coded frame (I), some predictive-coded frames (P), and the rest of bidirectional predictive-coded frames (B). The I frames are also used as a reference for P frames. The block diagram of MPEG-video encoder is shown in Fig. ...
Similar publications
For decoding high-resolution image stored in external memory after lossless compression, it is inevitable necessary to allow variable-length data types and this may decreases the efficiency in real time read of arbitrary compressed macro block without burst-mode in memory. This paper proposes a cache architecture for a video decoder that has a burs...
Citations
... Computational RAM [1] [2] is a SIMD-memory hybrid architecture, with 1-bit processing elements (PE) integrated at the sense amplifiers of a standard DRAM/ SRAM (Fig. 1a). This architecture improves the performance of highly-parallel and computation-intensive applications [3] [4] [5] [6] [7] by utilizing the high bandwidth at the memory sense amplifiers. By making the PE's 1-bit, many of them can be integrated in the pitch of a few sense amplifiers, thus increasing the degree of parallelism. ...
... Computational RAM12 is a SIMD-memory hybrid architecture, with 1-bit processing elements (PE) integrated at the sense amplifiers of a standard DRAM/ SRAM (Fig. 1a). This architecture improves the performance of highly-parallel and computation-intensive applications34567 by utilizing the high bandwidth at the memory sense amplifiers. By making the PE's 1-bit, many of them can be integrated in the pitch of a few sense amplifiers, thus increasing the degree of parallelism.Fig. ...
The architecture and implementation of a Computational RAM (CRAM) controller is presented. The design interfaces the CRAM to the PCI bus. CRAM macroinstructions are issued by the host processor through the PCI bus onto the CRAM controller instruction FIFO. These macroinstructions act as address pointers to the CRAM control store, which contains the lowest-level CRAM microinstruction routines. A read/write buffer and an instruction FIFO increase performance by reducing bus transaction delays and avoiding the need to synchronize PCI bus transactions to CRAM operations. The controller programming model is also described. A prototype CRAM controller has been implemented in a Xilinx XC4013E-2 FPGA, and is currently being used in building a prototype CRAM system in a PC environment. 1. Introduction Computational RAM [1-2] is a SIMD-memory hybrid architecture, with 1-bit processing elements (PE) integrated at the sense amplifiers of a standard DRAM/ SRAM (Fig. 1a). This architecture improve...
... Appendix D lists the CRAM and uniprocessor C++ code for these applications. Other applications not described here, but whose CRAM algorithms have been developed by other people, include fault simulation, data mining, satisfiabilty problem, and FIR filters [4], as well as discrete cosine transform, adaptable and scalable vector quantization, and other MPEG-2 algorithms [65], [66], [67], [68]. Because of the small sizes and limited number of the prototypes implemented so far, the performance analysis work is carried out using the CRAM C++ Simulator. ...
Integrating several 1-bit processing elements at the sense amplifiers of a standard RAM improves the performance of massively-parallel applications because of the inherent parallelism and high data bandwidth inside the memory chip. However, implementing such a logic-in-memory system on a host computer poses several challenges because of the small bandwidth at the host system buses, and the different data formats used on the two systems. In this thesis, solutions to these system design issues, including control of the processing elements, interface to the host, data transposition, and application programming, are considered. A minimal-hardware controller provides high utilization of processing elements while using a simple and general-purpose architecture. A buffer-based host interface unit enhances external data transfers, and minimizes the effect of the host on the performance of the logic-in-memory system. A parallel array-based corner-turning scheme reduces the time to convert data...
... The primary constraint on C*RAM performance as an MPEG-2 co-processor is the I/O bandwidth which is constrained by the fact that the single input and output pin connections are limited to the top end of organic transistor operation. A single 576x720 based shift register as proposed in [31]. Parallel loading using DMA would decrease this value to a reasonable level. ...
... The actual MPEG-2 encoding performance is summarized in Table 4-1 and is well within real time encoding requirements of 30 frames a second which is the minimum acceptable for our application. [31]. ...
The gap between processor speed and memory access time limits the
performance of memory-intensive applications such as volume rendering.
In this paper we compare the performance of stages of the splatting
volume rendering algorithm on a workstation and on the Computational RAM
(C·RAM) simulator. C·RAM is a Processor-in-Memory
architecture, which integrates SIMD processing elements into the memory
array. These processing elements exploit the highest bandwidth available
in the memory chip-at the sense amplifiers. Each stage executes faster
on C·RAM
This paper describes the system design techniques that have been
employed to minimize the effect of the host bus on the performance of a
Computational RAM (CRAM) logic-in-memory parallel processing system.
Specifically, we describe how the architectural features of the CRAM
controller affect instruction execution, utilization of processing
elements, time to initialize parallel variables from the host computer,
and execution time of scalar operations. Finally, we show that because
of the performance-enhancement features of the controller, the transfer
characteristics of the host bus has very little effect on the
performance of a CRAM system. This means that a CRAM system can be
implemented on a wide variety of platforms, including those with slow
external buses such as ISA-based computers and embedded systems that use
slow microcontrollers