Figure 3 - uploaded by Martin Snelgrove
Content may be subject to copyright.
Block diagram of MPEG-2 video encoder.

Block diagram of MPEG-2 video encoder.

Source publication
Article
Full-text available
In this paper, a Computational Random Access Memory (C*RAM) implementation of MPEG-2 video compression standard is presented. This implementation has the advantage of processing image/video data in parallel and directly in the frame buffers. Therefore, savings in execution time and I/O bandwidth due to massively parallel on-chip computation and red...

Context in source publication

Context 1
... a group of pictures (GOP) approach is used instead of frame by frame coding. A GOP is typically a combination of one or two intra-coded frame (I), some predictive-coded frames (P), and the rest of bidirectional predictive-coded frames (B). The I frames are also used as a reference for P frames. The block diagram of MPEG-video encoder is shown in Fig. ...

Similar publications

Article
Full-text available
For decoding high-resolution image stored in external memory after lossless compression, it is inevitable necessary to allow variable-length data types and this may decreases the efficiency in real time read of arbitrary compressed macro block without burst-mode in memory. This paper proposes a cache architecture for a video decoder that has a burs...

Citations

... Computational RAM [1] [2] is a SIMD-memory hybrid architecture, with 1-bit processing elements (PE) integrated at the sense amplifiers of a standard DRAM/ SRAM (Fig. 1a). This architecture improves the performance of highly-parallel and computation-intensive applications [3] [4] [5] [6] [7] by utilizing the high bandwidth at the memory sense amplifiers. By making the PE's 1-bit, many of them can be integrated in the pitch of a few sense amplifiers, thus increasing the degree of parallelism. ...
... Computational RAM12 is a SIMD-memory hybrid architecture, with 1-bit processing elements (PE) integrated at the sense amplifiers of a standard DRAM/ SRAM (Fig. 1a). This architecture improves the performance of highly-parallel and computation-intensive applications34567 by utilizing the high bandwidth at the memory sense amplifiers. By making the PE's 1-bit, many of them can be integrated in the pitch of a few sense amplifiers, thus increasing the degree of parallelism.Fig. ...
Article
Full-text available
The architecture and implementation of a Computational RAM (CRAM) controller is presented. The design interfaces the CRAM to the PCI bus. CRAM macroinstructions are issued by the host processor through the PCI bus onto the CRAM controller instruction FIFO. These macroinstructions act as address pointers to the CRAM control store, which contains the lowest-level CRAM microinstruction routines. A read/write buffer and an instruction FIFO increase performance by reducing bus transaction delays and avoiding the need to synchronize PCI bus transactions to CRAM operations. The controller programming model is also described. A prototype CRAM controller has been implemented in a Xilinx XC4013E-2 FPGA, and is currently being used in building a prototype CRAM system in a PC environment. 1. Introduction Computational RAM [1-2] is a SIMD-memory hybrid architecture, with 1-bit processing elements (PE) integrated at the sense amplifiers of a standard DRAM/ SRAM (Fig. 1a). This architecture improve...
... Appendix D lists the CRAM and uniprocessor C++ code for these applications. Other applications not described here, but whose CRAM algorithms have been developed by other people, include fault simulation, data mining, satisfiabilty problem, and FIR filters [4], as well as discrete cosine transform, adaptable and scalable vector quantization, and other MPEG-2 algorithms [65], [66], [67], [68]. Because of the small sizes and limited number of the prototypes implemented so far, the performance analysis work is carried out using the CRAM C++ Simulator. ...
Article
Integrating several 1-bit processing elements at the sense amplifiers of a standard RAM improves the performance of massively-parallel applications because of the inherent parallelism and high data bandwidth inside the memory chip. However, implementing such a logic-in-memory system on a host computer poses several challenges because of the small bandwidth at the host system buses, and the different data formats used on the two systems. In this thesis, solutions to these system design issues, including control of the processing elements, interface to the host, data transposition, and application programming, are considered. A minimal-hardware controller provides high utilization of processing elements while using a simple and general-purpose architecture. A buffer-based host interface unit enhances external data transfers, and minimizes the effect of the host on the performance of the logic-in-memory system. A parallel array-based corner-turning scheme reduces the time to convert data...
... The primary constraint on C*RAM performance as an MPEG-2 co-processor is the I/O bandwidth which is constrained by the fact that the single input and output pin connections are limited to the top end of organic transistor operation. A single 576x720 based shift register as proposed in [31]. Parallel loading using DMA would decrease this value to a reasonable level. ...
... The actual MPEG-2 encoding performance is summarized in Table 4-1 and is well within real time encoding requirements of 30 frames a second which is the minimum acceptable for our application. [31]. ...
Conference Paper
The gap between processor speed and memory access time limits the performance of memory-intensive applications such as volume rendering. In this paper we compare the performance of stages of the splatting volume rendering algorithm on a workstation and on the Computational RAM (C·RAM) simulator. C·RAM is a Processor-in-Memory architecture, which integrates SIMD processing elements into the memory array. These processing elements exploit the highest bandwidth available in the memory chip-at the sense amplifiers. Each stage executes faster on C·RAM
Conference Paper
Full-text available
This paper describes the system design techniques that have been employed to minimize the effect of the host bus on the performance of a Computational RAM (CRAM) logic-in-memory parallel processing system. Specifically, we describe how the architectural features of the CRAM controller affect instruction execution, utilization of processing elements, time to initialize parallel variables from the host computer, and execution time of scalar operations. Finally, we show that because of the performance-enhancement features of the controller, the transfer characteristics of the host bus has very little effect on the performance of a CRAM system. This means that a CRAM system can be implemented on a wide variety of platforms, including those with slow external buses such as ISA-based computers and embedded systems that use slow microcontrollers