Conference PaperPDF Available

Implementation of the JPEG Algorithm Compression on FPGA

Authors:
  • National Engineering School of Sfax ( Monastir
(OHYHQWK,QWHUQDWLRQDO&RQIHUHQFHRQ
6FLHQFHVDQG7HFKQLTXHVRI$XWRPDWLFFRQWURO
FRPSXWHUHQJLQHHULQJ
67$
′′
Mohamed CHAABANE
Mohamed KAMOUN
Yassine KOUBAA
Ahmed TOUMI
Academic Publication Center
Tunis, Tunisia
,QWHUQDWLRQDO&RQIHUHQFH
ISBN :
Eleventh International conference on
Sciences and Techniques of Automatic Control
& computer engineering
STA
′′
2010
Organized by
Research Unit of Automatic Control UCA of ENIS
and
Research Unit of Industrial Processes Control UCPI of ENIS
Supported by
Ministry of Higher Education,
Scientific Research in Tunisia
University of Sfax
International Journal on Sciences and Techniques
of Automatic control & computer engineering - IJ-STA
Institut Français de Coopération - IFC
Tunisian Association of Numeric Techniques and Automatic - ATTNA
Implementation of the JPEG
Algorithm Compression on FPGA
Mohamed Nidhal Krifa2, Bilel Chmissi 1, Abdessalem Ben Abdelali2,
Abdellatif Mtibaa1,2,
1National Engineering School of Monastir, University of Monastir, Tunisia
2Laboratory EµE, Faculty of Sciences of Monastir, University of Monastir,Tunisia
chmissi.bilel@gmail.com, Abdessalem.BenAbdelali@enim.rnu.tn,
kmnidhal@yahoo.fr, Abdellatif.mtibaa@enim.rnu.tn
Abstract.
In this paper, we implement the JPEG encoder on architecture com-
posed of a microprocessor and a FPGA. It starts with the standard JPEG algo-
rithm which is analyzed in order to extract functions that can be interestingly
implemented in an FPGA: quantization, DCT and Huffman codage. Once iden-
tified, these functions are implemented in software. Configuring the target plat-
form, adapting the program to that platform and interfacing between the FPGA
and the microprocessor is also considered. We construct a JPEG encoder on
mono-processor on Xilinx Virtex-II Pro FPGA. The design can compress a
BMP image into a JPG image in high speed.
Keywords.
FPGA, JPEG, DCT, Huffman codage, Implementation, hardware
and software.
1. Introduction
The JPEG image compression standard was developed by the JPEG (Joint Photo-
graphic Expert Group) committee for use in compressing digital images, and in partic-
ular full color photographic images. Implementations of this standard have received a
great deal of attention due to the widespread adoption of the JPEG image format; it is
one of the primary formats used for exchanging pictures on the World Wide Web, and
it is commonly used in digital cameras as the storage format.
With the increasing use of multimedia technologies, image compression requires
higher performance. To address needs and requirements of multimedia and internet
applications, many efficient image compression techniques, with considerably differ-
ent features, have been developed. Image compression techniques exploits common
characteristics of most images that is the neighboring picture elements or pixels are
highly correlated [1]. It means a typical still image contains a large amount of spatial
redundancy in plain areas where adjacent pixels have almost the same values. In addic-
tion still images contains subjective redundancy, which is determined by the properties
of Human Visual System (HVS). HVS presents some tolerance to distortion depending
upon the image and viewing conditions. Consequently, pixels must not always be re-
produced exactly same as the original one but still HVS will not detect the difference
between original image and reproduced image [2].
11th International conference on Sciences and Techniques
of Automatic control & computer engineering
December 19-21, 2010, Monastir, Tunisia
STA'2010-ECS-1093, pages 1-12
Academic Publication Center of Tunis, Tunisia
STA’2010 Embedded Systems pages 2 to 12
The basic measures for the performance of a compression system are picture quality
and the compression ratio i.e. defined as the ratio between original data size and com-
pressed data size.
The paper is organized as follows, in section 2 we present a brief description tech-
nology related to still image compression. In section 3 we present the JPEG encoder
on a soft processor platform. In section 4 we present the software design. In section 5
we present the result of FPGA Implementation. Last section presents the conclusion.
2.
Image Compression Algorithm
The following block diagram depicts the steps involved in compressing an image
using the JPEG standard [3]:
Down
Samplin
g
Entropy
Coding
Finalize
Data
Stream
RGB
Image
YCbCr
Data
8x8
Block
2-D
DCT
Quantization
8x8
DCT
Coeff.
Color Space
Conversion
Fig.1. Block diagram for the compressing of an image using the JPEG.
2.1. Color Space Conversion
If the input image is a full color image (with each pixel in the image represented by
a 24-bit value, with eight bits each of red, green, and blue information)[4], the first
step is to map the image to an alternate color space. (If the image is grayscale, this step
and the next are skipped.) The RGB information representing each pixel is mapped to
an equivalent luminance-chrominance representation (the YCbCr color space). Note
that this mapping is lossless (i.e. fully reversible), except for round-off error.
The initial color space conversion requires matrix multiplication. There are a num-
ber of different coefficients that are used the formulas typically used for JPEG are as
follows: Y = 0.2989 R + 0.5866 G + 0.1145 B
Cb = -0.1687 R - 0.3312 G + 0.5000 B + 128 (1)
Cr = 0.5000 R - 0.4183 G - 0.0816 B + 128
RGB values are normally on the scale of 0 to 1, or since they are stored as unsigned
single bytes, 0 to 255. The resulting luminance value is also on the scale of 0 to 255,
the chrominance values need 127.5 added to them so they can be saved in an unsigned
byte.
Implementation of the JPEG Algorithm Compression on FPGA 3
2.2. Down Sampling
If the image has been converted to the YCbCr color space, the Cb and Cr compo-
nents can be down sampled; that are; blocks of adjacent pixels in the Cb and Cr color
spaces can be replaced with their average value. This technique can be applied due to
the fact that the human eye is much more sensitive to changes in intensity than changes
in color. Thus, down sampling can yield a significant amount of compression with
little visual effect on the image. There are several alternatives for down sampling the
pixels are typically reduced 2:1 horizontally and either 2:1 or 1:1 (unchanged) hori-
zontally (in JPEG terminology, these are referred to as 2h2h or 411 and 2h1v or 422,
respectively).
2.3. DCT: 2-D
Once the pixel data has been pre-processed and separated into three separate com-
ponents, the pixels in each component are grouped into 8x8 blocks. Each block is
transformed using the Discrete Cosine Transform (DCT) [5]. This transform is similar
to the Discrete Fourier Transform, and the transforms the block into the spatial fre-
quency domain.
This step does not actually produce any compression. The DCT is reversible, except
for round off error in the mathematical operations. It is performed because you can
subsequently discard higher frequency data without significantly visual effect on the
image again; the eye is much less sensitive to high-frequency changes in intensity or
color than low frequency changes.
Each of the 8x8 blocks is transformed to the frequency domain using the Discrete
Cosine Transform. The DCT is a type of wavelet transform the one-dimensional
transform is defined as:
=
+
=
1
0
2)12(
cos)()()(
N
n
Nkn
nskckt
π
(2)
Where
s
is the array of
N
original values,
t
is the array of
N
transformed values, and
the coefficients
c
are given by:
11/2)(,/1)0( == NkNkcNc
(3)
In two dimensions, this becomes:
Njn
Nim
nmsjicjit
N
n
N
m
2)12(
cos
2)12(
cos),(),(),(
1
0
1
0
++
=
=
=
ππ
(4)
Where
N
,
s
, and
t
are analogous and
c
is given by:
11,11/2),(,/1)0,(,/1),0( === NjNiNjicNicNjc
(5)
STA’2010 Embedded Systems pages 4 to 12
Fig.2.
The 2-D Discrete Cosine Transform
2.4. Quantization
Once the data has been transformed to the frequency domain, each of the 64 fre-
quency components is divided by a separate quantization coefficient (figure 3) and the
result is rounded to an integer. As mentioned previously, more of the high-frequency
information is discarded in this step, by using larger coefficients for those elements.
Also, the luminance component is quantized more accurately than the chrominance
component, by using separate quantization tables, for reasons discussed previously.
If the user can tolerate a lower quality resulting image in exchange for a smaller
compressed image, the quantization coefficients can be increased. There are a number
of different ways to tune the quantization tables. In addition, this step can be influ-
enced by a user selected “quality” setting. Many implementations do simple linear
scaling of the example coefficient tables from the JPEG standard, based on the quality
setting. This is an area of active research. This is the step that introduces significant
loss of data and thus generates significant compression. There are a number of algo-
rithms which can be used to simplify this operation so that a full divider is not re-
quired, a pipelined quantize is described in a paper by Sun and Lee [6] that requires
five adders and five pipeline registers.
1
2 3
45
6 7 8
1
2
3
4
5
6
7
8
10
20
30
40
50
60
Low
Frequency
1
2 3
4 5
6
7
8
1
2
3
4
5
6
7
8
10
20
30
40
50
60
High
Frequency
Fig.3.
The frequency domain
2.5. Entropy Coder
Once the frequency data has been quantized, it is encoded using an entropy encod-
er. This can use either Huffman or arithmetic coding (specifically Q coding) since
arithmetic coding is covered by several patents by IBM and others, only Huffman
coding is required by the standard. The goal is to represent the data using a minimal
number of bits - in practice; Q coding introduces only marginal (5% to 10%) increases
in coding efficiency, so Huffman [7] coding is an effective choice.
Implementation of the JPEG Algorithm Compression on FPGA 5
In order to encode the data, the elements are traversed in a zigzag order; this places
the low frequency elements at the start of the data stream and the high frequency ele-
ments later. Since the high frequency elements are more likely to be zero, this result in
longer strings of zeroes, which can be more efficiently encoded. The data is run-length
encoded to compress the zeroes, and then run through the Huffman coder.
As mentioned, the elements of the quantized block are processed in a zigzag order,
as indicated by this sequence:
(6)
The upper left element is the DC coefficient its value will be the average of all the
64 input samples. This coefficient is treated separately from the remaining AC coeffi-
cients since the average value of successive blocks in the image is likely to change
slowly, this value is differentially encoded as the difference between the DC value and
that value from the preceding block.
2.6. Huffman Coding
Huffman coding is a technique which will assign a variable length codeword to an
input data item. Huffman coding assigns a smaller codeword to an input that occurs
more frequently. It is very similar to Morse code, which assigned smaller pulse combi-
nations to letters that occurred more frequently. Huffman coding is variable length
coding, where characters are not coded to a fixed number of bits [8].
This is the last step in the encoding process. It organizes the data stream into a
smaller number of output data packets by assigning unique codewords that later during
d compression can be reconstructed without loss. For the JPEG process, each combi-
nation of run length and size category, from the run length coder, are assigned a Huff-
man codeword.
2.7.
Finalizing Data Stream
The output of the entropy coder must be assembled into bytes, since the output of
the Huffman coder for each input is a variable number of bits.
Other operations are typically performed, including padding to byte boundaries at
the end of blocks and prepending headers to the result these headers contain infor-
mation required to reverse the process. They include the quantization tables and Huff-
man coding tables. If the encoding and decoding are done in a closed system, this
information can be omitted.
STA’2010 Embedded Systems pages 6 to 12
The remainder of this discussion will only be concerned with the first part of this
process, assembling the output into bytes.
The basic requirements of multimedia transmission systems are high compression
ratio, high quality and high speed. Before going any further, the following question has
to be raised: if digital storage is becoming so cheap and so wide spread and the availa-
ble transmission channel bandwidth is increasing due to the deployment of cable, fiber
optics and ADSL modems, why is there a need to provide more powerful compression
scheme. The answer is with no doubt mobile video transmission channels and, which
mainly requires high quality and high speed.
3.
JPEG Encoder on a Soft Processor Platform
3.1.
Micr oblaze Soft Processor
Microblaze is a soft, 32-bit RISC processor designed by Xilinx for their FPGAs.
Compared to other general purpose processors, it’s quite flexible with a few configur-
able parts and capable of being extended by customized co-processors. There are a
number of on-chip communication strategies available including a variety of memory
interfaces. Following is the core block diagram of Microblaze processor [9].
Similar to most of RISC processors, Microblaze processor has an instruction decod-
ing unit, 32x32b general purpose register file, arithmetic unit and special purpose
registers. In addition, it has an instruction pre- fetch buffer. The arithmetic unit is con-
figurable, as shown in core block diagram. The Barrel Shift, Multiplier, Divider and
FPU are optional features. Microblaze processor has a three- stage pipeline: fetch,
decode and clock cycle. There is no branch prediction logic. Branch with delay slot is
supported to reduce the branch penalty. Microblaze is a Harvard-architecture proces-
sor, with both 32-bit I-bus and D-bus. Cache is also an optional feature. Three types of
buses, FSL, LMB and OPB are available.
FSL bus is a fast co-processor interface. LMB isone-clock-cycle, on-chip memory
bus while OPB is a general bus with arbitration.
Microblaze system is as follows and a JPEG encoder has been mapped onto it. A
cache can be put between processor and external SDRAM. It’s not shown on the fol-
lowing diagram because cache is considered as part of the Microblaze processor com-
ponent in EDK.
Implementation of the JPEG Algorithm Compression on FPGA 7
Fig. 4.
Microblaze processor core block diagram.
CF Card
UART
SDRAM
Controller
MicroBlaze
External SDRAM
Local DMem
Local IMem
Fig. 5.
Typical single-core Microblaze system
3.2.
Soft mono-processor System on Xilinx FPGA
The implementation JPEG encoder was realized on a Xilinx Virtex-II Pro 2VP30
FPGA with Xilinx Embedded Development Kit (EDK). For the entire system, includ-
ing I/O, we use Xilinx XUP2Pro board, with Compact Flash (CF) card interface and
external memory [10].
The 2VP30 FPGA consist of 13696 slices and 2448Kbits on-chip Block RAM
(BRAM), 136 hardware multiplier and two PowerPC 405 cores.
STA’2010 Embedded Systems pages 8 to 12
The Microblaze soft core takes around 450 slices (3.2% of 2VP30 area) [11]. Nev-
ertheless, one Microblaze processor typically needs at least 8KByte on-chip BRAM as
data and instruction memory and a few memory controllers. It takes some slices and
BRAMs in addition. Due to project schedule, the IBM PowerPC cores are not used in
this design.
3.3.
Finalizing Data Stream
We implement a baseline JPEG encoder application with color conversion and sub-
sampling on the mono-processor platform.
Except for file I/O and bootstrap, the JPEG encoder algorithm includes BMP and
JPG header parsing, color conversion, DCT, zigzag scan, quantization and variable-
length encoding. Following is the data flow of JPEG encoder [12].
3.4.
Micr oblaze Hardware Acceleration
Xilinx Embedded Development Kit (EDK) allows for entire embedded processor
designs to be created in one environment. The EDK includes a license for use of Xi-
linx’s 32-bit MicroBlaze soft-core processor. The MicroBlaze can be connected to a
wide variety of IP peripherals such as UARTs and GPIOs through the use of the On-
chip Peripheral Bus (OPB) that are provided with the EDK. Another option to connect
hardware peripherals directly to the MicroBlaze using an up to 8 Fast Simplex Link
(FSL) I/O channels [13]. The FSL provides a fast uni-directional communication
channel that connects to the MicroBlaze, as shown in Fig 6, which does not suffer the
overhead communication delay associated with the OPB. The Xilinx software library
has various C functions that allow the programmer to pass values to the customized IP
core and to store values sent from an IP core as well.
Fig.6.
A custom user IP is directly connected to the MicroBlaze’s internal registers through the
use of an FSL bus.
Implementation of the JPEG Algorithm Compression on FPGA 9
Using the FSL to connect to customize IP can greatly accelerate complex calcula-
tions by taking advantage of hardware parallelism. In effect the customized IP core
acts much like a co-processor. For example, an 8x8 FDCT IP core connected to the
MicroBlaze can reduce the amount cycles needed to from over one thousand to less
than one hundred. Since the FDCT is the most computationally intensive part of the
JPEG encoding algorithm, significant gains in performance can be obtained. Integrat-
ing FSL co-processor functions into software is relatively easy due to software drivers
that are available for the FSL.
The benefit of this methodology is that the entire JPEG encoder can be written and
verified in C, and then piece by piece areas that take a significant time to process can
be off-loaded to co-processors connected via an FSL (Fast Simplex Links) that pro-
vides a fast non-arbitrated streaming communication mechanism.
4.
Hardware and Software Design Flow
Design tools and flow is an important factor with respect to design cost and time.
Most of work is done with Xilinx EDK and ISE tools. EDK supports high level com-
ponent based design. The design flow is also straightforward. There is little depen-
dence between hardware flow and software flow so they can be designed and iterated
independently.
4.1. System Design Flow
The system design flow is shown in figure 7. On the hardware side (left), designers
need to specify all needed hardware components, including components provided by
Xilinx, like processor and memory and customized hardware components in this
project. For customized hardware, designers need to provide source code or netlist.
Within EDK, all these components are synthesized and invoke ISE afterwards to im-
plement and generate a bitstream. Nevertheless, this bitstream is not the bitstream
downloaded to FPGA because it contains hardware only. At the same time, on the
software side, all needed software components, like drivers or operating system need
to be specified as well. Based on these definition and hardware components defini-
tion, EDK can generate libraries for this system which is later linked to object files
compiled from application code. The result is an elf file. The detailed hardware and
software flow is described in the following section. The last step is to integrate soft-
ware and hardware. Xilinx provides a tool called data2mem which can insert the bi-
nary software code in the ELF file into the bitstream generated from hardware flow.
The setting of location and inserting method is already extracted during hardware flow.
The resulting bitstream contains both hardware and software. It can therefore be down-
loaded into FPGA to run and debug.
STA’2010 Embedded Systems pages 10 to 12
Define
Hardware
Components
Hardware System
Definit ion
Components
Software
Synthesize
Software System
Definit ion
Components
Implementation and
generate bitstream
Generate Software
Libraries
Compile
Insert
code into
bitstream
Download
and
Debug
HDL
MHS File
MSS File
Library
Binary code (ELF)
Bitstream without
software
Bitstream with software
Netlist
Fig.7.
System Design Flow.
4.2.
Hardware Design Flow
The hardware system is defined on the component level with a Xilinx proprietary
language in a .MHS file. Basically it lists all components of the system, parameters
and interconnections. A component can be a processor, a bus, a memory controller, a
memory block, some peripheral or a custom hardware component. In EDK, Xilinx
provides libraries for the Microblaze processor as well as a rich set of bus, memory
and peripherals. In most of cases, it’s enough to build a system. Most of them are pro-
vided in a netlist with a wrapper provided. Connections can be defined on both bus
level and port level. On bus level, a group of signals are connected together. It’s al-
ways preferable if possible. On port level, a signal is connected one by one. Every
connection is called a port and defined a port name. For all memory components or
memory-mapped peripherals, it’s necessary to specify an address range. The next step
is to synthesize. All components, both Xilinx provided and customized are synthesized
together to generate a netlist for the whole system. Afterwards, the designer can start to
implement and he can generate a bitstream consisting of the hardware configuration.
A few more files are generated after synthesis, for instance, a memory mapping file.
They are used for the software flow and the system flow later.
It’s also practical to extend EDK by customized hardware components. To define a
new component, the designer needs to specify the interface as well as the component
entity. In EDK, there is a tool to generate the component template and the bus inter-
face. Except for editing MHS file manually, there is a GUI interface, called “Base
system generatorto generate XHS file for a simple system.
Implementation of the JPEG Algorithm Compression on FPGA
11
4.3.
Software Design Flow
The software is defined in a similar way. At the top level, components are specified.
Designers can also specify bootstrap, operating system, file system, network stack,
drivers and board support package if necessary. If some components are not provided
by Xilinx, its designer’s responsibility to write them. Normally it’s no longer written
as a component like in hardware flow. It can be part of the application code.
In EDK package, Xilinx provides an alternative way to develop software with Ec-
lipse initiated by IBM. Eclipse is nowadays becoming more and more popular and
somehow industry standard of development environment. The Eclipse tool in EDK has
been already customized for Microblaze processor or PowerPC and ready to use. The
compiler and linker in EDK is a customized version of gcc tool chain. All gcc tools are
available with mb- prefix. In some cases, especially in multiprocessor system, its
necessary to specify link scripts to define heap and stack size, mapping of different
component.
4.4.
Debugging
After downloading the bitstream to the FPGA board, debugging starts. It’s impor-
tant and usually takes most of the design time. There are three ways of debugging,
hardware debugging, software debugging and co-debugging.
For software debugging, Xilinx provides a customized tool based on GNU gdb. To
debug, simply start XMD, a backend server for gdb. After it connects to on-chip pro-
cessor via JTAG, start gdb. Then you have full control of the processor. A customized
version of Insight, a graphical shell of gdb is also available. However, the mechanism
is the same. To use gdb, it’s necessary to enable the hardware debug module of the
Microblaze processor. The debug module is connected to the JTAG interface of the
FPGA and connected to XMD finally.
5.
Result of FPGA Implementation
For the software JPEG encoder is designed based on Xilinx Microblaze processor
with customized hardware accelerators. It is expected to achieve high flexibility, low
complexity at little cost of size and performance.
Table 1. Device Utilization (Virtex 2Pro30) for the software JPEG encoder
Logic Utilization
Used
Available
Utilization
Number 4 input
LUTs
2,049
27,392
7
%
Number of Block
RAMs
64 136 47 %
Number of
MULT18X18s
3 136 2 %
STA’2010 Embedded Systems pages 12 to 12
6.
Conclusion
In this paper, we proposed an implementation of the JPEG image compression for
implementation using FPGA [14]. The architecture of the application uses hardware
resources of the Virtex 2 Pro FPGA. For implications, the primary concerns with im-
plementing this logic in an FPGA are the availability of logic blocks and the availabili-
ty of multipliers. The available embedded multipliers can be used in several of the
logic blocks; this requires that the arithmetic be converted to fixed point. Also, while
custom logic blocks will be an optimal implementation in processor, the availability of
embedded microprocessors provides a number of alternatives.
In future work, there are multiple levels of parallelism which can be exploited; the
individual tasks that make up the overall algorithm can be executed in a pipelined
fashion, and several tasks can also be pipelined internally (the DCT, the quantization,
the down sampler, and the color space converter could all be enhanced to some degree
by parallel computation). The limiting factor is the speed at which the DCT transform
can be performed. Given sufficient hardware resources, this could be optimized by
using multiple transform processors executing in parallel.
References
1. Rafael C. Gonzalez Richard E. Woods: Digital Image Processing, Pearson Education, 2nd
Edition, 2004.
2. Jayant, N-Johnston, and J. Safranek: Signal Compression Based on Models of Human Per-
ception. Proceedings of the IEEE, 1385-1422, 2002.
3. Pennebaker and Mitchell. JPEG Still Image Data Compression Standard. Copyright by Van
Nostrand Reinhold, 1993
4. Paul Bourke: YCC Color Space and Image Compression. 2000.
5. N. Ahmed, T. Natrajan, and K. R. Rao: Discrete Cosine Transform. IEEE Transactions on
Computers, vol. 23, July 1989.
6. Sun, Sung-Hsien, and Shie-Jue Lee, “A JPEG Chip for Image Compression and Decompres-
sion,” Journal of VLSI Signal Processing, Vol. 35, pp 43-60, 2003.
7. Wallace, Gregory K: The JPEG Still Picture Compression Standard. Communications of the
ACM, Vol. 34, No. 4, pp 31-44, août 2002.
8. Effelsberg Steinmetz. Video Compression Techniques. Copyright by dpunkt Verlag fur
digitale Technologie GmbH, 1998
9. Xilinx Inc.: Microblaze Microcontroller Reference Design User Guide. Sep 2007.
10. Xilinx Inc.: Xilinx XUP Virtex-II Pro Development System. Hardware Reference Manual,
March, 2007.
11. Xilinx Inc.: Embedded System Tools. Reference User Guide. Sep 2007.
12. Joris van Emden, Marcel Lauwerijssen, Sun Wei, Cristina Tena: Embedded JPEG Codec
Libarary. 2007
13. H-P. Rosinger, “Connecting Customized IP to the MicroBlaze Soft Processor Using the
Fast Simplex Link (FSL) Channel,” XAPP592.
14. Bilel Chmissi, Mohamed Nidhal Krifa, Abdessalem Ben Abdelali, Abdellatif Mtibaa,
“Rapport Interne”, National Engineering School of Monastir, groupe CSR University of Mo-
nastir, 2010.
ResearchGate has not been able to resolve any citations for this publication.
Article
This paper is a revised version of an article by the same title and author which appeared in the April 1991 issue of Communications of the ACM. For the past few years, a joint ISO/CCITT committee known as JPEG (Joint Photographic Experts Group) has been working to establish the first international compression standard for continuous-tone still images, both grayscale and color. JPEG’s proposed standard aims to be generic, to support a wide variety of applications for continuous-tone images. To meet the differing needs of many applications, the JPEG standard includes two basic compression methods, each with various modes of operation. A DCT-based method is specified for “lossy’ ’ compression, and a predictive method for “lossless’ ’ compression. JPEG features a simple lossy technique known as the Baseline method, a subset of the other DCT-based modes of operation. The Baseline method has been by far the most widely implemented JPEG method to date, and is sufficient in its own right for a large number of applications. This article provides an overview of the JPEG standard, and focuses in detail on the Baseline method. 1
Article
The FSL interface is described in great detail, and a reference application involving a 1-dimensional Inverse Direct Cosine Transform (IDCT) is used to show how the implementation of a customized core can be done in software and hardware. The first part of this document deals with the different methods of integrating user IP cores into a soft processor-based system. The second part contains a short overview on MicroBlaze and the FSL interface. After that, the reference design, which can be downloaded from the Xilinx web site, is explained. The last point of this document contains the conclusion regarding the use of the FSL interface
Article
JPEG is an international standard for still-image compression/decompression and has been widely implemented in hardware. In this paper, we describe the development of a JPEG chip which employs a single-chip implementation and an efficient architecture of Huffman codec. Firstly, we use VHDL (VHSIC Hardware Description Language) to describe the behavior of the chip. Each functional block of the chip is defined and simulated. An architecture consisting of two RAMs is adopted to reduce the size of the Huffman tables. Then we verify the functionality of our design with field programmable gate arrays (FPGAs) on circuit boards. Finally, a single chip is implemented using the standard cell design approach with the 0.6 µ triple-metal process. The chip is compliant with the JPEG baseline system and can work in real time at any compression ratio. The chip contains 411,745 transistors, with a chip size of 6.6 × 6.9 mm2.
Article
A joint ISO/CCITT committee known as JPEG (Joint Photographic Experts Group) has been working to establish the first international compression standard for continuous-tone still images, both grayscale and color. JPEG's proposed standard aims to be generic, to support a wide variety of applications for continuous-tone images. To meet the differing needs of many applications, the JPEG standard includes two basic compression methods, each with various modes of operation. A DCT (discrete cosine transform)-based method is specified for `lossy' compression, and a predictive method for `lossless' compression. JPEG features a simple lossy technique known as the Baseline method, a subset of the other DCT-based modes of operation. The Baseline method has been by far the most widely implemented JPEG method to date, and is sufficient in its own right for a large number of applications. The author provides an overview of the JPEG standard, and focuses in detail on the Baseline method
Article
A discrete cosine transform (DCT) is defined and an algorithm to compute it using the fast Fourier transform is developed. It is shown that the discrete cosine transform can be used in the area of digital processing for the purposes of pattern recognition and Wiener filtering. Its performance is compared with that of a class of orthogonal transforms and is found to compare closely to that of the Karhunen-Loève transform, which is known to be optimal. The performances of the Karhunen-Loève and discrete cosine transforms are also found to compare closely with respect to the rate-distortion criterion.
Article
The notion of perceptual coding, which is based on the concept of distortion masking by the signal being compressed, is developed. Progress in this field as a result of advances in classical coding theory, modeling of human perception, and digital signal processing, is described. It is proposed that fundamental limits in the science can be expressed by the semiquantitative concepts of perceptual entropy and the perceptual distortion-rate function, and current compression technology is examined in that framework. Problems and future research directions are summarized
Microblaze Microcontroller Reference Design User Guide
  • Xilinx Inc
Xilinx Inc.: Microblaze Microcontroller Reference Design User Guide. Sep 2007.
JPEG Still Image Data Compression Standard. Copyright by Van Nostrand Reinhold
  • Mitchell Pennebaker
Pennebaker and Mitchell. JPEG Still Image Data Compression Standard. Copyright by Van Nostrand Reinhold, 1993