ArticlePDF Available

Using JPEG quantization tables to identify imagery processed by software

Authors:

Abstract

a b s t r a c t The quantization tables used for JPEG compression can also be used to help separate im-ages that have been processed by software from those that have not. This loose classifica-tion is sufficient to greatly reduce the number of images an examiner must consider during an investigation. As illicit imagery prosecutions depend on the authenticity of the images involved, this capability is an advantage for forensic examiners. This paper explains how quantization tables work, how they can be used for image source identification, and the implications for computer forensics.
Using JPEG quantization tables to identify imagery
processed by software
Jesse D. Kornblum
Defense Cyber Crime Institute, United States
Keywords:
JPEG
Quantization
Digital ballistics
Calvin
Image authentication
abstract
The quantization tables used for JPEG compression can also be used to help separate im-
ages that have been processed by software from those that have not. This loose classifica-
tion is sufficient to greatly reduce the number of images an examiner must consider during
an investigation. As illicit imagery prosecutions depend on the authenticity of the images
involved, this capability is an advantage for forensic examiners. This paper explains how
quantization tables work, how they can be used for image source identification, and the
implications for computer forensics.
ª 2008 Digital Forensic Research Workshop. Published by Elsevier Ltd. All rights reserved.
1. Introduction
Illicit imagery cases became more difficult to prosecute in the
United States in the wake of the Ashcroft v. Free Speech Coalition
Supreme Court decision in 2002 (Supreme Court of the United
States, 2002). In the court’s opinion, the prosecution must
prove that a real child was harmed by the illicit imagery in
order to get a conviction. As such, forensic examiners have
been increasingly asked to prove that suspected illicit imagery
contains real victims and not computer generated people. On
the other hand, the prosecution still only needs to introduce
a handful of such images in order to secure a conviction.
One of the more compelling arguments for proving the
authenticity of a pictured individual is to show that the image
came from a camera and has not been edited. Although a cam-
era could conceivably be used to capture an image of an
artificial person (e.g. photographing a computer screen),
such an image would hopefully be obviously identifiable.
The science of performing digital ballistics, or matching an
image to the individual device that created it, would be an ideal
method for finding real pictures to use in legal proceedings. Un-
fortunately such identifications are not easy. Instead this paper
demonstrates how an examiner can match an imageback to the
type of device that last modified it, either hardware or software.
This paper gives a brief overview of JPEG compression and
pays particular attention to the quantization tables used in
that process. Those tables control how much information is
lost during the compression process. The author has catego-
rized the types of tables and the implications for digital ballis-
tics are discussed. In particular, by eliminating those images
that were most likely last processed by a computer program,
the examiner is left with fewer images to consider for the
remainder of the investigation.
2. JPEG compression
This paper focuses exclusively on images stored in the JPEG
Interchange File Format (JFIF) (Wallace, 1992), a method for
storing data compressed with the JPEG standard (Joint Photo-
graphic Experts Group, 1991; Wallace, 1991). The JFIF is the
most commonly used format for JPEG data. Throughout the
paper, any reference to a JPEG or JPEG file refers to JFIF encoded
data.
A JPEG compressed image takes up considerably less space
than an uncompressed image. Whereas an uncompressed
640 480 pixel 24-bit color image would require 900 kB,
a JPEG version of the same image can be compressed to
E-mail address: jesse.kornblum@mantech.com
available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/diin
1742-2876/$ see front matter ª 2008 Digital Forensic Research Workshop. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.diin.2008.05.004
digital investigation 5 (2008) S21–S25
a mere 150 kB. Converting an image into a JPEG is a six step
process that, as a whole, is beyond the scope of this paper
(Joint Photographic Experts Group, 1991; Wallace, 1991). First
the image is converted from the RGB color space into the
YCbCr space, or one based on the brightness and luminance
of each pixel. Next the image is downsampled, split into
blocks of 8 8 pixels, and a discrete cosine transform is
applied. The next stage is quantization, where the lossy com-
pression occurs and this paper is focused. Finally an entropy
coding (lossless compression) is applied and the image is
said to be JPEG compressed.
In the quantization stage, the image creation device must
use a table of values known as the quantization tables. Each
table has 64 values that range from 0 to 65,535.
1
A lower num-
ber means that less data will be discarded in the compression
and a higher quality image should result.
Each image has between one and four quantization tables.
The most commonly used quantization tables are those pub-
lished by the Independent JPEG Group (IJG) in 1998 and shown
in Fig. 1 (libjpeg, 1998). These tables can be scaled to a quality
factor Q. The quality factor allows the image creation device to
choose between larger, higher quality images and smaller,
lower quality images.
The value of Q can range between 0 and 100 and is used to
compute the scaling factor, S, as shown in Eq. (1). Each ele-
ment i in the scaled table T
s
is computed using the ith element
in the base table T
b
as shown in Eq. (2). All of these computa-
tions are done in integer math; there are no decimals (hence
the floor function in the equation). Any value of T
s
that com-
putes to zero is set to one.
For example, we can scale the IJG standard table using
Q ¼80 by applying Eq. (2) to each element in the table. The
resulting values are the scaled quantization tables and are
shown in Fig. 2. Note that the numbers in this table are lower
than in the standard table, indicating an image compressed
with these tables will be of higher quality than ones com-
pressed with the standard table. It should be noted that scal-
ing with Q ¼ 50 does not change the table.
S ¼
ð
Q < 50
Þ
?
5000
Q
: 200 2Q (1)
T
s
½i¼
S T
b
½iþ50
100

(2)
3. Related work
Using quantization tables to identify the origin of digital im-
ages is only one method of conducting digital ballistics. A
good summary of other methods can be found in Sencar and
Memon (2008). The idea of using JPEG quantization tables for
digital ballistics was first proposed by Farid (2006). In that re-
port he showed that the quantization tables from 204 images,
one per camera at the device’s highest quality setting, were for
the most part different from each other. When overlaps did
occur, they were generally among cameras from the same
manufacturer. He also demonstrated that the tables used by
the digital cameras were different from those used by Adobe
Photoshop.
Chandra and Ellis (1999) did some work to determine the
base quantization table used to compute the scaled tables
found in an existing JPEG image, but they focused on deter-
mining an equivalence to the IJG tables, not divining the true
tables used. Several papers have been written on recovering
the quantization table from previously compressed images
(Fan and de Queiroz, 2000, 2003; Neelamani et al., 2006) or
doubly compressed images (Lukas and Fridrich, 2003). Many
papers and patents have been written regarding quantization
table design such as Beretta et al. (1999), Costa and Veiga
(2005), Onnasch and Ploger (1994), Wang et al. (2001), and
Watson (1993).
4. Classifying JPEGs
For this paper the author examined several thousand images
from a wide variety of image creation devices and programs.
These devices include a Motorola KRZR K1m, a Canon Power-
Shot 540, a FujiFilm Finepix A200, a Konica Minolta Dimage Xg,
Fig. 2 Standard JPEG quantization tables scaled with
Q [ 80.
Fig. 1 Standard JPEG quantization tables.
1
In practice these values usually between 0 and 255. Some pro-
grammers chose to represent them as 8 bit values, but to be cor-
rect 16 bit values should be used.
digital investigation 5 (2008) S21–S25S22
and a Nikon Coolpix 7900. The author also examined the im-
ages produced by a number of software programs such as libj-
peg (libjpeg, 1998), Microsoft Paint, the Gimp (GIMP Team,
2007), Adobe Photoshop (Adobe Systems Incorporated, 2007),
and Irfanview (Irfan, 2007). The author also studied images
from the camera review web site Digital Photography Review
(Askey.Net Consulting Limited, 2007).
The author immediately noted that although some devices
always used the same quantization tables, the majority of
them used a different set of quantization tables in each image.
A further examination of the images allowed the author to
classify images into categories: standard tables, extended
standard tables, custom fixed tables, and custom adaptive
tables.
4.1. Standard tables
Images in this category use scaled versions of the quantiza-
tion tables published in International JPEG Group standard
(libjpeg, 1998), shown in Fig. 1. Because many cameras and
programs use these tables, there is no way to determine,
based on tables alone, which program or camera created an
image.
The base tables shown in Fig. 1 can be scaled using Q ¼ {1,
2, ., 99} to create 99 separate tables. Scaling with Q ¼0 would
produce grossly unusable images. Scaling with Q ¼100 would
produce a quantization table filled with all ones. Such a table
would be indistinguishable from any other base table scaled
with Q ¼100.
Any image using one of the 99 tables defined above is said
to be using the standard tables. It is certainly possible that an-
other method could be used to generate one of these tables,
but there is no way to distinguish that from the image alone.
For example, the author found that some images from partic-
ular devices matched an IJG table but others did not. It cannot
be determined if the devices are using the IJG tables for some
images and not others or the devices are using another
method that occasionally produces the same tables as the
IJG method.
4.2. Extended standard tables
These images are a special case of the standard tables. They
use scaled versions of the IJG tables, but have three tables in-
stead of the two in the standard. The third table is a duplicate
of the second. The same methodology used to identify the
standard tables can be used to identify extended standard
tables.
4.3. Custom fixed tables
Some programs have their own non-IJG quantization tables
that do not depend on the image being processed. For exam-
ple, when Adobe Photoshop saves an image as a JPEG it allows
the user to select one of 12 quality settings (different settings
are used when saving images ‘‘for the web’’). The quality
setting is used to select one of 12 sets of quantization tables
(Adobe Systems Incorporated, 2007).
Some devices use their own custom base quantization
table with the IJG scaling method. Regardless, these images
consider a user selected quality factor and the base quantiza-
tion table. Unlike the images described in the next section, the
image itself is not part of the equation.
One of the more frustrating aspects of these devices is that
there is no provable method for finding the original tables or
scaling method used by an image creation device. The exam-
iner could reverse engineer the device in question, but doing
so is not often practical for an illicit imagery investigation.
Given an image and its quantization tables, it is possible to
compute a base table, T
0
b
for any assumed value of Q between
1 and 99 and the IJG scaling method. The equation for doing so
is shown in Eq. (3). Note that the floor operation used in gen-
erating T
s
in Eq. (2) means that the values obtained for T
0
b
may not be equal to the true values of T
b
.
T
0
b
½i¼
100 T
s
½i50
S
(3)
The examiner could check each value of T
0
b
by attempting to
use it to scale up to current value of T
s
and the values of T
s
for
other images made by the same image creation device. If the
computed value of T
0
b
is too small to generate the correct value
of T
s
, then T
0
b
must be manually increased until it is sufficient
to create T
s
when scaled. In the event that T
0
b
is too large for
another image, the image was not created using the IJG scaling
method and must be considered as a custom adaptive image
described in Section 4.4.
The problem with all of the above calculations, however, is
that the examiner does not know the true value of Q used for
any of the images from the device. The method described in
Chandra and Ellis (1999) can be used to estimate a value for
Q
0
, or the quality factor used to scale the IJG standard table
to the closest possible value of T
s
. That estimate of Q provides
a good starting point, but in the end we have a system with
many possible solutions. The examiner cannot determine
which solution is correct.
The author constructed a program that, given an initial im-
age, accepted a value for Q
0
from the user. This value was used
to compute the T
0
b
tables. These tables were then scaled to the
values of Q from 1 to 99. These tables were then compared to
other images generated by the same device. After each image,
the tables were adjusted to fit the current image. If the tables
no longer fit all of the images, the images were re-categorized
as having custom adaptive tables as described in Section 4.4.
In general the program was deemed impractical as it did not
identify any plausible base tables.
4.4. Custom adaptive tables
These images do not conform to the IJG standard. In addition,
they may change, either in part or as a whole, between images
created by the same device using the same settings. They may
also have constants in the tables; values that do not change
regardless of the quality setting or image being processed.
For example, the author examined 21 pictures captured
with a Fuji Finepix A200 camera. Of these pictures, eight
images had identical quantization tables; four pictures shared
one set of tables but had differences in the other two tables.
The remaining 13 images all had unique quantization tables.
What struck the author as odd, however, was that for all 21
images the first value in each of the three quantization tables
digital investigation 5 (2008) S21–S25 S23
was four. The other values in the tables ranged from 1 to 24;
a wide range. The author could not devise a set of base quan-
tization tables that would include such a wide variation of
values but keep one member of the tables constant. The au-
thor has hypothesized that this camera uses one constant
value in the each table but scales the remainder of them.
It should be noted that the camera’s manufacturer, the Fuji
Xerox Company Limited, holds several patents regarding ‘im-
age creation apparatuses.’ These include at least one that de-
scribes creating custom quantization tables based on the
image being processed (Yokose, 2005). In that particular pat-
ent, a base quantization table is modified depending not
only on the standard scaling method, but also on the resolu-
tion of the original uncompressed image.
5. Using quantization tables for ballistics
An examiner can encounter hundreds of thousands of images
in the course of a single investigation. As noted above, it may
be difficult to prosecute an offender using images that have
been retouched by a computer. Given the large volume of im-
ages, it would benefit an examiner to only consider those im-
ages that could be used for prosecution, and thus only
consider images that have not been altered.
JPEG quantization tables can be used for digital ballistics to
identify and eliminate from consideration those images that
most likely have been altered by a computer. That is, those im-
ages whose quantization tables are the most likely to have
been generated by software can be eliminated from the
investigation.
This method may have some false positives, or images that
were not modified by a computer but are still eliminated from
the investigation. But given the scale of such investigations
and how few images are needed for a successful prosecution,
a few false positives are acceptable.
There will be some special cases, however, where the
quantization tables indicate the image was last modified by
software but it has other indicators that it originally came
from a camera. For example, the image could contain a com-
plete set of EXIF data from a known camera or color signatures
of real skin. In this case the examiner could use the quantiza-
tion tables as part of a larger system to evaluate images.
Ideally, the examiner could use the JPEG quantization
tables to determine exactly what kind of device created each
image and categorize the images accordingly. The program
JPEGsnoop aims to do exactly this (Hass, 2008). The program
comes with a database of tables that be compared against
input files. Unfortunately, however, JPEGsnoop assumes that
each camera can use only one quantization table. The use of
custom adaptive tables, however, means that programs like
JPEGsnoop would need to hold an unwieldy number of tables
to be practical. Worse, some tables may be used by several de-
vices, including both cameras and software programs, render-
ing the database inaccurate when attempting to determining
an image’s origin.
6. Calvin
The author has developed a software library called Calvin to
help programmers use quantization tables for digital ballis-
tics. The goal of Calvin is to identify those images who cannot
be guaranteed to have been created by a real camera. For our
purposes this means any image that could have been last
processed by software. The program was named in honor of
Dr. Calvin Goddard, the inventor of forensic ballistics (Federal
Bureau of Investigation, 2003).
The Calvin library is able to display the quantization tables
from existing images and determine if a new table is in a set of
known tables. By default the library contains the standard ta-
bles, extended standard tables, and the tables used by Adobe
Photoshop. Additional tables can be loaded from a configura-
tion file. The library can be used to generate these configura-
tion files from existing images.
6.1. Display mode
The user may wish to add more quantization tables to the set
used by Calvin. The program can extract and display the
quantization table for any image. For ease of use, the output
is presented in the library’s configuration file format. The
standard tables scaled with Q ¼ 80 (shown earlier in Fig. 2)
are shown as a configuration file entry in Fig. 3.
6.2. Comparison mode
An examiner can also use the Calvin library to compare the
quantization table from an unknown image to the set of
known tables. The user presents the library with an unknown
file and is told whether or not the quantization table is
contained in the known set. Presuming that the set of known
signatures contains only the tables used by software pro-
grams, a negative response means that the image in question
was possibly created by a hardware device. It could have been
created by a program that uses tables not in the set of knowns.
Fig. 3 Sample Calvin configuration file entry.
digital investigation 5 (2008) S21–S25S24
Conversely, a positive response from the library means that
the image was most likely last modified by a program. It could
have been created by a hardware device that uses the same
tables as a known software program.
It is also possible that a real picture that has been
processed by a software package, for example, cropped, would
get a positive response from Calvin. Such a program would
most likely be recompressed with the quantization tables
used by the program, not the original quantization tables
that the hardware device wrote into the image.
7. Conclusion
The author has demonstrated how JPEG quantization tables
can be used for digital ballistics to eliminate images that could
not be used in a prosecution for illicit imagery. The methodol-
ogy is not perfect, but given the large number of available
images and the small number needed in court, it should be
sufficient. A more elegant solution, however, would be to
combine this kind of digital ballistic information with other
metadata from an image. Other factors, such as the presence
or absence of EXIF data, signatures of known programs, and
color signatures of real skin, could reduce the examiner’s
workload even more. In the meantime, however, using JPEG
quantization tables for digital ballistics is a big step forward
for examiners and should improve their productivity and
success.
Acknowledgments
The author would like to thank the people who provided both
pictures and support during this research: Rik Farrow, Joe
Lewthwaite, Brian Martin, Jennifer Reichwein, and Peiter
‘‘Mudge’’ Zatko. Invaluable technical support was provided
by Robert J. Hansen. Extra special thanks to S–.
references
Adobe Systems Incorporated. Adobe photoshop. CS3 ed.; 2007.
Askey.Net Consulting Limited. Digital photography review,
<http://www.dpreview.com/>; June 2007.
Beretta Giordano, Bhaskaran Vasudev, Konstantinides
Konstantinos, Natarajan Balas K. US Patent 5,883,979: method
for selecting JPEG quantization tables for low bandwidth
applications; 1999.
Chandra Surendar, Ellis Carla Schlatter. JPEG compression metric
as a quality aware image transcoding. In: Proceedings of the
2nd USENIX symposium on internet technologies & systems;
1999. p. 81–92.
Costa LF, Veiga ACP. A design of JPEG quantization table using
genetic algorithms. In: Proceedings of the ACIT signal and
image processing; 2005.
Fan Zhigang, de Queiroz Ricardo. Maximum likelihood estimation
of JPEG quantization table in the identification of bitmap
compression history. IEEE Transactions on Image Processing
2000;1:948–51.
Fan Zhigang, de Queiroz Ricardo. Identification of bitmap
compression history: JPEG detection and quantizer estimation.
IEEE Transactions on Image Processing 2003;12(2).
Farid Hany. Digital image ballistics from JPEG quantization.
Technical Report TR2006-583, Department of Computer
Science, Dartmouth College; 2006.
Federal Bureau of Investigation. The birth of the FBI’s technical
laboratory, <http://www.fbi.gov/hq/lab/labdedication/
labstory.htm>; 2003.
The GIMP Team. GNU image manipulation program. 2.2.15 ed.,
<http://gimp.org/>; 2007.
Hass Calvin. JPEGsnoop. 1.2.0 ed., <http://www.
impulseadventure.com/photo/jpeg-snoop.html>; 2008.
Irfan Skiljan. IrfanView. 4.0 ed.; 2007.
Joint Photographic Experts Group. Information technology
digital compression and coding of continuous-tone still
images: requirements and guidelines. ISO/IEC 10918-1:1994;
1991.
JPEG Group. libjpeg. 6b ed., <http://www.ijg.org/>; 1998.
Lukas Ja
´
nˇ , Fridrich Jessica. Estimation of primary quantization
matrix in double compressed JPEG images. In: Proceedings of
the 2003 digital forensic research workshop, SUNY,
Binghamton; 2003.
Microsoft Corporation. Microsoft Paint overview.
Neelamani Ramesh (Neelsh), de Queiroz Ricardo, Fan Zhigang,
Dash Sanjeeb, Richard Baraniuk G. JPEG compression history
estimation for color images. IEEE Transactions on Image
Processing June 2006;15(6).
Onnasch Prause, Ploger. Quantization table design for JPEG
compression of angiocardiographic images. Computers in
Cardiology 1994.
Sencar Husrev T, Memon Nasir. Overview of state-of-the-art in
digital image forensics. Part of Indian Statistical Institute
Platinum Jubilee Monograph series ‘Statistical Science and
Interdisciplinary Research’; 2008.
Supreme Court of the United States. Ashcroft v. Free Speech
Coallition; 2002. case 00-795.
Wallace Gregory K. The JPEG still picture compression standard.
IEEE Transactions on Consumer Electronics 1991;38(1):18–34.
Wallace Gregory K. JPEG file interchange format. C-Cube
Microsystems September 1992.
Wang Ching-Yang, Lee Shiuh-Ming, Chang Long-Wen. Designing
JPEG quantization tables based on human visual system.
Signal Processing: Image Communication 2001;16(5):501–6.
Watson Andrew B. DCT quantization matrices visually optimized
for individual images. In: Proceedings of the society for optical
engineering; 1993. p. 202–16.
Yokose Taro. US Patent 6,968,090: image coding apparatus and
method; 2005.
Jesse D. Kornblum is a Research and Development Engineer
for the Defense Cyber Crime Institute. A contractor with the
ManTech International Corporation, his research focuses on
computer forensics and computer security. He has authored
and maintains a number of computer forensics tools including
foremost, md5deep and ssdeep. When choosing sodas,
Mr. Kornblum prefers cane sugar to high fructose corn syrup.
digital investigation 5 (2008) S21–S25 S25
... On the other hand, some encoders (typically those that are built into cameras) use adaptive quantization tables that are generated on-the-fly based on the image being compressed. [18], [19] image. Inference is then performed on these binned images, and the top-1 accuracy is computed for each bin. ...
... framework to display a preview of the server's reconstructed tensor, debugging information, and any other additional real-time statistics. The models available include various ResNet models (18,34,50,101,152) and VGG, though one can add additional models by splitting the model, converting it to .tflite format, and adding the relevant entries to models.json ...
Preprint
Full-text available
As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for deep learning model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency. In addition, cloud-only inference requires the input data (images, audio) to be fully transferred to the cloud, creating concerns about potential privacy breaches. There is an alternative approach: shared mobile-cloud inference. Partial inference is performed on the mobile in order to reduce the dimensionality of the input data and arrive at a compact feature tensor, which is a latent space representation of the input signal. The feature tensor is then transmitted to the server for further inference. This strategy can reduce inference latency, energy consumption, and network bandwidth usage, as well as provide privacy protection, because the original signal never leaves the mobile. Further performance gain can be achieved by compressing the feature tensor before its transmission.
... The required bit rate can be achieved by the quantization step in the JPEG framework. The QM (Q-Matrix) [27], also known as quality factor matrix, can be varied to change the bit rate. ...
... The details of the proposed method work flow are shown in Figure and the decoding process of JPEG compression, the Q-Matrix plays a quired bit rate can be achieved by the quantization step in the JPEG fra (Q-Matrix) [27], also known as quality factor matrix, can be varied to c So, the bpp of the image is configured by the series of quantization ta JPEG to obtain a different PSNR. The quantization table is given by : ...
Article
Full-text available
To meet the high bit rate requirements in many multimedia applications, a lossy image compression algorithm based on Walsh–Hadamard kernel-based feature extraction, discrete cosine transform (DCT), and bi-level quantization is proposed in this paper. The selection of the quantization matrix of the block is made based on a weighted combination of the block feature strength (BFS) of the block extracted by projecting the selected Walsh–Hadamard basis kernels on an image block. The BFS is compared with an automatically generated threshold for applying the specific quantization matrix for compression. In this paper, higher BFS blocks are processed via DCT and high Q matrix, and blocks with lower feature strength are processed via DCT and low Q matrix. So, blocks with higher feature strength are less compressed and vice versa. The proposed algorithm is compared to different DCT and block truncation coding (BTC)-based approaches based on the quality parameters, such as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) at constant bits per pixel (bpp). The proposed method shows significant improvements in performance over standard JPEG and recent approaches at lower bpp. It achieved an average PSNR of 35.61 dB and an average SSIM of 0.90 at a bpp of 0.5 and better perceptual quality with lower visual artifacts.
... The first studies in the image domain considered JPEG quantization tables and image resolution values Hany Farid (2006); Jesse D. Kornblum (2008); H. Farid (2008). Just with these few features, the authors demonstrated that it is possible to link a probe image to a set of devices or editing software. ...
... Similar results have been presented in Jesse D. Kornblum (2008). The authors examined images from devices (a Motorola KRZR K1m, a Canon PowerShot 540, a FujiFilm Finepix A200, a Konica Minolta Dimage Xg, and a Nikon Coolpix 7900) and images edited by several software programs such as libjpeg, Microsoft Paint, the Gimp, Adobe Photoshop, and Irfanview. ...
Chapter
Full-text available
In the previous chapters, multimedia forensics techniques based on the analysis of the data stream, i.e., the audio-visual signal, aimed at detecting artifacts and inconsistencies in the (statistics of the) content were presented. Recent research highlighted that useful forensic traces are also left in the file structure, thus offering the opportunity to understand a file’s life-cycle without looking at the content itself. This chapter is then devoted to the description of the main forensic methods for the analysis of image and video file formats.
... Although this approach is not capable to distinguish the originating device, it can determine pieces of information related to the camera brand, model, operating software, and some post-processing. Initial studies focused on the image domain and led to the development of a set of features that comprise JPEG quantization tables and image resolution values [14,15,16]. These features proved to be effective in linking probe images to a set of devices or editing software. ...
Article
Full-text available
Linking a digital image or video to its originating device, or checking the content integrity still represent challenging forensic tasks. Even though several technologies based on metadata, file format, and sensor noise have been developed to address these problems, current methods are frequently made obsolete by new customized acquisition pipelines implemented by manufacturers. Therefore, to assess the performance of the available tools and push the research activity, researchers continuously need new datasets containing contents captured with recent technologies. In this paper, we present a new image and video dataset for forensic analysis. Data, acquired by the most recent acquisition devices, were collected under strictly controlled procedures designed to limit the bias induced by differences in the acquisition process between different devices. The dataset includes over 9000 media contents captured by 46 smartphones of 11 major brands. For each device, we collected at least 100 unique natural images, 30 unique natural videos, 30 flat images, and 4 flat videos. Great care has been taken in collecting data that can be used for multiple forensic tasks; moreover, images and videos have been carefully organized so that FloreView could be used by the community immediately and effortlessly. Finally, two case studies related to image source identification and video brand identification have been performed, using state-of-the-art methods, to show how the proposed dataset can be effectively used for forensic tasks.
... It is represented as an 8x8 integer array and is used to quantize the coefficients of pixel blocks obtained from performing DCT. DQT typically consists of two tables for luminance and chrominance, which can be utilized as unique digital identifiers derived from cam- Table 2: Post-processing functions and save methods considered for generating manipulated images Figure 6: JPEG compression process era models and manufacturers [20,17,21]. These characteristics can also be applied to identify image editing tools. ...
Preprint
Full-text available
By applying artificial intelligence to image editing technology, it has become possible to generate high-quality images with minimal traces of manipulation. However, since these technologies can be misused for criminal activities such as dissemination of false information, destruction of evidence, and denial of facts, it is crucial to implement strong countermeasures. In this study, image file and mobile forensic artifacts analysis were conducted for detecting image manipulation. Image file analysis involves parsing the metadata of manipulated images (e.g., Exif, DQT, and Filename Signature) and comparing them with a Reference DB to detect manipulation. The Reference DB is a database that collects manipulation-related traces left in image metadata, which serves as a criterion for detecting image manipulation. In the mobile forensic artifacts analysis, packages related to image editing tools were extracted and analyzed to aid the detection of image manipulation. The proposed methodology overcomes the limitations of existing graphic feature-based analysis and combines with image processing techniques, providing the advantage of reducing false positives. The research results demonstrate the significant role of such methodology in digital forensic investigation and analysis. Additionally, We provide the code for parsing image metadata and the Reference DB along with the dataset of manipulated images, aiming to contribute to related research.
... JPEG [23] format is one of the most popular formats and widely used on various platforms [1]. For instance, an uncompressed 640 * 480 pixel 24-bit color image would require 900 kilobyte (KB), whereas a JPEG version of the same image can be compressed to 150 KB [24]. In data hiding there are two common methods of embedding: (i) Spatial domain method [23] in which messages are inserted into the LSBs of image pixels (ii) Variable bit LSB. ...
Article
Full-text available
The growth of multimedia technologies and the attractiveness of internet are dramatically increasing. All kinds of multimedia information like audio, video and images can be obtained freely and copied even edited then transmitted as one wishes. At the same time, this makes the integrity, reliability, and security of data transmitted under threat. The information security has always been a major concern. As one of the ways to solve the security problem is data hiding technology that embeds the secret data imperceptibly into the cover media by slightly modifying some of the cover elements. This paper discusses the basic principle s of reversible data hiding (RDH) as a method of covert communication in order to understand how information is embedded in the graphical representation and covers the in-depth discussion on image compression.
... To achieve quantization, the DCT coefficients are divided by corresponding elements in an 8x8 quantization table, which follows the specifications of the JPEG standard as described by J. D. Kornblum et. al [32]. The result is rounded to the nearest integer value, thus quantizing the coefficients. ...
Article
In today’s digital era, the demand for digital medical images is rapidly increasing. Hospitals are transitioning to filmless imaging systems, emphasizing the need for efficient storage and seamless transmission of medical images. To meet these requirements, medical image compression becomes essential. However, medical image compression typically necessitates lossless compression techniques to preserve the diagnostic quality and integrity of the images. There are several challenges associated with medical image compression and management. Firstly, medical image management and image data mining involve organizing and accessing large volumes of medical images efficiently for clinical and research purposes. Secondly, bioimaging, which encompasses various imaging modalities like microscopy and molecular imaging, presents specific requirements and challenges for compression algorithms. Thirdly, virtual reality technologies are increasingly utilized in medical visualizations, demanding efficient compression methods to handle the high resolution and immersive nature of VR medical imaging data. Lastly, neuro imaging deals with complex brain imaging data, requiring specialized compression techniques tailored to the unique characteristics of these images. As the amount of medical image data continues to grow, image processing and visualization algorithms have to be adapted to handle the increased workload. Researchers and developers have been working on various compression algorithms to address these challenges and optimize medical image compression. This review paper compares different compression algorithms that would provide valuable insights into the strengths, limitations, and performance metrics of various techniques. It would assist researchers, clinicians, and imaging professionals in selecting the most suitable compression algorithm for their specific needs, considering factors such as compression ratio, computational complexity, and image quality preservation. By comprehensively comparing compression algorithms, this review paper contributes to advancing the field of medical image compression, facilitating efficient image storage, transmission, and analysis in healthcare settings.
... Moreover, the quantization steps have been proven to be effective on several visual tasks. For example, Kornblum et al. [41] utilized quantization steps to identify whether computers have altered images. Park et al. [42] design a deep neural network for double JPEG detection using quantization steps. ...
Preprint
Full-text available
The sensitivity of deep neural networks to compressed images hinders their usage in many real applications, which means classification networks may fail just after taking a screenshot and saving it as a compressed file. In this paper, we argue that neglected disposable coding parameters stored in compressed files could be picked up to reduce the sensitivity of deep neural networks to compressed images. Specifically, we resort to using one of the representative parameters, quantization steps, to facilitate image classification. Firstly, based on quantization steps, we propose a novel quantization aware confidence (QAC), which is utilized as sample weights to reduce the influence of quantization on network training. Secondly, we utilize quantization steps to alleviate the variance of feature distributions, where a quantization aware batch normalization (QABN) is proposed to replace batch normalization of classification networks. Extensive experiments show that the proposed method significantly improves the performance of classification networks on CIFAR-10, CIFAR-100, and ImageNet. The code is released on https://github.com/LiMaPKU/QSAM.git
Article
The sensitivity of deep neural networks to compressed images hinders their usage in many real applications, which means classification networks may fail just after taking a screenshot and saving it as a compressed file. In this paper, we argue that neglected disposable coding parameters stored in compressed files could be picked up to reduce the sensitivity of deep neural networks to compressed images. Specifically, we resort to using one of the representative parameters, quantization steps, to facilitate image classification. Firstly, based on quantization steps, we propose a novel quantization aware confidence (QAC), which is utilized as sample weights to reduce the influence of quantization on network training. Secondly, we utilize quantization steps to alleviate the variance of feature distributions, where a quantization aware batch normalization (QABN) is proposed to replace batch normalization of classification networks. Extensive experiments show that the proposed method significantly improves the performance of classification networks on CIFAR-10, CIFAR-100, and ImageNet.
Article
Full-text available
Several image compression standards (JPEG, MPEG, H.261) are based on the Discrete Cosine Transform (DCT). These standards do not specify the actual DCT quantization matrix. Ahumada & Peterson 1 and Peterson, Ahumada & Watson 2 provide mathematical formulae to compute a perceptually lossless quantization matrix. Here I show how to compute a matrix that is optimized for a particular image. The method treats each DCT coefficient as an approximation to the local response of a visual "channel." For a given quantization matrix, the DCT quantization errors are adjusted by contrast sensitivity, light adaptation, and contrast masking, and are pooled non-linearly over the blocks of the image. This yields an 8x8 "perceptual error matrix." A second non-linear pooling over the perceptual error matrix yields total perceptual error. With this model we may estimate the quantization matrix for a particular image that yields minimum bit rate for a given total perceptual error, or minimum perceptual error for a given bit rate. Custom matrices for a number of images show clear improvement over image-independent matrices. Custom matrices are compatible with the JPEG standard, which requires transmission of the quantization matrix.
Conference Paper
Full-text available
A custom quantization matrix tailored to a particular image is designed by an image-dependent perceptual method incorporating solutions to the problems of luminance and contrast masking, error pooling and quality selectability
Article
Full-text available
We routinely encounter digital color images that were previously compressed using the Joint Photographic Experts Group (JPEG) standard. En route to the image's current representation, the previous JPEG compression's various settings-termed its JPEG compression history (CH)-are often discarded after the JPEG decompression step. Given a JPEG-decompressed color image, this paper aims to estimate its lost JPEG CH. We observe that the previous JPEG compression's quantization step introduces a lattice structure in the discrete cosine transform (DCT) domain. This paper proposes two approaches that exploit this structure to solve the JPEG Compression History Estimation (CHEst) problem. First, we design a statistical dictionary-based CHEst algorithm that tests the various CHs in a dictionary and selects the maximum a posteriori estimate. Second, for cases where the DCT coefficients closely conform to a 3-D parallelepiped lattice, we design a blind lattice-based CHEst algorithm. The blind algorithm exploits the fact that the JPEG CH is encoded in the nearly orthogonal bases for the 3-D lattice and employs novel lattice algorithms and recent results on nearly orthogonal lattice bases to estimate the CH. Both algorithms provide robust JPEG CHEst performance in practice. Simulations demonstrate that JPEG CHEst can be useful in JPEG recompression; the estimated CH allows us to recompress a JPEG-decompressed image with minimal distortion (large signal-to-noise-ratio) and simultaneously achieve a small file-size.
Article
This paper is a revised version of an article by the same title and author which appeared in the April 1991 issue of Communications of the ACM. For the past few years, a joint ISO/CCITT committee known as JPEG (Joint Photographic Experts Group) has been working to establish the first international compression standard for continuous-tone still images, both grayscale and color. JPEG’s proposed standard aims to be generic, to support a wide variety of applications for continuous-tone images. To meet the differing needs of many applications, the JPEG standard includes two basic compression methods, each with various modes of operation. A DCT-based method is specified for “lossy’ ’ compression, and a predictive method for “lossless’ ’ compression. JPEG features a simple lossy technique known as the Baseline method, a subset of the other DCT-based modes of operation. The Baseline method has been by far the most widely implemented JPEG method to date, and is sufficient in its own right for a large number of applications. This article provides an overview of the JPEG standard, and focuses in detail on the Baseline method. 1
Article
Most digital cameras export images in the JPEG file format. This lossy compression scheme employs a quantization table that controls the amount of compression achieved. Different cameras typically em- ploy different tables. A comparison of an image's quantization scheme to a database of known cameras affords a simple technique for confirming or denying an image's source. Similarly, comparison to a database of photo-editing software can be used in a forensic setting to determine if an image was edited after its original recording.
Article
In this report, we present a method for estimation of primary quantization matrix from a double compressed JPEG image. We first identify characteristic features that occur in DCT histograms of individual coefficients due to double compression. Then, we present 3 different approaches that estimate the original quantization matrix from double compressed images. Finally, most successful of them -Neural Network classifier is discussed and its performance and reliability is evaluated in a series of experiments on various databases of double compressed images. It is also explained in this paper, how double compression detection techniques and primary quantization matrix estimators can be used in steganalysis of JPEG files and in digital forensic analysis for detection of digital forgeries.
Conference Paper
To process previously JPEG coded images the knowledge of the quantization table used in compression is sometimes required. This happens for example in JPEG artifact removal and in JPEG re-compression. However, the quantization table might not be known due to various reasons. A method is presented for the maximum likelihood estimation (MLE) of the JPEG quantization tables. An efficient method is also provided to identify if an image has been previously JPEG compressed
Conference Paper
Transcoding is becoming a preferred technique to tailor multimedia objects for delivery across variable network bandwidth and for storage and display on the destination device. This paper presents techniques to quantify the quality-versus-size tradeoff characteristics for transcoding JPEG images. We analyze the characteristics of images available in typical Web sites and explore how we can perform informed transcoding using the JPEG compression metric. We present the effects of this transcoding on the image storage size and image information quality. We also present ways of predicting the computational cost as well as potential space benefits achieved by the transcoding. These results are useful in any system that uses transcoding to reduce access latencies, increase effective storage space as well as reduce access costs.
Article
Sometimes image processing units inherit images in raster bitmap format only, so that processing is to be carried without knowledge of past operations that may compromise image quality (e.g., compression). To carry further processing, it is useful to not only know whether the image has been previously JPEG compressed, but to learn what quantization table was used. This is the case, for example, if one wants to remove JPEG artifacts or for JPEG re-compression. In this paper, a fast and efficient method is provided to determine whether an image has been previously JPEG compressed. After detecting a compression signature, we estimate compression parameters. Specifically, we developed a method for the maximum likelihood estimation of JPEG quantization steps. The quantizer estimation method is very robust so that only sporadically an estimated quantizer step size is off, and when so, it is by one value.