Illustration of the inverse intensity-chromaticity space (blue color channel). Left: synthetic image (violet and green balls). Right: specular pixels converge towards the blue portion of the illuminant color (recovered at the -axis intercept). Highly specular pixels are shown in red.

Source publication

Exposing Digital Image Forgeries by Illumination Color Classification

Article

Full-text available

Jul 2013

For decades, photographs have been used to document space-time events and they have often served as evidence in courts. Although photographers are able to create composites of analog pictures, this process is very time consuming and requires expert knowledge. Today, however, powerful digital image editing software makes image modifications straight...

Context 1

... the integral is computed over all pixels in the image, where denotes a particular position (pixel coordinate). Furthermore, denotes a scaling factor, the absolute value, the differential operator, and the observed intensities at position , smoothed with a Gaussian kernel . Note that can be computed separately for each color channel. Compared to the original gray world algorithm, the derivative operator increases the robustness against homogeneously colored regions of varying sizes. Additionally, the Minkowski norm emphasizes strong derivatives over weaker derivatives, so that specular edges are better exploited [27]. 2) Inverse Intensity-Chromaticity Estimates: The second illuminant estimator we consider in this paper is the so-called inverse intensity-chromaticity (IIC) space. It was originally proposed by Tan et al. [28]. In contrast to the previous approach, the observed image intensities are assumed to exhibit a mixture of diffuse and specular re fl ectance. Pure specularities are assumed to consist of only the color of the illuminant. Let (as above) be a column vector of the observed RGB colors of a pixel. Then, using the same notation as for the generalized gray world model, is modelled as (3) Let be the intensity and be the chromaticity (i.e., normalized RGB-value) of a color channel at position , respectively. In addition, let be the chromaticity of the illuminant in channel . Then, after a somewhat laborious calculation, Tan et al. [28] derived a linear relationship between , and by showing that (4) Here, mainly captures geometric in fl uences, i.e., light position, surface orientation and camera position. Although can not be analytically computed, an approximate solution is feasible. More importantly, the only aspect of interest in illuminant color estimation is the -intercept . This can be directly estimated by analyzing the distribution of pixels in IIC space. The IIC space is a per-channel 2-D space, where the horizontal axis is the inverse of the sum of the chromaticities per pixel, , and the vertical axis is the pixel chromaticity for that particular channel. Per color channel , the pixels within a superpixel are projected onto inverse intensity-chromaticity (IIC) space. Fig. 5 depicts an exemplary IIC diagram for the blue channel. A synthetic image is rendered (left) and projected onto IIC space (right). Pixels from the green and purple balls form two clusters. The clusters have spikes that point towards the same location at the -axis. Considering only such spikes from each cluster, the illuminant chromaticity is estimated from the joint -axis intercept of all spikes in IIC space [28]. In natural images, noise dominates the IIC diagrams. Riess and Angelopoulou [2] proposed to compute these estimates over a large number of small image patches. The fi nal illuminant estimate is computed by a majority vote of these estimates. Prior to the voting, two constraints are imposed on a patch to improve noise resilience. If a patch does not satisfy these constraints, it is excluded from voting. In practice, these constraints are straightforward to compute. The pixel colors of a patch are projected onto IIC space. Prin- cipal component analysis on the distribution of the patch-pixels in IIC space yields two eigenvalues , and their associated eigenvectors and . Let be the larger eigenvalue. Then, is the principal axis of the pixel distribution in IIC space. In the two-dimensional IIC-space, the principal axis can be in- terpreted as a line whose slope can be directly computed from . Additionally, and can be used to compute the eccentricity as a metric for the shape of the distribution. Both constraints are associated with this eigenanalysis 4 . The fi rst constraint is that the slope must exceed a minimum of 0.003. The second constraint is that the eccentricity has to exceed a minimum of 0.2. We require bounding boxes around all faces in an image that should be part of the investigation. For obtaining the bounding boxes, we could in principle use an automated algorithm, e.g., the one by Schwartz et al. [30]. However, we prefer a human operator for this task for two main reasons: a) this minimizes false detections or missed faces; b) scene context is important when judging the lighting situation. For instance, consider an image where all persons of interest are illuminated by fl ashlight. The illuminants are expected to agree with one another. Conversely, assume that a person in the foreground is illuminated by fl ashlight, and a person in the background is illuminated by ambient light. Then, a difference in the color of the illuminants is expected. Such differences are hard to distinguish in a fully-auto- mated manner, but can be easily excluded in manual annotation. We illustrate this setup in Fig. 6. The faces in Fig. 6(a) can be assumed to be exposed to the same illuminant. As Fig. 6(b) shows, the corresponding gray world illuminant map for these two faces also has similar values. We use the Statistical Analysis of Structural Information (SASI) descriptor by Carkacioglu and Yarman-Vural [31] to extract texture information from illuminant maps. Recently, Penatti et al. [32] pointed out that SASI performs remarkably well. For our application, the most important advantage of SASI is its capability of capturing small granularities and discontinuities in texture patterns. Distinct illuminant colors in- teract differently with the underlying surfaces, thus generating distinct illumination “texture”. This can be a very fi ne texture, whose subtleties are best captured by SASI. SASI is a generic descriptor that measures the structural properties of textures. It is based on the autocorrelation of horizontal, vertical and diagonal pixel lines over an image at different scales. Instead of computing the autocorrelation for every possible shift, only a small number of shifts is considered. One autocorrelation is computed using a speci fi c fi xed orientation, scale, and shift. Computing the mean and standard deviation of all such pixel values yields two feature dimensions. Repeating this computation for varying orientations, scales and shifts yields a 128-dimensional feature vector. As a fi nal step, this vector is normalized by subtracting its mean value, and dividing it by its standard deviation. For details, please refer to [31]. Differing illuminant estimates in neighboring segments can lead to discontinuities in the illuminant map. Dissimilar illuminant estimates can occur for a number of reasons: changing geometry, changing material, noise, retouching or changes in the incident light. Thus, one can interpret an illuminant estimate as a low-level descriptor of the underlying image statistics. We observed that the edges, e.g., computed by a Canny edge detector, detect in several cases a combination of the segment borders and isophotes (i.e., areas of similar incident light in the image). When an image is spliced, the statistics of these edges is likely to differ from original images. To characterize such edge discontinuities, we propose a new feature descriptor called HOGedge . It is based on the well-known HOG-descriptor, and computes visual dictionaries of gradient intensities in edge points. The full algorithm is described in the remainder of this section. Fig. 7 shows an algorithmic overview of the method. We fi rst extract approximately equally distributed candidate points on the edges of illuminant maps. At these points, HOG descriptors are computed. These descriptors are summarized in a visual words dictionary. Each of these steps is presented in greater detail in the next subsections. Extraction of Edge Points: Given a face region from an illuminant map, we fi rst extract edge points using the Canny edge detector [33]. This yields a large number of spatially close edge points. To reduce the number of points, we fi lter the Canny output using the following rule: starting from a seed point, we eliminate all other edge pixels in a region of interest (ROI) cen- tered around the seed point. The edge points that are closest to the ROI (but outside of it) are chosen as seed points for the next iteration. By iterating this process over the entire image, we reduce the number of points but still ensure that every face has a comparable density of points. Fig. 8 depicts an example of the resulting points. Point Description: We compute Histograms of Oriented Gradients (HOG) [34] to describe the distribution of the selected edge points. HOG is based on normalized local histograms of image gradient orientations in a dense grid. The HOG descriptor is constructed around each of the edge points. The neighbor- hood of such an edge point is called a cell. Each cell provides a local 1-D histogram of quantized gradient directions using all cell pixels. To construct the feature vector, the histograms of all cells within a spatially larger region are combined and con- trast-normalized. We use the HOG output as a feature vector for the subsequent steps. Visual Vocabulary: The number of extracted HOG vectors varies depending on the size and structure of the face under examination. We use visual dictionaries [35] to obtain feature vectors of fi xed length. Visual dictionaries constitute a robust representation, where each face is treated as a set of region descriptors. The spatial location of each region is discarded [36]. To construct our visual dictionary, we subdivide the training data into feature vectors from original and doctored images. Each group is clustered in clusters using the -means algorithm [37]. Then, a visual dictionary with visual words is constructed, where each word is represented by a cluster center. Thus, the visual dictionary summarizes the most representative feature vectors of the training set. Algorithm 1 shows the pseudocode for the dictionary ...

View in full-text

Securing Visual Integrity: Machine learning approaches for forged image detection

Article

Full-text available

May 2024

Image forgery detection is a critical area of digital forensics, attempting to discover manipulated regions within images to assure their authenticity and integrity. This study investigates the use of machine learning techniques, particularly the Convolutional Neural Networks for image fraud detection. The suggested method involves training classifier to distinguish between original and counterfeit images using extracted features or patches. An image dataset is divided into training and testing sets in this study to facilitate CNN training on patches corresponding to original images. The accuracy of the trained model in identifying phony regions is then evaluated using an additional test set. To measure the effectiveness of the CNN-based forgery detection system, evaluation criteria such as accuracy, precision and recall are used. Proposed system achieves 99.15% accuracy with VGG16 network with tuned parameters.

CFA-Based Splicing Forgery Localization Method via Statistical Analysis

Article

Full-text available

Apr 2024
IET SIGNAL PROCESS

The color filter array of the camera is an effective fingerprint for digital forensics. Most previous color filter array (CFA)-based forgery localization methods perform under the assumption that the interpolation algorithm is linear. However, interpolation algorithms commonly used in digital cameras are nonlinear, and their coefficients vary with content to enhance edge information. To avoid the impact of this impractical assumption, a CFA-based forgery localization method independent of linear assumption is proposed. The probability of an interpolated pixel value falling within the range of its neighboring acquired pixel values is computed. This probability serves as a means of discerning the presence and absence of CFA artifacts, as well as distinguishing between various interpolation techniques. Subsequently, curvature is employed in the analysis to select suitable features for generating the tampering probability map. Experimental results on the Columbia and Korus datasets indicate that the proposed method outperforms the state-of-the-art methods and is also more robust to various attacks, such as noise addition, Gaussian filtering, and JPEG compression with a quality factor of 90.

Effective Image Tampering Localization Via Enhanced Transformer and Co-Attention Fusion

Conference Paper

Full-text available

Mar 2024

Powerful manipulation techniques have made digital image forgeries be easily created and widespread without leaving visual anomalies. The blind localization of tampered regions becomes quite significant for image forensics. In this paper, we propose an effective image tampering localization network (EITLNet) based on a two-branch enhanced transformer encoder with attention-based feature fusion. Specifically, a feature enhancement module is deployed to enhance the feature representation ability of the transformer encoder. The features extracted from RGB and noise streams are fused effectively by the coordinate attention-based fusion module at multiple scales. Extensive experimental results verify that the proposed scheme achieves the state-of-the-art generalization ability and robustness in various benchmark datasets. Code is public at https://github.com/multimediaFor/EITLNet.

UFCC: A Unified Forensic Approach to Locating Tampered Areas in Still Images and Detecting Deepfake Videos by Evaluating Content Consistency

Article

Full-text available

Feb 2024

Image inpainting and Deepfake techniques have the potential to drastically alter the meaning of visual content, posing a serious threat to the integrity of both images and videos. Addressing this challenge requires the development of effective methods to verify the authenticity of investigated visual data. This research introduces UFCC (Unified Forensic Scheme by Content Consistency), a novel forensic approach based on deep learning. UFCC can identify tampered areas in images and detect Deepfake videos by examining content consistency, assuming that manipulations can create dissimilarity between tampered and intact portions of visual data. The term “Unified” signifies that the same methodology is applicable to both still images and videos. Recognizing the challenge of collecting a diverse dataset for supervised learning due to various tampering methods, we overcome this limitation by incorporating information from original or unaltered content in the training process rather than relying solely on tampered data. A neural network for feature extraction is trained to classify imagery patches, and a Siamese network measures the similarity between pairs of patches. For still images, tampered areas are identified as patches that deviate from the majority of the investigated image. In the case of Deepfake video detection, the proposed scheme involves locating facial regions and determining authenticity by comparing facial region similarity across consecutive frames. Extensive testing is conducted on publicly available image forensic datasets and Deepfake datasets with various manipulation operations. The experimental results highlight the superior accuracy and stability of the UFCC scheme compared to existing methods.

RIFD-Net: A Robust Image Forgery Detection Network

Article

Full-text available

Jan 2024

Image splicing forensic technologies reveal manipulations that add or remove objects from images. However, the performance of existing splicing forensic methods is fatally degraded when detecting noisy images, as they often ignore the influence of image noise. In this paper, we propose a new forgery detection network called the robust image forgery detection network (RIFD-Net) based on convolutional neural networks (CNNs). With the help of multi-classifiers and a denoising network, RIFD-Net can effectively filter out multiple types of image noise before forgery detection. To determine the extent of tampering, we follow the Siamese network to calculate the similarity between two image patches, without prior knowledge of forensic traces. Results from extensive experiments on benchmark datasets indicate that our method outperforms existing image splicing forensic methods, achieving a substantial improvement of over 20% in the mean average precision (mAP) for forgery detection. Furthermore, RIFD-Net accurately locates splice areas, even in the presence of noise.

High-Performance Image Splicing Detection utilizing Image Augmentation and Deep Learning

Conference Paper

Dec 2023

Nowadays, the hassle-free availability of innumerable, easy-to-use image editing software makes image manipulation widespread, and these forged images can be used for various malicious activities. One of the most common methods of digital image deception is image splicing, where multiple portions from different images are combined to make the spliced image. Image splicing can be detected based on machine learning or deep learning techniques. Deep learning-based methods usually provide better performance but with extremely large training data and training time needed, which increases the cost of the model because of its structural complexity. Hence, in this research, we present a Deep Convolution Neural network-based model where a pre-trained network, ResNet50, replaces the initial convolution layers, followed by adding and modifying multiple layers. We have implemented image augmentation to make the dataset diverse for appropriate training of our model. Results from experiments show that our simplified approach, which does not require large training data and time, can accurately differentiate between the spliced images and the authentic images with the best accuracy of 1 and average accuracy of 0.99 for the DSO-1 dataset and best accuracy of 0.96 and average accuracy of 0.95 for the Columbia dataset.

End-to-end image splicing localization based on multi-scale features and residual refinement module

Article

Nov 2023
J ELECTRON IMAGING

Progressive Feedback-Enhanced Transformer for Image Forgery Localization

Preprint

Full-text available

Nov 2023

Blind detection of the forged regions in digital images is an effective authentication means to counter the malicious use of local image editing techniques. Existing encoder-decoder forensic networks overlook the fact that detecting complex and subtle tampered regions typically requires more feedback information. In this paper, we propose a Progressive FeedbACk-enhanced Transformer (ProFact) network to achieve coarse-to-fine image forgery localization. Specifically, the coarse localization map generated by an initial branch network is adaptively fed back to the early transformer encoder layers for enhancing the representation of positive features while suppressing interference factors. The cascaded transformer network, combined with a contextual spatial pyramid module, is designed to refine discrim-inative forensic features for improving the forgery localization accuracy and reliability. Furthermore, we present an effective strategy to automatically generate large-scale forged image samples close to real-world forensic scenarios, especially in realistic and coherent processing. Leveraging on such samples, a progressive and cost-effective two-stage training protocol is applied to the ProFact network. The extensive experimental results on nine public forensic datasets show that our proposed localizer greatly outperforms the state-of-the-art on the generalization ability and robustness of image forgery localization. Code will be publicly available at https://github.com/multimediaFor/ProFact.

Effective Image Tampering Localization with Multi-Scale ConvNeXt Feature Fusion

Preprint

Full-text available

Nov 2023
J VIS COMMUN IMAGE R

With the widespread use of powerful image editing tools, image tampering becomes easy and realistic. Existing image forensic methods still face challenges of low generalization performance and robustness. In this letter, we propose an effective image tampering localization scheme based on ConvNeXt encoder and multi-scale Feature Fusion (ConvNeXtFF). Stacked ConvNeXt blocks are utilized as an encoder to capture hierarchical multi-scale features, which are then fused in decoder for locating tampered pixels accurately. Combined loss function and effective data augmentation strategies are adopted to further improve the model performance. Extensive experimental results show that both localization accuracy and robustness of the ConvNeXtFF scheme outperform other state-of-the-art ones. The source code is available at https://github.com/ multimediaFor/ConvNeXtFF.

An object-based splicing forgery detection using multiple noise features

Article

Full-text available

Sep 2023
MULTIMED TOOLS APPL

In our modern age, everything is accessible from anywhere to share thoughts and monuments with loved ones via social networking. On the other hand, different photo editing tools manipulate images and videos and allow an incredible opportunity to challenge the intended audience. When altered images go viral on social media, people may lose confidence, faith and integrity on the shared images. Thus necessitating a digital, trustworthy forensic technique to authenticate such images. This paper presents a novel feature extraction approach for detecting a tampered region. Individual objects are retrieved from the spliced image, and noise standard deviation is evaluated for each object in three different domains. The noise deviation features are then obtained based on pair-wise deviation using cosine similarity between individual objects. These features are fused using logistic regression to obtain a fake regression score that reveals the tampering region of a spliced image. The experimental findings suggest that the features and approach are superior and robust to state-of-the-art methods in detecting the tampered region.

Illustration of the inverse intensity-chromaticity space (blue color channel). Left: synthetic image (violet and green balls). Right: specular pixels converge towards the blue portion of the illuminant color (recovered at the -axis intercept). Highly specular pixels are shown in red.

Context in source publication

Citations