Schematic diagram of rotation consistency between 2D and 3D. The 3D camera coordinate of X, Y , and Z axis corresponds to Pitch, Yaw and Roll respectively. The image plane on 2D x-y coordinate is parallel to X-Y plane on the camera coordinate. Image obtained by rotating δ around Z axis that followed by projection operation is equivalent to that of rotating δ directly on 2D image.

Source publication

An End-To-End Task-Simplified and Anchor-Guided Deep Learning Framework for Image-based Head Pose Estimation

Article

Full-text available

Mar 2020

Image-based Head Pose Estimation (HPE) from an arbitrary view is still challenging due to the complex imaging conditions as well as the intrinsic and extrinsic property of the faces. Different from existing HPE methods combining additional cues or tasks, this paper solves the HPE problem by relieving problem complexity. Our method integrates the de...

Context 1

... bins is used to represent the final estimation. Regarding loss functions, we argue that existing methods usually formulate loss functions independent of images. In this paper, we observe that there exists a strong relevance on Euler angles between the input image pair. Details of the proposed pairwise pose loss are described in Sec. III-D. Fig. 3 shows the schematic diagram of the rotation consistency between 2D image space and 3D camera space, indicated by x-y and X-Y -Z respectively. Note that the image plane x-y is parallel to that of the X-Y plane in camera coordinate, which is perpendicular to that of the Z axis. The orange path shows 2D rotation pipeline, where the 2D ...

View in full-text

A deep learning-based framework for accurate identification and crop estimation of olive trees

Article

Full-text available

Aug 2022
J SUPERCOMPUT

Over the last several years, olive cultivation has grown throughout the Mediterranean countries. Among them, Spain is the world’s leading producer of olives. Due to its high economic significance, it is in the best interest of these countries to maintain the crop spread and its yield. Manual enumeration of trees over such extensive fields is impractical and humanly infeasible. There are several methods presented in the existing literature; nonetheless, the optimal method is of greater significance. In this paper, we propose an automated method of olive tree detection as well as crop estimation. The proposed approach is a two-step procedure that includes a deep learning-based classification model followed by regression-based crop estimation. During the classification phase, the foreground tree information is extracted using an enhanced segmentation approach, specifically the K-Mean clustering technique, followed by the synthesis of a super-feature vector comprised of statistical and geometric features. Subsequently, these extracted features are utilized to estimate the expected crop yield. Furthermore, the suggested method is validated using satellite images of olive fields obtained from Google Maps. In comparison with existing methods, the proposed method contributed in terms of novelty and accuracy, outperforming the rest by an overall classification accuracy of 98.1% as well as yield estimate with a root mean squared error of 0.185 respectively.

Application of Convolutional Neural Networks in Visual Feedback of Movable Camera Mounting Control

Article

Full-text available

May 2022

The aim of this work is to present an automatic solution to control the surveillance camera merely by the movements of the operator’s head. The method uses convolutional neural networks that work in a course-to-fine manner to estimate head orientation in image data. First, the image frame of the operator’s head is acquired from the camera on the operator’s side of the system. The exact position of a head, given by its bounding box, is estimated by a Multitask Cascaded Convolutional Network. Second, the customized network for a given scenario is used to classify the orientation of the head-on image data. In particular, the dedicated image dataset was collected for training purposes and was given a discrete set of possible orientations in the vertical and horizontal planes. The accuracy of the estimators is higher than 80%, with an average of 4.12 fps of validation time. Finally, the current head orientation data are converted into a control signal for two degrees of freedom surveillance camera mounting. The feedback response time is 1.5 s, which is sufficient for most real-life surveillance applications.

Face Image Analysis Using Machine Learning: A Survey on Recent Trends and Applications

Article

Full-text available

Apr 2022

Human face image analysis using machine learning is an important element in computer vision. The human face image conveys information such as age, gender, identity, emotion, race, and attractiveness to both human and computer systems. Over the last ten years, face analysis methods using machine learning have received immense attention due to their diverse applications in various tasks. Although several methods have been reported in the last ten years, face image analysis still represents a complicated challenge, particularly for images obtained from ’in the wild’ conditions. This survey paper presents a comprehensive review focusing on methods in both controlled and uncontrolled conditions. Our work illustrates both merits and demerits of each method previously proposed, starting from seminal works on face image analysis and ending with the latest ideas exploiting deep learning frameworks. We show a comparison of the performance of the previous methods on standard datasets and also present some promising future directions on the topic.

Camper’s Plane Localization and Head Pose Estimation Based on Multi-View RGBD Sensors

Article

Full-text available

Jan 2022

Head pose estimation (HPE) is a key step in computation and quantification of 3D facial features and has a significant impact on the precision and accuracy of measurements. High-precision HPE is the basis for standardized facial data collection and analysis. The Camper’s plane is the standard (baseline) plane commonly used by anthropologists for head and face research, but there is no research on automatic positioning of the Camper’s plane using color and depth cameras. This paper presents a high-accuracy method for Camper’s plane localization and HPE based on multi-view RGBD depth sensors. The 3D facial point clouds acquired by the multi-view RGBD depth sensors are aligned to obtain a complete 3D face. Keypoint RCNN is used for facial keypoint detection to obtain facial landmarks. A method is proposed to build a general face datum model based on a self-built dataset. The head pose is estimated by applying rigid body transformation to an individual 3D face and the general 3D face model. In order to verify the accuracy of Camper’s plane localization and HPE, 102 cases of 3D facial data and experiments were collected and conducted. The tragus and nasal alar points are localized to within 7 pixels (about 0.83 cm) and the average accuracy of the three dimensions of Camper’s plane identified is 0.87°, 0.64° and 0.47° respectively. The average accuracies of the three dimensions of HPE were 1.17°, 0.90° and 0.97. The experiment results demonstrate the effectiveness of the method for Camper’s plane localization and HPE.

Attention Span Prediction Using Head-Pose Estimation With Deep Neural Networks

Article

Full-text available

Oct 2021

Automated human pose estimation is evolving as an exciting research area in human activity detection. It includes sophisticated applications such as malpractice detection in the examination, distracted driving, gesture detection, etc., and requires robust and reliable pose estimation techniques. These applications help to map the attention of the user with head pose estimation (HPE) metrics supported by emotion and gaze analysis. This paper solves the problem of attention score estimation with HPE. The proposed method ensures ease of implementation while addressing head pose estimation using 68 facial features. Further, to attain reliability and precision, head pose estimation has been implemented as a regression task. The coordinate pair angle method (CPAM) with deep neural network (DNN) regression and elastic net regression is carried out. The use of DNN ensures precision on low lighting, distorted or occluded images. CPAM methodology leverages facial landmark detection and angular difference to estimate head pose. Experimentation results showed that the proposed model could handle large datasets, real-time data processing, significant pose variations, partial occlusions, and diverse facial expressions with a mean absolute error (MAE) of 3° and less. The proposed system was evaluated on three standard databases: the 300W across large poses (300W-LP) dataset, annotated facial landmarks in the wild (AFLW2000) dataset, and the national institute of mental health child emotional faces picture set (NIMH-ChEFS) dataset. The results achieved are on par with recent state-of-the-art methodologies such as anisotropic angle distribution learning (AADL), joint head pose estimation and face alignment algorithm (JFA), rotation axis focused attention network (RAFA-Net), and propose an MAE ranging up to 6°. The paper could achieve remarkable results for attention span prediction using head pose estimation and for many possible future applications.

Detecting Object Surface Keypoints From a Single RGB Image via Deep Learning Network for 6-DoF Pose Estimation

Article

Full-text available

Jan 2021

Estimating the 6-DoF (Degree of Freedom) object pose from a single RGB image is one of the challenging tasks in the field of computer vision. Before the pose which is defined as the translation and rotation parameters can be derived by the traditional PnP algorithm, 2D image projections of a set of 3D object keypoints must be accurately detected. In this paper, we present techniques for defining 3D object surface keypoints and predicting their corresponding 2D counterparts via deep-learning network architectures. The main technique to designate 3D object keypoints is to employ quadratic fitting scheme for calculating the principal surface curvatures as the weights and then select from all surface points the ones mostly distributive with larger curvatures to describe the object shape as possible. However, the 2D projected keypoints are not directly regressed from the network, but encoded as the unit vector fields pointing to them, so that the voting scheme to recover back those 2D keypoints can be performed. Moreover, an effective loss function with the regularization term is adopted in training ResNet for predicting image projections of object keypoints by focusing on small-scale errors. Experimental results show that our proposed technique outperforms state-of-the-art approaches in both “2D projection” and “3D transformation” metrics.

Corrections to “An End-to-End Task-Simplified and Anchor-Guided Deep Learning Framework for Image-Based Head Pose Estimation”

Article

Full-text available

Jan 2020

In the above article [1] , the affiliation for the corresponding author Jiang Wang needs to be corrected.

Efficient Transfer Learning Combined Skip-Connected Structure for Masked Face Poses Classification

Article

Full-text available

Jan 2020

Aiming at the new requirements of masked face poses classification during the epidemic outbreak, this paper proposes an efficient transfer learning approach combined skip-connected structure to improve the accuracy of masked face poses classification in the absence of masked face poses data. We have worked on the following two aspects: 1) According to the features transition of the convolutional neural networks, we propose an efficient transfer learning approach and opt for a more appropriate source domain to solve the problem that the specificity of features in the pre-trained deep networks will damage the performance when transferring to the target domain. First, a semisynthetic masked face poses dataset is constructed to replace ImageNet as the source domain, which can reduce the span of transfer and improve the pertinence of transfer learning. Second, the shallow networks which contain the general features are frozen while the deep networks which contain the specific features are retrained and the entire networks are fine-tuned afterwards. It optimizes the specific features in the source domain when transferring, and promoted transfer learning more effectively; 2) To further improve the overall accuracy by improving the accuracy of masked face pose classes with subtle differences, a skip-connected structure is proposed to fuse general features containing rich detailed information in the shallow networks into the classifier. Experiments on AlexNet and VGG16 show that the proposed method has certain advantages, and the overall accuracy can reach 96.43% and 99.29% at the final respectively.

A framework for face mask detection in COVID19 pandemic using deep learninga lgorithm

Conference Paper

Jan 2023

Head Pose Estimation Patterns as Deepfake Detectors

Article

Aug 2023
ACM T MULTIM COMPUT

The capacity to create ”fake” videos has recently raised concerns about the reliability of multimedia content. Identifying between true and false information is a critical step toward resolving this problem. On this issue, several algorithms utilizing deep learning and facial landmarks have yielded intriguing results. Facial landmarks are traits that are solely tied to the subject’s head posture. Based on this observation, we study how Head Pose Estimation (HPE) patterns may be utilized to detect deepfakes in this work. The HPE patterns studied are based on FSA-Net, SynergyNet, and WSM, which are among the most performant approaches on the state of the art. Finally, using a machine learning technique based on K-Nearest Neighbor and Dynamic Time Warping, their temporal patterns are categorized as authentic or false. We also offer a set of experiments for examining the feasibility of using deep learning techniques on such patterns. The findings reveal that the ability to recognize a deepfake video utilizing an HPE pattern is dependent on the HPE methodology. On the contrary, performance is less dependent on the performance of the utilized HPE technique. Experiments are carried out on the FaceForensics++ dataset, that presents both identity swap and expression swap examples. The findings show that FSA-Net is an effective feature extraction method for determining whether a pattern belongs to a deepfake or not. The approach is also robust in comparison to deepfake videos created using various methods or for different goals. In mean the method obtain 86% of accuracy on the identity swap task and 86.5% of accuracy on the expression swap. These findings offer up various possibilities and future directions for solving the deepfake detection problem using specialized HPE approaches, which are also known to be fast and reliable.

Context in source publication

Citations