Fig 2 - uploaded by Mohamed Daoudi
Content may be subject to copyright.
A pictorial representation of the positive semidefinite cone S + (d, n). Viewing matrices G 1 and G 2 as ellipsoids in R n ; their closeness consists of two contributions: d 2 G (squared Grassmann distance) and d 2

A pictorial representation of the positive semidefinite cone S + (d, n). Viewing matrices G 1 and G 2 as ellipsoids in R n ; their closeness consists of two contributions: d 2 G (squared Grassmann distance) and d 2

Source publication
Article
Full-text available
In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring n...

Contexts in source publication

Context 1
... is the span of U i and Θ is a d×d diagonal matrix formed by the principal angles between U 1 and U 2 . The closeness d S + consists of two independent contribu- tions: the square of the distance d G (span(U 1 ), span(U 2 )) between the two associated subspaces, and the square of the distance d P d (R 2 1 , R 2 2 ) on the positive cone P d (Fig. 2). Note that C G1→G2 is not necessarily a geodesic and therefore, the closeness d S + is not a true Riemannian distance. From the viewpoint of the landmark configurations Z 1 and Z 2 , ...
Context 2
... is the span of U i and Θ is a d×d diagonal matrix formed by the principal angles between U 1 and U 2 . The closeness d S + consists of two independent contribu- tions: the square of the distance d G (span(U 1 ), span(U 2 )) between the two associated subspaces, and the square of the distance d P d (R 2 1 , R 2 2 ) on the positive cone P d (Fig. 2). Note that C G1→G2 is not necessarily a geodesic and therefore, the closeness d S + is not a true Riemannian distance. From the viewpoint of the landmark configurations Z 1 and Z 2 , ...

Similar publications

Article
Full-text available
Transient beam loading compensation schemes, such as One-Turn-FeedBack (OTFB), require beam synchronous processing (BSP). Swept clocks derived from the RF, and therefore harmonic to the revolution frequency, are widely used in CERN synchrotrons; this simplifies implementation with energy ramping, where the revolution frequency changes. It is howeve...
Article
Full-text available
Background: Centrifugation is a consuming time step which participates to increase the turnaround time (TAT) in laboratories, likewise hemolysis sample that needs a re-sampling could delay management of patients. Recently, it has been postulated that BD Barricor™ tube could allow to decrease the centrifugation time and prevent hemolysis, two key f...
Preprint
Full-text available
This Resolution Team 3 (RT-3) report contains the results of the investigation of key aspects of the proposed Sample Clock Frequency Offset (SCFO) scheme for the Mid SKA1 telescope. This is a scheme, first proposed by one author (Carlson) at a meeting at the SKAO 25-Jan-2013, to digitize the analog signal at each antenna at a slightly different sam...
Article
Full-text available
Aim: The study aims to develop an advanced non-destructive method to estimate the plant growth rate of tissue culture propagated banana plantlets during primary hardening phase inside the greenhouse using Bootstrapped Artificial Neural Network (BANN). Methodology: Both non-destructive growth parameters like plant height, girth, number of leaves,...
Article
Full-text available
Spatial downscaling of rainfall fields is a challenging mathematical problem for which many different types of methods have been proposed. One popular solution consists in redistributing rainfall amounts over smaller and smaller scales by means of a discrete multiplicative random cascade (DMRC). This works well for slowly varying, homogeneous rainf...

Citations

... Without utilizing the specialized e2eET multistream CNN architecture, said performance is only -4.34% lower than the SOTA, thus highlighting the effectiveness of our data-level fusion design for transforming temporal dynamic data from various domains for image classification. Song et al. [43] 91.50 Liu et al. [44] 93.50 Kacem et al. [45] 93.70 Proposed Framework 93.96 Ke et al. [46] 94.17 Liu et al. [47] 94.90 Maghoumi et al. [48] 95.70 Zhang et al. [49] 98.30 ...
Preprint
Full-text available
This study focuses on Hand Gesture Recognition (HGR), which is vital for perceptual computing across various real-world contexts. The primary challenge in the HGR domain lies in dealing with the individual variations inherent in human hand morphology. To tackle this challenge, we introduce an innovative HGR framework that combines data-level fusion and an Ensemble Tuner Multi-stream CNN architecture. This approach effectively encodes spatiotemporal gesture information from the skeleton modality into RGB images, thereby minimizing noise while improving semantic gesture comprehension. Our framework operates in real-time, significantly reducing hardware requirements and computational complexity while maintaining competitive performance on benchmark datasets such as SHREC2017, DHG1428, FPHA, LMDHG and CNR. This improvement in HGR demonstrates robustness and paves the way for practical, real-time applications that leverage resource-limited devices for human-machine interaction and ambient intelligence.
... Another successful group of methods, known as hand-designed feature descriptorbased methods (Hussein et al., 2013;Vemulapalli et al., 2014), utilize handcrafted 3D/4D features extracted from mesh data. Skeleton-based methods (Devanne et al., 2014;Shi et al., 2019;Liu et al., 2020;Zhang et al., 2020;Chen et al., 2021;Kacem et al., 2018) represent human actions using skeletal joint positions and rotations extracted from depth or motion sensors. Point cloud-based methods Fan et al., 2022), Fig. 1: Overview of the proposed model for 3D human action recognition. ...
Article
Recent technological advancements have significantly expanded the potential of human action recognition through harnessing the power of 3D data. This data provides a richer understanding of actions, including depth information that enables more accurate analysis of spatial and temporal characteristics. In this context, We study the challenge of 3D human action recognition. Unlike prior methods, that rely on sampling 2D depth images, skeleton points, or point clouds, often leading to substantial memory requirements and the ability to handle only short sequences, we introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network), specifically designed for fixed-topology mesh sequences. The SpATr model disentangles space and time in the mesh sequences. A lightweight auto-encoder, based on spiral convolutions, is employed to extract spatial geometrical features from each 3D mesh. These convolutions are lightweight and specifically designed for fix-topology mesh data. Subsequently, a temporal transformer, based on self-attention, captures the temporal context within the feature sequence. The self-attention mechanism enables long-range dependencies capturing and parallel processing, enabling scalability for long sequences. The proposed method is evaluated on three prominent 3D human action datasets: Babel, MoVi, and BMLrub, from the Archive of Motion Capture As Surface Shapes (AMASS). Our results analysis demonstrates the competitive performance of our SpATr model in 3D human action recognition while maintaining efficient memory usage. The code and the training results are publicly available at https://github.com/h-bouzid/spatr.
... Video understanding belongs to the basic task of computer vision, and behavior recognition is one of its challenging and practical research directions [14]. As a hot research direction of video understanding, behavior recognition has broad application potential in intelligent video surveillance [18], automatic scoring for sports [16], and gait-based (1) Multi-frame 3D-like convolution. ...
Article
Full-text available
Video-based behavior detection is an important research direction in computer vision, which has great application potential in intelligent video surveillance, sports behavior evaluation, gait recognition, and so on. However, due to the complexity of video content and background, video behavior detection and evaluation face many challenges and are still in their early stages. This paper proposes a novel multi-frame MobileNet model, which describes the internal differences of similar behaviors by introducing multiple continuous frames of behaviors to be detected, and realizes fine-grained behavior detection and evaluation. Firstly, using energy trend images (ETIs) of behaviors as features, multiple continuous frames of the target video are fed into the proposed network to explore the relationship between adjacent frames. Then,in the weighted point-wise convolution stage, by adding a fade-in factor to the timeline for providing different weights to each involved frame, which makes better use of the progressive relationship between behavior frames at different times. Finally, the effectiveness of the proposed method is verified by comparative experiments on multiple video data sets such as UCF101, HMDB51 and CASIA-B.
... where v x = concate (G( x), F C e (E e ( x))), Gr( • ) represents the Gram matrix [40] for a more abstract feature characterization. ∥ • ∥ F denotes Frobenius norm. ...
Preprint
Full-text available
Unpaired Medical Image Enhancement (UMIE) aims to transform a low-quality (LQ) medical image into a high-quality (HQ) one without relying on paired images for training. While most existing approaches are based on Pix2Pix/CycleGAN and are effective to some extent, they fail to explicitly use HQ information to guide the enhancement process, which can lead to undesired artifacts and structural distortions. In this paper, we propose a novel UMIE approach that avoids the above limitation of existing methods by directly encoding HQ cues into the LQ enhancement process in a variational fashion and thus model the UMIE task under the joint distribution between the LQ and HQ domains. Specifically, we extract features from an HQ image and explicitly insert the features, which are expected to encode HQ cues, into the enhancement network to guide the LQ enhancement with the variational normalization module. We train the enhancement network adversarially with a discriminator to ensure the generated HQ image falls into the HQ domain. We further propose a content-aware loss to guide the enhancement process with wavelet-based pixel-level and multi-encoder-based feature-level constraints. Additionally, as a key motivation for performing image enhancement is to make the enhanced images serve better for downstream tasks, we propose a bi-level learning scheme to optimize the UMIE task and downstream tasks cooperatively, helping generate HQ images both visually appealing and favorable for downstream tasks. Experiments on three medical datasets, including two newly collected datasets, verify that the proposed method outperforms existing techniques in terms of both enhancement quality and downstream task performance. We will make the code and the newly collected datasets publicly available for community study.
... Other works, referred to as hand-designed descriptor-based methods, use handcrafted 3D/4D features extracted from mesh data (Hussein et al., 2013;Vemulapalli et al., 2014). Another category of methods, known as skeleton-based methods (Wang et al., 2013;Zhang et al., 2020;Devanne et al., 2014;Kacem et al., 2018), represent human actions using skeletal joint positions and rotations sequences extracted from depth or motion sensors. Point cloud-based methods (Li et al., 2021;Fan et al., 2022), on the other hand, utilize 3D point cloud data to model the human body as a collection of points, thereby capturing spatial and motion information for action recognition. ...
Preprint
Full-text available
Recent advancements in technology have expanded the possibilities of human action recognition by leveraging 3D data, which offers a richer representation of actions through the inclusion of depth information, enabling more accurate analysis of spatial and temporal characteristics. However, 3D human action recognition is a challenging task due to the irregularity and Disarrangement of the data points in action sequences. In this context, we present our novel model for human action recognition from fixed topology mesh sequences based on Spiral Auto-encoder and Transformer Network, namely SpATr. The proposed method first disentangles space and time in the mesh sequences. Then, an auto-encoder is utilized to extract spatial geometrical features, and tiny transformer is used to capture the temporal evolution of the sequence. Previous methods either use 2D depth images, sample skeletons points or they require a huge amount of memory leading to the ability to process short sequences only. In this work, we show competitive recognition rate and high memory efficiency by building our auto-encoder based on spiral convolutions, which are light weight convolution directly applied to mesh data with fixed topologies, and by modeling temporal evolution using a attention, that can handle large sequences. The proposed method is evaluated on on two 3D human action datasets: MoVi and BMLrub from the Archive of Motion Capture As Surface Shapes (AMASS). The results analysis shows the effectiveness of our method in 3D human action recognition while maintaining high memory efficiency. The code will soon be made publicly available.
... Curves in homogeneous spaces where used for animation purposes by Celledoni et al. [1]. We improve the implementations of temporal alignment procedures introduced in previous research [10,7,4,2,3]. Here our main concern is to align motions in order to be able to display them in a synchronized manner. ...
... Temporal alignment using Gram-matrices. In this method, we compute the Gram-matrices [4,7] associated to the joints positions of each skeleton and align them in the space of positive semi-definite matrices using Dynamic Programming. In our experiments, we used 10 active joints, namely "Ankle left", "Ankle right", " Hip left", "Hip right", "Knee left", "Knee right", "Spine low", "Spine high", "Racket hand", "Racket top". ...
Preprint
Temporal alignment is an inherent task in most applications dealing with videos: action recognition, motion transfer, virtual trainers, rehabilitation, etc. In this paper we dive into the understanding of this task from a geometric point of view: in particular, we show that the basic properties that are expected from a temporal alignment procedure imply that the set of aligned motions to a template form a slice to a principal fiber bundle for the group of temporal reparameterizations. A temporal alignment procedure provides a reparameterization invariant projection onto this particular slice. This geometric presentation allows to elaborate a consistency check for testing the accuracy of any temporal alignment procedure. We give examples of alignment procedures from the literature applied to motions of tennis players. Most of them use dynamic programming to compute the best correspondence between two motions relative to a given cost function. This step is computationally expensive (of complexity $O(NM)$ where $N$ and $M$ are the numbers of frames). Moreover most methods use features that are invariant by translations and rotations in $\mathbb{R}^3$, whereas most actions are only invariant by translation along and rotation around the vertical axis, where the vertical axis is aligned with the gravitational field. The discarded information contained in the vertical direction is crucial for accurate synchronization of motions. We propose to incorporate keyframe correspondences into the dynamic programming algorithm based on coarse information extracted from the vertical variations, in our case from the elevation of the arm holding the racket. The temporal alignment procedures produced are not only more accurate, but also computationally more efficient.
... Curves in homogeneous spaces where used for animation purposes by Celledoni et al. [1]. We improve the implementations of temporal alignment procedures introduced in previous research [10,7,4,2,3]. Here our main concern is to align motions in order to be able to display them in a synchronized manner. ...
... Temporal alignment using Gram-matrices. In this method, we compute the Gram-matrices [4,7] associated to the joints positions of each skeleton and align them in the space of positive semi-definite matrices using Dynamic Programming. In our experiments, we used 10 active joints, namely "Ankle left", "Ankle right", " Hip left", "Hip right", "Knee left", "Knee right", "Spine low", "Spine high", "Racket hand", "Racket top". ...
Preprint
Full-text available
Temporal alignment is an inherent task in most applications dealing with videos: action recognition, motion transfer, virtual trainers, rehabilitation...In this paper we dive into the understanding of this task from a geometric point of view in the spirit of Geometric Green Learning: in particular, we show that the basic properties that are expected from a temporal alignment procedure imply that the set of aligned motions to a template form a slice to a principal fiber bundle for the group of temporal reparameterizations. A temporal alignment procedure provides a reparameterization invariant projection onto this particular slice. This geometric presentation allows to elaborate a consistency check for testing the accuracy of any temporal alignment procedure. We give examples of alignment procedures from the literature applied to motions of tennis players. Most of them use dynamic programming to compute the best correspondence between two motions relative to a given cost function. This step is computationally expensive (of complexity O(N M) where N and M are the numbers of frames). Moreover most methods use features that are invariant by translations and rotations in R 3 , whereas most actions are only invariant by translation along and rotation around the vertical axis, where the vertical axis is aligned with the gravitational field. The discarded information contained in the vertical direction is crucial for accurate synchronization of motions. We propose to incorporate keyframe correspondences into the dynamic programming algorithm based on coarse information extracted from the vertical variations, in our case from the elevation of the arm holding the racket. The temporal alignment procedures produced are not only more accurate, but also computationally more efficient.
... For example, Daoudi et al. [13] represented 3D skeleton data with posture covariance matrices and exploited the Riemannian centre of mass and the log-Euclidean Riemannian metric to classify five emotions. Kacem et al. [27] proposed a geometric measure to process the posture covariance matrix for emotion recognition from full-body skeleton sequences. However, with significant advances in the study of optimization strategies and activation functions for Riemannian networks [28]- [30], an increasing number of researchers researchers have focused on the potential of Riemannian networks in processing posture covariance matrices. ...
Article
Full-text available
Body motion is an important channel for human communication and plays a crucial role in automatic emotion recognition. This work proposes a multiscale spatio-temporal network, which captures the coarse-grained and fine-grained affective information conveyed by full-body motion and decodes the complex mapping between emotion and body movement. The proposed method consists of three main components. First, a scale selection algorithm based on the pseudo-energy model is presented, which guides our network to focus not only on long-term macroscopic body expressions, but also on short-term subtle posture changes. Second, we propose a hierarchical spatio-temporal network that can jointly process posture covariance matrices and 3D posture images with different time scales, and then hierarchically fuse them in a coarse-to-fine manner. Finally, a spatio-temporal iterative (ST-ITE) fusion algorithm is developed to jointly optimize the proposed network. The proposed approach is evaluated on five public datasets. The experimental results show that the introduction of the energy-based scale selection algorithm significantly enhances the learning capability of the network. The proposed ST-ITE fusion algorithm improves the generalization and convergence of our model. The average classification results of the proposed method exceed 86% on all datasets and outperform the state-of-the-art methods.
... Modeling Human Motions as Trajectories on a Riemannian Manifold: While our present work is the first that explores the benefit of manifold-valued trajectories for human motion prediction, representing 3D human poses and their temporal evolution as trajectories on a manifold was adopted in many recent works for action recognition. Different manifolds were considered in different studies [17], [18], [19]. More related to our work, in [20], a human action is interpreted as a parametrized curve and is seen as a single point on the sphere by computing its Square Root Velocity Function (SRVF). ...
Article
In this work we propose a novel solution for 3D skeleton-based human motion prediction. The objective of this task consists in forecasting future human poses based on a prior skeleton pose sequence. This involves solving two main challenges still present in recent literature; (1) discontinuity of the predicted motion which results in unrealistic motions and (2) performance deterioration in long-term horizons resulting from error accumulation across time. We tackle these issues by using a compact manifold-valued representation of 3D human skeleton motion. Specifically, we model the temporal evolution of the 3D poses as trajectory, what allows us to map human motions to single points on a sphere manifold. Using such a compact representation avoids error accumulation and provides robust representation for long-term prediction while ensuring the smoothness and the coherence of the whole motion. To learn these non-Euclidean representations, we build a manifold-aware Wasserstein generative adversarial model that captures the temporal and spatial dependencies of human motion through different losses. Experiments have been conducted on CMU MoCap and Human 3.6M datasets and demonstrate the superiority of our approach over the state-of-the-art both in short and long term horizons. The smoothness of the generated motion is highlighted in the qualitative results.
... To capture changes in the dynamics of facial movement relevant to pain expression, we propose an original framework based on the temporal evolution of facial landmarks modeled as a trajectory on a Riemannian manifold. This formulation has shown promising results in action recognition [13], [14], [15], [16] and in facial expression recognition [16], [17]. In our case, Gram matrices are computed from facial landmarks at each video frame and their temporal evolution is modeled as a trajectory on the Riemannian manifold of symmetric positive semi-definite (PSD) matrices [16]. ...
... The dynamics of the landmarks in each region is mod- eled as a trajectory on the manifold of Positive Semi-Definite Matrices of fixed rank. Modeling the temporal evolution of landmarks as a trajectory on a Riemannian manifold has shown promising results in action recognition [13], [14], [16] and in facial expression recognition [17]. Motivated by these results, we propose a geometric approach for VAS pain intensity estimation based on the representation of facial landmarks and their dynamics as Gram matrices of fixed rank. ...
... The velocity of each landmark is computed after this normalization. Similar to [15], [16], [17], we propose the Gram matrix G as a representation of landmarks and velocities. The Gram matrix is defined by: ...
Article
We propose an automatic method to estimate self-reported pain intensity based on facial landmarks extracted from videos. For each video sequence, we decompose the face into four different regions and pain intensity is measured by modeling the dynamics of facial movement using the landmarks of these regions. A formulation based on Gram matrices is used to represent the trajectory of facial landmarks on the Riemannian manifold of symmetric positive semi-definite matrices of fixed rank. A curve fitting algorithm is then used to smooth the trajectories and a temporal alignment is performed to compute the similarity between the trajectories on the manifold. A Support Vector Regression classifier is then trained to encode the extracted trajectories into pain intensity levels consistent with the self-reported pain intensity measurement. Finally, a late fusion of the estimation for each region is performed to obtain the final predicted pain intensity level. The proposed approach is evaluated on two publicly available databases, the UNBCMcMaster Shoulder Pain Archive and the Biovid Heat Pain database. We compared our method to the state-of-the-art on both databases using different testing protocols, showing the competitiveness of the proposed approach.