Publications

Conference Papers

AG-MAE: Anatomically Guided Spatio-Temporal Masked Auto-Encoder for Online Hand Gesture Recognition

Authors: Omar Ikne, Benjamin Allaert, Hazem Wannous

Conference: International Conference on 3D Vision (3DV), 2025

Read Paper

Hand gesture recognition plays a crucial role in human-computer interaction, enabling intuitive and touch-free communication. While offline methods have shown strong performance, real-world applications require online and continuous recognition. Skeleton-based approaches face challenges due to the complexity of hand anatomy and diverse 3D motions. This paper introduces AG-MAE, an anatomically guided spatio-temporal masked autoencoder designed for self-supervised learning of 3D hand keypoint representations. By integrating anatomical constraints into the masking and reconstruction process, AG-MAE learns more discriminative features for hand poses and movements, significantly improving online gesture recognition performance on standard benchmarks.

Skeleton-based Self-Supervised Feature Extraction for Improved Dynamic Hand Gesture Recognition

Authors: Omar Ikne, Benjamin Allaert, Hazem Wannous

Conference: The 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024

Read Paper

Human-computer interaction has become an essential part of our lives, especially with the particularly with the rise of digital environments. Nevertheless, the challenge persists through hand gestures, given the complexities associated with factors like pose variation and occlusions. In this paper, we propose an innovative approach to improve skeleton-based hand gesture recognition by integrating self-supervised learning, a promising technique for acquiring distinctive representations directly from unlabeled data. The proposed method takes advantage of prior knowledge of hand topology, combining topology-aware self-supervised learning with a customized skeleton-based architecture to derive meaningful representations from skeleton data under different hand poses. We introduce customized masking strategies for skeletal hand data and design a model architecture that incorporates spatial connectivity information, improving the model’s understanding of the interrelationships between hand joints. The extensive experiments demonstrate the effectiveness of the approach, with state-of-the-art performance on benchmark datasets. An exploration of the generalization of learned representations across datasets and a study of the impact of fine-tuning with limited labeled data are conducted, highlighting the adaptability and robustness of the proposed approach. Code and trained models are available at: https://github.com/o-ikne/SkelMAE

Spatio-Temporal Sparse Graph Convolution Network for Hand Gesture Recognition

Authors: Omar Ikne, Rim Slama, Hichem Saoudi, Hazem Wannous

Conference:The 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024

Read Paper

Unlike whole-body action recognition, hand gestures involve spatially closely distributed joints, promoting stronger collaboration. This needs to be taken into account in order to capture complex spatial and temporal features. In response to these challenges, this paper presents a Spatio-Temporal Sparse Graph Convolution Network (ST-SGCN) for dynamic recognition of hand gestures. Based on decoupled spatio-temporal processing, the ST-SGCN incorporates Graph Convolutional Networks, attention mechanism and asymmetric convolutions to capture the nuanced movements of hand joints. The key novelty is the introduction of sparse spatio-temporal directed interactions, overcoming the limitations associated with dense, undirected methods. The sparse aspect models essential interactions between hand joints selectively, improving computational efficiency and interpretability. Directed interactions capture asymmetrical dependencies between hand joints, improving discernment of joint influences. Experimental evaluations on three benchmark datasets, including Briareo, SHREC'17 and IPN Hand, demonstrate ST-SGCN's state-of-the-art performance for dynamic hand gesture recognition. Codes are available at: https://github.com/HichemSaoudi/ST-SGCN

Automatic Modeling of Dynamical Interactions Within Marine Ecosystems

Authors: Omar Ikne, Maxime Folschette, Tony Ribeiro

Conference:The 1st International Joint Conference on Learning & Reasoning, 2021

Read Paper

Marine ecology models are used to study and anticipate population variations of plankton and microalgae species. These variations can have an impact on ecological niches, the economy or the climate. Our objective is the automation of the creation of such models. Learning From Interpretation Transition (LFIT) is a framework that aims at learning the dynamics of a system by observing its state transitions. LFIT provides explainable predictions in the form of logical rules. In this paper, we introduce a method that allows to extract an influence graph from a LFIT model. We also propose an heuristic to improve the model against noise in the data.

Journal Papers

SHREC 2024: Recognition Of Dynamic Hand Motions Molding Clay

Authors: Veldhuijzen, Ben and Veltkamp, Remco C and Ikne, Omar and Allaert, Benjamin and Wannous, Hazem and Emporio, Marco and Giachetti, Andrea and LaViola Jr, Joseph J and He, Ruiwen and Benhabiles, Halim and others

Journal:Computers & Graphics

Read Paper

Gesture recognition is a tool to enable novel interactions with different techniques and applications, like Mixed Reality and Virtual Reality environments. With all the recent advancements in gesture recognition from skeletal data, it is still unclear how well state-of-the-art techniques perform in a scenario using precise motions with two hands. This paper presents the results of the SHREC 2024 contest organized to evaluate methods for their recognition of highly similar hand motions using the skeletal spatial coordinate data of both hands. The task is the recognition of 7 motion classes given their spatial coordinates in a frame-by-frame motion. The skeletal data has been captured using a Vicon system and pre-processed into a coordinate system using Blender and Vicon Shogun Post. We created a small, novel dataset with a high variety of durations in frames. This paper shows the results of the contest, showing the techniques created by the 5 research groups on this challenging task and comparing them to our baseline method.

eMotion-GAN: A Motion-based GAN for Photorealistic and Facial Expression Preserving Frontal View Synthesis

Authors: Omar Ikne, Benjamin Allaert, Ioan Marius Bilasco, Hazem Wannous

Journal: Computer Vision and Image Understanding, 2025

Publication date: November 12, 2025

Code

Many existing facial expression recognition (FER) systems suffer from performance degradation under head pose variations. While frontalization methods attempt to mitigate this issue, they often introduce distortions that harm expression analysis. We propose eMotion-GAN, a novel GAN-based approach for frontal face synthesis that preserves facial expressions in the motion domain. By modeling head-pose-induced motion as noise and expression motion as relevant information, our method filters unwanted motion and maps expressive dynamics onto a neutral frontal face. Extensive evaluations on multiple dynamic FER datasets demonstrate significant performance gains, with improvements of up to +5% for small pose variations and up to +20% for large pose variations.