Publications

Conference Papers

Skeleton-based Self-Supervised Feature Extraction for Improved Dynamic Hand Gesture Recognition

Authors: Omar Ikne, Benjamin Allaert, Hazem Wannous

Conference: The 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024

Read Paper

Human-computer interaction has become an essential part of our lives, especially with the particularly with the rise of digital environments. Nevertheless, the challenge persists through hand gestures, given the complexities associated with factors like pose variation and occlusions. In this paper, we propose an innovative approach to improve skeleton-based hand gesture recognition by integrating self-supervised learning, a promising technique for acquiring distinctive representations directly from unlabeled data. The proposed method takes advantage of prior knowledge of hand topology, combining topology-aware self-supervised learning with a customized skeleton-based architecture to derive meaningful representations from skeleton data under different hand poses. We introduce customized masking strategies for skeletal hand data and design a model architecture that incorporates spatial connectivity information, improving the model’s understanding of the interrelationships between hand joints. The extensive experiments demonstrate the effectiveness of the approach, with state-of-the-art performance on benchmark datasets. An exploration of the generalization of learned representations across datasets and a study of the impact of fine-tuning with limited labeled data are conducted, highlighting the adaptability and robustness of the proposed approach. Code and trained models are available at: https://github.com/o-ikne/SkelMAE

Spatio-Temporal Sparse Graph Convolution Network for Hand Gesture Recognition

Authors: Omar Ikne, Rim Slama, Hichem Saoudi, Hazem Wannous

Conference:The 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024

Read Paper

Unlike whole-body action recognition, hand gestures involve spatially closely distributed joints, promoting stronger collaboration. This needs to be taken into account in order to capture complex spatial and temporal features. In response to these challenges, this paper presents a Spatio-Temporal Sparse Graph Convolution Network (ST-SGCN) for dynamic recognition of hand gestures. Based on decoupled spatio-temporal processing, the ST-SGCN incorporates Graph Convolutional Networks, attention mechanism and asymmetric convolutions to capture the nuanced movements of hand joints. The key novelty is the introduction of sparse spatio-temporal directed interactions, overcoming the limitations associated with dense, undirected methods. The sparse aspect models essential interactions between hand joints selectively, improving computational efficiency and interpretability. Directed interactions capture asymmetrical dependencies between hand joints, improving discernment of joint influences. Experimental evaluations on three benchmark datasets, including Briareo, SHREC'17 and IPN Hand, demonstrate ST-SGCN's state-of-the-art performance for dynamic hand gesture recognition. Codes are available at: https://github.com/HichemSaoudi/ST-SGCN

Automatic Modeling of Dynamical Interactions Within Marine Ecosystems

Authors: Omar Ikne, Maxime Folschette, Tony Ribeiro

Conference:The 1st International Joint Conference on Learning & Reasoning, 2021

Read Paper

Marine ecology models are used to study and anticipate population variations of plankton and microalgae species. These variations can have an impact on ecological niches, the economy or the climate. Our objective is the automation of the creation of such models. Learning From Interpretation Transition (LFIT) is a framework that aims at learning the dynamics of a system by observing its state transitions. LFIT provides explainable predictions in the form of logical rules. In this paper, we introduce a method that allows to extract an influence graph from a LFIT model. We also propose an heuristic to improve the model against noise in the data.

Journal Papers

SHREC 2024: Recognition Of Dynamic Hand Motions Molding Clay

Authors: Veldhuijzen, Ben and Veltkamp, Remco C and Ikne, Omar and Allaert, Benjamin and Wannous, Hazem and Emporio, Marco and Giachetti, Andrea and LaViola Jr, Joseph J and He, Ruiwen and Benhabiles, Halim and others

Journal:Computers & Graphics

Read Paper

Gesture recognition is a tool to enable novel interactions with different techniques and applications, like Mixed Reality and Virtual Reality environments. With all the recent advancements in gesture recognition from skeletal data, it is still unclear how well state-of-the-art techniques perform in a scenario using precise motions with two hands. This paper presents the results of the SHREC 2024 contest organized to evaluate methods for their recognition of highly similar hand motions using the skeletal spatial coordinate data of both hands. The task is the recognition of 7 motion classes given their spatial coordinates in a frame-by-frame motion. The skeletal data has been captured using a Vicon system and pre-processed into a coordinate system using Blender and Vicon Shogun Post. We created a small, novel dataset with a high variety of durations in frames. This paper shows the results of the contest, showing the techniques created by the 5 research groups on this challenging task and comparing them to our baseline method.

eMotion-GAN: A Motion-based GAN for Photorealistic and Facial Expression Preserving Frontal View Synthesis

Authors: Omar Ikne, Benjamin Allaert, Ioan Marius Bilasco, Hazem Wannous

Journal: arXiv preprint arXiv:2404.09940, 2024

Read Paper

Many existing facial expression recognition (FER) systems encounter substantial performance degradation when faced with variations in head pose. Numerous frontalization methods have been proposed to enhance these systems' performance under such conditions. However, they often introduce undesirable deformations, rendering them less suitable for precise facial expression analysis. In this paper, we present eMotion-GAN, a novel deep learning approach designed for frontal view synthesis while preserving facial expressions within the motion domain. Considering the motion induced by head variation as noise and the motion induced by facial expression as the relevant information, our model is trained to filter out the noisy motion in order to retain only the motion related to facial expression. The filtered motion is then mapped onto a neutral frontal face to generate the corresponding expressive frontal face. We conducted extensive evaluations using several widely recognized dynamic FER datasets, which encompass sequences exhibiting various degrees of head pose variations in both intensity and orientation. Our results demonstrate the effectiveness of our approach in significantly reducing the FER performance gap between frontal and non-frontal faces. Specifically, we achieved a FER improvement of up to +5% for small pose variations and up to +20% improvement for larger pose variations. Code and trained models are available at: https://github.com/o-ikne/eMotion-GAN