Evaluating real-life performance of the state-of-the-art in facial expression recognition using a novel YouTube-based datasets

2017 ◽  
Vol 77 (1) ◽  
pp. 917-937 ◽  
Author(s):  
Muhammad Hameed Siddiqi ◽  
Maqbool Ali ◽  
Mohamed Elsayed Abdelrahman Eldib ◽  
Asfandyar Khan ◽  
Oresti Banos ◽  
...  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Olalekan Agbolade ◽  
Azree Nazri ◽  
Razali Yaakob ◽  
Abdul Azim Ghani ◽  
Yoke Kqueen Cheah

Abstract Background Expression in H-sapiens plays a remarkable role when it comes to social communication. The identification of this expression by human beings is relatively easy and accurate. However, achieving the same result in 3D by machine remains a challenge in computer vision. This is due to the current challenges facing facial data acquisition in 3D; such as lack of homology and complex mathematical analysis for facial point digitization. This study proposes facial expression recognition in human with the application of Multi-points Warping for 3D facial landmark by building a template mesh as a reference object. This template mesh is thereby applied to each of the target mesh on Stirling/ESRC and Bosphorus datasets. The semi-landmarks are allowed to slide along tangents to the curves and surfaces until the bending energy between a template and a target form is minimal and localization error is assessed using Procrustes ANOVA. By using Principal Component Analysis (PCA) for feature selection, classification is done using Linear Discriminant Analysis (LDA). Result The localization error is validated on the two datasets with superior performance over the state-of-the-art methods and variation in the expression is visualized using Principal Components (PCs). The deformations show various expression regions in the faces. The results indicate that Sad expression has the lowest recognition accuracy on both datasets. The classifier achieved a recognition accuracy of 99.58 and 99.32% on Stirling/ESRC and Bosphorus, respectively. Conclusion The results demonstrate that the method is robust and in agreement with the state-of-the-art results.


Symmetry ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 52 ◽  
Author(s):  
Xianzhang Pan ◽  
Wenping Guo ◽  
Xiaoying Guo ◽  
Wenshu Li ◽  
Junjie Xu ◽  
...  

The proposed method has 30 streams, i.e., 15 spatial streams and 15 temporal streams. Each spatial stream corresponds to each temporal stream. Therefore, this work correlates with the symmetry concept. It is a difficult task to classify video-based facial expression owing to the gap between the visual descriptors and the emotions. In order to bridge the gap, a new video descriptor for facial expression recognition is presented to aggregate spatial and temporal convolutional features across the entire extent of a video. The designed framework integrates a state-of-the-art 30 stream and has a trainable spatial–temporal feature aggregation layer. This framework is end-to-end trainable for video-based facial expression recognition. Thus, this framework can effectively avoid overfitting to the limited emotional video datasets, and the trainable strategy can learn to better represent an entire video. The different schemas for pooling spatial–temporal features are investigated, and the spatial and temporal streams are best aggregated by utilizing the proposed method. The extensive experiments on two public databases, BAUM-1s and eNTERFACE05, show that this framework has promising performance and outperforms the state-of-the-art strategies.


Algorithms ◽  
2019 ◽  
Vol 12 (11) ◽  
pp. 227 ◽  
Author(s):  
Yingying Wang ◽  
Yibin Li ◽  
Yong Song ◽  
Xuewen Rong

In recent years, with the development of artificial intelligence and human–computer interaction, more attention has been paid to the recognition and analysis of facial expressions. Despite much great success, there are a lot of unsatisfying problems, because facial expressions are subtle and complex. Hence, facial expression recognition is still a challenging problem. In most papers, the entire face image is often chosen as the input information. In our daily life, people can perceive other’s current emotions only by several facial components (such as eye, mouth and nose), and other areas of the face (such as hair, skin tone, ears, etc.) play a smaller role in determining one’s emotion. If the entire face image is used as the only input information, the system will produce some unnecessary information and miss some important information in the process of feature extraction. To solve the above problem, this paper proposes a method that combines multiple sub-regions and the entire face image by weighting, which can capture more important feature information that is conducive to improving the recognition accuracy. Our proposed method was evaluated based on four well-known publicly available facial expression databases: JAFFE, CK+, FER2013 and SFEW. The new method showed better performance than most state-of-the-art methods.


Research ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Meiqi Zhuang ◽  
Lang Yin ◽  
Youhua Wang ◽  
Yunzhao Bai ◽  
Jian Zhan ◽  
...  

The facial expressions are a mirror of the elusive emotion hidden in the mind, and thus, capturing expressions is a crucial way of merging the inward world and virtual world. However, typical facial expression recognition (FER) systems are restricted by environments where faces must be clearly seen for computer vision, or rigid devices that are not suitable for the time-dynamic, curvilinear faces. Here, we present a robust, highly wearable FER system that is based on deep-learning-assisted, soft epidermal electronics. The epidermal electronics that can fully conform on faces enable high-fidelity biosignal acquisition without hindering spontaneous facial expressions, releasing the constraint of movement, space, and light. The deep learning method can significantly enhance the recognition accuracy of facial expression types and intensities based on a small sample. The proposed wearable FER system is superior for wide applicability and high accuracy. The FER system is suitable for the individual and shows essential robustness to different light, occlusion, and various face poses. It is totally different from but complementary to the computer vision technology that is merely suitable for simultaneous FER of multiple individuals in a specific place. This wearable FER system is successfully applied to human-avatar emotion interaction and verbal communication disambiguation in a real-life environment, enabling promising human-computer interaction applications.


Author(s):  
Weicheng Xie ◽  
Linlin Shen ◽  
Meng Yang ◽  
Zhihui Lai

Facial expression has lots of applications in human-computer interaction. Although feature extraction and selection have been well studied, the specificity of each expression variation is not fully explored in state-of-the-art works. In this work, the problem of multiclass expression recognition is converted into triplet-wise expression recognition. For each expression triplet, a new feature optimization model based on Action Unit (AU) weighting and patch weight optimization is proposed to represent the specificity of the expression triplet. Sparse representation based approach is then proposed to detect the active AUs of testing sample for better generalization. The algorithm achieved competitive accuracies of 89.67% and 94.09% for Jaffe and CK+ databases, respectively. Better cross-database performance has also been observed.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5184
Author(s):  
Min Kyu Lee ◽  
Dae Ha Kim ◽  
Byung Cheol Song

Facial expression recognition (FER) technology has made considerable progress with the rapid development of deep learning. However, conventional FER techniques are mainly designed and trained for videos that are artificially acquired in a limited environment, so they may not operate robustly on videos acquired in a wild environment suffering from varying illuminations and head poses. In order to solve this problem and improve the ultimate performance of FER, this paper proposes a new architecture that extends a state-of-the-art FER scheme and a multi-modal neural network that can effectively fuse image and landmark information. To this end, we propose three methods. To maximize the performance of the recurrent neural network (RNN) in the previous scheme, we first propose a frame substitution module that replaces the latent features of less important frames with those of important frames based on inter-frame correlation. Second, we propose a method for extracting facial landmark features based on the correlation between frames. Third, we propose a new multi-modal fusion method that effectively fuses video and facial landmark information at the feature level. By applying attention based on the characteristics of each modality to the features of the modality, novel fusion is achieved. Experimental results show that the proposed method provides remarkable performance, with 51.4% accuracy for the wild AFEW dataset, 98.5% accuracy for the CK+ dataset and 81.9% accuracy for the MMI dataset, outperforming the state-of-the-art networks.


2020 ◽  
Vol 1 (6) ◽  
Author(s):  
Pablo Barros ◽  
Nikhil Churamani ◽  
Alessandra Sciutti

AbstractCurrent state-of-the-art models for automatic facial expression recognition (FER) are based on very deep neural networks that are effective but rather expensive to train. Given the dynamic conditions of FER, this characteristic hinders such models of been used as a general affect recognition. In this paper, we address this problem by formalizing the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks. We introduce an inhibitory layer that helps to shape the learning of facial features in the last layer of the network and, thus, improving performance while reducing the number of trainable parameters. To evaluate our model, we perform a series of experiments on different benchmark datasets and demonstrate how the FaceChannel achieves a comparable, if not better, performance to the current state-of-the-art in FER. Our experiments include cross-dataset analysis, to estimate how our model behaves on different affective recognition conditions. We conclude our paper with an analysis of how FaceChannel learns and adapts the learned facial features towards the different datasets.


Electronics ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 385 ◽  
Author(s):  
Ying Chen ◽  
Zhihao Zhang ◽  
Lei Zhong ◽  
Tong Chen ◽  
Juxiang Chen ◽  
...  

Near-infrared (NIR) facial expression recognition is resistant to illumination change. In this paper, we propose a three-stream three-dimensional convolution neural network with a squeeze-and-excitation (SE) block for NIR facial expression recognition. We fed each stream with different local regions, namely the eyes, nose, and mouth. By using an SE block, the network automatically allocated weights to different local features to further improve recognition accuracy. The experimental results on the Oulu-CASIA NIR facial expression database showed that the proposed method has a higher recognition rate than some state-of-the-art algorithms.


2021 ◽  
Vol 1963 (1) ◽  
pp. 012010
Author(s):  
Sharmeen M. Saleem ◽  
Subhi R. M. Zeebaree ◽  
Maiwan B. Abdulrazzaq

Sign in / Sign up

Export Citation Format

Share Document