Abstract
Artificial intelligence substantially changes the global world, influencing technologies, machines, and objects in various encouraging aspects nowadays; emotion recognition is also one of them. This paper describes a significant contribution of emotion recognition by applying conventional and deep learning methodologies by focusing on limitations and demanding challenges. It also intends to explore the comparative study on recently applied machine learning and deep learning-based algorithms, which provide the best accuracy rates to recognize emotions. This Comparative study consists of different feature extractions, classifier models, and datasets that recognize the emotions within a facial image, speech, and non-verbal communication and describes their features and principles for future research work. We have shown the balancing accuracy, and efficiency of using hybrid classification techniques briefly explained in Speech emotion recognition. This review study would be more beneficial in enhancing automated decision-making services in various customer-based industries and observing patients in the health care sector, industries, public sectors, private sectors, and production firms.
Introduction
Technology is prevalent in a broad way to learn, think, and act like human intelligence, referred to as artificial emotional intelligence. Artificial emotional intelligence is a subset of artificial intelligence which refers to recollecting, recognising and reacting to human emotions (Erol et al., 2020). Artificial emotional intelligence is nothing but a Human-machine interaction-based computing technology that detects facial expressions and automatically recognises the emotions from Speech and audio-visual data (Amorim et al., 2019). Artificial emotional intelligence allows various techniques and approaches to performing over the collected and analysed data to detect emotions from multiple pathways, as generally, humans do (Kumar et al., 2018). Identifying human emotions is efficient through computer vision technology, pattern recognition, virtual reality, and augmented reality (Cohn & La Torre, 2018). Generally, People use lots of non-verbal language, body language, and gestures to express their feelings in terms of emotions. Mainly, seven expressions can be detected: happiness, anger, sadness, surprise, disgust, fear, and neutral (Kumar et al., 2018). Emotion recognition systems are helpful to run the business, advertising campaigns, social surveys, analysing the customer’s reaction to products, health monitoring intelligence, and several other human–machine interaction applications (Kumar et al., 2018) such as E-learning programs, Identification-driven for a social robot, Cyber security, fraud detection, driving assistance, human resource, patient counselling, workplace designing, IoT integrated gaming applications. To detect the changes in emotions, the researchers have implemented various algorithms and techniques, where the accuracy rate varies with the implementation of conventional emotion recognition (Kim, 2016) and advanced deep learning algorithms (Spiers, 2016), such as neural network algorithms, and natural language processing algorithms (Sekhon & Agarwal, 2015). Facial image detection in a general image captured by cameras, sensors, signals, and any other electrophysical device can be determined by the shape, landmarks, and facial features (Kumar et al., 2018). These are generally and technically positioned in the pixel variant coordinates system (Meynet, 2003). It uses static and dynamic approaches (Kumar et al., 2018). Still, emotion detection from advanced signals such as electromyography and electrocardiograph (Happy & Routray, 2011) may also lead to practical real-time applications. To detect the micro-expression and emotions of a human face, firstly, it requires to detect the different features, variations, and movements beneath the position of facial skin (Alkawaz et al., 2015). Feature extraction is the vital process of emotion recognition. We have discussed several conventional and advanced feature extraction methods and approaches used in recent years.
Section snippets
Literature survey
Although some work has been done for reviewing the emotion recognition techniques in past years, those surveys were focused on particular expressions, eye motions (Hickson et al., 2019), pre-processing and feature extraction techniques such as LBP (Adegun & Vadapalli, 2020), Gabor wavelet (Mehta & Jadhav, 2016), existing classifiers, datasets, specific conventional and deep learning feature extraction. (Anagnostopoulos & Iliou, 2015) presented a survey study on existing classifiers and feature
The conventional architecture of the facial emotion recognition
Fig. 1 depicts the traditional architecture of a facial expression recognition system. (Revina & Emmanuel, 2018). It clearly shows how emotions are recognized from a facial image as input data which passes through preprocessing, feature extraction and, classification stages after reducing the irrelevant information. In this Fig. 1, six basic emotions have been shown as facial recognition output: surprise, smile, sadness, anger, fear, and disgust (Revina & Emmanuel, 2018).
The Conventional
Limitation of the conventional approach
The conventional approach has its limitation in feature extraction, classification, and processing of a certain amount of data. In contrast, deep learning-based techniques depend on pre-processing the images and feature extraction methods. It is highly durable in the domain with parameters such as radiance and obstruction, indicating that it can perform better than conventional approaches. In addition, Deep learning techniques can handle high volume data, but conventional techniques can deal
Deep learning-based facial emotion recognition (FER)
Deep learning-based facial emotion recognition techniques- The deep learning-based facial emotion recognition approach has become the most exciting area among researchers due to its high accuracy and capacity for automatic recognition. Various techniques have been implemented to recognize facial emotions from images and videos. Fig. 3 (Chul Ko, 2018) illustrates that after preprocessing the input data feature extraction and classification of expressions, the same methods can do both, and then
Speech based emotion detection techniques
Automatic Audio/Speech detection schemes analyze the emotions from an audio clip. The human voice and emotion detection is a crucial subject research area in the present time.
The speech emotion detection system is categorized into three stages.
- 1.
Feature extraction
- 2.
Feature dimension reduction
- 3.
Classification Engine
Classifiers for emotion recognition
Classifiers are the approximation mapping task of input and output units of any model in which it specifies the correlation that varies with the application and databases. classification and regression tree (machine learning algorithm), multilayer feed forward neural network (Greche et al., 2017), back propagation algorithms (Sekhon & Agarwal, 2015) also can be used with the bayesian algorithms to achieve better accuracy rate (Han et al., 2019). different types of prominent classifiers are as
Computational efforts and burden
Initialization of the weights, redaction of the weights, and the feature value of face characteristics such as distance between the eyes, eyes to nose tip, eyes to the right point of lips, nose tip to the right point of lips, and nose tips to lips. The following parameters are considered when determining the significance and application of the proposed methods: layers, learning rate, weighting, error values, and so on (Sekhon & Agarwal, 2015). Table 5 shows the computational effort and burden.
- a.
Challenges and issues in deep learning approach
Although we have studied effective techniques of deep learning approach in emotion recognition in the facial image and speech-based input data, which also resolves the limitation of the conventional approach, apart from this achievement, deep learning approach has remained some issues that drag the attention for further research in enhancement.
- •
The deep learning approach deals with large and complex input data; therefore, it requires a large dataset for the training and testing model. The
Conclusion
This research paper reviewed the various proposed conventional and deep learning approaches and models, which have obtained the highest accuracy from their implementation. We have discussed the (1) conventional approach and (2) deep learning-based approach for facial and speech-based emotion recognition, along with their steps and highest accuracy rate. In the conventional approach, recognizing Human emotions deals with four steps: Input data, feature extraction techniques, classifiers, and
Credit authorship contribution statement
Himanshu Kumar: Conceptualization, Methodology, Writing – original draft. A. Martin: Visualization, Investigation, Supervision, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.