How are speech sentiment and emotion measured?

Speech sentiment and emotion analyses are more and more frequently adopted by companies for different purposes. One of the most common practical applications of this technology is better understanding of customers. If you are interested how it is possible to utilise machine-learning techniques for understanding people better, this article will provide you with the technical side of this question.

Understanding speech analysis

Speech can be viewed as sound waves generated by humans with its physical parameters determining either a sentiment or an emotion depending on the scope of research. Such an approach differs from the techniques adopted by NLP since it omits the actual linguistic meaning of the speech.

To derive this meaning, in NLP, speech undergoes speech-to-text transcription first and is then analysed with classic rule-based NLP algorithms or automated methods of machine learning (ML). Sometimes, both types of analyses are used together.

Rule-based techniques

The rule-based approach often applies special lexicons consisting of expressions and words used for identifying sentiments or emotions as well as special text manipulation algorithms. Some of the most popular ones are tokenisation, stemming, lemmatisation, tagging words into parts of speech and word categorisation by a grammatical function.

For example, with such techniques a prevailing sentiment or emotion of a text is measured by the frequency of occurrence of certain words and phrases.

Machine-learning methods

Sentiment and emotion analyses heavily rely on classification algorithms which are trained with categorised datasets. An algorithm is looking for similarities between data points and, with this knowledge, categorises new data. The same ML techniques can be applied to both sentiment and emotional recognition, albeit the latter one will have more complex models with a greater number of variables or categories.

Sentiment and emotion analyses frequently uses the Naive Bayes, Maximum Entropy and SVM algorithms, however, the scope of the available classifying techniques covers many other options including Logistic Regression, k-Nearest Neighbour and neural networks.

Naive Bayes is built on Bayes’ Theorem and the theory of probability. Training such an algorithm with a dataset of pieces of a text with tags related to emotions or sentiments, we can calculate the probability for a tested text to fit into the existing categories. Its major difference from the Maximum Entropy (ME) classifier is in its initial assumption of the proportionality between the significance of a certain word for the analysis and the frequency of its occurrence in the text. ME also pays attention to the correlation between words.

SVM or Support Vector Machine is using a different approach of looking for a hyperplane for separating the pre-categorised data points of the training dataset. It can be visualised as a two-dimensional plane containing all the data points for which the algorithm will find a separator with the maximum possible distance between the tags.

All of the ML techniques include data pre-processing, splitting it into training and testing sets, training the algorithm and testing it with the test set. When the model is evaluated positively, it can be used for testing other datasets.