Sentiment analysis

Introduction

Our research focuses on the question of whether or not there appear to be changes in the polarity of sentiments in word usage of novels written between 1920 and 1960. To do this, we use sentiment analysis and so on this page we will try to briefly explain a bit more about it, mainly focusing on its use and one of its components, sentiment lexicons. We will also give three examples of practical use by examining three papers on research using sentiment analysis and briefly explain the sentiment lexicon we used for our research.

Short comment on word usage and translation

As we based a lot of our research on sentiment analysis of Japanese sources, we sometimes had to improvise a translation when there seemed to be none in lexicons. The categories of sentiment lexicons that appear in this text are categories we created based on observation of existing sentiment lexicons in different languages and thus the names are also given by us. We would also like to mention that for reading and translating the Japanese sources we used no artificial intelligence or translation engines. What we did use:

KanjiTomo (an OCR based pop-up lexicon, not sure for the creation process but it does not use AI to run, does not require internet connection, can be used outside of browser (in PDF and Zotero for example) and also works on pictures as it is OCR based.
Jisho, a free online Japanese-English two-way lexicon
A Japanese electronic dictionary (XD-series from Casio)

Definition of sentiment analysis

Sentiment analysis is an important component of Natural Language Processing (NLP). It is often defined as the process of identifying and analyzing subjective information in user-written text, enabling the categorization of sentiments as "positive," "negative," or "neutral." The idea of limiting the analysis of sentiments to the classification into only those three categories, however, has been deemed to be deficient, especially for more complex research as it ignores the complexity of human emotions. As such the aspect of "positive," "negative," or "neutral" can be omitted from the definition. The Cambridge Dictionary defines sentiment analysis as the following: “the process of using computer software to find out people's opinions or feelings about something from things that have been written, especially comments that people have posted on social media.” It is true that sentiment analysis is often used for the analysis of comments on social media, but as we will discuss in the section "Examples of research based on sentiment analysis”, there are a lot more ways to use it. Sentiment analysis often relies on machine learning but it is completely possible to work without artificial intelligence through the use of programming languages (such as Python, R, PowerShell…), language processors (such as MeCab) and sentiment/emotion lexicons.

Sentiment/emotion lexicons

Sentiment lexicon are repositories of words/phrases labelled from positive to negative. Emotion lexicons are repositories of words/phrases labelled with emotions.

Different types of sentiment and emotion lexicons

As mentioned earlier, the idea of limiting the analysis of emotions to the division into only three categories has been deemed to be deficient, especially for more complex research. As such, there is a lot of active discussion and ongoing research on how to create sentiment and emotion lexicons without disregarding the complexity of human emotions which has led to different methods. Beneath, you can find some examples of different types of sentiment and emotion lexicons. Each lexicon has its own strengths and weaknesses. Simplifying emotions too much can lead to limited understanding of them but more nuanced lexicons are often harder to use.

Semantic orientation lexicons

A lexicon that classifies words as positive, negative or neutral. It is deemed the most simple way of classification and often used for analysis of comments on social media or customer feedback.

Semantic orientation score lexicons

A lexicon that assigns a semantic orientation score to words. The score usually ranges from -1 (very negative) to 1 (very positive). 0 would be a negative score. Depending on the lexicon, scores could be very detailed with a lot of digits after the decimal separator or relatively simple with only one or two digits after the decimal separator. This approach allows for more nuance than only using the three categories positive, negative or neutral.

Emotion classification lexicons

A lexicon that classifies words based on a set of basic emotions. As there is no universal agreement on what the basic human emotions are, classification can be different in each lexicon but most are based on the division proposed by either psychologist Paul Ekman or psychologist Robert Plutchik. Psychologist Paul Ekman lists 6 basic emotions: joy, sadness, anger, fear, disgust, and surprise, while psychologist Robert Plutchik lists, 8 adding trust and anticipation. Further nuance can be added by splitting the emotions further (for example: anger into frustration, anger and rage) to give an idea of the strength of the emotion.

Multiple emotion orientation score lexicons

Similar to the emotion classification a set of basic emotions is used but in this case each word is assigned a score for each emotion often ranging from 0 (not present at all) to 1 (extremely present). This method takes into account that usually a word contains not just one basic emotion. For example, the word ‘hatred’ could be classified in the category of ‘anger’ when working with an emotion classification lexicon, but it can be stated that the word ‘hatred’ contains just as much of the feeling ‘disgust’ and arguably also a bit of ‘fear’ and ‘sadness.’ A multiple emotion orientation score lexicon could show all these basic emotions present in ‘hatred.’

Color association lexicons

Each word is assigned to a color it is associated with. This kind of lexicon is often impossible to translate directly as color association often depends on culture.

Frequent expression semantic orientation score lexicons

These lexicons work the same as semantic orientation score lexicons but here expressions are given a score depending on which words they are combined with. To give an example, if the expression ‘ことです’ is used together with the word ‘美しい’ it would be given a positive score while it would be given a negative score if it were to be used in combination with the word ‘悲しい.' It could be considered arguable whether this still counts as ‘a lexicon.’

The sentiment lexicon used in our research project

The sentiment lexicon we use in our research project is called ‘単語感情極性対応表’ which roughly translates as ‘Semantic Word-Orientation Lexicon.' It was created and made available to the general public for research purposes by three researchers from the Language and Information Research Team of the Artificial Intelligence Research Center (AIRC) in Tokyo: Takamura Hiroya (高村大也), Inui Takashi (乾孝司), Okumura Manabu (奥村学).

The sentiment scores are assigned to words through a very complicated automated process. If you want to know the exact theory behind it you can refer to the dissertation on it written by the developers (高村大也 Takamura Hiroya et al. 2006) but we will try to explain the very basics here. Please take into account that this lies far outside of our knowledge field.

The idea is that all words have a given distance from each other. They could be considered very close to each other in meaning (synonyms), very far from each other (antonyms), or anything in between. By taking the distance from each word to all other words into consideration a lexical network with words representing nodes is created. By then converting that model into a one-dimensional model, with the two farthest nodes being respectively the most positive word (receiving 1 as its score) and the most negative word (receiving -1 as its score), all words can be given a score based on their position. To determine the distance of words the so-called Ising spin model is used. This model is originally used as a mathematical model for ferromagnetism physics, but here it gets applied to language studies by comparing electrons with words. The positive or negative semantic orientation of words is compared to the up or down spin of electrons and the tendency of words to be defined by others with the tendency of closely situated electrons spins to go into the same direction. The actual principle behind this is very complex and we cannot go into further detail here but as mentioned earlier, you can read the original research if you are interested. Our apologies for any possible mistakes in the above explanation.

Examples of research based on sentiment analysis

In this segment we will take a closer look at three research papers that use sentiment analysis.

In this study dating from the end of 2022 sentiment analysis was used to extract the main stress factor in the environment surrounding the university out of social media.

Tweets concerning the environment surrounding the University of Kobe from Twitter posted in the past 5 months were collected and, after using sentiment analysis to split them into negative and positive tweets, the negative tweets were used as the target of analysis. Tweets that were deemed to be automatically sent were removed. Keywords were extracted and the ones that appeared 10 or more times were classified into categories: entrance exam, pursuit of knowledge, region, health, social conditions, club activities, part-time job, traffic, Twitter, and human relations.

The research experienced the following troubles/limitations: as the target of analysis are tweets from Twitter all data is restricted to Twitter users. it is unclear whether this group can be taken to represent all students. Furthermore, it is uncertain whether all tweets that were analyzed are truly written by university students. Lastly, it is also unclear to what level the extracted stress factors are a psychological burden on the students.

Despite these limitations, the method seems very promising especially as it gave largely the same results as gained through preceding large-scale surveys and also succeeded in extracting some stress factors that newly appeared due to the changing environment (such as the coronavirus).

Sentiment Extraction of Characters in Stories Based on a Sentiment Lexicon (感情語辞書に基づく物語の登場人物の感情抽出)(逢坂 Osaka et al. 2020)

In this study an emotion lexicon is created to study the emotional state of characters from a novel and how they interact with different story factors.

The novel that was analyzed is “The Beautiful World (キノの旅)”. A story where two friends travel together and visit different countries. Each chapter features a different country and could be read alone.

The emotion lexicon used was created by extracting the vocabulary from existing lexicons and manually adding labels to them. Originally 12 basic emotions were defined to be used as labels, but a few of them got assigned to only a very small amount of words and thus the labels were reduced to 8 basic emotions by merging some with others. The eventual categorization was the following: 好 (affection), 怖 (fear), 驚/昂 (surprise/excitement), 厭/恥/ 諦 (discomfort/shame/giving up), 怒 (anger), 喜 (joy), 哀/無 (sorrow/feeling of emptiness), 安 (trust).

For each chapter emotion words were extracted for the 2 main characters separately and for all other characters who are treated as one entity.

Factor analysis was used to divide the chapters into groups based on which emotions were most present and define story factors that influence the emotional state of the characters. As a result, the chapters were divided into 3 groups. After analysing the contents of each group it could be observed that chapters from the same group shared similar themes. From this observation, the groups were given the following names. Group 1: 皮肉な物語の因子(ironic story factors), group 2: 意外性の強い物語の因子(surprising story factors), group 3: 悲喜劇的な物語の因子(tragic story factors).

As the emotion lexicon for this research was created by one person manually adding the labels to words, it is in need of further testing. The fact that some labels had to be merged together because they only gave neglectable output could mean that the lexicon is still missing valuable words and needs to be expanded. Even so, the fact that the groups, created based on sentiment analysis using the emotion lexicon constructed for it, contained chapters featuring similar themes proves that the lexicon has potential.

Effects of MBCT-CP on Expression of Pain and Meditative Experiences (Mbct-Cpが痛みと瞑想経験の表現に与える影響)(阿部 Abe et al. 2022)

In this study 5 participants living with chronic pain are asked to keep a diary in which they write about their experience with pain for a time span of 4 weeks. After those 4 weeks they received MBCT-CP (Mindfulness-Based Cognitive Therapy for Chronic Pain) for another 4 weeks during which the participants were asked to write about their experience with the meditation sessions and also continue describing their pain in their diary . After that period participants continued to write about their experience with pain for an additional 4 weeks.

The diaries of the participants were analysed and graphics were made to visualize the changes in the amount of positive and negative word usage. Through this the following things could be observed: by the end of the experiment positive words appeared more often than negative words. During the MBCT-CP treatment period however, negative words start to increase relatively sharply at first until they decrease again towards the end and then decrease even further during the last 4 weeks after the MBCT-CP treatment.

Although this is a very small scale research based on only 5 participants, it shows the potential of sentiment analysis to be used to evaluate the effects of therapy.

Conclusion

As shown through the examples above, sentiment analysis is very versatile and can be used for various purposes. As human emotions are very complex however there are still limitations to it but even so qs shown through the aforementioned examples, by determining an appropriate research object and methodology, it can be used to aid research on subjective data. It can, and has been used for example, for extracting main stress factors amongst university students through an analysis of tweets on social media. Another example we discussed is a study done on the effects of MBCT-CP on expressions of pain, where they applied sentiment analysis on diaries of the participants. And so, while it does have its limitations in regard to the complexity of the human mind, looking at it from the other side, it actually forms a way to bring structure to this complexity allowing for studies of greater speed and quantity and allowing for new approaches to the solving of problems born from society.

Sources

Abe, Tetsuri 阿部哲理, Muta Toshimizu 牟田季純, Ishikawa Haruyuki 石川遥至, Nozomi Imajo 今城希望, Ito Etsuro 伊藤悦朗, and Koshikawa Fusako 越川房子. "Mbct-Cp ga itami to meisoukeiken no hyougen ni ataeru eikyou" [Effects of MBCT-CP on Expression of Pain and Meditative Experiences]. Nihon shinrigaku taikai happyou ronbun-shuu 日本心理学会大会発表論文集 86: 3PM-033-PD (2022). https://doi.org/10.4992/pacjpa.86.0_3PM-033-PD.

Adachi, Sho 足立祥, Mouri Kentaro 毛利健太朗, and Yasuji Yamamoto 山本泰司. "SNS kanjou bunseki wo mochiita, daigaku wo torimaku kankyou de no sutoresu genin chuushutsu no kokoromi" SNS感情分析を用いた、大学を取り巻く環境でのストレス要因抽出の試み [An attempt to use sentiment analysis to extract the main stress factor in the environment surrounding the university out of social media]. Daigaku no mentaru herusu 大学のメンタルヘルス 7: 76-78 (2023). https://doi.org/10.60198/jjcmh.7.0_76.

Adachi, Yoshiro 安達由洋, Kondo Tomohiro 近藤友啓, Kobayashi Takamitsu 小林孝允, Etani Nao 惠谷菜央, and Ishii Kaito 石井解人. “Kanjougojisho wo mochiita nihongobun no kanjoubunseki” 感情語辞書を用いた日本語文の感情分析 [Emotion Analysis of Japanese Sentences Using an Emotion-word Dictionary]. Kashika jouhou gakkaishi 可視化情報学会誌 41, no. 161 (2021): 21-27. https://doi.org/10.3154/jvs.41.161_21.

Cambria Erik, Dāsa Dīpaṅkara, Bandyopadhyay Sivaji, and Feraco Antonio. 2017. A Practical Guide to Sentiment Analysis. Chapter 5, Sentiment Resources: Lexicons and Datasets. Socio-Affective Computing 5. Springer International Publishing. https://doi.org/10.1007/978-3-319-55394-8.

Ichimura, Mai 市村真衣, and Hisano Masaki 久野雅樹. "Kanjou kategori wo kouyo shita tango kyokusei no seitei" 感情カテゴリを考慮した単語極性の推定 [The estimation of sentiment polarity taking emotion categories into account]. Gengo shori-gakkai dai29-kai nenji taikai happyou ronbun-shuu 言語処理学会第29回年次大会発表論文集 (2023). https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/P7-1.pdf>.

Nagasawa, Naomu 長澤尚武, and Hagiwara Masafumi 萩原将文. "Bunmyaku o kouryo shita tango no kanjousuitei" 文脈を考慮した単語の感情推定 [Context-sensitive Emotion Estimation for Words]. Nihon kansei kougakkai ronbunshi 日本感性工学会論文誌 23, no. 2: 87-96 (2023). https://doi.org/10.5057/jjske.TJSKE-D-23-00027.

Osaka, Shunya 逢坂駿也, and Murai Hajime 村井源. "Kanjou-go jisho ni motodzuku monogatari no toujou jinbutsu no kanjou chuushutsu" [Emotion Extraction of Characters in Stories Based on Emotion-Word Dictionary]. Jouhou chishiki gakkaishi 情報知識学会誌 30, no. 2: 283-88 (2020). https://doi.org/10.2964/jsik_2020_031.

Takamura, Hiroya 高村大也, Inui Takashi 乾孝司, and Okumura Manabu 奥村学. "Supin moderu ni yoru tango no kanjou kyokusei chuushutsu" スピンモデルによる単語の感情極性抽出 [Extracting Semantic Orientations Using Spin Model]. Jouhou shori gakkai robunshi 情報処理学会論文誌 47, no. 2: 627-37 (2006).

Takeuchi, Tatsuya 武内達哉, and Hagiwara Masafumi 萩原将文. "Tango no motsu kanjou suitei-hou no teian to tango kanjou jisho no kouchiku" 単語の持つ感情推定法の提案と単語感情辞書の構築 [A proposal for a method to estimate the emotions words hold and its application to construct a word-emotion dictionary]. Nihon kansei kougakkai ronbunshi 日本感性工学会論文誌 18, no. 4 (2019): 273-78. https://doi.org/10.5057/jjske.TJSKE-D-18-00104.

Touhoku-Daigaku inui・okazaki kenkyuushitsu. 東北大学乾・岡崎研究室. "Nihongo hyouka kyokusei-jisho" 日本語評価極性辞書 [Japanese sentiment polarity dictionary]. https://www.cl.ecei.tohoku.ac.jp/Open_Resources-Japanese_Sentiment_Polarity_Dictionary.html. (Accessed 16 May 2026).