Sentiment analysis
Introduction
Our research focuses on the question whether or not there appears to be changes in the polarity of sentiments in word usage of novels written between 1920 and 1960. To do this we use sentiment analysis and so on this page we will try to briefly explain a bit more about it, mainly focusing on its use and one of its components, sentiment lexicons. We will also give three examples of practical use by examining three papers on research using sentiment analysis and briefly explain the sentiment lexicon we used for our research.
Short comment on word usage and translation
As we based a lot of our research on sentiment analysis on Japanese sources we sometimes had to improvise for a translation when there seemed to be none in lexicons. The categories of sentiment lexicons that appear in this text are categories we created based on observation of existing sentiment lexicons in different languages and thus the names are also given by us. We would also like to mention that for reading and translating the Japanese sources we used no artificial intelligence or translation engines. What we did use: - [KanjiTomo] (https://www.kanjitomo.net/) (an OCR based pop-up lexicon, not sure for the creation process but it does not use AI to run, does not require internet connection, can be used outside of browser (in PDF and Zotero for example) and also works on pictures as it is OCR based. - [Jisho] (https://jisho.org/), a free online Japanese-English two-way lexicon - A Japanese electronic dictionary (XD-series from Casio)
Definition of sentiment analysis
Sentiment analysis is an important component of Natural Language Processing (NLP). It is often defined as the process of identifying and analyzing subjective information in user-written text, enabling the categorization of sentiment as "positive," "negative," or "neutral." The idea of limiting the analysis of sentiment to the classification into only those three categories however has been deemed to be deficient especially for more complex research as it ignores the complexity of human emotions. As such the aspect of "positive," "negative," or "neutral" can be omitted from the definition. The Cambridge lexicon defines sentiment analysis as the following: “the process of using computer software to find out people's opinions or feelings about something from things that have been written, especially comments that people have posted on social media”. It is true that sentiment analysis is often used for the analysis of comments on social media but as you will see in the section “examples of research based on sentiment analysis” there are a lot more ways to use sentiment analysis. Sentiment analysis often relies on machine learning but it is completely possible to work without artificial intelligence through the use of programming languages (such as Python, R, PowerShell…), language processors (such as MeCab) and sentiment/emotion lexicons.
Sentiment/emotion lexicons
Sentiment lexicon are repositories of words/phrases labelled from positive to negative. Emotion lexicons are repositories of words/phrases labelled with emotions
Diffrent types of sentiment and emotion lexicons
AAs mentioned earlier, the idea of limiting the analysis of emotions to the division into only three categories has been deemed to be deficient, especially for more complex research. As such there is a lot of active discussion and ongoing research on how to create sentiment and emotion lexicons without disregarding the complexity of human emotions which has led to different methods. Beneath, you can find some examples of different types of sentiment and emotion lexicons. Each lexicon has its own strengths and weaknesses. Simplifying emotions too much can lead to limited understanding of them but more nuanced lexicons are often harder to use.
- Semantic orientation lexicons
A lexicon that classifies words as positive, negative or neutral. It is deemed the most simple way of classification and often used for analysis of comments on social media or customer feedback.
- Semantic orientation score lexicons
A lexicon that assigns a semantic orientation score to words. The score usually ranges from -1 (very negative) to 1 (very positive). 0 would be a negative score. Depending on the lexicon, scores could be very detailed with a lot of digits after the decimal separator or relatively simple with only one or two digits after the decimal separator. This approach allows for more nuance than only using the three categories positive, negative or neutral.
- Emotion classification lexicons
A lexicon that classifies words based on a set of basic emotions. As there is no universal agreement on what the basic human emotions are, classification can be different in each lexicon but most are based on the division proposed by either psychologist Paul Ekman or psychologist Robert Plutchik. Psychologist Paul Ekman lists 6 basic emotions: joy, sadness, anger, fear, disgust, and surprise, while psychologist Robert Plutchik lists 8 adding trust and anticipation. Further nuance can be added splitting the emotions further (For example: anger into frustration, anger and rage) to give an idea of the strength of the emotion.
- Multiple emotion orientation score lexicons
Similar to the Emotion classification a set of basic emotions is used but in this case each word is assigned a score for each emotion often ranging from 0 (not present at all) to 1(extremely present). This method takes into account that usually a word contains not just one basic emotion. For example the word ‘hatred’ could be classified in the category of ‘anger’ when working with an Emotion classification lexicon, but it can be stated that the word ‘hatred’ contains just as much of the feeling ‘disgust’ and arguably also a bit of ‘fear’ and ‘sadness’. A multiple emotion orientation score lexicon could show all these basic emotions present in ‘hatred’.
- Colour association lexicons
Each word is assigned to a colour it is associated with. This kind of lexicon is often impossible to translate directly as colour association often depends on culture.
- Frequent expression semantic orientation score lexicons
These lexicons work the same as semantic orientation score lexicons but here expressions are given a score depending with which words they are combined with. To give an example, if the expression ‘ことです’ is used together with the word ‘美しい’ it would be given a positive score while it would be given a negative score if it were to be used in combination with the word‘悲しい’. It could be considered arguable whether this still count as ‘a lexicon’
The sentiment lexicon used in our research project
The sentiment lexicon we use in our research project is called ‘単語感情極性対応表’ which roughly translates as ‘Semantic Word-Orientation Lexicon’. It was created and made available to the general public for research purposes by three researchers from the [Language and Information Research Team] (https://www.airc.aist.go.jp/en/kirt) of the Artificial Intelligence Research Center (AIRC) in Tokyo, Takamura Hiroya (高村大也), Inui Takashi (乾孝司), Okumura Manabu (奥村学).
The sentiment scores are assigned to words through a very complicated automated process. If you want to know the exact theory behind it you can refer to the dissertation on it written by the developers (高村大也 Takamura Hiroya et al. 2006) but we will try to explain the very basics here. Please take into account that this lies far outside of our knowledge field.
The idea is that all words have a given distance from each other. They could be considered very close to each other in meaning (synonyms), very far from each other (antonyms), or anything in between. By taking the distance from each word to all other words into consideration a lexical network with words representing nodes is created. By then converting that model into a one dimensional model, with the two farthest nodes being respectively the most positive word (receiving 1 as its score) and the most negative word (receiving -1 as its score), all words can be given a score based on their position. To determine the distance of words the so-called Ising spin model is used. This model is originally used as a mathematical model for ferromagnetism physics but here it gets applied to language studies by comparing electrons with words. The positive or negative semantic orientation of words is compared to the up or down spin of electrons and the tendency of words to be defined by others with the tendency of closely situated electrons spin to go into the same direction. The actual principle behind this is very complex and we cannot go into further detail here but as mentioned earlier you can read the original research if you are interested. Our apologies for any possible mistakes in the above explanation.
Examples of research based on sentiment analysis
In this segment we will take a closer look at three research papers that use sentiment analysis.
An attempt to use sentiment analysis to extract the main stress factor in the environment surrounding the university out of social media (SNS 感情分析を用いた、大学を取り巻く環境でのストレス要因抽出の試み )
In this study dating from the end of 2022 sentiment analysis was used to extract the main stress factor in the environment surrounding the university out of social media.
Tweets concerning the environment surrounding the university of Kobe from twitter posted in the past 5 months were collected and, after using sentiment analysis to split them into negative and positive tweets, the negative tweets were used as target of analysis. Tweets that were deemed to be automatically sent were removed. Keywords were extracted and the ones that appeared 10 or more times were classified into categories: entrance exam, pursuit of knowledge, region, health, social conditions, club activities, part-time job, traffic, Twitter, human relations.
The research experienced the following troubles/limitations: as the target of analysis are tweets from twitter all data is restricted to twitter users. it is unclear whether this group can be taken to represent all students. Furthermore, it is uncertain whether all tweets that were analyzed are truly written by university students. Lastly it is also unclear to which level the extracted stress factors are a psychological burden on the students.
Despite these limitations the method seems very promising especially as it gave largely the same results as gained through preceding large-scale surveys and also succeeded in extracting some stress factors that newly appeared due to changing environment (such as the corona virus).
Sentiment Extraction of Characters in Stories Based on a Sentiment Lexicon (感情語辞書に基づく物語の登場人物の感情抽出)
In this study an emotion lexicon is created to study the emotional state of characters from a novel and how they interact with different story factors.
The novel that was analyzed is“the Beautiful World (キノの旅). A story where two friends travel together and visit different countries. Each chapter features a different country and could be read alone.
The emotion lexicon used was created by extracting the vocabulary from existing lexicons and manually adding labels to them. Originally 12 basic emotions were defined to be used as labels but a few of them got assigned to only a very small amount of words and thus the labels were reduced to 8 basic emotions by merging some with others. The eventual categorization was the following: 好(affection), 怖(fear), 驚/昂(surprise/excitement), 厭/恥/ 諦(discomfort/shame/giving up), 怒(anger), 喜(joy), 哀/無(sorrow/feeling of emptiness), 安(trust).
For each chapter emotion words were extracted for the 2 main characters separately and for all other characters who are treated as one entity.
Factor analysis was used to divide the chapters in groups based on which emotions were most present and define story factors that influence the emotional state of the characters. As a result the chapters were divided into 3 groups. After analyzing the contents of each group it could be observed that chapters from the same group shared similar themes. From this observation, the groups were given the following names. group 1: 皮肉な物語の因子(ironic story factors), group 2: 意外性の強い物語の因子(surprising story factors), group 3: 悲喜劇的な物語の因子(tragic story factors).
As the emotion lexicon for this research was created by one person manually adding the labels to words, it is in need for further testing. The fact that some labels had to be merged together because they only gave neglectable output could mean that the lexicon is still missing valuable words and needs to be expanded. Even so, the fact that the groups, created based on sentiment analysis using the emotion lexicon constructed for it, contained chapters featuring similar themes proves that the lexicon has potential.
Effects of MBCT-CP on Expression of Pain and Meditative Experiences (Mbct-Cpが痛みと瞑想経験の表現に与える影響)
In this study 5 participants living with chronic pain are asked to keep a diary in which they write about their experience with pain for a time span of 4 weeks. After those 4 weeks they received MBCT-CP (Mindfulness-Based Cognitive Therapy for Chronic Pain) for another 4 weeks during which the participants were asked to write about their experience with the meditation sessions and also continue describing their pain in their dairy . After that period participants continued to write about their experience with pain for an additional 4 weeks.
The diaries of the participants were analyzed and graphics were made to visualize the changes in the amount of positive and negative word usage. Through this the following things could be observed: by the end of the experiment positive words appeared more often than negative words. During the MBCT-CP treatment period however, negative words start to increase relatively sharply at first until they decrease again towards the end and then decrease even further during the last 4 weeks after the MBCT-CP treatment.
Although this is a very small scale research based on only 5 participants, it shows the potential of sentiment analysis to be used to evaluate the effects of therapy.
Conclusion
As shown through the examples above, sentiment analysis is very versatile and can be used for various purposes. As human emotions are very complex however there are still limitations to it but even so As shown through the aforementioned examples, by determining an appropriate research object and methodology, it can be used to aid research on subjective data. It can, and has been used for example, for extracting main stress factors amongst university students through an analysis of tweets on social media. Another example we discussed is a study done on the effects of MBCT-CP on expressions of pain, where they applied sentiment analysis on diaries of the participants. And so, while it does have its limitations in regards to the complexity of the human mind, looking at it from the other side it actually forms a way to bring structure in this complexity allowing for studies of bigger speed and quantity; and allowing for new approaches to the solving of problems born from society.
Sources
Cambria Erik, Dāsa Dīpaṅkara, Bandyopadhyay Sivaji, and Feraco Antonio. 2017. A Practical Guide to Sentiment Analysis. Chapter 5, Sentiment Resources: Lexicons and Datasets. Socio-Affective Computing 5. Springer International Publishing. https://doi.org/10.1007/978-3-319-55394-8.
安達由洋 Adachi Yoshihiro, 近藤友啓 Kondo Tomohiro, 小林孝充 Kobayashi Takamitsu, 惠谷菜央 Etani Nao, 石井解人 Ishii Kaito. 2021. 感情語辞書を用いた日本語文の感情分析 [Kanjougojisho wo mochiita nihongobun no kanjoubunseki] [Emotion Analysis of Japanese Sentences Using an Emotion-word Dictionary]. 可視化情報学会誌 41 (161): 21–27. https://doi.org/10.3154/jvs.41.161_21.
東北大学 乾・岡崎研究室. Tohoku University Inui-Okazaki Laboratory. 日本語評価極性辞書 [nihongohyoukakyokuseijisho] [Japanese sentiment polarity dictionary]. Accessed 16 May 2026. https://www.cl.ecei.tohoku.ac.jp/Open_Resources-Japanese_Sentiment_Polarity_Dictionary.html.
武内達哉 Takeuchi Tatsuya, 萩原将文 Hagiwara Masafumi. 2019. 単語の持つ感情推定法の提案と単語感情辞書の構築 [tango no motsu kanjousuiteihou no teian to tangokanjoujisho no kouchiku] [A proposal for a method to estimate the emotions words hold and its application to construct a word-emotion dictionary]. 日本感性工学会論文誌 18 (4): 273–78. https://doi.org/10.5057/jjske.TJSKE-D-18-00104.
市村真衣 Ichimura Mai, 久野雅樹 Hisano Masaki. 2023. 感情カテゴリを考慮した単語極性の推定 [kanjou kategori wo kouyoshita tangokyokusei no seitei] [The estimation of sentiment polarity taking emotion categories into account]. 言語処理学会 第29回年次大会 発表論文集. https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/P7-1.pdf
足立祥 Adachi Sho, 毛利健太朗 Mouri Kentarou, 山本泰司 Yasuji Yamamoto. 2023. Sns 感情分析を用いた、大学を取り巻く環境でのストレス要因抽出の試み [SNS kanjoubunseki wo mochiita, daigaku wo torimaku kankyou de no sutoresu geninchuushutsu no kokoromi] [An attempt to use sentiment analysis to extract the main stress factor in the environment surrounding the university out of social media]. 大学のメンタルヘルス 7: 76–78. https://doi.org/10.60198/jjcmh.7.0_76.
逢坂駿也 Osaka Shunya, 村井源 Murai Hajime. 2020. 感情語辞書に基づく物語の登場人物の感情抽出 [Kanjou-go jisho ni motodzuku monogatari no toujou jinbutsu no kanjou chuushutsu] [Emotion Extraction of Characters in Stories Based on Emotion-Word Dictionary]. 情報知識学会誌 30 (2): 283–88. https://doi.org/10.2964/jsik_2020_031.
長澤尚武 Nagasawa Naomu, 萩原将文 Hagiwara Masafumi. 2024. 文脈を考慮した単語の感情推定 [Bunmyaku o kouryo shita tango no kanjousuitei] [Context-sensitive Emotion Estimation for Words]. 日本感性工学会論文誌 23 (2): 87–96. https://doi.org/10.5057/jjske.TJSKE-D-23-00027.
阿部哲理 Abe Tetsuri, 牟田季純 Muta Toshimizu, 石川遥至 Ishikawa Haruyuki, 今城希望 Nozomi Imajo, 伊藤悦朗 Ito Etsuro, 越川房子 Koshikawa Fusako. 2022. Mbct-Cpが痛みと瞑想経験の表現に与える影響 [Mbct-Cp ga itami to meisoukeiken no hyougen ni ataeru eikyou] [Effects of MBCT-CP on Expression of Pain and Meditative Experiences]. 日本心理学会大会発表論文集 86: 3PM-033-PD. https://doi.org/10.4992/pacjpa.86.0_3PM-033-PD.
高村大也 Takamura Hiroya, 乾孝司 Inui Takashi, 奥村学 Okumura Manabu. 2006. スピンモデルによる単語の感情極性抽出 [supinmoderu ni yoru tango no kanjoukyokuseichuushutsu] [Extracting Semantic Orientations Using Spin Model]. 情報処理学会論文誌 47 (2): 627–37.