Use of English in Japanese pop songs due to the influence of globalization

To investigate whether there is an increase in the use of English in popular Japanese songs, which could potentially indicate the globalization of Japanese music, we scraped several annual charts and then examined the ratio of English to Japanese in the lyrics of these songs.

Billboard100 Japan

The first chart we scraped is the Billboard Japan Hot 100 which consists of the most streamed, most sold and most broadcasted songs in Japan. We started with the earliest available list from 2008 and then scraped each year up to 2023.

Import

from bs4 import BeautifulSoup
import requests
import pandas as pd
from urllib.parse import quote
import re
import csv

These are the modules we consulted in our code.

Function

def remove_escape_characters(input_string):
    return input_string.replace("\n", "").replace("\t", "").replace("\r", "")

We use this to remove escape characters from texts that we retrieve using BeautifulSoup.

def remove_special_symbols(string):
    # Define a regular expression pattern to match special symbols
    pattern = r'[^\w\sあ-オー]'
    # Use the sub() function to replace all matches with an empty string
    filtered_string = re.sub(pattern, '', string)
    return filtered_string

To obtain more reliable figures, we use this to remove all characters from the lyrics that are neither Japanese nor English.

Class

To store songs scraped from uta-net.com in an organized manner in Python and to work with them more easily throughout the code, we create a class that keeps the data organized per song throughout the code.

class Lied:

    def __init__(self, link):
        self.__link = link
        self.__titel = self.lied_titel_op_site()
        self.__artiest = self.lied_artiest_op_site()
        self.__reeks = 0
        self.__lyrics = self.lyrics_op_site()
        self.__verhouding = self.verhouding()

In this part of the code, you can see which data we store for each song. A song is created with a provided link to uta-net.com. For each song, we keep track of the following data: the link to uta-net.com, the song title, the artist, the series (the year the song appears on the hit list), the lyrics, and the ratio of Japanese characters to all characters.

    def set_reeks(self, jaar):
        self.__reeks = jaar

    def get_link(self):
        return self.__link

    def get_titel(self):
        return self.__titel

    def get_artiest(self):
        return self.__artiest

    def get_reeks(self):
        return self.__reeks

    def get_lyrics(self):
        return self.__lyrics

    def get_verhouding(self):
        return self.__verhouding

These are getters and setters, which serve to easily retrieve the data.

    def lied_titel_op_site(self):
        r = requests.get(self.__link)
        soup = BeautifulSoup(r.text, 'lxml')
        song_titel = soup.find("h2", class_="ms-2 ms-md-3").text
        return song_titel

This function retrieves the title of the song from uta-net.com.

    def lied_artiest_op_site(self):
        r = requests.get(self.__link)
        soup = BeautifulSoup(r.text, 'lxml')
        lied_artiest = soup.find('h3', class_='ms-2 ms-md-3').text
        if lied_artiest[0:1] == "\n":
            lied_artiest = lied_artiest[1:]
        return lied_artiest

This function retrieves the artist of the song from uta-net.com.

    def lyrics_op_site(self):
        r = requests.get(self.__link)
        soup = BeautifulSoup(r.text, 'lxml')
        lyric = soup.find('div', itemprop='text')
        lyric = lyric.text
        lyric = remove_special_symbols(lyric)
        lyric = lyric.replace(" ", "")
        lyric = lyric.replace(" ", "")
        return lyric

This function retrieves the lyrics of the song from uta-net.com. Then, the lyrics are converted to a format that is consistent for processing (special characters and spaces are removed).

    def verhouding(self):
        count_western = 0
        for char in self.__lyrics:
            if char.isalpha():
                if 'a' <= char <= 'z' or 'A' <= char <= 'Z':
                    count_western += 1
        pct = (len(self.__lyrics) - count_western) / len(self.__lyrics)
        return pct

This function calculates the ratio of Japanese text in a given song. It does this by separating Japanese characters from Western ones and then calculating the ratio of Japanese characters to the total number of characters in the song. This results in a percentage.

Executing the scraping.

def main():  
    # Create a list of the top 100 songs. from 2008 until 2023  
    startjaar = 2022  
    jaar = startjaar  
    liederen = []  
    with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:  
        writer = csv.writer(file)  
        writer.writerow(["JAAR", "ARTIEST", "TITEL", "GEVONDEN ARTIEST", "GEVONDEN TITEL", "PERCENTAGE", "LINK",  
                         "LYRICS"])  # Add header 

In the while loop below, the program scrapes the Billboard 100 Japan hit list for each year. It retrieves the artist name and song title and then stores them in an array. In a subsequent phase of the code, this data is used to search for the link to the song on uta-net.com.

    while jaar < 2024:  
        data = []  
        basis_url = 'https://www.billboard-japan.com/charts/detail?a=hot100_year&year='  
        jaar_url = basis_url + str(jaar)  
        top_100_songs_current = []  

        response = requests.get(jaar_url)  
        soup = BeautifulSoup(response.text, 'html.parser')  

        for row in soup.find_all('td', class_='name_td'):  
            song_title = row.find('p', class_='musuc_title').text  
            artist_name = row.find('p', class_='artist_name').text  
            song_title = remove_escape_characters(song_title)  
            song_title = song_title.strip()  
            top_100_songs_current.append((song_title, artist_name))  
        data.append(top_100_songs_current)  
        print("Hitlijst gemaakt")  

Above, the link to the site with the hit list is provided. This link is then scraped, and the information is stored in the data array. Some websites are consistent enough that the link can be constructed per year.

        with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:  
            writer = csv.writer(file)  
            writer.writerow(["GEM", ""])  
            writer.writerow([jaar, jaar, jaar, jaar, jaar, jaar, jaar, jaar])  # Add header  

        for sub_array in data:  
            for song_title, artist_name in sub_array:  
                print("next", song_title, artist_name, sep=" ")  
                search_url = 'https://search.yahoo.com/search;?p=' + quote(  
                    song_title + ' ' + artist_name + " \"uta-net\"")  
                # search_url = "https://search.yahoo.com/search;?p=ドライフラワー \"優里\" \"uta-net\""  
                print(search_url)  
                r = requests.get(search_url)  
                soup = BeautifulSoup(r.text, 'lxml')  
                first_result = soup.find('h3', class_='title')  
                # Extract the link from the first search result  

                try:  
                    first_result_link = first_result.find('a')['href']  
                    huidig_lied = Lied(first_result_link)  
                    huidig_lied.set_reeks(jaar)  
                    liederen.append(huidig_lied)  
                    data = [jaar, artist_name, song_title, liederen[-1].get_artiest(), liederen[-1].get_titel(),  
                            liederen[-1].get_verhouding(),  
                            liederen[-1].get_link(), liederen[-1].get_lyrics()]  

                    with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:  
                        writer = csv.writer(file)  
                        writer.writerow(data)  
                except (AttributeError, TypeError):  
                    try:  
                        print("retry")  
                        search_url = 'https://search.yahoo.com/search;?p=' + quote(  
                            song_title + ' \"' + artist_name + "\" uta-net.com")  
                        print(search_url)  
                        r = requests.get(search_url)  
                        soup = BeautifulSoup(r.text, 'lxml')  
                        all_results = soup.find_all('h3', class_='title')  
                        found = False  
                        for result in all_results:  
                            result_link = result.find('a')  
                            if result_link:  
                                result_link = result_link['href']  
                                print("Title:", result.text.strip())  
                                print("Link:", result_link)  
                                if result_link.startswith("https://www.uta-net.com/song/"):  
                                    first_result_link = result_link  
                                    found = True  
                                    print("successfully recovered on window of opportunity")  
                                    huidig_lied = Lied(first_result_link)  
                                    huidig_lied.set_reeks(jaar)  
                                    liederen.append(huidig_lied)  
                                    data = [jaar, artist_name, song_title, liederen[-1].get_artiest(),  
                                            liederen[-1].get_titel(),  
                                            liederen[-1].get_verhouding(),  
                                            liederen[-1].get_link(), liederen[-1].get_lyrics()]  
                                    with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:  
                                        writer = csv.writer(file)  
                                        writer.writerow(data)  
                                    break  
                        if not found:  
                            print("No link found for this result")  
                            print("second fail")  
                            data = [jaar, artist_name, song_title,  
                                    "NIETJAPANS",  
                                    "N.V.T", "N.V.T"]  
                            with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:  
                                writer = csv.writer(file)  
                                writer.writerow(data)  
                    finally:  
                        print("einde")  


main()

In the second phase of the code, the program uses the artist name and song title to search for a valid link on the website uta-net.com. This is done via the Yahoo! search engine, as it was the only search engine that allowed us to find search results. Initially, the song is searched for using a fixed combination of search terms, such as:

'https://search.yahoo.com/search;?p=' + quote(  
                            song_title + ' \"' + artist_name + "\" uta-net.com") 

If this search term fails, the program tries again with a modified combination of search terms and goes through all the results looking for a possible candidate. If it still fails in this case, we assume that the song is not Japanese and, therefore, not applicable to our research.

Uta-net.com

For the second dataset, we utilized a hitlist from Uta-net.com, the same website we use to retrieve song lyrics. This list includes the top 30 songs that were most searched for each year, ranging from 2008 to 2023. Due to differences in site construction compared to the Billboard Japan Hot 100, we had to adjust our code accordingly.

The differences


def main():  
    # Create a list of the top 100 songs. from 2008 until 2023  
    startjaar = 2023  
    jaar = startjaar  
    liederen = []  
    with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:  
        writer = csv.writer(file)  
        writer.writerow(["JAAR", "ARTIEST", "TITEL", "GEVONDEN ARTIEST", "GEVONDEN TITEL", "PERCENTAGE", "LINK",  
                         "LYRICS"])  # Add header  
    while jaar < 2024:  
        data = []  
        # doelformaat link tot en met 2017: https://www.uta-net.com/user/ranking/XXXXranking/XXXXranking2.html  
        # doelformaat link voor 2018 https://www.uta-net.com/user/ranking/XXXXranking/index.html        # doelformaat link vanaf 2018 tot 2023 https://www.uta-net.com/close_up/XXXX_ranking        stam_url = "https://www.uta-net.com/close_up/"  
        achtervoegsel_url = str(jaar) + "_ranking"  
        jaar_url = stam_url + achtervoegsel_url  
        top_songs_current = []  

The main difference here is that accessing uta-net.com required mandatory acceptance of the GDPR policy. This meant that when our program attempted to scrape the hit list, it was redirected to the GDPR policy page. We resolved this issue by using a Python library called Selenium WebDriver, which can interact with websites.

        # Set up Selenium WebDriver (this example uses Chrome)        options = Options()  
        options.headless = True  # Run in headless mode (no GUI), set to False to see the browser actions  
        service = Service('chromedriver.exe')  # Update with the path to your WebDriver  
        driver = webdriver.Chrome(service=service, options=options)  

        chromeDriverLocation = driver.service.path  

        print(chromeDriverLocation)  

        # URL of the GDPR page and the target page  
        target_url = jaar_url  

        # Open the GDPR page  
        driver.get(jaar_url)  
        # Wait a bit for the page to process the acceptance  
        time.sleep(3)  
        # Wait for the GDPR acceptance element to be visible and interact with it  
        # The specifics here depend on the actual implementation of the GDPR notice        try:  
            # Example: Find and click the GDPR accept button  
            accept_button = driver.find_element(By.CLASS_NAME, "fc-primary-button")  
            accept_button.click()  
        except Exception as e:  
            print("Could not find or click the GDPR accept button:", e)  
        time.sleep(3)  
        try:  
            # Example: Find and click the GDPR accept button  
            accept_button = driver.find_element(By.ID, "not-from-eu")  
            accept_button.click()  
        except Exception as e:  
            print("Could not find or click the GDPR accept button:", e)  

        # Wait a bit for the page to process the acceptance  
        time.sleep(3)  

        # Open the target page  
        driver.get(target_url)  
        soup = BeautifulSoup(driver.page_source, 'html.parser')  
  ```
  Another difference with previous iterations of our code was that the various websites containing the hit lists for each year had inconsistent formats. This required us to modify the format of our custom links three times to accommodate all the years.
  ```Python
        # table = soup.find('table', {'border': '0', 'cellpadding': '2', 'cellspacing': '2'})        # 2005 width = 502        table = soup.find('table', class_="song_ranking")  

        # Iterate through the table rows, skipping the header row  
        for row in table.find_all("tr")[1:]:  
            cells = row.find_all('td')  
            # moet 4 zijn tot 2018  
            if len(cells) == 3:  
                # tot en met 2018  
                # song_title = remove_escape_characters(cells[1].text.strip())                # artist = remove_special_symbols(cells[2].text.strip())                song_title = remove_escape_characters(cells[0].text.strip())  
                artist = remove_special_symbols(cells[1].text.strip())  
                top_songs_current.append((song_title, artist))  
        data.append(top_songs_current)  

        # Print the rankings array  
        for rank in data:  
            print(rank)  
        # alles is opgeslagen in de data lijst  

        with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:  
            writer = csv.writer(file)  
            writer.writerow(["GEM", ""])  
            writer.writerow([jaar, jaar, jaar, jaar, jaar, jaar, jaar, jaar])  # Add header  

        for sub_array in data:  
            for song_title, artist_name in sub_array:  
                print("next", song_title, artist_name, sep=" ")  
                search_url = 'https://search.yahoo.com/search;?p=' + quote(  
                    song_title + ' ' + artist_name + " 歌詞 - 歌ネット")  
                # search_url = "https://search.yahoo.com/search;?p=ドライフラワー \"優里\" \"uta-net\""  
                print(search_url)  
                r = requests.get(search_url)  
                soup = BeautifulSoup(r.text, 'lxml')  
                first_result = soup.find('h3', class_='title')  
                # Extract the link from the first search result  

Here, we slightly modified the search prompt for the Yahoo! search, which resulted in more hits and improved overall accuracy.

Oricon

For the third dataset, we used a hitlist provided by Oricon, which contains the top 30 songs by CD sales for each year from 1968 to 2010. Fortunately, we did not need to make any significant adjustments for this dataset, except for a few minor tweaks.

while jaar < 2011:  
    data = []  

    stam_url = "https://amigo.lovepop.jp/yearly/ranking.cgi?year="  
    achtervoegsel_url = str(jaar)  
    jaar_url = stam_url + achtervoegsel_url  
    top_songs_current = []  

    response = requests.get(jaar_url)  
    response.encoding = 'shift_jis'  

    soup = BeautifulSoup(response.text, 'html.parser')  

    table = soup.find('table', class_="ta2")

The website was encoded in Shift_Jis which we had to account for in order to obtain the correct data.

Results

Tableau link that shows all individual graphs

Conclusion

The analysis of the change over time in the proportion of Japanese text in J-Pop, based on data from Oricon, Billboard Top 100 Japan, and Uta-net, reveals several key trends:

Oricon (1968-2023)

  • Strong Early Preference (1968-1977): Japanese lyrics were dominant, with percentages consistently above 90%, indicating an era where Japanese music was primarily in the native language.
  • 1980s Decline: There is a significant drop in the proportion of Japanese lyrics during the 1980s, reaching a low of 73.87% in 1986. This decline may reflect the growing influence of Western music styles during this period.
  • 1990s Recovery: The 1990s show a resurgence in the use of Japanese lyrics, with percentages generally above 80% and peaking at 93.00% in 1993. This might suggest a cultural reaffirmation of Japanese language in music.
  • 21st Century Variability: From 2000 onwards, there is notable variability but with consistently high percentages of Japanese lyrics. Post-2008 data aligns closely with other sources, indicating a strong presence of Japanese lyrics.

Billboard Top 100 Japan (2002-2023)

  • High Japanese Lyric Content: Starting from 2002, the Billboard data consistently shows high percentages of Japanese lyrics, typically ranging from the high 80s to mid-90s. This reflects a continued preference for Japanese language in popular music despite increasing globalization.
  • Recent Strengthening: In recent years (2019-2023), there is a marked increase in the proportion of Japanese lyrics, with some of the highest percentages recorded, such as 98.81% in 2019 and 97.49% in 2020, indicating a strong reaffirmation of Japanese lyrics in contemporary popular music.

Uta-net (2008-2023)

  • High Japanese Lyric Percentage: The Uta-net data, starting from 2008, shows a high proportion of Japanese lyrics, often aligning with trends observed in the Billboard data. For example, in 2009, the percentage is 92.00%, closely matching Billboard’s 93.73%.
  • Search Trends: The data reflects the lyrics that users are actively searching for, indicating a strong interest in Japanese lyrics. Despite some variability, the overall trend maintains high percentages of Japanese lyrics.
  • Consistency Across Sources: Despite some fluctuations, all three data sources show a high proportion of Japanese lyrics in J-Pop over time, with the 21st century witnessing a reaffirmation of Japanese language in music.
  • Cultural Reaffirmation: The data indicates periods of cultural reaffirmation where the use of Japanese lyrics in popular music increases, particularly noticeable in the 1990s and the recent years.
  • Impact of Globalization: While the 1980s show a decline possibly due to Western influences, the overall trend suggests that Japanese lyrics have remained a strong and integral part of J-Pop, adapting and reaffirming cultural identity through different eras.