Use of English in Japanese pop songs due to the influence of globalization
To investigate whether there is an increase in the use of English in popular Japanese songs, which could potentially indicate the globalization of Japanese music, we scraped several annual charts and then examined the ratio of English to Japanese in the lyrics of these songs.
Billboard100 Japan
The first chart we scraped is the Billboard Japan Hot 100 which consists of the most streamed, most sold and most broadcasted songs in Japan. We started with the earliest available list from 2008 and then scraped each year up to 2023.
Import
from bs4 import BeautifulSoup
import requests
import pandas as pd
from urllib.parse import quote
import re
import csv
These are the modules we consulted in our code.
Function
def remove_escape_characters(input_string):
return input_string.replace("\n", "").replace("\t", "").replace("\r", "")
We use this to remove escape characters from texts that we retrieve using BeautifulSoup.
def remove_special_symbols(string):
# Define a regular expression pattern to match special symbols
pattern = r'[^\w\sあ-オー]'
# Use the sub() function to replace all matches with an empty string
filtered_string = re.sub(pattern, '', string)
return filtered_string
To obtain more reliable figures, we use this to remove all characters from the lyrics that are neither Japanese nor English.
Class
To store songs scraped from uta-net.com in an organized manner in Python and to work with them more easily throughout the code, we create a class that keeps the data organized per song throughout the code.
class Lied:
def __init__(self, link):
self.__link = link
self.__titel = self.lied_titel_op_site()
self.__artiest = self.lied_artiest_op_site()
self.__reeks = 0
self.__lyrics = self.lyrics_op_site()
self.__verhouding = self.verhouding()
In this part of the code, you can see which data we store for each song. A song is created with a provided link to uta-net.com. For each song, we keep track of the following data: the link to uta-net.com, the song title, the artist, the series (the year the song appears on the hit list), the lyrics, and the ratio of Japanese characters to all characters.
def set_reeks(self, jaar):
self.__reeks = jaar
def get_link(self):
return self.__link
def get_titel(self):
return self.__titel
def get_artiest(self):
return self.__artiest
def get_reeks(self):
return self.__reeks
def get_lyrics(self):
return self.__lyrics
def get_verhouding(self):
return self.__verhouding
These are getters and setters, which serve to easily retrieve the data.
def lied_titel_op_site(self):
r = requests.get(self.__link)
soup = BeautifulSoup(r.text, 'lxml')
song_titel = soup.find("h2", class_="ms-2 ms-md-3").text
return song_titel
This function retrieves the title of the song from uta-net.com.
def lied_artiest_op_site(self):
r = requests.get(self.__link)
soup = BeautifulSoup(r.text, 'lxml')
lied_artiest = soup.find('h3', class_='ms-2 ms-md-3').text
if lied_artiest[0:1] == "\n":
lied_artiest = lied_artiest[1:]
return lied_artiest
This function retrieves the artist of the song from uta-net.com.
def lyrics_op_site(self):
r = requests.get(self.__link)
soup = BeautifulSoup(r.text, 'lxml')
lyric = soup.find('div', itemprop='text')
lyric = lyric.text
lyric = remove_special_symbols(lyric)
lyric = lyric.replace(" ", "")
lyric = lyric.replace(" ", "")
return lyric
This function retrieves the lyrics of the song from uta-net.com. Then, the lyrics are converted to a format that is consistent for processing (special characters and spaces are removed).
def verhouding(self):
count_western = 0
for char in self.__lyrics:
if char.isalpha():
if 'a' <= char <= 'z' or 'A' <= char <= 'Z':
count_western += 1
pct = (len(self.__lyrics) - count_western) / len(self.__lyrics)
return pct
This function calculates the ratio of Japanese text in a given song. It does this by separating Japanese characters from Western ones and then calculating the ratio of Japanese characters to the total number of characters in the song. This results in a percentage.
Executing the scraping.
def main():
# Create a list of the top 100 songs. from 2008 until 2023
startjaar = 2022
jaar = startjaar
liederen = []
with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["JAAR", "ARTIEST", "TITEL", "GEVONDEN ARTIEST", "GEVONDEN TITEL", "PERCENTAGE", "LINK",
"LYRICS"]) # Add header
In the while loop below, the program scrapes the Billboard 100 Japan hit list for each year. It retrieves the artist name and song title and then stores them in an array. In a subsequent phase of the code, this data is used to search for the link to the song on uta-net.com.
while jaar < 2024:
data = []
basis_url = 'https://www.billboard-japan.com/charts/detail?a=hot100_year&year='
jaar_url = basis_url + str(jaar)
top_100_songs_current = []
response = requests.get(jaar_url)
soup = BeautifulSoup(response.text, 'html.parser')
for row in soup.find_all('td', class_='name_td'):
song_title = row.find('p', class_='musuc_title').text
artist_name = row.find('p', class_='artist_name').text
song_title = remove_escape_characters(song_title)
song_title = song_title.strip()
top_100_songs_current.append((song_title, artist_name))
data.append(top_100_songs_current)
print("Hitlijst gemaakt")
Above, the link to the site with the hit list is provided. This link is then scraped, and the information is stored in the data array. Some websites are consistent enough that the link can be constructed per year.
with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["GEM", ""])
writer.writerow([jaar, jaar, jaar, jaar, jaar, jaar, jaar, jaar]) # Add header
for sub_array in data:
for song_title, artist_name in sub_array:
print("next", song_title, artist_name, sep=" ")
search_url = 'https://search.yahoo.com/search;?p=' + quote(
song_title + ' ' + artist_name + " \"uta-net\"")
# search_url = "https://search.yahoo.com/search;?p=ドライフラワー \"優里\" \"uta-net\""
print(search_url)
r = requests.get(search_url)
soup = BeautifulSoup(r.text, 'lxml')
first_result = soup.find('h3', class_='title')
# Extract the link from the first search result
try:
first_result_link = first_result.find('a')['href']
huidig_lied = Lied(first_result_link)
huidig_lied.set_reeks(jaar)
liederen.append(huidig_lied)
data = [jaar, artist_name, song_title, liederen[-1].get_artiest(), liederen[-1].get_titel(),
liederen[-1].get_verhouding(),
liederen[-1].get_link(), liederen[-1].get_lyrics()]
with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(data)
except (AttributeError, TypeError):
try:
print("retry")
search_url = 'https://search.yahoo.com/search;?p=' + quote(
song_title + ' \"' + artist_name + "\" uta-net.com")
print(search_url)
r = requests.get(search_url)
soup = BeautifulSoup(r.text, 'lxml')
all_results = soup.find_all('h3', class_='title')
found = False
for result in all_results:
result_link = result.find('a')
if result_link:
result_link = result_link['href']
print("Title:", result.text.strip())
print("Link:", result_link)
if result_link.startswith("https://www.uta-net.com/song/"):
first_result_link = result_link
found = True
print("successfully recovered on window of opportunity")
huidig_lied = Lied(first_result_link)
huidig_lied.set_reeks(jaar)
liederen.append(huidig_lied)
data = [jaar, artist_name, song_title, liederen[-1].get_artiest(),
liederen[-1].get_titel(),
liederen[-1].get_verhouding(),
liederen[-1].get_link(), liederen[-1].get_lyrics()]
with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(data)
break
if not found:
print("No link found for this result")
print("second fail")
data = [jaar, artist_name, song_title,
"NIETJAPANS",
"N.V.T", "N.V.T"]
with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(data)
finally:
print("einde")
main()
In the second phase of the code, the program uses the artist name and song title to search for a valid link on the website uta-net.com. This is done via the Yahoo! search engine, as it was the only search engine that allowed us to find search results. Initially, the song is searched for using a fixed combination of search terms, such as:
'https://search.yahoo.com/search;?p=' + quote(
song_title + ' \"' + artist_name + "\" uta-net.com")
If this search term fails, the program tries again with a modified combination of search terms and goes through all the results looking for a possible candidate. If it still fails in this case, we assume that the song is not Japanese and, therefore, not applicable to our research.
Uta-net.com
For the second dataset, we utilized a hitlist from Uta-net.com, the same website we use to retrieve song lyrics. This list includes the top 30 songs that were most searched for each year, ranging from 2008 to 2023. Due to differences in site construction compared to the Billboard Japan Hot 100, we had to adjust our code accordingly.
The differences
def main():
# Create a list of the top 100 songs. from 2008 until 2023
startjaar = 2023
jaar = startjaar
liederen = []
with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["JAAR", "ARTIEST", "TITEL", "GEVONDEN ARTIEST", "GEVONDEN TITEL", "PERCENTAGE", "LINK",
"LYRICS"]) # Add header
while jaar < 2024:
data = []
# doelformaat link tot en met 2017: https://www.uta-net.com/user/ranking/XXXXranking/XXXXranking2.html
# doelformaat link voor 2018 https://www.uta-net.com/user/ranking/XXXXranking/index.html # doelformaat link vanaf 2018 tot 2023 https://www.uta-net.com/close_up/XXXX_ranking stam_url = "https://www.uta-net.com/close_up/"
achtervoegsel_url = str(jaar) + "_ranking"
jaar_url = stam_url + achtervoegsel_url
top_songs_current = []
The main difference here is that accessing uta-net.com required mandatory acceptance of the GDPR policy. This meant that when our program attempted to scrape the hit list, it was redirected to the GDPR policy page. We resolved this issue by using a Python library called Selenium WebDriver, which can interact with websites.
# Set up Selenium WebDriver (this example uses Chrome) options = Options()
options.headless = True # Run in headless mode (no GUI), set to False to see the browser actions
service = Service('chromedriver.exe') # Update with the path to your WebDriver
driver = webdriver.Chrome(service=service, options=options)
chromeDriverLocation = driver.service.path
print(chromeDriverLocation)
# URL of the GDPR page and the target page
target_url = jaar_url
# Open the GDPR page
driver.get(jaar_url)
# Wait a bit for the page to process the acceptance
time.sleep(3)
# Wait for the GDPR acceptance element to be visible and interact with it
# The specifics here depend on the actual implementation of the GDPR notice try:
# Example: Find and click the GDPR accept button
accept_button = driver.find_element(By.CLASS_NAME, "fc-primary-button")
accept_button.click()
except Exception as e:
print("Could not find or click the GDPR accept button:", e)
time.sleep(3)
try:
# Example: Find and click the GDPR accept button
accept_button = driver.find_element(By.ID, "not-from-eu")
accept_button.click()
except Exception as e:
print("Could not find or click the GDPR accept button:", e)
# Wait a bit for the page to process the acceptance
time.sleep(3)
# Open the target page
driver.get(target_url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
```
Another difference with previous iterations of our code was that the various websites containing the hit lists for each year had inconsistent formats. This required us to modify the format of our custom links three times to accommodate all the years.
```Python
# table = soup.find('table', {'border': '0', 'cellpadding': '2', 'cellspacing': '2'}) # 2005 width = 502 table = soup.find('table', class_="song_ranking")
# Iterate through the table rows, skipping the header row
for row in table.find_all("tr")[1:]:
cells = row.find_all('td')
# moet 4 zijn tot 2018
if len(cells) == 3:
# tot en met 2018
# song_title = remove_escape_characters(cells[1].text.strip()) # artist = remove_special_symbols(cells[2].text.strip()) song_title = remove_escape_characters(cells[0].text.strip())
artist = remove_special_symbols(cells[1].text.strip())
top_songs_current.append((song_title, artist))
data.append(top_songs_current)
# Print the rankings array
for rank in data:
print(rank)
# alles is opgeslagen in de data lijst
with open("output_dict.csv", mode='a', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(["GEM", ""])
writer.writerow([jaar, jaar, jaar, jaar, jaar, jaar, jaar, jaar]) # Add header
for sub_array in data:
for song_title, artist_name in sub_array:
print("next", song_title, artist_name, sep=" ")
search_url = 'https://search.yahoo.com/search;?p=' + quote(
song_title + ' ' + artist_name + " 歌詞 - 歌ネット")
# search_url = "https://search.yahoo.com/search;?p=ドライフラワー \"優里\" \"uta-net\""
print(search_url)
r = requests.get(search_url)
soup = BeautifulSoup(r.text, 'lxml')
first_result = soup.find('h3', class_='title')
# Extract the link from the first search result
Here, we slightly modified the search prompt for the Yahoo! search, which resulted in more hits and improved overall accuracy.
Oricon
For the third dataset, we used a hitlist provided by Oricon, which contains the top 30 songs by CD sales for each year from 1968 to 2010. Fortunately, we did not need to make any significant adjustments for this dataset, except for a few minor tweaks.
while jaar < 2011:
data = []
stam_url = "https://amigo.lovepop.jp/yearly/ranking.cgi?year="
achtervoegsel_url = str(jaar)
jaar_url = stam_url + achtervoegsel_url
top_songs_current = []
response = requests.get(jaar_url)
response.encoding = 'shift_jis'
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', class_="ta2")
The website was encoded in Shift_Jis which we had to account for in order to obtain the correct data.
Results
Tableau link that shows all individual graphs
Conclusion
The analysis of the change over time in the proportion of Japanese text in J-Pop, based on data from Oricon, Billboard Top 100 Japan, and Uta-net, reveals several key trends:
Oricon (1968-2023)
- Strong Early Preference (1968-1977): Japanese lyrics were dominant, with percentages consistently above 90%, indicating an era where Japanese music was primarily in the native language.
- 1980s Decline: There is a significant drop in the proportion of Japanese lyrics during the 1980s, reaching a low of 73.87% in 1986. This decline may reflect the growing influence of Western music styles during this period.
- 1990s Recovery: The 1990s show a resurgence in the use of Japanese lyrics, with percentages generally above 80% and peaking at 93.00% in 1993. This might suggest a cultural reaffirmation of Japanese language in music.
- 21st Century Variability: From 2000 onwards, there is notable variability but with consistently high percentages of Japanese lyrics. Post-2008 data aligns closely with other sources, indicating a strong presence of Japanese lyrics.
Billboard Top 100 Japan (2002-2023)
- High Japanese Lyric Content: Starting from 2002, the Billboard data consistently shows high percentages of Japanese lyrics, typically ranging from the high 80s to mid-90s. This reflects a continued preference for Japanese language in popular music despite increasing globalization.
- Recent Strengthening: In recent years (2019-2023), there is a marked increase in the proportion of Japanese lyrics, with some of the highest percentages recorded, such as 98.81% in 2019 and 97.49% in 2020, indicating a strong reaffirmation of Japanese lyrics in contemporary popular music.
Uta-net (2008-2023)
- High Japanese Lyric Percentage: The Uta-net data, starting from 2008, shows a high proportion of Japanese lyrics, often aligning with trends observed in the Billboard data. For example, in 2009, the percentage is 92.00%, closely matching Billboard’s 93.73%.
- Search Trends: The data reflects the lyrics that users are actively searching for, indicating a strong interest in Japanese lyrics. Despite some variability, the overall trend maintains high percentages of Japanese lyrics.
Overall Trends
- Consistency Across Sources: Despite some fluctuations, all three data sources show a high proportion of Japanese lyrics in J-Pop over time, with the 21st century witnessing a reaffirmation of Japanese language in music.
- Cultural Reaffirmation: The data indicates periods of cultural reaffirmation where the use of Japanese lyrics in popular music increases, particularly noticeable in the 1990s and the recent years.
- Impact of Globalization: While the 1980s show a decline possibly due to Western influences, the overall trend suggests that Japanese lyrics have remained a strong and integral part of J-Pop, adapting and reaffirming cultural identity through different eras.