Introduction - Quantitative Research Documentation
For the quantitative section of our group’s research, our main objective was to find and visualise relevant data, in order to illustrate the international phenomenon of the rise in Japanese Popular culture.
To begin with we went looking for quantitative research sources or datasets regarding Japanese popular culture internationally.
Anime Conventions Worldwide
The first interesting source we found was animecons.com, a website that provides information about every convention around the world, starting from 1975 until 2026. We decided to only use information up until 2021, since the amount of Conventions listed in the years after that are still bound to change over time and are therefore not representative for our research. We worked together as a group to import the information from the site into a Google spreadsheet using the function ‘ImportHTML()’. 1
After that we split up the work; Tara continued with the main project of the conventions, while Luka and Anna went looking for other useful data or sources concerning international outlets of Japanese popular culture.
Process
As a group, we had imported the data from the website for each year (1975-2026) into a separate spreadsheet on Google Sheets.
However, when I wanted to download the data I realised that it could not be used (on programs such as Open Refine or Tableau Public), due to there being multiple sheets, and therefore multiple datasets. So I had to combine all the spreadsheets into one single sheet, which was cumbersome work. My options were to re-import each year consecutively in one sheet using the ‘ImportHTML()’ function again, or to copy the data from the other sheets. I couldn’t think of any other methods, and ended up using copy-paste. 1
Once the entire dataset was located on a single sheet, I could finally download it as a csv file. For some reason though, this resulted in the data being completely messed up. So I downloaded the file as an excel document, and finally managed to open the dataset with Tableau Public.
Immediately, I realised that the dataset was “messy” and therefore not usable in Tableau Public.
-
The dates were all in different formats
-
The locations / addresses were also in different formats
-
Year-numbers were in the names of the Conventions
-
The word “Cancelled” or “Postponed” could be found right next to the name of the Convention (On the website those events were crossed out, mostly due to Covid in 2019/2020)
(dataset before cleaning, Open Refine)
Openrefine
The next logical step therefore, was to clean up the dataset using Open Refine. General overview of which steps I took:
-
Trim leading & trailing whitespace
-
Separated ‘Country’ into a new column, from the ‘Location’ column.
-
This was a very arduous process (around 100 steps on Open Refine!) which involved using the filter:
split into several columns by separator ‘ , '
(making sure to keep the original ‘Location’ column for reference!)
-
then faceting the resulting (5) columns by text to remove non-country values, and meanwhile deleting non-country values in the other columns)
-
Eg. filter:
location5 / facet / text facet
Click on 1 option in left bar, such as ‘Canada’
For columns 1-4 containing other location data filter:
edit cells / common transforms / to empty string
Repeat for all choices in location5 facet, once done move on to facet location4
-
-
And finally, joining the columns back into 1 column named ‘Country’
-
Note: US based locations only had a state-code (eg. TX), so I had to replace it with ‘USA’ by using GREL, however for this I needed to ensure that there were no other countries’ state-codes, eg. Canada ‘ON’.
value.replace(/\b\w\w\b/,"USA")
-
-
Separated ‘Year’ into a new column, from the ‘Date’ column:
-
split column ‘Dates’ into several columns by separator ‘CET’ (remembering to keep the original column!) and of the 2 new columns, deleted the 1st one
-
repeated with separator ‘CEST’ (after deleting the unnecessary column, there were 2 new columns remaining next to ‘Dates’, only containing the year number)
-
repeated with separator ‘,’ (again deleted 1st new column)
For some reason this produced 3 new columns instead of 2, due to some cells containing the format: December 31, 2002 - January 1, 2003
However, it was easy to facet the column by name, select the outliers (there were only 14) and then filter:
edit cells / common transforms / to empty string
-
finally there were 4 columns containing year numbers next to the original ‘Dates’ column. I checked that there were no incorrect cell values using facet / by name.
-
then I combined the 4 columns using the filter:
edit columns / join columns
-
just in case I faceted the resulting column, and fixed the (2) blanks.
-
-
Removed the year from the ‘Name’ column using GREL function:
value.replace(/\s(\d\d\d\d)/,"")
-
Removed the words “Cancelled” and “Postponed” from the ‘Name’ column, and placed them in a separate column ‘Details’.
-
First I was trying to achieve this in a very convoluted way, but then I found a fast solution using GREL:
value.replace(“Cancelled”,”%Cancelled”)
then ‘split column by separator %’ (I had to find a unique symbol that wasn’t used in anywhere in the ‘Name’ column)
-
Repeat steps with ‘Postponed’
-
-
After this I also attempted to change the ‘Dates’ column to a single format, but this was proving to be difficult so I stopped.
I figured that it would be best to first go and see what I could already achieve in Tableau Public with the ‘Year’ column.
After exporting the project as a csv file, I opened it with Tableau Public. Right away I found a few mistakes in the data that I had not noticed when cleaning, so I reopened the Open Refine project.
-
I had missed some empty cells in the column ‘Country’ and filled in the correct data
-
Once again I attempted to fix the ‘Dates’ column:
value.replace("Sun ","") value.replace("Sat ","") value.replace(" 00:00:00 CET",",") value.replace(" 00:00:00 CEST",",")
-
This went quite well, and the only remaining problem was the cells containing multiple dates
-
Eg. July 28-29, 1979 -> Tableau Public would only keep the end-date
-
Eg. September 30 - October 02 2003 -> in this case Tableau Public didn’t recognise the cell value as a date, and assigned it ‘null’ instead
-
-
To solve this issue I wondered if i would have to make 2 columns for the dates instead of 1, for example ‘Start_Date’ and ‘End-Date’.
-
However, since this issue was taking up too much time I decided to leave it, and simply use the ‘Year’ column, reasoning that the exact dates would not necessarily improve the data visualisation on Tableau Public.
-
(dataset after cleaning, Open Refine)
Tableau Public
Back in Tableau Public, I started experimenting with different types of graphs that could represent the dataset in different aspects. Eg. Map, line-graph, area-chart, bar-chart, pie-chart, …
After a week I presented what I had done so far to the rest of my group, and received some more suggestions which I promptly tried out.
I had also wondered if it would be useful to add another column to the dataset with the different continents, but this was ultimately rejected by the group, since we already had a wide range of useful visuals.
Another suggestion was to put a disclaimer regarding the ‘Covid years’ since a lot of conventions were cancelled or postponed. This made me realise that I had completely forgotten about the dataset column ‘Details’, which contained the cancelled and postponed values. Luckily, I managed to integrate it into the dashboard, significantly improving the accuracy of the data.
Instead of writing out exactly what I did in Tableau Public, it would make more sense to illustrate via screenshots. This way readers can also try to replicate what I have done by looking at the Filters, Marks, etc.
Treemap & Pie-charts
Though the treemap is visually appealing, it wasn’t very useful to the research so I removed the sheet from Tableau Public, as I noticed that too many sheets caused the Interactive Dashboard to load more slowly.
In the case of the pie-chart, I had first filtered the countries by colour, as you can see below. However, the chart proved to be superfluous to the data already available in the dashboard.
So at the very end, as an update I changed the filter to ‘Details’ (which I renamed to ‘Status’ to better reflect the postponed/ cancelled information). I then also added the pie-chart to the dashboard, to better display the Status data.
Area-chart
A very useful graph containing the timeline, showing the exact same outline as the line-graph below, and if you highlight a certain country, it will change to only show that country’s evolution in number of conventions.
Line-graphs
This simple graph shows the same outline as in the area-chart above.
The data in this second line-graph is once again better visualised through the area chart, which is why it will also be removed from Tableau Public. When clicking on a country it will give the exact same outline as the area-chart would, but as a whole the data is not clear (as you can see below).
Bar-charts
The bar-chart proved to be an invaluable graph for the dashboard, acting as a filter for all the other visuals.
The stacked bar-chart on the other hand, was very confusing to read (the area-chart is a better visual replacement). So I removed this sheet from Tableau Public too.
Table
Last minute I also tried making a new sheet with a table of the Names (which I renamed to ‘Convention Names’) and their count. I tried adding it into the dashboard, but upon playing around with the filters I realised that there was still a lot of data-cleaning I had missed in that column.
Eg. Belgium's “FACTS’ also written as ‘F.A.C.T.S’
for which I should have used 'Cluster & edit'.
After cleaning the ‘Convention Name’ column in Openrefine some more, I updated the data source in Tableau Public and added the table into the dashboard, along with a search-bar.
Dashboard
They say that less is more, and that also stands true for this project. Rather than cramming all the different charts and graphs into the dashboard, I tried to curate a selection which would interact nicely together, and help one gain a broad understanding of the subject material.
The intention was to create visual representations that differed from those already available on the website, such as a map. However, in the end we included a map in the Dashboard after all, as it helps to better visualise the other graphs and charts. 2
The new version of dashboard includes the revised pie-chart, and the colour legend for ‘Country’ has also been removed, as that information is already available in the bar-chart.
Final version of dashboard, containing the table with Convention Names and their Count.
Link to the Interactive Dashboard
A Visual Overview of Anime Conventions Worldwide
Manga licensed in English
While searching for other useful data, we came across this list of manga which were licensed in English. Even though there is no licensed-date, most mangas do show in which region they were licensed. Here you can see which regions have the most licensed manga. 3
A problem was that not all of the manga have a region listed, and there were also manga which had more than one region.
First draft of the graph
no data of dates
Newest version of the graph
Process
Make Google Spreadsheet
- Import tables from Wikipedia with functions =importhtml()
=importhtml("https://en.wikipedia.org/wiki/List_of_manga_licensed_in_English","table",1) =importhtml("[https://en.wikipedia.org/wiki/List_of_manga_licensed_in_English","table",n+1](https://en.wikipedia.org/wiki/List_of_manga_licensed_in_English%22,%22table%22,n+1))
Download in excel (.xlsx) then generate in Openrefine
-
Clean data
-
Search for blanks and useless cells and information
-
star these rows and filter them out
-
Text transform column Title GREL:
value.replace(‘*’,’’)
-
Create new columns based on regions GREL:
value.contains(“(region code)”)
-
Replace ‘false’ with blank and replace ‘true’ with number 1 to create easy common value:
GREL: value.replace(“false”,””) GREL: value.replace(“true”,”1”)
Download in excel (.xlsx) then open in Tableau Public
-
Use data interpreter
-
Add sheet1
-
Use ‘Measure Values’ and ‘Measure Names’ to create horizontal bars graph
-
Change color
-
-
Add sheet2
- Use ‘Count’ of the List to create text table
-
Make Dashboard1
-
Add sheet1 and sheet2
-
Rearrange
-
Change titles/Axis titles
-
Add Text
-
Save to Tableau Public, remake in Tableau Public
- Change color
Save to Tableau Public
Anime distributed in USA
Another useful website we found was where they listed which anime were distributed in the USA per 10 years. Based on the amount of the distributed anime you can see a growth. A problem here was that the tables and lists differed from format and made making a clear list less easy. 4
First draft of the graph
- Only dates of the first 20 years
- This graph only shows anime distributed in the USA
Newest version of the graph
Process
Make Google Spreadsheet
-
Import tables from Wikipedia with functions =importhtml()
=importhtml("https://en.wikipedia.org/wiki/List_of_anime_distributed_in_the_United_States","table",6) =importhtml("https://en.wikipedia.org/wiki/List_of_anime_distributed_in_the_United_States","table",7) =importhtml("https://en.wikipedia.org/wiki/List_of_anime_distributed_in_the_United_States","table",8)
-
Import lists from Wikipedia with functions =importhtml()
=importhtml("https://en.wikipedia.org/wiki/List_of_anime_distributed_in_the_United_States","list",2) =importhtml("[https://en.wikipedia.org/wiki/List_of_anime_distributed_in_the_United_States","list",n+1](https://en.wikipedia.org/wiki/List_of_anime_distributed_in_the_United_States%22,%22list%22,n+1)) x3
-
Add years in which they were distributed in new column and copy to the other rows
-
Make changes in order/clean title
Download in excel (.xlsx)
- Delete useless title rows
Generate in Openrefine
-
Clean data
-
Search for blanks and useless cells and information
-
star these rows and filter them out
-
Move columns for better order
-
Add data which accidentally wasn’t added
-
Text transform column Title GREL:
value.replace(‘*’,’’)
-
Rename columns
-
Download and open in excel (.xlsx)
- Rearrange columns
Save in excel (.xlsx), then open in Tableau Public
-
Add sheet1
-
Use ‘Count’ of Anime and distributed ‘Years of release’ distributed to create horizontal bars graph
-
Add color
-
-
Add sheet2
- Use ‘Count’ to create text table
-
Make Dashboard1
-
Add sheet1 and sheet2
-
Rearrange
-
Change titles/Axis titles
-
Save to Tableau Public then remake in Tableau Public
-
Add Changing colors to sheet1
-
Add sheet3
-
Use ‘Years of release’ to create text table
-
Add changing colors
-
-
Make changes to Dashboard1
-
Add sheet3
-
Rearrange
-
Change titles/Axis titles
-
Save to Tableau Public
Japanese movies
Whilst the other two were busy with the convention site data and manga research, I went looking for some more quantitative sources. We wanted to look for information or datasets regarding Japanese style gardens in other countries, or Japanese movies exhibited internationally.
Even after searching the internet for a while I couldn’t find anything regarding the Japanese gardens (only about Japanese garden tools). However, I did find a data site that provided research from UNESCO that covered the “Japan total number of all foreign feature films exhibited”. 5
Table
The column “value” shows the total number of exhibited Japanese movies overseas. Putting the numbers and years together in a line-graph, we get this:
Line-graph
The problem I encountered was that only the total numbers per year were available for free, if I wanted to see the rest of the research I’d have to pay a subscription fee to the data site.
After that I went looking for some sort of site or dataset we might have been able to use for the manga dataset, but couldn’t find anything in the end.
Sources
-
AnimeCons.com. “AnimeCons.Com - Anime Conventions.” Accessed May 10, 2021. https://animecons.com/. ↩↩
-
AnimeCons.com. “Anime Convention Map.” Accessed May 10, 2021. https://animecons.com/events/map/. ↩
-
“List of Manga Licensed in English.” In Wikipedia, March 18, 2021. https://en.wikipedia.org/w/index.php?title=List_of_manga_licensed_in_English&oldid=1012760288 ↩
-
“List of Anime Distributed in the United States.” In Wikipedia, April 24, 2021. https://en.wikipedia.org/w/index.php?title=List_of_anime_distributed_in_the_United_States&oldid=1019706677. ↩
-
Knoema. “UNESCO: Culture Statistics Data - Knoema.Com.” Accessed April 26, 2021. https://knoema.com//CEMP_DS_CUL_DS_CTRD_DS/unesco-culture-statistics-data?tsId=1251070. ↩