Our Conclusion

To conclude our OCR research on dialects in manga, we can state the following. There’s isn’t, in fact, a whole lot of manga in which dialects are used. Manga is a form of recreation that should be available to any and all people, and writing something entirely in a dialect that isn’t widely used throughout Japan would be a veritable way to ensure the opposite of that. There’s a world of difference between, for example, Okinawan and Standard Japanese, so assuming not everyone outside of Okinawa understanding it is a given. Also to be taken into account is that, while we call it ‘a’ dialect, it is often a group of variating regional dialects just called under one name. This means that even if there would be a lot of speakers of “one dialect” there would still be the possibility of nuances being misunderstood because of those variants. This leads us to our second observation, which is that when dialects are used, it is often limited to one, maybe two characters in a series, that are often not the main character. They usually embody a supporting role, that sometimes is just there to be comic relief though their dialect. There are other cases of course, as seen in Lovely Complex and Barakamon, that are entirely in a dialect, but those manga have a list at the back of each volume explaining certain intricacies of the dialect in question. So, while there are instances of entire mangas in one dialect, the general rule of thumb is that it is used in small doses, as to ensure everyone can still understand, and as a result enjoy a manga, without needing extensive knowledge about one of Japan’s dialects.

Technical Conclusion

The tools we used all gave us a different experience and different results. Though they provided us with means to OCR the text out of a manga (some better than other tools), they were by no means error free. These tests have learned us a few things on the technical aspect of OCRing manga. They are inefficient and are as, if not more, time consuming as typing the text. Although some provide a nice means to automate parts of the steps, manual action is still required. Be it correcting the OCR'ed text or selecting the text yourself. Tesseract, for example, OCR'ed only parts of the text in the manga, but also ended up finding non-existing text in the drawings. Whereas Capture2Text provided us with a much better result, but requires manual selection. OCRing manga is not an easy task, and we have yet to come across an open-source tool that can provide us with a full automation.