Github repo here: jet-ocr-discord-bot
Earlier this year and the year before on my free time, I have taken an interest in Japanese culture through various forms of entertainment; whether they be variety shows, games, books and the likes, I have become quite fascinated with the culture as a whole. Wanting to learn more, I decided to take a dive into learning the language (although quite slow at that).
A Slight... Problem
However, there is a slight problem: when trying to study individual kanji or characters that appear within media, I cannot exactly copy and paste what I see if I want to explicitly translate it for my own understanding nor can I just write it down in any order I guessed it to be (stroke order is important to the language).
OCR to the Rescue?!
Pondering upon this problem, I then remembered something that I came across some time ago while surfing the internet: Optical Character Recognition (OCR)!
OCR is the process of electronically converting any form of printed text from images into digital text. It generally involves a combination of machine learning, computer vision, and pattern recognition in other to identify the printed text and correctly transcribe them as digital text.
With this in mind, OCR would definitely the right technology to use in order to solve this problem!
Why Discord?
The reason why I chose to make a Discord bot is quite simple:
if I wanted to make something that would come in handy for me, I should be able to make easily accessible for my friends as well
For the most part, I use Discord quite often when talking with my close friends and they too are interested in Japanese culture as much as I am so it was simply a matter of killing two birds with one stone.
For my bot, I have named the bot Jet (named after a friend) but you can name your bot whatever you like!
Development Process
With that being all said and done, let's get Pythonic!
Note: this is not a tutorial but a reflection on my overall development process after spending a couple of days that resulted in some sillyness.
Functionality Expectations
In the most simplest form, I wanted a bot where any user within a server can send it an image as an attachment and once it receives it, the bot returns the text that it finds within the image. With this in mind, here are some potential user stories I came up with:
- As a user within the server, I want to be able to call the bot in order to interact with it
- As a user within the server, I want to be able to call a certain command to see the list of all available options it can do
- As a user within the server, I want to be able to call a command alongside an image attachment, and expect to get text returned back to me
These user stories help form a baseline of what I should code the bot to do as well as create expectations for other friends within the server alongside myself.
Project Stack
For this project, I'll be combining the following to create the bot:
- Python (Ver. 3.8.10)
- Discord.py
- Python-tesseract (Pytesseract)
Python is one of those languages where no matter how short or long it has been the last time you interacted with it, it is a language that is both convenient and fun. It's flexibility is what made me choose it for this project.
Discord.py is a modern python wrapper that allows a user to call various Discord methods utilizing it's feature-rich async API. This is where bot features such as reading attachments and sending messages in Discord are located.
The meat of the project comes from Pytesseract, a python wrapper for Google's Tesseract OCR engine. When the bot receives the image, a couple of functions are called, which are provided by Pytesseract:
image_to_osd()
: this function will be used to help learn more about the image, such as the detected script languageimage_to_string()
: this function will be used in order to extract text from the received image, as unmodified string
How it works
- Jet will actively listen to the
$text
command. When this command is used, Jet will grab the image attachment alongside this text command.
- Once Jet has successfully fetched the image attachment, Jet will reply with a sentence stating that the image has been successfully fetched and the image is passed to Pytesseract where
image_to_osd()
is called to fetch information about the image (mainly the language detected). Afterwards,image_to_string()
will be called next with the correct detected language to start extracting the text from the image.
- When Jet has finished extracting the text, Jet will then edit his previous message and change it into the extracted text!
Hosting
At the time of this writing, I am currently hosting my bot on a Raspberry Pi 4. For small-scale hobby projects like these where RAM And CPU consumption are not a concern, they're the perfect partner to rely on for the job.
This allows me to keep the bot up and running 24/7 hours without the need of keeping my main workstation running.
Results!
After putting the bot altogether, I have put Jet to the test by inviting the bot into a server that I am in with my friends; after all, the best way to actually get results would to get my friends to try it too!
For the most part, Jet will give surprisingly accurate results.
On the other hand, Jet will put out some hilarious results.
These results goes to show that there are definitely room for improvement. For the most part, the results that Jet outputs has been quite sufficient for my own needs but if I wanted to make the output more accurate, I would definitely additional image preprocessing via OpenCV as all images that are sent to Jet are not preprocessed.
Final Thoughts
Overall, this entire project has been quite fun. At the end of the day, Jet has produced has given me sufficient results and that's fine by me.
Sure, it's not exactly perfect by any means but for my own use case and my friends, I would say the produced results are fine enough as it is.
it's always hilarious when Jet outputs the wildest results so that's a plus for me