With fast internet connection, video is growing fast as the multimedia entertainment platform. User is not only consume the video but also becoming the content creator. But not all user creating content with a good intent. Some of the user creating content for spamming.
Spam definition is very vary, so I will give my own definition of spam. Spam is the content for driving the other user to visit the site or page of the spammer.
I will share one of the simple way to detect spam using only:
First, lets create Prober. This class is for detecting video duration and getting the sample of the video frame.
Then, we will create helper function for converting the duration from seconds to HH:MM:SS format. And also create helper method for delete the sample picture that generated by the prober.
We create the ocr file, by using pyterrasect (the abstraction for terrasect) for detecting the character in picture. We also desaturate the picture first using opencv, because it can improve the alphabeth detection in a picture.
After that, we also create the spam text detector. Before we detect, we try to clean all non alphabetical words. And then detecting, is contains either banned words, link, or phone number.
And we integrate all in main.py
env python3 main.py -f spam.mkv
Created using python 3.5.2 in Linux Mint 18.3 Sylvia.