Bored of writing plugins, workflows, integrations and web pages and want to try something interesting? Try artificial intelligence.
It is so interesting and powerful that once you are into it you will never look back. Drones are in the air and driverless cars are being trialled. All such smart machines have one key requirement i.e. Visual Recognition.
Ability to understand what a frame contains – what is in that image, what is in the video?
It is quite fascinating to think about how can a program interpret an image?
If that is something you like then read on.
How a program understands an image
Images are matrices of pixel values, think of it as a 3D array where first dimension is the with of the image, second dimension is along the height and third dimension is the color channel i.e. RGB.
For the below image – An array value of =157 means the value of Red Channel of the pixel at 10th row and 5th column is 157
and its Green Channel value may be 34 i.e. =34
So at very basic level image interpretation is all about applying machine learning to these matrices
How to write a basic Image classifier
In this blog, I will highlight how can you write a very basic image classifier – that would not be state of the art but it can give you an understanding about the basics. There is a great source available that can help you train your image classifier. The CIFAR dataset gives you around 50K classified images in their matrix form that your program can train upon and additional 10K image that you can use to test the accuracy of your program. At the end of this blog I will leave you with the link to full source code a working classifier.
In the training phase you load all these images in an array and also store their category in an equivalent array e.g. let me show you some code
In the above code we are loading the CIFAR dataset and converting into two arrays. Array flattenedMatrix contains the image pixels and Array classNames contains what the image actually contains e.g. a boat, horse, car, etc.
So flattenedMatrix  will give us pixel values of the 400th example and classNames will give us its category e.g. a car
That way program can relate, what pixel values correspond to what objects and create patterns that it can match against during prediction.
This being a very simple classifier uses a simple prediction algorithm called kNN i.e. k Nearest Neighbour. Prediction occurs by finding the closest neighbour from the images the program already knows.
For example if k=5, then for an input image X the program finds 5 closest images whose pixel values are similar to X. Then the class of X is computed based on the majority vote e.g. if 3 of those images are of category horse, then X is also most likely to be a horse.
Below is some code that shows how this computation occurs
As outlined above if you need to try this yourselves, full source code is available on my Github page