How developers can move to the next level

Bored of writing  plugins, workflows, integrations and web pages and want to try something interesting? Try artificial intelligence.

It is so interesting and powerful that once you are into it you will never look back. Drones are in the air and driverless cars are being trialled. All such smart machines have one key requirement i.e. Visual Recognition.

Ability to understand what a frame contains – what is in that image, what is in the video?

It is quite fascinating to think about how can a program interpret an image?

If that is something you like then read on.


How a program understands an image

Images are matrices of pixel values, think of it as a 3D array where first dimension is the with of the image, second dimension is along the height and third dimension is the color channel i.e. RGB.

For the below image – An array value of [10][5][0]=157 means the value of Red Channel of the pixel at 10th row and 5th column is 157

and its Green Channel value may be 34 i.e. [10][5][1]=34




So at very basic level image interpretation is all about applying machine learning to these matrices


How to write a basic Image classifier

In this blog, I will highlight how can you write a very basic image classifier – that would not be state of the art but it can give you an understanding about the basics. There is a great source available that can help you train your image classifier. The CIFAR dataset gives you around 50K classified images in their matrix form that your program can train upon and additional 10K image that you can use to test the accuracy of your program. At the end of this blog I will leave you with the link to full source code a working classifier.


Training Phase

In the training phase you load all these images in an array and also store their category in an equivalent array e.g. let me show you some code

# Get the raw images.
rawImages = unpickledFile[b'data']
# Get the class-numbers for each image. Convert to numpy-array.
classNames = np.array(unpickledFile[b'labels'])
# Reshape 32 *32 * 3 (3D) vector into 3072 (1D) vector
flattenedMatrix = np.reshape(matrixImages, (self.NUM_EXAMPLES, self.NUMBER_OF_PIXELS * self.NUMBER_OF_PIXELS * self.TOTAL_CHANNELS))


In the above code we are loading the CIFAR dataset and converting into two arrays. Array flattenedMatrix contains the image pixels and Array classNames contains what the image actually contains e.g. a boat, horse, car, etc.

So flattenedMatrix [400] will give us pixel values of the 400th example and classNames[400] will give us its category e.g. a car

That way program can relate, what pixel values correspond to what objects and create patterns that it can match against during prediction.


This being a very simple classifier uses a simple prediction algorithm called kNN i.e. k Nearest Neighbour. Prediction occurs by finding the closest neighbour from the images the program already knows.

For example if k=5, then for an input image X the program finds 5 closest images whose pixel values are similar to X. Then the class of X is computed based on the majority vote e.g. if 3 of those images are of category horse, then X is also most likely to be a horse.

Below is some code that shows how this computation occurs

def Predict(self, testData, predictedImages=False):
# testData is the N X 3072 array where each row is 3072 D vector of pixel values between 0 and 1
totalTestRows = testData.shape[0]
# A vector where each element is zero with N rows where each row will be predicted class i.e. 0 to 9
Ypred = np.zeros(totalTestRows, dtype = self.trainingLabels.dtype)
Ipred = np.zeros_like(testData)

# Iterate for each row in the test set
for i in range(totalTestRows):
# It uses Numpy broadcasting. Below is what is happening
# testData[i,:] is test row of 3072 values
# self.trainingExamples - testData[i,:] gives you a difference matrix of size 50000 X 3072 where each element is the difference value
# np.sum() computes sums across the columns e.g. [ 2 4 9] sum is 15,
# distances is 50000 rows where each element is the distance (cummulative sum of all 3072 columns) from test record (i)
distances = np.sum(np.abs(self.trainingExamples - testData[i,:]), axis = 1)
#Partition by nearest K distances (smallest K)
nearest_K_distances= np.argpartition(distances, self.K)[:self.K]
#K matches
labels_K_matches= self.trainingLabels.take(nearest_K_distances)
# top matched label
Ypred[i] = best_label
# do we need to return predicted Image as well
best_label_arg= np.argwhere(labels_K_matches==best_label)
# store the match
Ipred[i] = self.trainingExamples[nearest_K_distances[best_label_arg[0][0]]]
return Ypred, Ipred


As outlined above if you need to try this yourselves, full source code is available on my Github page