Posted in Computer Vision, Dynamics 365

Can Dynamics CRM understand images? Yes! Using deep learning.

Machine Learning is quite a buzzword these days and we have witnessed how quickly Microsoft and other vendors have made progress in this area. Just couple of years back Microsoft had no product or tool in this space and today they have closer to a dozen. Recently Microsoft has integrated Machine Learning into SQL Server and Dynamics CRM, it is slowly becoming core to its product line.

I would not be surprised if machine learning becomes a mandatory skill for most of the development jobs in the next decade.

How Image Recognition can help CRM?

Attaching documents is a common feature asked for in many CRM projects where customers can complete an application form and then upload scanned copies to support their application. Think of invoices, receipts, certificates, licenses, etc. As of now there is no way that Dynamics CRM can detect if the scan that a customer is uploading is a picture of a license, or beach or a car.

What if Dynamics CRM can detect and recognise the scanned image and tell the user that it is expecting a license not a Dilbert on the beach.


Source: Ol.v!er [H2vPk] – Flickr

Wouldn’t it be great?

Although there are some Image engines that can tell you what an uploaded picture contains but there isn’t any engine or tool (as per my knowledge) that can tell whether an upload document is a license or not. This is because there are only subtle differences between scanned copies of various documents.

In this blog series I will build and demonstrate an approach to have this kind of image recognition capability with our favourite Dynamics CRM and we will use a branch of machine learning called Deep Learning that is very good at tasks related to Computer Vision. I would not be delving into the concepts of Deep Learning (there are numerous posts and videos on the internet) but will try to cover the major building block in this whirlwind tour.

Australian Identity Documents

I will take a real business case which is ubiquitous in many online applications in Australia where a customer is asked to provide a scan of their Australian ID as a proof. For our blog we will use the following Australian IDs

1) Victoria Driver’s License clip_image003

Courtesy: VicRoads


2) Australian Visa clip_image005


3) Medicare card clip_image007

Courtesy: Medicare

Note: Because of their sensitive nature I would only be exposing sample documents in this blog

The expectation is that the system can tell if the user is attaching a scanned copy of their Australian Visa when the record type is Australian Visa. So we will validate the image based on its content.

Good thing about deep learning based systems is that the detection algorithms do not rely on exact colour, resolution and placement but rather on pattern and feature matching. I got pretty good results when I built this system which I will share in later posts.

Technical Setup

Deep Learning based systems use a concept of neural networks to train themselves and to perform their tasks. There are many kinds of neural networks and the one that does the job for us is the Convolutional Neural Network. CNNs are good at image related tasks.

In order to train a CNN from scratch you need lot of hardware and computing power and I do not have that. So I will be using a partially trained network and customise it for our specific task i.e. to identify the images of those 3 types of Australian IDs.

Let us cover the building blocks of our solution

TensorFlow TM

TensorFlow is an open source framework for Deep Learning and we will be using it to train our engine.


TensorFlow comes in many platforms but we will use its Python version.

Dynamics 365

Once our model is trained we will deploy it online as web service and CRM can query that. I would not be posting the integration code here as I have already posted code to integrate Dynamics CRM with Machine Learning web services in my other blog


Let us start by training an image recognition model that can classify an image e.g. a scanned copy and tell if it is an Australian ID e.g. driving license or visa scan, etc.


We will use an approach called Transfer Learning. In this approach you take an existing Convolutional Neural Network and retrain its last few layers. Think of it this way that you have already got a network that can detect differences of aeroplane from a dog but you need to retrain it to pick more subtle differences i.e. the difference between a scanned invoice and a scanned passport.

TensorFlow is based on the concept of a tensor which is a mathematical vector that contains the features of an image. We will grab the penultimate layer of tensors and retrain it with some sample images of a Medicare card, an Australian Visa and Victoria’s Driver license.

Once the model is trained we will use a simple Support Vector Machine classify and predict the likelihood of the uploaded image to be an Australian ID. The output of the SVC classifier will a predicted class along with a likelihood probability e.g.

(Visa, 0.83) Model thinks 83% the image is that of an Australian Visa
(Medicare, 0.89) 89%, it is a Medicare
(License, 0.45) 45% it is a license

If the confidence percentage is low it means that image is not in the class of our interest e.g. in the last example the uploaded image is most likely not a license. As a rule of thumb, a probability of 0.80 is good mark for the prediction to be reliable.

Training Pool

Below are the screenshots of the samples that I used as a training for my image classification model. As you can see images differ in terms of angles, positioning, colours, etc. system can still learn based on important properties and disregard irrelevant properties.

Australian Visa

Training Set


Training Set

Victoria Driver’s License

Training Set


Training Phase

The training procedure involves categorising all the training images into a folder which is a named after their class. As you can see in the screenshots above, the windows folders are named after the class i.e. DriversLicense, Medicare and Visa

We then iterate over all these images and pass them to the penultimate layer of TensorFlow which gives us a feature tensor (a 2048 dimensional array of that image), we then label the image with its respective class.

Support Vector Machine

Once we have the feature tensor and label of every image, our training dataset is complete and we feed it to a Support Vector Machine and train the model. To save time, I pickled the model so that it can be reused for all predictions.

I know some of this terminology may be new to you but in the next post I will explain the architecture and some sample code that generates the predictions. Then it will start falling in place. See you then.

Part 3

In the previous two instalments I have been explaining the image recognition system that I built to recognise Australian IDs and discussed how our traditional CRM can benefit from such intelligent capabilities.

In this post I will cover the Architecture and share some sample code



As you can see above there are basically two major pillars of the system

A) Python

B) CRM ecosystem

Python is used to build the model using TensorFlow, then the compiled version of the trained model is deployed to an online webservice that should be able to accept binary contents like image data.

On the CRM ecosystem side, user can upload the image in a web portal or directly from CRM based on the scenario, then we need to pass it to the model and get the score.

Source Code

Below is an excerpt of the source code from one of the unit tests that will give you glimpse of what happens under the hood on Python side of the fence. This is just one class for introductory purposes, not the entire source code.

import os

import pickle

import sklearn

import numpy as np

from sklearn.svm import SVC

import tensorflow as tf

import tensorflow.python.platform

from tensorflow.python.platform import gfile

model_dir = 'inception'

def CreateImageGraph():

#Get the tensorflow graph

with gfile.FastGFile(os.path.join(

model_dir, 'classify_image_graph_def.pb'), 'rb') as f:

graph_def = tf.GraphDef()


_ = tf.import_graph_def(graph_def, name='')

def ClassifyAustralianID(image):

nb_features = 2048

#Initialise the feature tensor

features = np.empty((1,nb_features))


with tf.Session() as sess:

next_to_last_tensor = sess.graph.get_tensor_by_name('pool_3:0')

print('Processing %s...' % (image))

if not gfile.Exists(image):

tf.logging.fatal('File does not exist %s', image)

image_data = gfile.FastGFile(image, 'rb').read()

#Get the feature tensor 

predictions =,{'DecodeJpeg/contents:0': image_data})

features[0,:] = np.squeeze(predictions)

clear = '\n' * 20


return features

if __name__ == '__main__':

#Unpickle the trained model

trainedSVC = pickle.load(open('Trained SVC','rb'))

#Path to the image to be classified

unitTestImagePath = 'Test\\L5.jpg'

#Get feature tensor of the image

X_test = ClassifyAustralianID(unitTestImagePath)

print("Trying to match the image at path %s.....",unitTestImagePath)

#Get predicted probabilities of various classes


#Get predicted class


#Choose the item with the best probability

bestProb = y_predict_prob.argsort()[0][-1]

#Print the predicted class along with its probability

print("(%s, %s)" % (y_predict_class, y_predict_prob[0][bestProb]))


The purpose of the above stub is to test the prediction class ClassifyAustralianID with a sample image L5.jpg which is below. As we can see it is a driving license.


Running this image against the model gives us this output


It means the model says, it is 93% sure that the input image matches the Driving License class. In my testing I found anything above 80% was the correct prediction

i.e. the confidence percentage for the below images was low because they do not belong to one of our classes (Drivers License, Visa or Medicare), which is the expected output


Closing Notes

Image recognition is a field of budding research and getting a lot of attention these days because of driverless cars, robots, etc. This little proof of concept gave me a lot of insight into how things work behind the scenes and it was a great experience to create such a smart system. The world of machine learning is very interesting!!

Hope you enjoyed the blog.

Posted in Computer Vision

How developers can move to the next level

Bored of writing  plugins, workflows, integrations and web pages and want to try something interesting? Try artificial intelligence.

It is so interesting and powerful that once you are into it you will never look back. Drones are in the air and driverless cars are being trialled. All such smart machines have one key requirement i.e. Visual Recognition.

Ability to understand what a frame contains – what is in that image, what is in the video?

It is quite fascinating to think about how can a program interpret an image?

If that is something you like then read on.


How a program understands an image

Images are matrices of pixel values, think of it as a 3D array where first dimension is the with of the image, second dimension is along the height and third dimension is the color channel i.e. RGB.

For the below image – An array value of [10][5][0]=157 means the value of Red Channel of the pixel at 10th row and 5th column is 157

and its Green Channel value may be 34 i.e. [10][5][1]=34




So at very basic level image interpretation is all about applying machine learning to these matrices


How to write a basic Image classifier

In this blog, I will highlight how can you write a very basic image classifier – that would not be state of the art but it can give you an understanding about the basics. There is a great source available that can help you train your image classifier. The CIFAR dataset gives you around 50K classified images in their matrix form that your program can train upon and additional 10K image that you can use to test the accuracy of your program. At the end of this blog I will leave you with the link to full source code a working classifier.


Training Phase

In the training phase you load all these images in an array and also store their category in an equivalent array e.g. let me show you some code

# Get the raw images.
rawImages = unpickledFile[b'data']
# Get the class-numbers for each image. Convert to numpy-array.
classNames = np.array(unpickledFile[b'labels'])
# Reshape 32 *32 * 3 (3D) vector into 3072 (1D) vector
flattenedMatrix = np.reshape(matrixImages, (self.NUM_EXAMPLES, self.NUMBER_OF_PIXELS * self.NUMBER_OF_PIXELS * self.TOTAL_CHANNELS))


In the above code we are loading the CIFAR dataset and converting into two arrays. Array flattenedMatrix contains the image pixels and Array classNames contains what the image actually contains e.g. a boat, horse, car, etc.

So flattenedMatrix [400] will give us pixel values of the 400th example and classNames[400] will give us its category e.g. a car

That way program can relate, what pixel values correspond to what objects and create patterns that it can match against during prediction.


This being a very simple classifier uses a simple prediction algorithm called kNN i.e. k Nearest Neighbour. Prediction occurs by finding the closest neighbour from the images the program already knows.

For example if k=5, then for an input image X the program finds 5 closest images whose pixel values are similar to X. Then the class of X is computed based on the majority vote e.g. if 3 of those images are of category horse, then X is also most likely to be a horse.

Below is some code that shows how this computation occurs

def Predict(self, testData, predictedImages=False):
# testData is the N X 3072 array where each row is 3072 D vector of pixel values between 0 and 1
totalTestRows = testData.shape[0]
# A vector where each element is zero with N rows where each row will be predicted class i.e. 0 to 9
Ypred = np.zeros(totalTestRows, dtype = self.trainingLabels.dtype)
Ipred = np.zeros_like(testData)

# Iterate for each row in the test set
for i in range(totalTestRows):
# It uses Numpy broadcasting. Below is what is happening
# testData[i,:] is test row of 3072 values
# self.trainingExamples - testData[i,:] gives you a difference matrix of size 50000 X 3072 where each element is the difference value
# np.sum() computes sums across the columns e.g. [ 2 4 9] sum is 15,
# distances is 50000 rows where each element is the distance (cummulative sum of all 3072 columns) from test record (i)
distances = np.sum(np.abs(self.trainingExamples - testData[i,:]), axis = 1)
#Partition by nearest K distances (smallest K)
nearest_K_distances= np.argpartition(distances, self.K)[:self.K]
#K matches
labels_K_matches= self.trainingLabels.take(nearest_K_distances)
# top matched label
Ypred[i] = best_label
# do we need to return predicted Image as well
best_label_arg= np.argwhere(labels_K_matches==best_label)
# store the match
Ipred[i] = self.trainingExamples[nearest_K_distances[best_label_arg[0][0]]]
return Ypred, Ipred


As outlined above if you need to try this yourselves, full source code is available on my Github page