Can Dynamics CRM understand images? Yes! Using deep learning.

Machine Learning is quite a buzzword these days and we have witnessed how quickly Microsoft and other vendors have made progress in this area. Just couple of years back Microsoft had no product or tool in this space and today they have closer to a dozen. Recently Microsoft has integrated Machine Learning into SQL Server and Dynamics CRM, it is slowly becoming core to its product line.

I would not be surprised if machine learning becomes a mandatory skill for most of the development jobs in the next decade.

How Image Recognition can help CRM?

Attaching documents is a common feature asked for in many CRM projects where customers can complete an application form and then upload scanned copies to support their application. Think of invoices, receipts, certificates, licenses, etc. As of now there is no way that Dynamics CRM can detect if the scan that a customer is uploading is a picture of a license, or beach or a car.

What if Dynamics CRM can detect and recognise the scanned image and tell the user that it is expecting a license not a Dilbert on the beach.


Source: Ol.v!er [H2vPk] – Flickr

Wouldn’t it be great?

Although there are some Image engines that can tell you what an uploaded picture contains but there isn’t any engine or tool (as per my knowledge) that can tell whether an upload document is a license or not. This is because there are only subtle differences between scanned copies of various documents.

In this blog series I will build and demonstrate an approach to have this kind of image recognition capability with our favourite Dynamics CRM and we will use a branch of machine learning called Deep Learning that is very good at tasks related to Computer Vision. I would not be delving into the concepts of Deep Learning (there are numerous posts and videos on the internet) but will try to cover the major building block in this whirlwind tour.

Australian Identity Documents

I will take a real business case which is ubiquitous in many online applications in Australia where a customer is asked to provide a scan of their Australian ID as a proof. For our blog we will use the following Australian IDs

1) Victoria Driver’s License



Courtesy: VicRoads


2) Australian Visa




3) Medicare card



Courtesy: Medicare

Note: Because of their sensitive nature I would only be exposing sample documents in this blog

The expectation is that the system can tell if the user is attaching a scanned copy of their Australian Visa when the record type is Australian Visa. So we will validate the image based on its content.

Good thing about deep learning based systems is that the detection algorithms do not rely on exact colour, resolution and placement but rather on pattern and feature matching. I got pretty good results when I built this system which I will share in later posts.

Technical Setup

Deep Learning based systems use a concept of neural networks to train themselves and to perform their tasks. There are many kinds of neural networks and the one that does the job for us is the Convolutional Neural Network. CNNs are good at image related tasks.

In order to train a CNN from scratch you need lot of hardware and computing power and I do not have that. So I will be using a partially trained network and customise it for our specific task i.e. to identify the images of those 3 types of Australian IDs.

Let us cover the building blocks of our solution

TensorFlow TM

TensorFlow is an open source framework for Deep Learning and we will be using it to train our engine.


TensorFlow comes in many platforms but we will use its Python version.

Dynamics 365

Once our model is trained we will deploy it online as web service and CRM can query that. I would not be posting the integration code here as I have already posted code to integrate Dynamics CRM with Machine Learning web services in my other blog


Let us start by training an image recognition model that can classify an image e.g. a scanned copy and tell if it is an Australian ID e.g. driving license or visa scan, etc.


We will use an approach called Transfer Learning. In this approach you take an existing Convolutional Neural Network and retrain its last few layers. Think of it this way that you have already got a network that can detect differences of aeroplane from a dog but you need to retrain it to pick more subtle differences i.e. the difference between a scanned invoice and a scanned passport.

TensorFlow is based on the concept of a tensor which is a mathematical vector that contains the features of an image. We will grab the penultimate layer of tensors and retrain it with some sample images of a Medicare card, an Australian Visa and Victoria’s Driver license.

Once the model is trained we will use a simple Support Vector Machine classify and predict the likelihood of the uploaded image to be an Australian ID. The output of the SVC classifier will a predicted class along with a likelihood probability e.g.

(Visa, 0.83)

Model thinks 83% the image is that of an Australian Visa

(Medicare, 0.89)

89%, it is a Medicare

(License, 0.45)

45% it is a license

If the confidence percentage is low it means that image is not in the class of our interest e.g. in the last example the uploaded image is most likely not a license. As a rule of thumb, a probability of 0.80 is good mark for the prediction to be reliable.

Training Pool

Below are the screenshots of the samples that I used as a training for my image classification model. As you can see images differ in terms of angles, positioning, colours, etc. system can still learn based on important properties and disregard irrelevant properties.

Australian Visa

Training Set



Training Set


Victoria Driver’s License

Training Set


Training Phase

The training procedure involves categorising all the training images into a folder which is a named after their class. As you can see in the screenshots above, the windows folders are named after the class i.e. DriversLicense, Medicare and Visa

We then iterate over all these images and pass them to the penultimate layer of TensorFlow which gives us a feature tensor (a 2048 dimensional array of that image), we then label the image with its respective class.

Support Vector Machine

Once we have the feature tensor and label of every image, our training dataset is complete and we feed it to a Support Vector Machine and train the model. To save time, I pickled the model so that it can be reused for all predictions.

I know some of this terminology may be new to you but in the next post I will explain the architecture and some sample code that generates the predictions. Then it will start falling in place. See you then.

Part 3

In the previous two instalments I have been explaining the image recognition system that I built to recognise Australian IDs and discussed how our traditional CRM can benefit from such intelligent capabilities.

In this post I will cover the Architecture and share some sample code



As you can see above there are basically two major pillars of the system

A) Python

B) CRM ecosystem

Python is used to build the model using TensorFlow, then the compiled version of the trained model is deployed to an online webservice that should be able to accept binary contents like image data.

On the CRM ecosystem side, user can upload the image in a web portal or directly from CRM based on the scenario, then we need to pass it to the model and get the score.

Source Code

Below is an excerpt of the source code from one of the unit tests that will give you glimpse of what happens under the hood on Python side of the fence. This is just one class for introductory purposes, not the entire source code.

import os

import pickle

import sklearn

import numpy as np

from sklearn.svm import SVC

import tensorflow as tf

import tensorflow.python.platform

from tensorflow.python.platform import gfile

model_dir = 'inception'

def CreateImageGraph():

#Get the tensorflow graph

with gfile.FastGFile(os.path.join(

model_dir, 'classify_image_graph_def.pb'), 'rb') as f:

graph_def = tf.GraphDef()


_ = tf.import_graph_def(graph_def, name='')

def ClassifyAustralianID(image):

nb_features = 2048

#Initialise the feature tensor

features = np.empty((1,nb_features))


with tf.Session() as sess:

next_to_last_tensor = sess.graph.get_tensor_by_name('pool_3:0')

print('Processing %s...' % (image))

if not gfile.Exists(image):

tf.logging.fatal('File does not exist %s', image)

image_data = gfile.FastGFile(image, 'rb').read()

#Get the feature tensor

predictions =,{'DecodeJpeg/contents:0': image_data})

features[0,:] = np.squeeze(predictions)

clear = '\n' * 20


return features

if __name__ == '__main__':

#Unpickle the trained model

trainedSVC = pickle.load(open('Trained SVC','rb'))

#Path to the image to be classified

unitTestImagePath = 'Test\\L5.jpg'

#Get feature tensor of the image

X_test = ClassifyAustralianID(unitTestImagePath)

print("Trying to match the image at path %s.....",unitTestImagePath)

#Get predicted probabilities of various classes


#Get predicted class


#Choose the item with the best probability

bestProb = y_predict_prob.argsort()[0][-1]

#Print the predicted class along with its probability

print("(%s, %s)" % (y_predict_class, y_predict_prob[0][bestProb]))

The purpose of the above stub is to test the prediction class ClassifyAustralianID with a sample image L5.jpg which is below. As we can see it is a driving license.


Running this image against the model gives us this output


It means the model says, it is 93% sure that the input image matches the Driving License class. In my testing I found anything above 80% was the correct prediction

i.e. the confidence percentage for the below images was low because they do not belong to one of our classes (Drivers License, Visa or Medicare), which is the expected output


Closing Notes

Image recognition is a field of budding research and getting a lot of attention these days because of driverless cars, robots, etc. This little proof of concept gave me a lot of insight into how things work behind the scenes and it was a great experience to create such a smart system. The world of machine learning is very interesting!!

Hope you enjoyed the blog.

Power BI for Data Scientists

With my involvement in some data science work recently, I have had the privilege to explore a lot tools of the trade – Rapid Miner, Python, Tensorflow and Azure Machine Learning to name a few. My experience has been highly enriching but I felt there was no Swiss knife that can handle the initial – and the most critical stage of a Data science project: i.e. Hypothesis stage.

During this stage, scientists typically need to quickly prep the data, find the correlation patterns and establish hypotheses. It requires them to fail fast by identifying null hypotheses and spurious correlations and stay focussed on the right path. I recently explored Power BI and would like to share my findings through this blog.

Business Problem

Let us take a business case of a juice vendor say Julie. Julie sells various kinds of juices and she collects some data about her business operations on daily basis. Say we have the following data for the month of July which looks like below. It is pretty much – when, where, what and for how much?


Now say I am a data scientist who is trying to help Julie to increase her sales and give her some insights that what should she focus on to get the best bang of her buck. I have been tasked to build an estimation model for Julie based on simple linear regression.

Feature Engineering

I will start by analysing various correlations between the features and our target variable i.e. Revenue. It can be commenced by importing the data into Power BI and looking after the following basics

1) Eliminate the null values with mean value of the feature

2) Dedupe any rows

3) Engineer some new features as below


DAX formula

Day Type

Purpose of this feature is to distinguish between a week day and a weekend day. I wanted to test a hypothesis that weekend day might generate more sales than a week day.

Day Type = IF(WEEKDAY(Lemonade2016[Date],3) >= 5,”Weekend”,”Weekday”)

Total Items Sold

Lemon + Orange


Total Items Sold * Price

Data preparation and feature engineering was a breeze in Power BI, thanks its extensive support of DAX, calculated columns and measures. The dataset looks like below now.


Hypotheses Development

Once we had our dataset ready in Power BI, the next task was to analyse the patterns between Revenue and other features

Hypothesis 1 – There is a positive correlation between Temperature and Revenue

Result: Passed

Hypothesis 2 – There are more sales on a weekend day

Result: Failed

I derived these results using the below visualizations built briskly using Power BI platform


Next off to some advanced hypothesis development. Shall we?

I needed to understand the relationship between the leaflets given on a particular day and their relationship with Revenue. Time to pull some heavy plumbing in, so I decided to tow R into in the mix. Power BI comes with inbuilt (almost!) support with R and I was able to quickly spawn a coplot using just 6-8 lines of R in the R Script Editor of Power BI


Interesting insight was how correlation differs based on the day. This was made possible using the Power BI slicer as shown below




Wednesday – Less correlation between leaflets and sales


Sunday – High correlation between leaflets and sales

Power BI + R = Advanced Insights

If you need to analyse the dynamics between various features and how this dynamics impacts your target variable i.e. Revenue. You can easily model that in Power BI. Below is a dynamic co plot that shows the incremental causal relationship between Leaflets, Revenue and Temperature.

The 6 quadrants at the bottom should be read in conjunction with 6 steps in the top box. The bottom left is the first step and the top right the last step of leaflets. Basically it shows how the correlation between Temperature and Revenue is affected by leaflets bin size


I ended my experiment by building a simple regression model that can give you prediction of your Revenue if you enter Temperature, Price and Leaflets. Below is the code for model in case you are keen


Power BI is a very simple and powerful tool for the exploratory data scientist in you. Give it a go.

How developers can move to the next level

Bored of writing  plugins, workflows, integrations and web pages and want to try something interesting? Try artificial intelligence.

It is so interesting and powerful that once you are into it you will never look back. Drones are in the air and driverless cars are being trialled. All such smart machines have one key requirement i.e. Visual Recognition.

Ability to understand what a frame contains – what is in that image, what is in the video?

It is quite fascinating to think about how can a program interpret an image?

If that is something you like then read on.


How a program understands an image

Images are matrices of pixel values, think of it as a 3D array where first dimension is the with of the image, second dimension is along the height and third dimension is the color channel i.e. RGB.

For the below image – An array value of [10][5][0]=157 means the value of Red Channel of the pixel at 10th row and 5th column is 157

and its Green Channel value may be 34 i.e. [10][5][1]=34




So at very basic level image interpretation is all about applying machine learning to these matrices


How to write a basic Image classifier

In this blog, I will highlight how can you write a very basic image classifier – that would not be state of the art but it can give you an understanding about the basics. There is a great source available that can help you train your image classifier. The CIFAR dataset gives you around 50K classified images in their matrix form that your program can train upon and additional 10K image that you can use to test the accuracy of your program. At the end of this blog I will leave you with the link to full source code a working classifier.


Training Phase

In the training phase you load all these images in an array and also store their category in an equivalent array e.g. let me show you some code

# Get the raw images.
rawImages = unpickledFile[b'data']
# Get the class-numbers for each image. Convert to numpy-array.
classNames = np.array(unpickledFile[b'labels'])
# Reshape 32 *32 * 3 (3D) vector into 3072 (1D) vector
flattenedMatrix = np.reshape(matrixImages, (self.NUM_EXAMPLES, self.NUMBER_OF_PIXELS * self.NUMBER_OF_PIXELS * self.TOTAL_CHANNELS))


In the above code we are loading the CIFAR dataset and converting into two arrays. Array flattenedMatrix contains the image pixels and Array classNames contains what the image actually contains e.g. a boat, horse, car, etc.

So flattenedMatrix [400] will give us pixel values of the 400th example and classNames[400] will give us its category e.g. a car

That way program can relate, what pixel values correspond to what objects and create patterns that it can match against during prediction.


This being a very simple classifier uses a simple prediction algorithm called kNN i.e. k Nearest Neighbour. Prediction occurs by finding the closest neighbour from the images the program already knows.

For example if k=5, then for an input image X the program finds 5 closest images whose pixel values are similar to X. Then the class of X is computed based on the majority vote e.g. if 3 of those images are of category horse, then X is also most likely to be a horse.

Below is some code that shows how this computation occurs

def Predict(self, testData, predictedImages=False):
# testData is the N X 3072 array where each row is 3072 D vector of pixel values between 0 and 1
totalTestRows = testData.shape[0]
# A vector where each element is zero with N rows where each row will be predicted class i.e. 0 to 9
Ypred = np.zeros(totalTestRows, dtype = self.trainingLabels.dtype)
Ipred = np.zeros_like(testData)

# Iterate for each row in the test set
for i in range(totalTestRows):
# It uses Numpy broadcasting. Below is what is happening
# testData[i,:] is test row of 3072 values
# self.trainingExamples - testData[i,:] gives you a difference matrix of size 50000 X 3072 where each element is the difference value
# np.sum() computes sums across the columns e.g. [ 2 4 9] sum is 15,
# distances is 50000 rows where each element is the distance (cummulative sum of all 3072 columns) from test record (i)
distances = np.sum(np.abs(self.trainingExamples - testData[i,:]), axis = 1)
#Partition by nearest K distances (smallest K)
nearest_K_distances= np.argpartition(distances, self.K)[:self.K]
#K matches
labels_K_matches= self.trainingLabels.take(nearest_K_distances)
# top matched label
Ypred[i] = best_label
# do we need to return predicted Image as well
best_label_arg= np.argwhere(labels_K_matches==best_label)
# store the match
Ipred[i] = self.trainingExamples[nearest_K_distances[best_label_arg[0][0]]]
return Ypred, Ipred


As outlined above if you need to try this yourselves, full source code is available on my Github page

Use Machine Learning to predict customers you might lose – Part 4

So far we have seen how a Dynamics CRM integration can be connected to Azure ML to receive the predictions. Once we got the integration going there is no dearth of possibilities. You may like to build an alert / flagging functionality that can alert a Customer Service rep to contact a customer if their predictors are indicating that they might churn. You may incorporate predictions into exec reporting so that the execs are aware of the churn trends and make decisions to minimise churn.


One of the things I discussed at the start of this series was to be able to get some insights into the key drivers of customer churn e.g. how do you know which features are most likely to cause churn. Answering such questions begins with analysing your data, few starting points can be

1. From your data find out what fields change with respect to the Churn variable e.g. does the churn rate increase as the income of the customer goes up or is it dependent on their usage?

There are measures like correlation, covariance, entropy, etc that can help you answer such questions.

2. Find the distribution of your data and identify any outliers e.g. check if there is a skew in the data or if the classes are unbalanced. You may need to apply some statistical techniques like variance, standard deviation to have a better platform to delve into some of these insights.

Azure Machine Learning does provide some modules straight off the bat that can make the job easier e.g. it has the following modules

Compute Elementary Statistics

Compute Linear Correlation

Getting advanced insights can be tricky based on your algorithm or setup of the experiment (project). But there are ways e.g. with bit of a Python code you can produce a decision rule tree below. The last label in the box class= {LEAVE, STAY} tells us if the customer will churn based on what path they fall under


Above is the automatically generated insight that tells us that overage is most important variable in deciding customer churn. If overage exceeds 97.5 then a customer is more likely to churn, this does not mean that every customer whose overage is more than 97.5 will churn nor does it mean that whose overage is less than that will stay. It is just that Overage is the strongest indicator of churn based on our data.

We can even derive decision rules from insights like these e.g. customers with overage less than 97.5 and Leftover minutes less than 24.5 minutes are most likely to stay. On the contrary customers with overage more than 97.5 and average income more than $100059.5 are most likely to leave.

Here is another one that shows the impact of House Value, Handset Value and other features on the churn


Once decision rules have been identified based on the above insights, policies can be made to retain such customers who are at risk of churn e.g. give them discounts, offer them a change of plan, prize them with loyalty offerings, etc.

Where to from here?

Hopefully by now you appreciate the potential of machine learning and recognise the opportunity it provides when it is complemented with traditional information systems like CRMs, ERPs and Document Management systems. The field of machine learning is enormous and sometimes quite complex too as it based on scientific techniques and mathematics. You need to understand and lot of theory if you need to get into the black box i.e. how machine learning does what it does?

But great thing about using Azure Machine Learning suite is that it makes entry into machine learning easier by taking care of the complexities and giving you an easy-to-understand and easy-to-use environment. You have full control over the data structure and algorithms used in your project. It can be tuned as per the needs of your organisation to receive the best possible results.

For example you can tune the example I provided in the following different ways

1. Rather than going with Random Forest you can choose Support Vector Machines or Neural Networks and compare the results.

2. You are not restricted to Javascript, you can call the web services from a plugin. That way in a data migration scenario, while you are importing data you can set the prediction scores as the data is being imported

3. You can also change the threshold of confidence percentage to ignore the predictions score where confidence is less than a certain amount.

So there are lot of possibilities. Hope you enjoyed the series.

Happy CRM + ML!!

Use Machine Learning to predict customers you might lose – Part 3

Cruising through our machine learning journey and starting from where we left in the previous instalment, the next step is to expose our machine learning model as a Web service so that it can be invoked from within Dynamics CRM.

Azure Machine Learning has this fantastic concept of converting an experiment into a trained model. Trained model is like a compiled version of your experiment that can be exposed via a web service, all from the click of just one button i.e. Setup Web Service


Azure ML takes care of the rest by deploying the model. Once deployed you can inspect its configuration by going to the Web Services section as shown here


In order to connect to this web service from within Dynamics CRM, we can use the below JavaScript. We can pass CRM objects to this service in JSON format and get prediction results back

sendRequest: function (avgIncome,overAge,leftOver,houseVal,handsetVal,longCalls) {

var service = AzureScript.getRequestObject();

var url = "";

var jsonObject =


"Inputs": {

"input1": {

"ColumnNames": [














"Values": [


















"GlobalParameters": {}


var dataString = JSON.stringify(jsonObject);

if (service != null) {"POST", url, false);

service.setRequestHeader("X-Requested-Width", "XMLHttpRequest");

service.setRequestHeader("Authorization", "Bearer xxxxxxxxx==");

service.setRequestHeader("Accept", "application/json");

service.setRequestHeader("Content-Type", "application/json; charset=utf-8");

service.setRequestHeader("Content-Length", dataString.length);


//Recieve result

var requestResults = eval('(' + service.responseText + ')');

try {

resultOutput = requestResults.Results.output1.value.Values[0]

return resultOutput;


catch (err) {

console.log('Unable to interpret result');




Let us prepare Dynamics CRM to start consuming this web service. I have created an event onSave() of the Telecom Customer form which passes the relevant data to the Azure Service and gets the score. The Javascript for that is as below

function onFormSave() {

//Prepare data - only fields with high Information gain

var houseValue = Xrm.Page.getAttribute('manny_housevalue').getValue();

var income = Xrm.Page.getAttribute('manny_income').getValue();

var longcalls = Xrm.Page.getAttribute('manny_longcalls').getValue();

var overage = Xrm.Page.getAttribute('manny_overage').getValue();

var phonecost = Xrm.Page.getAttribute('manny_phonecost').getValue();

var leftOver = Xrm.Page.getAttribute('manny_leftover').getValue();

var valOutput = AzureScript.sendRequest(income, overage, leftOver, houseValue, phonecost, longcalls);

if (valOutput != null && valOutput[0]!=null && valOutput[1]!=null) {





var prob = parseFloat(valOutput[1]);

if(prob>=0 && prob<=1.0)




getRequestObject: function () {
 /// Get an instance of XMLHttpRequest for all browsers
 if (XMLHttpRequest) {
 // Chrome, Firefox, IE7+, Opera, Safari
 // ReSharper disable InconsistentNaming
 return new XMLHttpRequest();
 // ReSharper restore InconsistentNaming
 // IE6
 try {
 // The latest stable version. It has the best security, performance,
 // reliability, and W3C conformance. Ships with Vista, and available
 // with other OS's via downloads and updates.
 return new ActiveXObject('MSXML2.XMLHTTP.6.0');
 } catch (e) {
 try {
 // The fallback.
 return new ActiveXObject('MSXML2.XMLHTTP.3.0');
 } catch (e) {
 alertMessage('This browser is not AJAX enabled.');
 return null;


These scripts are trivial and should be self-explanatory. Basically we are passing the highly correlated features to the prediction service and getting two outputs

Prediction score -> assigned to-> manny_predictedchurnstatus

Prediction confidence -> assigned to -> manny_predictionconfidencepercentage

And they are displayed on the form like this, it’s integrated i.e. the moment you change the data the score is updated.


In the next blog post, we will touch upon the Insights that can be gained from a machine learning integration

How to write a thinking engine for Dynamics CRM

Time to uncover the core functionality that will turn Dynamics CRM into a machine that can learn. Its brain !! If you haven’t been through the previous posts – Part 1 and Part 2, I recommend you do, to make better sense of the following content. Hoping you understand what we are doing, lets keep cruising

Analyse Dynamics CRM data

Let us see a sample from our feature set first i.e. a Case record in Dynamics CRM


The engine will train on such data, it will look for measurable attributes like ETA given by Support Rep (estimated), how much time Support Rep actually took (actual), nature of work (whether the work corresponded to a bank or was it from a government agency), other variables can be introduced depending on what is important for your organisation.


Then the engine will start learning: not only by just understanding meaningful words but also by correlating them. In Data Science, it is very important to focus on only those attributes that provide you with the most information gain. So we will only need to extract the most meaningful attributes that pertain to the problem at hand i.e. predicting what department the query belongs to –Tax, Investment or Medical. In our scenario – the most meaningful phrases are noun phrases i.e . proper nouns, combination of common nouns, industry jargons, etc. So our engine should be able to separate this critical information from a big blurb of text while staying away from the common words which occur in every email / query.

Note: In any other kind of action based application, verb phrases may be more important than nouns, so you need to adjust your extraction module accordingly –  horses for courses.


Designing the engine (brain)

We will use three key Data Science concepts to build this engine

Natural Language Processing

This process will involve tokenisation and meaningful keyword extraction

Term Frequency – Inverse Document Frequency

We will use this measure to determine distances between various features for our classification problem

Support Vector Machine

This will be classification algorithm for our classification task at hand i.e. to determine the department

Below are various phases involved in the brain training



Writing the engine (brain)

I have written the grammar engine below that uses regular expressions for synthesis. I found that tokenisation is much faster and accurate when you use regular expressions as it gives the NLTK engine a jump start.

Then it uses Bi-gram approach for grammatical tagging of the text. This approach is more efficient than unigram approach because it understands the context of the word in the sentence before tagging it (rather than just the word itself)

After the synthesis, tokenisation and tagging, we then move to the keyword definition and their extraction. I am sharing the source code below, you can tune it to suit your requirement. It uses Python’s NLTK library and Brown corpus for Bigram tagging.
# >>>>>>>> Manny Grewal - 2016  <<<<<<<<<<<<
# Fast and simple POS Tagging module with emphasis on key pharases
# Based on Brown Corpus - News
# Below are the regular expressions that give a jump start to the tagger
taggingDatabase = brown.tagged_sents(categories='news')
tokenGrammar = nltk.RegexpTagger(
    [(r'(\W)', 'CD'), #special chars
     (r'(\d+)', 'CD'), #digits only
     (r'\'*$', 'MD'), 
     (r'(The|the|A|a|An|an)$', 'AT'), # match articles
     (r'^-?[0-9]+(.[0-9]+)?$', 'CD'), # match amounts and decimals
     (r'.*able$', 'JJ'),
     (r'(?<![!.?]\s)\b[A-Z]\w+', 'NNP'), # noun pharses
     (r'.+ness$', 'NN'),
     (r'.*ly$', 'RB'),
     (r'.*s$', 'NNS'),
     (r'.*ing$', 'VBG'),
     (r'.*ed$', 'VBD'),    
     (r'.*', 'NN')
uniGramTagger = nltk.UnigramTagger(taggingDatabase, backoff=tokenGrammar)
biGramTagger = nltk.BigramTagger(taggingDatabase, backoff=uniGramTagger)

# Grammar rules 
#This grammar decides the 3 word, 2 word pharses and what tokens should be chosen
triConfig = {}
triConfig["NNP+NNP+NNP"] = "NNP" # New York City
triConfig["NNP+IN+NNP"] = "NNP" # Ring of Fire
#triConfig["NN+NN+NN"] = "NN" # captial gain tax

biConfig = {}
biConfig["NNP+NNP"] = "NNP"
biConfig["NN+NN"] = "NNI"
biConfig["NNP+NN"] = "NNP"
biConfig["NN+NNP"] = "NNP"
biConfig["AT+NNP"] = "NNP"
biConfig["JJ+NN"] = "NNI"
biConfig["VBG+NN"] = "NNI"
biConfig["RBT+NN"] = "NNI"

uniConfig ={}
uniConfig["NNP"] = "NNP"
uniConfig["NN"] = "NN"
# Split the sentence into singlw words/tokens
def tokeniseSentence(textData):
    tokens = nltk.word_tokenize(textData)
    return tokens

# generalise special POS tags
def replaceTagWithGeneral(tagValue):
    if(tagValue=="JJ-TL" or tagValue=="NN-TL" or tagValue=="NNPS"):
        return "NNP"
        return "NN"
        return tagValue

# Extract the main topics from the sentence
def ExtractKeyTokens(textData):
    tokens = tokeniseSentence(textData)
    generatedTags = biGramTagger.tag(tokens)

    # replace special tags with general tag
    for cnt, (w,t) in enumerate(generatedTags):
        replacedVal = replaceTagWithGeneral(t)

    #process trigrams
    while remainingTags >= 3:
        firstTag = generatedTags[currentTag]
        secondTag = generatedTags[currentTag + 1]
        thirdTag = generatedTags[currentTag + 2]
        configKey = "%s+%s+%s" % (firstTag[1], secondTag[1], thirdTag[1])
        value = triConfig.get(configKey)
        if value:
            for l in range(0,3):
            matchedTokens.append("%s %s %s" %   (firstTag[0], secondTag[0], thirdTag[0]))

    #process bigrams
    while remainingTags >= 2:
        firstTag = generatedTags[currentTag]
        secondTag = generatedTags[currentTag + 1]            
        configKey = "%s+%s" % (firstTag[1], secondTag[1])
        value = biConfig.get(configKey)
        if value:
            for l in range(0,2):
            matchedTokens.append("%s %s" %   (firstTag[0], secondTag[0]))

    #process unigrams
    while remainingTags >= 1:
        firstTag = generatedTags[currentTag] 
        value = uniConfig.get(firstTag[1])
        if value:
    return set(matchedTokens)



In a bid to keep this post relevant to the Dynamics CRM audience and not to flood it with too much mathematical complexity, I will describe the steps in nutshell that I performed to develop the ML engine:

1. Wrote a program that gets the key phrases out of the text

2. Fed the phrases to a Linear SVM classifier using the TF-IDF distance as a similarity measure

3. Trained the engine on a corpus of around 300 tickets, 100 from each category

4. Tested it using rolling windows approach of 10%

5. Adjusted the Coefficient of the kernel measures to give best results and yet avoid over-fitting of the model

6. Once trained, I pickled my engine and deployed it as a WS that will accept ticket description as input and predict the department


Integration of Dynamics CRM with the prediction service

Let us look how CRM will connect to the Machine Learning web service.

A plugin will fire on creation of Case and will pass the ticket description and receive the predicted department as shown below by the predictedDepttResult variable

Below is the source code of the plugin, it uses JSON to connect to WS


namespace Manny.Xrm.BusinessLogic
    public class CasePredictTeam : IPlugin
         public void Execute(IServiceProvider serviceProvider)
            //Extract the tracing service for use in debugging sandboxed plug-ins.
            ITracingService tracingService =  (ITracingService)serviceProvider.GetService(typeof(ITracingService));
            // Obtain the execution context from the service provider.
            IPluginExecutionContext context = (IPluginExecutionContext)  serviceProvider.GetService(typeof(IPluginExecutionContext));

            //Extract the crm service for use in debugging sandboxed plug-ins.
            IOrganizationServiceFactory serviceFactory = (IOrganizationServiceFactory)serviceProvider.GetService(typeof(IOrganizationServiceFactory));
            IOrganizationService crmService = serviceFactory.CreateOrganizationService(context.UserId);
            if (context.InputParameters.Contains("Target") && context.InputParameters["Target"] is Entity)
                Entity entity = (Entity)context.InputParameters["Target"];             
                if (entity.LogicalName != "incident")
                    if (entity.Attributes.Contains("description"))
                        var url = "http://<put your host name and WS here>/PredictTicketDeptt/";                     
                        string predictedDepttResult = "(default)";
                        var httpWebRequest = (HttpWebRequest)WebRequest.Create(url);
                        httpWebRequest.ContentType = "application/json";
                        httpWebRequest.Method = "POST";                       
                        using (var streamWriter = new StreamWriter(httpWebRequest.GetRequestStream()))
                            string rawDesc = (string) entity.Attributes["description"];
                            rawDesc = EncodeJson(rawDesc);
                            string json = "{\"descp\":\"" + rawDesc + "\"}";
                            var httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
                            using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
                                predictedDepttResult = streamReader.ReadToEnd();

                        Entity caseToBeUpdated = new Entity("incident");
                        caseToBeUpdated.Attributes.Add("incidentid", entity.Id);

                        var optionSetValue = GetOptionSetValue(predictedDepttResult);
                        caseToBeUpdated.Attributes.Add("manny_department", new OptionSetValue(optionSetValue));                       
                catch (FaultException<OrganizationServiceFault> ex)
                    throw new InvalidPluginExecutionException("An error occurred in the CasePredictTeam plug-in.", ex);

                catch (Exception ex)
                    tracingService.Trace("v: {0}", ex.ToString());
        public int GetOptionSetValue(string deptt)
            if (deptt == "Tax")
                return 159690000;
            else if (deptt == "Investment")
                return 159690001;
                return 159690002;

        public string EncodeJson(string rawDesc)
            return rawDesc.Replace("\r\n", " ").Replace("\n", " ").Replace("\r", " ").Replace("\t", " ")
                .Replace("\b", " ").Replace("\"","\\\"").Replace("\\","\\\\");


Once integrated, the ticket will start predicting the department as soon as it is created.

You can build the Support Rep Prediction WS using a similar approach. Rather the predicting based on description text, it will use ETA, Actual Time  Spent and nature of work as three parameters  to choose the best Rep i.e. it will predict the Rep that will take minimum amount of time to resolve the case based on the nature of work involved. It is also a classification problem, rather than classifying into 3 classes (Tax, Investment and Medical), you will be classifying in N classes and N is the number of Support Reps in the team.


I hope you got the basic idea of the possibilities and potential of machine learning. In the world of Data Science, sky is the limit and lot of wonders are waiting to be explored.


Happy treasure-hunting !!

Turn Dynamics CRM into a thinking machine

The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom

Isaac Asimov

As the saying goes wisdom is an asset of unmatchable importance, wisdom comes with intelligence. In computers, intelligence comes with extracting meaning out of data using Data Science. A little tinge of intelligence can turn an instruction-taking information system into an instruction-giving thinking machine.

In the previous post we discussed the idea to create an intelligent routing system in Dynamics CRM that can tell which Support Rep is best suited to resolve a customer ticket. If you missed the introductory post and the agenda, I recommend you to read it first to understand the following content better.


Prepare Dynamics CRM to marry Data Science

Before we start training our machine learning engine, we need to prepare our data to suit the data science algorithms better. We will use the following techniques

1. Classification using Support Vector machines : to find out what team/department the ticket belongs to

2. Logistic Regression: to predict the most suitable agent


We will train our Machine learning engine first using a supervised approach based on the existing tickets. In nutshell, it will understand the characteristics of various types of tickets and convert them into mathematical form, then predict by applying mathematical formulas on those characteristics. Some examples of these characteristics can be

  • Which Support Rep is better at handling certain kinds of customers
  • Which Support Rep generally resolves a ticket earlier than estimated
  • What are the traits of a ticket that belongs to certain category e.g. Investment Category



Let us see how our data looks like..

If you recall we are an advisory support organisation that primarily deals with Tax, Investment and Medical queries. Below is the how our historical ticket database looks like



It is all fictitious data. Neither the customers are real nor the Reps. But the ticket contents are realistic.

You cannot train a machine learning system with rubbish content, your samples have to be relevant to the domain for which you are building the ML model.

Data that will used by Data Science

Let me explain the fields

Description – This is email or phone transcript from the customer which contains the queries/questions and problem definition of the ticket

Department – The Department to which the query belonged to. In the past it was manually set by Tier 1 Agent, but now our system will predict it automatically

Type – It is the industry sector / vertical of the Customer. It will used as well in the algorithm, which I will explain the upcoming posts

Support Rep – The Rep who worked on the query

Estimated Time – ETA given by the Support Rep before starting work

Total Time Spent – Actual time taken by the Support rep to perform work before moving to next query


Below is a view of our Intelli-routing engine that shows how the engine will fit inside CRM and integrate with Machine Learning WS



Its self explanatory, basically I will build my engine using Python. It can be deployed in Azure Machine Learning  (as it supports Python). But my Azure access has expired so I will use another provider to host my web service.


In the next post we will start building our Intelli-routing model in Python and train the classifier.