Posted in Artificial Intelligence, Machine Learning

Dedupe Duplicates using Fuzzy / Proximity search

Last year I wrote a post about finding similar accounts for Dynamics CRM which generated lot of interest in the community. Understandably so, as this is a very common requirement that is asked for in nearly every CRM project – Duplicate Accounts. CRM duplicate detection capabilities are only basic – they just do partial match, they can’t do any fuzzy or proximity match.

Even with the latest and advanced weaponry in CRM’s armour i.e. Relevance Search it is not there yet where it could tell that the following accounts are infact the same companies.


Potential Duplicate


Waste Management

Waset Manaegment


Public Storage Co.

Storage Public Co.

Wrong order

Scotts Miracle-Gro

Scott Miracles Gro


Melbourne University

Melbourne Univ.

Short form

I decided to improve and generalise my code a bit, so that it can be used not only for CRM for any general requirement where you need to find duplicates based on proximity. I am going to share the code and approach in this blog.


This proximity search is based on the machine learning algorithms which base the search on Edit Distance. The program starts with finding the exact matches first, if it couldn’t find an exact match, then it widens the search filter to find partial and proximity matches (i.e. words in the same neighbourhood, ordered in a different way, etc.)


I have also attached the original files that I used during my testing i.e. the file containing duplicates and the results (where duplicates were found). Below is the brief snapshot of the results from my test run


Duplicate Found


Kimberly Clark

San disk




Starwood Hotels & Resorts

Starwood Hotels And Resorts

Expeditors Washington

Expeditors International of Washington

There were some false positives in the results as well, so you can adjust the thresholds of the algorithm as per your data.

How to use

You got a list of companies and you want to know which of them are duplicates. So, this is what you need to do.

1. Export the list into a CSV file.

2. Point the code to your file.

3. Run the code and it generates a new file results.csv with a new column called Duplicate

Complete source code

Python is a beautiful language and does big things in just few lines of code. Just install Python on your desktop and run the following file. No frills, no servers, no deployment. Too easy.


import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import csv
import os


companies_db = "<local path of your CSV file>/CompaniesShort.csv"
pwd = os.getcwd()
current_db_dataframe = pd.read_csv(os.path.basename(companies_db),skiprows=1,index_col=False, names=['Company'])

def find_matches(matchThis):
    rows = current_db_dataframe['Company'].values.tolist();
    matches= process.extractBests(matchThis,rows,scorer=fuzz.ratio,score_cutoff=FULL_MATCHING_THRESHOLD,limit=MAX_MATCHES)
    if len(matches)==0:
        matches= process.extractBests(matchThis,rows,scorer=fuzz.partial_ratio,score_cutoff=PARTIAL_MATCHING_THRESHOLD,limit=MAX_MATCHES);
        if len(matches)==0:
            matches= process.extractBests(matchThis,rows,scorer=fuzz.token_set_ratio,score_cutoff=TOKEN_MATCHING_THRESHOLD,limit=MAX_MATCHES);
            if len(matches)==0:
                matches= process.extractBests(matchThis,rows,scorer=fuzz.token_sort_ratio,score_cutoff=SORT_MATCHING_THRESHOLD,limit=MAX_MATCHES);
    return matches[0][0] if len(matches)>0 else None

fn_find_matches = lambda x: find_matches(x)

Posted in IoT, Machine Learning

Azure IoT Hub Streaming Analytics Simulator

Azure IoT Hub Streaming Analytics Simulator is an application written by Manny Grewal. The purpose of this blog is to explain What, Why and How of this application.



Streaming analytics is a growing trend that deals with analysing data in real-time. Real-time data streams have a short life span, their relevance decreases with time, so they demand quick analysis and rapid action.

Some areas where such applications are highly useful include data streams emitted by

  • Data centres to detect intrusions, outages
  • Factory production line to detect wear and tear of the machinery
  • Transactions and phone calls to detect fraud
  • Time-series analysis
  • Analogy Detection


Data used by streaming analytics applications is temporal in nature i.e. it is based on short intervals of time. What is happening at the interval TX can be influenced by what happened 2 minutes ago i.e. at the interval TX-2

So the relationships between various events are time-based rather than entity based (e.g. as in general Entity Relational Database based systems)

Take the scenario of a Data Centre which has two sensors that emit a couple of data streams – Fan Speed of the server hardware and its temperature.

If temperature reading of server hardware is going high, it could be related to the dwindling Fan Speed reading. We need to look at both the readings over an interval of time to establish a hypotheses on their correlation.




In order to model and work with streaming analytics it is important to have an event generator that can generate the data streams in a time-series fashion.

Some example of such generators can be vehicle sensors, IoT devices, medical devices, transactions, etc. that generate data quickly.

The purpose of this application is to simulate the data generated by those devices, it just helps you setup quickly and start modelling some data for your IoT experiments.



Main benefits of this app

1. Integrated with Azure IoT Hub i.e. the messages emitted by this application are sent to the Azure IoT Hub and can be leveraged by the Intelligence and Big Data ecosystem of Azure.

2. This app comes with 4 preset sensors

a. Temperature/Humidity

b. Air Quality

c. Water Pollution

d. Phone call simulator

3. Configure > Ready. App can be easily pointed to your Azure instance and can start sending messages to your Azure IoT Hub

4. Can be extended, if you are handy with .NET development. I have designed the app on S.O.L.I.D framework so it can be extended and customised the link to source code is below




App and source code can be downloaded from my Github


A quick tour of the app is below

IoT Hub




The app needs to be configured with details of your Azure IoT Hub account.

The following files need to be configured

1. App.Config

2. If you are registering Devices in the Hub, then keys for the devices need to be stored in the SensorBuilder.cs

3. You may need to restore the Nuget Packages to build the application


Once the above three steps have been completed, you can build the application and the EXE of the application will be generated.


Sensor Tuning

Sensors can be tuned from the classes inheriting IDataPoint e.g. in the FloatDataPoint.cs

The following properties can be used to tune the sensors

Property Name Tuning
MinValue The minimum value of the sensor reading e.g. for climatic temperature it can be -40C
MaxValue The maximum value of the sensor reading e.g. for climatic  temperature it can be 55C
CommonValue This is the average value of the sensor e.g. for warmer months it can be 30C
FluctuationPercentage How much variance you want in the generated data
AlertThresholdPercentage When should an alert be generated if the reading passes a certain threshold e.g. 80% of the maximum value


Azure IoT Hub

The messages sent by the sensor simulator can be accessed in the Azure IoT Hub. Once you have configured your hub and related streaming jobs. The messages can be seen in the dashboard as below



The messages are sent in the JSON format and below is a structure of one of the messages emitted by a sensor located at Berwick, VIC

"IncludeSensorHeader": 1,

"MessageId": "949a3618-c4a4-42bc-9c2a-39da86aa9191",

"EmittedOn": "2017-06-30T11:13:45.3543200",

"SensorDataHeader": {
"Readings": [






Posted in Machine Learning

Power BI for Data Scientists

With my involvement in some data science work recently, I have had the privilege to explore a lot tools of the trade – Rapid Miner, Python, Tensorflow and Azure Machine Learning to name a few. My experience has been highly enriching but I felt there was no Swiss knife that can handle the initial – and the most critical stage of a Data science project: i.e. Hypothesis stage.

During this stage, scientists typically need to quickly prep the data, find the correlation patterns and establish hypotheses. It requires them to fail fast by identifying null hypotheses and spurious correlations and stay focussed on the right path. I recently explored Power BI and would like to share my findings through this blog.

Business Problem

Let us take a business case of a juice vendor say Julie. Julie sells various kinds of juices and she collects some data about her business operations on daily basis. Say we have the following data for the month of July which looks like below. It is pretty much – when, where, what and for how much?


Now say I am a data scientist who is trying to help Julie to increase her sales and give her some insights that what should she focus on to get the best bang of her buck. I have been tasked to build an estimation model for Julie based on simple linear regression.

Feature Engineering

I will start by analysing various correlations between the features and our target variable i.e. Revenue. It can be commenced by importing the data into Power BI and looking after the following basics

1) Eliminate the null values with mean value of the feature

2) Dedupe any rows

3) Engineer some new features as below

Feature DAX formula
Day Type

Purpose of this feature is to distinguish between a week day and a weekend day. I wanted to test a hypothesis that weekend day might generate more sales than a week day.

Day Type = IF(WEEKDAY(Lemonade2016[Date],3) >= 5,”Weekend”,”Weekday”)
Total Items Sold Lemon + Orange
Revenue Total Items Sold * Price

Data preparation and feature engineering was a breeze in Power BI, thanks its extensive support of DAX, calculated columns and measures. The dataset looks like below now.


Hypotheses Development

Once we had our dataset ready in Power BI, the next task was to analyse the patterns between Revenue and other features

Hypothesis 1 – There is a positive correlation between Temperature and Revenue

Result: Passed

Hypothesis 2 – There are more sales on a weekend day

Result: Failed

I derived these results using the below visualizations built briskly using Power BI platform


Next off to some advanced hypothesis development. Shall we?

I needed to understand the relationship between the leaflets given on a particular day and their relationship with Revenue. Time to pull some heavy plumbing in, so I decided to tow R into in the mix. Power BI comes with inbuilt (almost!) support with R and I was able to quickly spawn a coplot using just 6-8 lines of R in the R Script Editor of Power BI


Interesting insight was how correlation differs based on the day. This was made possible using the Power BI slicer as shown below

clip_image006 clip_image008
Wednesday – Less correlation between leaflets and sales Sunday – High correlation between leaflets and sales

Power BI + R = Advanced Insights

If you need to analyse the dynamics between various features and how this dynamics impacts your target variable i.e. Revenue. You can easily model that in Power BI. Below is a dynamic co plot that shows the incremental causal relationship between Leaflets, Revenue and Temperature.

The 6 quadrants at the bottom should be read in conjunction with 6 steps in the top box. The bottom left is the first step and the top right the last step of leaflets. Basically it shows how the correlation between Temperature and Revenue is affected by leaflets bin size


I ended my experiment by building a simple regression model that can give you prediction of your Revenue if you enter Temperature, Price and Leaflets. Below is the code for model in case you are keen


Power BI is a very simple and powerful tool for the exploratory data scientist in you. Give it a go.

Posted in Dynamics 365, Machine Learning

Use Machine Learning to predict customers you might lose – Part 4

So far we have seen how a Dynamics CRM integration can be connected to Azure ML to receive the predictions. Once we got the integration going there is no dearth of possibilities. You may like to build an alert / flagging functionality that can alert a Customer Service rep to contact a customer if their predictors are indicating that they might churn. You may incorporate predictions into exec reporting so that the execs are aware of the churn trends and make decisions to minimise churn.


One of the things I discussed at the start of this series was to be able to get some insights into the key drivers of customer churn e.g. how do you know which features are most likely to cause churn. Answering such questions begins with analysing your data, few starting points can be

1. From your data find out what fields change with respect to the Churn variable e.g. does the churn rate increase as the income of the customer goes up or is it dependent on their usage?

There are measures like correlation, covariance, entropy, etc that can help you answer such questions.

2. Find the distribution of your data and identify any outliers e.g. check if there is a skew in the data or if the classes are unbalanced. You may need to apply some statistical techniques like variance, standard deviation to have a better platform to delve into some of these insights.

Azure Machine Learning does provide some modules straight off the bat that can make the job easier e.g. it has the following modules

Compute Elementary Statistics

Compute Linear Correlation

Getting advanced insights can be tricky based on your algorithm or setup of the experiment (project). But there are ways e.g. with bit of a Python code you can produce a decision rule tree below. The last label in the box class= {LEAVE, STAY} tells us if the customer will churn based on what path they fall under


Above is the automatically generated insight that tells us that overage is most important variable in deciding customer churn. If overage exceeds 97.5 then a customer is more likely to churn, this does not mean that every customer whose overage is more than 97.5 will churn nor does it mean that whose overage is less than that will stay. It is just that Overage is the strongest indicator of churn based on our data.

We can even derive decision rules from insights like these e.g. customers with overage less than 97.5 and Leftover minutes less than 24.5 minutes are most likely to stay. On the contrary customers with overage more than 97.5 and average income more than $100059.5 are most likely to leave.

Here is another one that shows the impact of House Value, Handset Value and other features on the churn


Once decision rules have been identified based on the above insights, policies can be made to retain such customers who are at risk of churn e.g. give them discounts, offer them a change of plan, prize them with loyalty offerings, etc.

Where to from here?

Hopefully by now you appreciate the potential of machine learning and recognise the opportunity it provides when it is complemented with traditional information systems like CRMs, ERPs and Document Management systems. The field of machine learning is enormous and sometimes quite complex too as it based on scientific techniques and mathematics. You need to understand and lot of theory if you need to get into the black box i.e. how machine learning does what it does?

But great thing about using Azure Machine Learning suite is that it makes entry into machine learning easier by taking care of the complexities and giving you an easy-to-understand and easy-to-use environment. You have full control over the data structure and algorithms used in your project. It can be tuned as per the needs of your organisation to receive the best possible results.

For example you can tune the example I provided in the following different ways

1. Rather than going with Random Forest you can choose Support Vector Machines or Neural Networks and compare the results.

2. You are not restricted to Javascript, you can call the web services from a plugin. That way in a data migration scenario, while you are importing data you can set the prediction scores as the data is being imported

3. You can also change the threshold of confidence percentage to ignore the predictions score where confidence is less than a certain amount.

So there are lot of possibilities. Hope you enjoyed the series.

Happy CRM + ML!!

Posted in Dynamics 365, Machine Learning

Use Machine Learning to predict customers you might lose – Part 2

Continuing our journey from the previous post where we defined the issue of churn prediction, in this instalment, let us create the model in Azure Machine Learning. We are trying to predict the likelihood of customer’s churn based on certain features in the profile which are stored in the Telecom Customer entity. We will use a technique called Supervised Learning, where we train the model on our data first and let us understand the trends before it can start giving us some insights.

Obviously you need access to Azure Machine Learning, once you log into it, you can create a new Experiment. That gives you a workspace designer and a toolbox (somewhat like SSIS/Biztalk) where you can drag control and the feed into each other. So it is a flexible model and for most tasks you do not need to write code.

Below is a screenshot of my experiment with toolbox on the left


Now machine learning is something which is slightly atypical for a usual CRM audience, I would not be able to fit full details of each of these tools in this blog but I will touch on each of these steps so that you can understand at high-level that what is going on inside these boxes. Let us address them one by one

Dynamics CRM 2016 Telecom

This module is the input data module where we are reading the CRM customer information in the form of a dataset. At the moment of writing the blog, there is no direct connection available from Azure Machine Learning to CRM online. But where there is a will, there is a way i.e. I discovered that you can connect to CRM using the following

1. You schedule a daily export of CRM data into a location that Azure Machine Learning can read e.g. Azure blob storage, Web Url over Http

2. You can write a small Python based module that connects to Dynamics using Azure Directory Services, the module can the pass the data to the Azure using a DataFrame control

From my experience having an automatic sync is not important from Dynamics to Azure ML but it is important the other way round i.e. Azure ML to Dynamics.

Split Data

This module basically splits your data into a two sets

1. Training dataset – The data based on which the machine learning model will learn

2. Testing dataset – The data based on which the accuracy of the model will be determined

I have chosen stratified split which ensures that the Testing dataset is balanced when it comes to classes being predicted. The split ratio is 80/20 i.e. 80% of the records will be used for training and 20% for testing.

Two-class Decision Forest

This is main classifier i.e. the module that does the grunt of the work. The classifier of choice here is a random forest with bootstrap aggregation. Two-class makes sense for us because our prediction has two outcomes i.e. whether the customer will churn or not.

Random forests are fast classifiers and very difficult to overfit, rather than taking one path they learn your data from different angles (called ensembles). Then in the end the scores of various ensembles are combined to come up with an overall prediction score. You can read more about this classifier here.

Train Model

This module basically connects the classifier to the data. As you can see in the screenshot of the experiment I posted above there are two arrows coming out of Spilt Data, the one of the left is the 80% one i.e. the training dataset. The output of this module is trained model that is ready to make predictions.

Score Model

This step uses the trained model from the previous step and tests the accuracy of the model against our test data. Put simply, here we start feeding the data to the model that it has not seen before and count how many number of times the model gave the correct prediction Vs wrong prediction.

Evaluate Model

The scores (hit vs miss) generated from the previous modules are evaluated in this step. In Data Science there are standard metrics to measure this kind of performance e.g. Confusion Matrices, ROC curves and many more. Below is the screenshot of the Confusion matrix


I know there is a lot of confusing details here (hence the name Confusion Matrix) but as a rule of thumb we need to focus on AUC i.e. area under the curve. As shown in the results above we have a decent 72.9% of the area under the curve (which in layman terms means percentage of correct predictions). Higher percentage does not necessarily equate to a better model, more often than not a higher percentage (e.g. 90%) means overfitting i.e. a state where your model does very well on the sample data but not so well on the real-world data. So our model is good to go.

You can read more about the metrics and terms above here

In the next blog we will deploy and integrate the model with Dynamics CRM.

Posted in Dynamics 365, Machine Learning

Use Machine Learning to predict customers you might lose – Part 1

“Customer satisfaction is worthless. Customer loyalty is priceless.”

Jeffrey Gitomer

Business is becoming increasingly competitive these days and getting new customers increasingly difficult. The wisest thing to do in this cut-throat scenario is to hold on to your existing customer base while trying to develop new business. Realistically, no matter how hard it tries, every organisation still loses a percentage of its customers every year to the competition. This process of losing customers is called Churn.

Progressive organisations take churn seriously, they want to know in advance that approximately how many customers they are going to lose this year and what is causing the churn. Having an insight into customer churn at least gives an organisation an opportunity to proactively take measures to control the churn before it is too late and the customer is gone.

Two pieces of information help the most when it comes to minimising the churn

1. Which customers are we going to lose this year

2. What are the biggest drivers of customer churn

The answers to the above questions often are hidden in the customer data itself but revealing these answers out of swathes of data is an art – rather a science called Data Science. With recent advances in some practical Data Science techniques like Machine Learning getting these answers is becoming increasingly feasible even for small scale organisations who do not have the luxury of a Data Science team. Thanks to services like Azure Machine Learning which are trying to democratise these advanced techniques to a level such that even a small scale customer can leverage them to solve their business puzzles.

Let me show you how your Dynamics CRM can leverage the powerful Machine Learning cortex to get some insights into the key drivers of customer churn. In this blog series, we will build a machine learning model that will answer the questions regarding churn. I have divided the series into four parts as below

Part 1 – Introduction

Part 2 – Creating a Machine Learning model

Part 3- Integrate the model with Dynamics CRM

Part 4 – Gaining insights within Dynamics CRM

I will take the example of a Telecom organisation but the model can be extended any kind of organisation in any capacity and from any industry.


Let us say there is a Telecom company called TelcoOrg which uses Microsoft CRM 2016 and they have an entity called Telecom Customer that stores their telco profile. Such profile may include some data regarding a customer mobile plan, phone usage, demography and reported satisfaction.

Understanding the features

In data science projects, it is crucial to understand the data points (called features). You need to carefully select those features that are relevant to the problem at hand, some the features also need to be engineered and normalised before they start generating some information gain. Below are the features that we will be using in this scenario of our Churn problem

Let me quickly explain the features so that we can understand the information contained in them



Has a College degree?

If the customer has a college degree

Cost price of phone

Price of the customer’s phone as per the plan/contract with TelcoOrg

Value of customer’s house

Approximate value of customer’s house based on Property Information websites like RPData, etc.

Average Income

Yearly income as reported by the customer

Leftover minutes per month

Average number of minutes a customer normally does not use from monthly quota

Average call duration

Average duration of calls made based on call history

Usage category

The category customer’s phone usage falls under as compared to other customers e.g. Very High, High, Average, Low or Very Low

Average overcharges

Average number of times a customer is usually overcharged per month

Average long duration calls

Average number of calls a customer usually makes per month that are more than 15 minutes long

Considering change?

How customer responded to TelcoOrg’s survey when asked if they are considering changing to another provider e.g. Yes, considering, Maybe, Not looking, etc.

Reported level of satisfaction

How customer responded to TelcoOrg’s survey when asked if they are satisfied with TelcoOrg’s service e.g. Unsatisfied, Neutral, Satisfied

Account Status

Current Status of the customer (i.e. if they have left or are currently Active)

Predicted Churn Status

This is the predicted status returned by the Azure Machine Learning Web Service

Prediction Confidence Percentage

This field means how confident Azure Machine Learning Web Service is regarding its prediction. A threshold can be set to only consider the predictions above e.g. we can say, take only those predictions where WS is 70% confident.

The screenshot below shows the information from the Telecom Customer entity. The section highlighted in blue are predictions based on Azure Machine Learning web services. Whenever any of the fields on this CRM form changes, the WS updates its prediction scores based on the record’s data. I will provide details later during the series as to how I built this integration.


Below is a screenshot of some of these records


We will achieve the following business benefits using Azure Machine Learning

1. Customers who are predicted to be at a higher risk of leaving (churn) can be flagged, so the customer retention teams can get it touch with them to proactively address their concerns in a bid to retain them

2. Find what factors affect churn the most i.e. out of all these fields we will determine what fields are more likely to make a leave than others

3. We will also get insights into some business rules that dictate churn i.e. the drivers

I hope you understand the problem now and find it interesting so far. Let us meet in next part of the blog where I will show how a machine learning model is created.

Posted in Dynamics 365, Machine Learning

Dynamics CRM – Prediction based routing

Imagine a day at work of a front desk staff who is handling the support mailbox or reception of any mid-size organisation. It is not uncommon for them to receive hundreds, if not thousands of email and phone enquires everyday.

If this organisation happens to be using Dynamics CRM, then every enquiry is usually handled as below

  1. Read the description of the enquiry
  2. Understand
  3. Determine what team/department the query belongs to
  4. If there are multiple members in that team, then find who is best suited to answer it
  5. Assign it to that person

Move to the next enquiry. Repeat 1 to 5 above…. hundreds of times.

Now imagine the time spent on every enquiry to perform steps 1 to 5. A fair guess – it can easily take 10 minutes to grasp, digest and route the query.

Realistically, for most queries there is often a support rep matching ritual, back and forth, something like

Hey, who do you think this should go to?

Oh sorry! so it was meant for Helen, no worries I can assign to her

Have you worked on this kind of stuff earlier?

When will you get free to look at it, customer needs an answer today




We have already spent 20 minutes and the ticket has not even landed on the support rep’s desk yet !!

Well – time is money. If we can find a solution to save this time, its a great return on investment.

Supercharge your Tier 1 Support

Through this blog series, I will try to explore a solution to this problem using Machine Learning. We will automate steps 1 to 5, full automation.

A machine algorithm will predict

  • Which team does this query belongs to?
  • Which agent will get free first and which agent is best-skilled to answer this query

And machine would not take 20 minutes to decide, it will take 20 seconds



Let us layout a scenario

A big advisory firm that uses Dynamics CRM  offers many kinds of services to its clients. They have professional advisors on their team who can answer queries across of the range, no matter if they are tax enquires, investment or even medical.

Each team – tax, investment and medical has a range of support reps available to handle enquires.

Traditionally, Tier 1 staff created support cases upon receiving customer enquires and assigned them to the relevant support rep by following steps 1 to 5 described at the start.

Upon assignment, Support Rep gives an ETA before working on the query and system tracks how time support rep actually spent.

We will also track some other parameters which we can be leveraged by the ML engine.



Machine Learning Algorithm

The Machine Learning approach will tackle this situation as shown below

1. ML engine will train itself by synthesising content and correlating parameters that belong to a category

2. ML engine will be deployed as a web service (compiled model) to be consumed by Dynamics CRM

3. It will start predicting what category the enquiry is after reading, tokenising and tagging the content

4. Once it has known the category, it will then find who is best suited to answer the query using parameters like

  • Which customer support rep will be the earliest to get free to look at this
  • Which customer support rep is generally good at these kinds of queries



We will see and the use the following machine learning techniques to build the smarts

  • Tokenisation & Semantic Analytics using Natural Language Processing
  • Support Vector Machines
  • Inverse Document Frequency (TF-IDF)


See you in the next post

Posted in Dynamics 365, Machine Learning

Dynamics CRM – Find similar customers using Machine Learning

In the previous blog, we used Machine Learning inside Dynamics CRM to add value to our customer records by getting a quick health check of how customers are doing based some measurable data points. We used supervised learning, a technique that involves training your machine first, and then deriving your predictions based on the trained model. In this blog, we will use another technique – unsupervised learning. This technique is often used to determine similarity between records, categorise them into clusters and other scenarios which involve correlation of records. We will use unsupervised learning to solve a shortcoming that had existed in Dynamics CRM for a decade i.e. to match (and detect duplicates) records based on a semantic match.


This is a very common requirement in Dynamics CRM when you need to cleanse your data and get rid of duplicates with similar sounding names. CRM does have a duplicate detection wizard but that doesn’t address this problem because it cannot do fuzzy match or a semantic match. I have seen many situations where hundreds (even thousands) of records are distributed among various team members for them to fix by identifying duplicates manually. Sounds familiar?


Courtesy –


Let us put some intelligence in Dynamics CRM to save us from the wrath of the painful manual work.


Problems Solved


We will solve the following problems when it comes to matching records

  • Juxtaposed word sequences e.g. it can match Manchester University to University of Manchester and Socceroos Australia to Australian Association of Socceroos
  • Takes are of little punctuation and abbreviation tidbits e.g. match Manchester University to Manchester Uni or Manchester’s Univ or Man. Utd. University
  • Covers spelling mistakes, similar sounding names e.g. match Scot’s And Christina to Scott & Kristina Corp
  • Phonetic match and verb forms e.g. match Richtie Rich to  Rishi Richest



Matching Accounts

So this is how the solution works inside Dynamics CRM.

A web resource is added to the Account form called Similar Accounts  that lists other accounts with similar names and their matching score e.g. 100 for a perfect match and 60 for partial match. The threshold can be adjusted to pick only closer results. Below are some of the screenshots from my Dynamics CRM where I have applied this algorithm. I have kept it simple as the focus is to demonstrate the matching engine rather than look and feel.



Similar Account2



Similar Account3



Similar Account4


Similar Account1


Powered by Machine Learning Algorithm

This solution is built using Python and uses a Machine Learning algorithm called Levenshtein Distance to determine the similarity between two records. I have built a package around this core Python library and integrated it with Dynamics CRM. The package is hosted as a Flask web service that communicates with Dynamics CRM using Json.  More details of the Python package are here

Posted in Dynamics 365, Machine Learning

Dynamics CRM meets Machine Learning – Final


Welcome to the final post of this Dynamics CRM meets Machine Learning series where we have been discussing about using Machine Learning (ML) to interpret customer happiness by using certain cues and behaviour points based on Dynamics CRM data. If you have been following all along, you might notice that now the only remaining piece of this jigsaw puzzle is the usage of the ML insights i.e. the score of the email, inside Dynamics CRM.


We have already seen the ML engine and how CRM can connect to the ML web service that calculates the score of the email based on its content. Now we will focus on how to show the Happiness Index on the Contact record that will tell us about the level of satisfaction of a customer when we open their record.


Dynamics CRM customisations

We will be adding the following fields into CRM

Entity Field Purpose
Email sentiment This field will store the sentiment calculated by the Online Azure Machine Learning Web Service. Its value will either be 0 – Unhappy email or 4- Happy email
Contact Total Emails Total emails received from this contact
  Average Sentiment Score Average ML sentiment score based on all emails
  Happiness Index If more than 2 emails have been received then average sentiment score, otherwise 2


I have used rollup and calculated fields that rollup the sentiment score from emails over to the contact record.


Displaying Happiness Index

The happiness index is shown on the Contact form and an emoticon is displayed based on the score e.g. in the example below the Index score is 2.4 which is on the happier side, so a happy face is shown. The below screen also shows various emails based on which the score for Jim Glynn was calculated by the ML web service.



Now let us look at a slightly unhappy customer Patrick Sands. You can look at his emails on the right to determine why is he unhappy. The scores in the Sentiment column have purely been calculated by the ML web service.



Script to show emoticons

I think these graphics look cool and add a face value to your customers. If you are curious to know more about how I displayed them, below is the script. Basically there is a html page added as a web resource and I replace the image based on the score




 <meta charset="utf-8">



   2:  function onPageLoad() 

   3: {

   4:  var happIndexString ="manny_happinessindex").getValue();


   6: if (happIndexString != null) 

   7: { 

   8: var happIndex = parseFloat(happIndexString); 

   9: if (happIndex <= 1) 

  10: { 

  11: this.imgIndex.src = "manny_1.png"; 

  12: } 

  13: else if (happIndex > 1 && happIndex <= 2) 

  14: { 

  15: this.imgIndex.src = "manny_2.png"; 

  16: } 

  17: else if (happIndex > 2 && happIndex <= 3) 

  18: {


  20: this.imgIndex.src = "manny_3.png"; 

  21: }


  23: else if (happIndex > 3 && happIndex <= 4) 

  24: { 

  25: this.imgIndex.src = "manny_4.png"; 

  26: }


  28: }

  29:  }










style="margin: 0px; word-wrap: break-word;" onload="onPageLoad()">

 <img id="imgIndex" name="imgIndex" src="manny_2.png" border="0">




And below is the entire solution that shows plumbing on the CRM side of the fence




So there you have it, a simple solution that works end to end by consuming ML web service from within Dynamics CRM that gives you some insight into the satisfaction of your customers. The algorithm, that just uses body of the email, is indeed a simple one but it can be enhanced to take into account various other parameters and behaviour points that we have discussed in Part 2. This is just tip of the iceberg, there are lot of possibilities with Machine Learning.


Hope you enjoyed the series. Let me know if any comments or feedback.

Posted in Dynamics 365, Machine Learning

Dynamics CRM meets Machine Learning–Part 3

So far in this series we have covered the business problem and machine learning setup that drives the score based on the email content. We have also discussed how this approach can be extended to calculate an aggregated happiness index based on various behaviour points from within Dynamics CRM. In this instalment, let us focus on deploying the ML (machine learning) module as web service, so that it can be consumed from within Dynamics CRM.


Deploying the ML module as a web service

Once you have tested your module and you are happy with results of the trained model, you can deploy it as a consumable web service by clicking on the Deploy Web Service button at the bottom of the ML Studio’s Experiment screen as shown below



After deployment, the web service is available in the Web Services section of the Azure ML Studio. You can click on the record and view its properties. In order to connect to this service you will need an API key which can be found in the properties as shown below

api key


The setup of this web service for our scenario is as below

Score – 0 for unhappy email

4 for a happy email



Consuming the ML web service from within Dynamics CRM


Once the service has been deployed, it can be consumed from both Javascript and server side code (e.g. a plugin or a custom workflow activity). To keep it simple we will consume it from within the JavaScript

The following script on the email form can be used to call the service and get the score. Here we are sending the body in the tweet_text parameter and retrieving the results in JSON format.


sendRequest: function (text) {


var service = 



var url = "";

 var jsonObject 





































"GlobalParameters": {}


 var dataString = 


 if (service != null) 

{"POST", url, 






service.setRequestHeader("Authorization", "Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx");





service.setRequestHeader("Content-Type", "application/json; 



service.setRequestHeader("Content-Length", dataString.length);






requestResults = eval('(' + service.responseText + ')');

 try {


resultSentiment = 



resultProb = 






//alert(resultSentiment + " " + 








console.log('Unable to interpret 







In the next part we will see how this score gets used within Dynamics CRM