In the previous blog, we used Machine Learning inside Dynamics CRM to add value to our customer records by getting a quick health check of how customers are doing based some measurable data points. We used supervised learning, a technique that involves training your machine first, and then deriving your predictions based on the trained model. In this blog, we will use another technique – unsupervised learning. This technique is often used to determine similarity between records, categorise them into clusters and other scenarios which involve correlation of records. We will use unsupervised learning to solve a shortcoming that had existed in Dynamics CRM for a decade i.e. to match (and detect duplicates) records based on a semantic match.
This is a very common requirement in Dynamics CRM when you need to cleanse your data and get rid of duplicates with similar sounding names. CRM does have a duplicate detection wizard but that doesn’t address this problem because it cannot do fuzzy match or a semantic match. I have seen many situations where hundreds (even thousands) of records are distributed among various team members for them to fix by identifying duplicates manually. Sounds familiar?
Courtesy – boredportal.com
Let us put some intelligence in Dynamics CRM to save us from the wrath of the painful manual work.
We will solve the following problems when it comes to matching records
- Juxtaposed word sequences e.g. it can match Manchester University to University of Manchester and Socceroos Australia to Australian Association of Socceroos
- Takes are of little punctuation and abbreviation tidbits e.g. match Manchester University to Manchester Uni or Manchester’s Univ or Man. Utd. University
- Covers spelling mistakes, similar sounding names e.g. match Scot’s And Christina to Scott & Kristina Corp
- Phonetic match and verb forms e.g. match Richtie Rich to Rishi Richest
So this is how the solution works inside Dynamics CRM.
A web resource is added to the Account form called Similar Accounts that lists other accounts with similar names and their matching score e.g. 100 for a perfect match and 60 for partial match. The threshold can be adjusted to pick only closer results. Below are some of the screenshots from my Dynamics CRM where I have applied this algorithm. I have kept it simple as the focus is to demonstrate the matching engine rather than look and feel.
Powered by Machine Learning Algorithm
This solution is built using Python and uses a Machine Learning algorithm called Levenshtein Distance to determine the similarity between two records. I have built a package around this core Python library and integrated it with Dynamics CRM. The package is hosted as a Flask web service that communicates with Dynamics CRM using Json. More details of the Python package are here