How to write a thinking engine for Dynamics CRM

Time to uncover the core functionality that will turn Dynamics CRM into a machine that can learn. Its brain !! If you haven’t been through the previous posts – Part 1 and Part 2, I recommend you do, to make better sense of the following content. Hoping you understand what we are doing, lets keep cruising

Analyse Dynamics CRM data

Let us see a sample from our feature set first i.e. a Case record in Dynamics CRM

data1

The engine will train on such data, it will look for measurable attributes like ETA given by Support Rep (estimated), how much time Support Rep actually took (actual), nature of work (whether the work corresponded to a bank or was it from a government agency), other variables can be introduced depending on what is important for your organisation.

 

Then the engine will start learning: not only by just understanding meaningful words but also by correlating them. In Data Science, it is very important to focus on only those attributes that provide you with the most information gain. So we will only need to extract the most meaningful attributes that pertain to the problem at hand i.e. predicting what department the query belongs to –Tax, Investment or Medical. In our scenario – the most meaningful phrases are noun phrases i.e . proper nouns, combination of common nouns, industry jargons, etc. So our engine should be able to separate this critical information from a big blurb of text while staying away from the common words which occur in every email / query.

Note: In any other kind of action based application, verb phrases may be more important than nouns, so you need to adjust your extraction module accordingly –  horses for courses.

 

Designing the engine (brain)

We will use three key Data Science concepts to build this engine

Natural Language Processing

This process will involve tokenisation and meaningful keyword extraction

Term Frequency – Inverse Document Frequency

We will use this measure to determine distances between various features for our classification problem

Support Vector Machine

This will be classification algorithm for our classification task at hand i.e. to determine the department

Below are various phases involved in the brain training

engine1

 

Writing the engine (brain)

I have written the grammar engine below that uses regular expressions for synthesis. I found that tokenisation is much faster and accurate when you use regular expressions as it gives the NLTK engine a jump start.

Then it uses Bi-gram approach for grammatical tagging of the text. This approach is more efficient than unigram approach because it understands the context of the word in the sentence before tagging it (rather than just the word itself)

After the synthesis, tokenisation and tagging, we then move to the keyword definition and their extraction. I am sharing the source code below, you can tune it to suit your requirement. It uses Python’s NLTK library and Brown corpus for Bigram tagging.

Tokeniser.py
# >>>>>>>> Manny Grewal - 2016  <<<<<<<<<<<<
# Fast and simple POS Tagging module with emphasis on key pharases
# Based on Brown Corpus - News
# Below are the regular expressions that give a jump start to the tagger
#>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
taggingDatabase = brown.tagged_sents(categories='news')
tokenGrammar = nltk.RegexpTagger(
    [(r'(\W)', 'CD'), #special chars
     (r'(\d+)', 'CD'), #digits only
     (r'\'*$', 'MD'), 
     (r'(The|the|A|a|An|an)$', 'AT'), # match articles
     (r'^-?[0-9]+(.[0-9]+)?$', 'CD'), # match amounts and decimals
     (r'.*able$', 'JJ'),
     (r'(?<![!.?]\s)\b[A-Z]\w+', 'NNP'), # noun pharses
     (r'.+ness$', 'NN'),
     (r'.*ly$', 'RB'),
     (r'.*s$', 'NNS'),
     (r'.*ing$', 'VBG'),
     (r'.*ed$', 'VBD'),    
     (r'.*', 'NN')
])
uniGramTagger = nltk.UnigramTagger(taggingDatabase, backoff=tokenGrammar)
biGramTagger = nltk.BigramTagger(taggingDatabase, backoff=uniGramTagger)

# Grammar rules 
#This grammar decides the 3 word, 2 word pharses and what tokens should be chosen
#>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
triConfig = {}
triConfig["NNP+NNP+NNP"] = "NNP" # New York City
triConfig["NNP+IN+NNP"] = "NNP" # Ring of Fire
#triConfig["NN+NN+NN"] = "NN" # captial gain tax

biConfig = {}
biConfig["NNP+NNP"] = "NNP"
biConfig["NN+NN"] = "NNI"
biConfig["NNP+NN"] = "NNP"
biConfig["NN+NNP"] = "NNP"
biConfig["AT+NNP"] = "NNP"
biConfig["JJ+NN"] = "NNI"
biConfig["VBG+NN"] = "NNI"
biConfig["RBT+NN"] = "NNI"

uniConfig ={}
uniConfig["NNP"] = "NNP"
uniConfig["NN"] = "NN"
#>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       
# Split the sentence into singlw words/tokens
def tokeniseSentence(textData):
    tokens = nltk.word_tokenize(textData)
    return tokens

# generalise special POS tags
def replaceTagWithGeneral(tagValue):
    if(tagValue=="JJ-TL" or tagValue=="NN-TL" or tagValue=="NNPS"):
        return "NNP"
    elif(tagValue=="NNS"):
        return "NN"
    else:
        return tagValue


# Extract the main topics from the sentence
def ExtractKeyTokens(textData):
    tokens = tokeniseSentence(textData)
    generatedTags = biGramTagger.tag(tokens)

    
    # replace special tags with general tag
    for cnt, (w,t) in enumerate(generatedTags):
        replacedVal = replaceTagWithGeneral(t)
        generatedTags[cnt]=(w,replacedVal)
   
    matchedTokens=[]

    #process trigrams
    remainingTags=len(generatedTags)
    currentTag=0
    while remainingTags >= 3:
        firstTag = generatedTags[currentTag]
        secondTag = generatedTags[currentTag + 1]
        thirdTag = generatedTags[currentTag + 2]
        configKey = "%s+%s+%s" % (firstTag[1], secondTag[1], thirdTag[1])
        value = triConfig.get(configKey)
        if value:
            for l in range(0,3):
                generatedTags.pop(currentTag)
                remainingTags-=1
            matchedTokens.append("%s %s %s" %   (firstTag[0], secondTag[0], thirdTag[0]))
        currentTag+=1
        remainingTags-=1

    #process bigrams
    remainingTags=len(generatedTags)
    currentTag=0
    while remainingTags >= 2:
        firstTag = generatedTags[currentTag]
        secondTag = generatedTags[currentTag + 1]            
        configKey = "%s+%s" % (firstTag[1], secondTag[1])
        value = biConfig.get(configKey)
        if value:
            for l in range(0,2):
                generatedTags.pop(currentTag)
                remainingTags-=1
            matchedTokens.append("%s %s" %   (firstTag[0], secondTag[0]))
        currentTag+=1
        remainingTags-=1

    #process unigrams
    remainingTags=len(generatedTags)
    currentTag=0
    while remainingTags >= 1:
        firstTag = generatedTags[currentTag] 
        value = uniConfig.get(firstTag[1])
        if value:
            generatedTags.pop(currentTag);
            remainingTags-=1
            matchedTokens.append(firstTag[0])
        currentTag+=1
        remainingTags-=1
    
    return set(matchedTokens)

 

 

In a bid to keep this post relevant to the Dynamics CRM audience and not to flood it with too much mathematical complexity, I will describe the steps in nutshell that I performed to develop the ML engine:

1. Wrote a program that gets the key phrases out of the text

2. Fed the phrases to a Linear SVM classifier using the TF-IDF distance as a similarity measure

3. Trained the engine on a corpus of around 300 tickets, 100 from each category

4. Tested it using rolling windows approach of 10%

5. Adjusted the Coefficient of the kernel measures to give best results and yet avoid over-fitting of the model

6. Once trained, I pickled my engine and deployed it as a WS that will accept ticket description as input and predict the department

 

Integration of Dynamics CRM with the prediction service

Let us look how CRM will connect to the Machine Learning web service.

A plugin will fire on creation of Case and will pass the ticket description and receive the predicted department as shown below by the predictedDepttResult variable

Below is the source code of the plugin, it uses JSON to connect to WS

CasePredictTeam.cs

namespace Manny.Xrm.BusinessLogic
{
    public class CasePredictTeam : IPlugin
    {
         public void Execute(IServiceProvider serviceProvider)
        {
            //Extract the tracing service for use in debugging sandboxed plug-ins.
            ITracingService tracingService =  (ITracingService)serviceProvider.GetService(typeof(ITracingService));
           
            // Obtain the execution context from the service provider.
            IPluginExecutionContext context = (IPluginExecutionContext)  serviceProvider.GetService(typeof(IPluginExecutionContext));

            //Extract the crm service for use in debugging sandboxed plug-ins.
            IOrganizationServiceFactory serviceFactory = (IOrganizationServiceFactory)serviceProvider.GetService(typeof(IOrganizationServiceFactory));
            IOrganizationService crmService = serviceFactory.CreateOrganizationService(context.UserId);
          
            if (context.InputParameters.Contains("Target") && context.InputParameters["Target"] is Entity)
            {                
                Entity entity = (Entity)context.InputParameters["Target"];             
                if (entity.LogicalName != "incident")
                    return;              
                try
                {
                    if (entity.Attributes.Contains("description"))
                    {
                        var url = "http://<put your host name and WS here>/PredictTicketDeptt/";                     
                       
                        string predictedDepttResult = "(default)";
                        var httpWebRequest = (HttpWebRequest)WebRequest.Create(url);
                        httpWebRequest.ContentType = "application/json";
                        httpWebRequest.Method = "POST";                       
                        using (var streamWriter = new StreamWriter(httpWebRequest.GetRequestStream()))
                        {
                            string rawDesc = (string) entity.Attributes["description"];
                            rawDesc = EncodeJson(rawDesc);
                            string json = "{\"descp\":\"" + rawDesc + "\"}";
                            streamWriter.Write(json);
                            streamWriter.Flush();
                            streamWriter.Close();
                            tracingService.Trace("2");
                            var httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
                            using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
                            {
                                tracingService.Trace("3");
                                predictedDepttResult = streamReader.ReadToEnd();
                            }
                            tracingService.Trace("4");
                        }

                        Entity caseToBeUpdated = new Entity("incident");
                        tracingService.Trace(predictedDepttResult);
                        caseToBeUpdated.Attributes.Add("incidentid", entity.Id);

                        var optionSetValue = GetOptionSetValue(predictedDepttResult);
                        tracingService.Trace(optionSetValue.ToString());
                        caseToBeUpdated.Attributes.Add("manny_department", new OptionSetValue(optionSetValue));                       
                        crmService.Update(caseToBeUpdated);
                     
                    }
                }
                catch (FaultException<OrganizationServiceFault> ex)
                {
                    throw new InvalidPluginExecutionException("An error occurred in the CasePredictTeam plug-in.", ex);
                }

                catch (Exception ex)
                {
                    tracingService.Trace("v: {0}", ex.ToString());
                    throw;
                }
            }            
        }
        public int GetOptionSetValue(string deptt)
        {
            if (deptt == "Tax")
                return 159690000;
            else if (deptt == "Investment")
                return 159690001;
            else
                return 159690002;

        }
        public string EncodeJson(string rawDesc)
        {
            return rawDesc.Replace("\r\n", " ").Replace("\n", " ").Replace("\r", " ").Replace("\t", " ")
                .Replace("\b", " ").Replace("\"","\\\"").Replace("\\","\\\\");
        }
    }
}

 

Once integrated, the ticket will start predicting the department as soon as it is created.

You can build the Support Rep Prediction WS using a similar approach. Rather the predicting based on description text, it will use ETA, Actual Time  Spent and nature of work as three parameters  to choose the best Rep i.e. it will predict the Rep that will take minimum amount of time to resolve the case based on the nature of work involved. It is also a classification problem, rather than classifying into 3 classes (Tax, Investment and Medical), you will be classifying in N classes and N is the number of Support Reps in the team.

 

I hope you got the basic idea of the possibilities and potential of machine learning. In the world of Data Science, sky is the limit and lot of wonders are waiting to be explored.

 

Happy treasure-hunting !!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s