Continuing our journey from the previous post where we defined the issue of churn prediction, in this instalment, let us create the model in Azure Machine Learning. We are trying to predict the likelihood of customer’s churn based on certain features in the profile which are stored in the Telecom Customer entity. We will use a technique called Supervised Learning, where we train the model on our data first and let us understand the trends before it can start giving us some insights.
Obviously you need access to Azure Machine Learning, once you log into it, you can create a new Experiment. That gives you a workspace designer and a toolbox (somewhat like SSIS/Biztalk) where you can drag control and the feed into each other. So it is a flexible model and for most tasks you do not need to write code.
Below is a screenshot of my experiment with toolbox on the left
Now machine learning is something which is slightly atypical for a usual CRM audience, I would not be able to fit full details of each of these tools in this blog but I will touch on each of these steps so that you can understand at high-level that what is going on inside these boxes. Let us address them one by one
Dynamics CRM 2016 Telecom
This module is the input data module where we are reading the CRM customer information in the form of a dataset. At the moment of writing the blog, there is no direct connection available from Azure Machine Learning to CRM online. But where there is a will, there is a way i.e. I discovered that you can connect to CRM using the following
1. You schedule a daily export of CRM data into a location that Azure Machine Learning can read e.g. Azure blob storage, Web Url over Http
2. You can write a small Python based module that connects to Dynamics using Azure Directory Services, the module can the pass the data to the Azure using a DataFrame control
From my experience having an automatic sync is not important from Dynamics to Azure ML but it is important the other way round i.e. Azure ML to Dynamics.
This module basically splits your data into a two sets
1. Training dataset – The data based on which the machine learning model will learn
2. Testing dataset – The data based on which the accuracy of the model will be determined
I have chosen stratified split which ensures that the Testing dataset is balanced when it comes to classes being predicted. The split ratio is 80/20 i.e. 80% of the records will be used for training and 20% for testing.
Two-class Decision Forest
This is main classifier i.e. the module that does the grunt of the work. The classifier of choice here is a random forest with bootstrap aggregation. Two-class makes sense for us because our prediction has two outcomes i.e. whether the customer will churn or not.
Random forests are fast classifiers and very difficult to overfit, rather than taking one path they learn your data from different angles (called ensembles). Then in the end the scores of various ensembles are combined to come up with an overall prediction score. You can read more about this classifier here.
This module basically connects the classifier to the data. As you can see in the screenshot of the experiment I posted above there are two arrows coming out of Spilt Data, the one of the left is the 80% one i.e. the training dataset. The output of this module is trained model that is ready to make predictions.
This step uses the trained model from the previous step and tests the accuracy of the model against our test data. Put simply, here we start feeding the data to the model that it has not seen before and count how many number of times the model gave the correct prediction Vs wrong prediction.
The scores (hit vs miss) generated from the previous modules are evaluated in this step. In Data Science there are standard metrics to measure this kind of performance e.g. Confusion Matrices, ROC curves and many more. Below is the screenshot of the Confusion matrix
I know there is a lot of confusing details here (hence the name Confusion Matrix) but as a rule of thumb we need to focus on AUC i.e. area under the curve. As shown in the results above we have a decent 72.9% of the area under the curve (which in layman terms means percentage of correct predictions). Higher percentage does not necessarily equate to a better model, more often than not a higher percentage (e.g. 90%) means overfitting i.e. a state where your model does very well on the sample data but not so well on the real-world data. So our model is good to go.
You can read more about the metrics and terms above here
In the next blog we will deploy and integrate the model with Dynamics CRM.