Power BI for Data Scientists

With my involvement in some data science work recently, I have had the privilege to explore a lot tools of the trade – Rapid Miner, Python, Tensorflow and Azure Machine Learning to name a few. My experience has been highly enriching but I felt there was no Swiss knife that can handle the initial – and the most critical stage of a Data science project: i.e. Hypothesis stage.

During this stage, scientists typically need to quickly prep the data, find the correlation patterns and establish hypotheses. It requires them to fail fast by identifying null hypotheses and spurious correlations and stay focussed on the right path. I recently explored Power BI and would like to share my findings through this blog.

Business Problem

Let us take a business case of a juice vendor say Julie. Julie sells various kinds of juices and she collects some data about her business operations on daily basis. Say we have the following data for the month of July which looks like below. It is pretty much – when, where, what and for how much?

clip_image001

Now say I am a data scientist who is trying to help Julie to increase her sales and give her some insights that what should she focus on to get the best bang of her buck. I have been tasked to build an estimation model for Julie based on simple linear regression.

Feature Engineering

I will start by analysing various correlations between the features and our target variable i.e. Revenue. It can be commenced by importing the data into Power BI and looking after the following basics

1) Eliminate the null values with mean value of the feature

2) Dedupe any rows

3) Engineer some new features as below

Feature

DAX formula

Day Type

Purpose of this feature is to distinguish between a week day and a weekend day. I wanted to test a hypothesis that weekend day might generate more sales than a week day.

Day Type = IF(WEEKDAY(Lemonade2016[Date],3) >= 5,”Weekend”,”Weekday”)

Total Items Sold

Lemon + Orange

Revenue

Total Items Sold * Price

Data preparation and feature engineering was a breeze in Power BI, thanks its extensive support of DAX, calculated columns and measures. The dataset looks like below now.

clip_image001[4]

Hypotheses Development

Once we had our dataset ready in Power BI, the next task was to analyse the patterns between Revenue and other features

Hypothesis 1 – There is a positive correlation between Temperature and Revenue

Result: Passed

Hypothesis 2 – There are more sales on a weekend day

Result: Failed

I derived these results using the below visualizations built briskly using Power BI platform

clip_image003

Next off to some advanced hypothesis development. Shall we?

I needed to understand the relationship between the leaflets given on a particular day and their relationship with Revenue. Time to pull some heavy plumbing in, so I decided to tow R into in the mix. Power BI comes with inbuilt (almost!) support with R and I was able to quickly spawn a coplot using just 6-8 lines of R in the R Script Editor of Power BI

clip_image004

Interesting insight was how correlation differs based on the day. This was made possible using the Power BI slicer as shown below

clip_image006

 

clip_image008

Wednesday – Less correlation between leaflets and sales

 

Sunday – High correlation between leaflets and sales

Power BI + R = Advanced Insights

If you need to analyse the dynamics between various features and how this dynamics impacts your target variable i.e. Revenue. You can easily model that in Power BI. Below is a dynamic co plot that shows the incremental causal relationship between Leaflets, Revenue and Temperature.

The 6 quadrants at the bottom should be read in conjunction with 6 steps in the top box. The bottom left is the first step and the top right the last step of leaflets. Basically it shows how the correlation between Temperature and Revenue is affected by leaflets bin size

clip_image009

I ended my experiment by building a simple regression model that can give you prediction of your Revenue if you enter Temperature, Price and Leaflets. Below is the code for model in case you are keen

clip_image010

Power BI is a very simple and powerful tool for the exploratory data scientist in you. Give it a go.