Contribution Analysis

A contribution analysis is a way to analyze the effects of different options, programs, or policies on final outcomes. The most common use is to identify how each program or option in a decision contributes to a final outcome, which ones improve or hinder the outcome and by how much. This guide will cover an example of both the application of this task and a breakdown of how a contribution analysis works.

Example: An online dress retailer sells a wide range of dresses featuring different combinations of styles and dress features. They want to know what combination of features on a dress sells the best. To do this they need to look at how well dresses with different combinations of features have sold in the past. Then figure out which features (or combinations of ) led to the greatest sales.

Our goals for this task are:

  • Predict that outcome if a feature changes.
  • Estimate the total sales if we do not have this feature.
  • What if we have this feature on all these dresses?
  • Differences between them, how much sales does this feature contributes?

The training data table for this example will look something like this:

NIA1.png

Here we have rows of different dresses, their respective sales, and the combination of dress features each dress has. With this data we can begin to create a numeric prediction (regression) model to tell us which individual feature (or combination of) correlates with the best sales.

Building the model itself is simple, following the steps outlined in our walkthrough, the training data is converted to cvs, uploaded, and the prediction task of regression is set with “sales” as our predicted column:

IA2

Counterfactual data constructions, and predictions

After the regression model is made it can now be used to do a contribution analysis.

To use the regression model for a contribution analysis, we need to construct the data to get specific information from our model. To do this we go back to our training dataset and clear out all values in the label. (all column names must stay exactly the same)

We then pick one column and clear it as well, till the dataset now looks like this:

NIA2

This new dataset can now be uploaded and our model can be applied to it. By removing all info from the label and column “Frilled Skirts”, the regression model will then tell us the predicted sales for our dresses if none of them had frilled skirts. Those sales predictions look like:

NIA3

Now, go back and this time we will fill the “Frilled Skirts” column with checks (keep the label column blank):

NIA4.png

We upload this dataset and apply our regression model to get predictions based on if every dress had frilled skirts:

NIA5

The last step is to take the sums of total sales for each test:

NIA7

  • All without Frilled Skirts, sales had a sum total of: 47,881.76
  • All with Frilled Skirts, sales had a sum total of: 43,487.9
  • If we calculate the difference, 43,487.9 – 47,881.76 = -4,393.86

We see that frilled skirts have a negative impact on dress sales, suggesting that we should not sell dresses with frilled skirts in the future, but by breaking it down by row we do see that some dresses sold positively when they only had frilled skirts, meaning that perhaps frilled skirts should simply not be used with any other dress feature.

These steps can be repeated for the dress features of “Buttons” and “Low Cut” to determine what impact they have.

 

Leave a comment