Why Your Predictions Aren’t Worth As Much As You Think

pitcher.jpg

Lately it’s really come into focus how difficult it can be to approve a model and get it to production in your typical dinosaur organization. Not only can it be hard to get the correct data, feature engineer and enrich it, select the highest yield modeling and preprocessing techniques and deploy, but after all of that, you still probably have a black box.

Unless you are using a stone age model like Logistic regression or some kind of GLM, you won’t be able to get more than a foggy idea of what’s going on. Maybe at best you’ve got the tree based importances from a Random Forest. Cue the groaning of 10,000 analysts asked WHY a model produces a certain prediction, when the drivers of a given predicted value don’t seem to compute in the realm of human intuition. Too bad human intuition can’t look at a fresh Neural Net from Tensor Flow and figure out what is going on under the hood. For that we need reason codes.

DataRobot (and some other tools out there no doubt) provides reason codes and other features to make the predictions of a model completely transparent, letting us know why a pitching prospect is a fantastic draft pick despite a high ERA, why a flawlessly running truck might be called in for maintenance despite no obvious signs of a problem, and why a patient might need to stay in the hospital despite looking healthy enough to discharge.

Call me crazy, but I think the prediction itself is actually relatively low value in comparison to the combined prediction AND reason codes, especially in the phases before and shortly after deploying a new model.

This is the age where a director of analytics who is running with dragons under the hood should hunt for ways to eliminate risk in all forms.

Before I deploy, I want to take the reason codes for my model, take a hist of the values from a few thousand prediction rows and visualize the results in Excel or Tableau. This will begin to show us the combinations of factors driving predictions. If I can somehow deploy a slower model, I’d love to save all the reason codes for EVERY prediction in a database so that I have an auditable trail if the predictions from the model were ever questioned. Why isn’t that a mega sign of quality? A model with a track record for its decision making is the one I’d pick.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s