MAPE Madness

Problem Setup: You want to use MAPE as your loss function for training Linear Regression and you want to use Sklearn to solve it.

Setup

You’ve got to do some forecasting on data where there is potentially large difference in the subsetquent samplings of data (E.g T_1 = 5, T_2 = 5_000). The percentage error is nice because it reduces the problem to be less susceptible to outliers, and because if you do the difference (T_2 - T_1) the scale isn’t as important anymore (more on this later)

Attempts

1) A quick look at the Sklearn Linear Model - Linear Regression page tells you that it only supports OLS

  • Hmm, that’s unfortunate. It would have been nice to use Sklearn to solve the problem as it already integrates so well with your system and existing infrastructure.
  • Unfortunately, even after expanding the equations, you realize that there is no way to “finesse” the percentage into the equation since we do not have fine-grained control over it.
  • But maybe….

2) Autograd, the excellent package from HIPS.

  • Looking at the examples, it seems to be quite opaque on how to handle extremely large datasets and generate the indices to be passed in for minibatch training so I’ll just save you the time and tell you to go straight to the convnet example which shows you how to pass minibatches in
  • Note, you’ll want to be sure that your y_true values aren’t 0 as this can lead to division by 0 errors in the optimization. I suggest doing
def objective(params, X, y):
    pred = np.dot(X, params)
    non_zero_mask = y > 0
    return (y[non_zero_mask - pred[non_zero_mask]]) / y[non_zero_mask]
  • of course, you can do things like add weights to the objective function since it is conceivable that if you’re extremely unlucky that your objective returns 0, or perhaps you want to give greater weight to batches which have more non-zero values. Your call!
  • Wait…. this is actually a little slow and your boss wants you to write boilerplate to make it fit into the sklearn framework. Back to the drawing board it is …. We work smart not har…. We work smart AND hard

3) Stop thinking with your eyes

  • The title of this list point was really for me but I think it shows the necessity of understanding the math, and understanding the engineering that comes with it. IMO, a good Machine Learning Engineer at a startup needs to be a competent software engineer AND a competent mathematician.

  • The Catch

    Looking at the Wikipedia page for MAPE, one might notice that it resembles the formula for MAE

In fact, one might say that MAPE is a weighted version of MAE. Lets just pull out Y_true and go for lunch. Restructuring MAPE, we get the following

  • The Implementation

Great! We know it’s basically just an MAE with a weighted … weight? How in the world do we get this crap into Sklearn (not a ding on Sklearn - I think they do excellent work, I’m just used to building stuff from tf.Varable in Tensorflow where I have tons of control over all MY code and training process)

4) RTFM and homestretch

Thankfully, because reading documentation and spending a few minutes a day just reading up about packages I use are both things that I enjoy doing, I had already stumbled on sklearn.SGDRegressor which doesn’t use the least squares optimization method which SHOULD mean that we have greater control over our gradients and should be able to pass in arbitrary loss functions that are differentiable.

Looking through the documentation, we see that it supports a loss known as epsilon-insensitive which basically amounts to 0 if the error for X_i is less than epsilon and is abs(error) otherwise. So, we just set epsilon to 0 and we already have the MAE that we wanted

Now, it’s just a matter of passing in the weights that we were looking for and bingo-bango we’re done!

Assumptions

  • When modeling the problem, the scale isn’t a huge factor. Depending on your problem, large T_x might lead to large T_(x+1) more often than small T_x.

Note:

1) Although the equations and the thought process here are true to something I did, I took some liberties to both shorten the scenario, embellish the feedback process, and motivate the steps. My bosses are extremely trusting and would have let me end at the autograd step if I had felt I was happy with it but something nagged at me even after I had run the experiment.

2) Although it was a fun accomplishment and a quick little math trick, this didn’t do any better than the baseline we had established. “C’est la vie” mes amis and we move on to the next problem

Thank you for taking the time to read this and happy holidays!

Written on December 25, 2019