MAPE Madness
Problem Setup: You want to use MAPE as your loss function for training Linear Regression and you want to use Sklearn
to solve it.
Setup
You’ve got to do some forecasting on data where there is potentially large difference in the subsetquent samplings of data (E.g T_1 = 5, T_2 = 5_000
). The percentage error is nice because it reduces the problem to be less susceptible to outliers, and because if you do the difference (T_2  T_1
) the scale isn’t as important anymore (more on this later)
Attempts
1) A quick look at the Sklearn Linear Model  Linear Regression page tells you that it only supports OLS
 Hmm, that’s unfortunate. It would have been nice to use Sklearn to solve the problem as it already integrates so well with your system and existing infrastructure.
 Unfortunately, even after expanding the equations, you realize that there is no way to “finesse” the percentage into the equation since we do not have finegrained control over it.
 But maybe….
2) Autograd, the excellent package from HIPS.
 Looking at the examples, it seems to be quite opaque on how to handle extremely large datasets and generate the indices to be passed in for minibatch training so I’ll just save you the time and tell you to go straight to the convnet example which shows you how to pass minibatches in
 Note, you’ll want to be sure that your
y_true
values aren’t 0 as this can lead to division by 0 errors in the optimization. I suggest doing
def objective(params, X, y):
pred = np.dot(X, params)
non_zero_mask = y > 0
return (y[non_zero_mask  pred[non_zero_mask]]) / y[non_zero_mask]
 of course, you can do things like add weights to the objective function since it is conceivable that if you’re extremely unlucky that your objective returns 0, or perhaps you want to give greater weight to batches which have more nonzero values. Your call!
 Wait…. this is actually a little slow and your boss wants you to write boilerplate to make it fit into the
sklearn
framework. Back to the drawing board it is …. We work smart not har…. We work smart AND hard
3) Stop thinking with your eyes

The title of this list point was really for me but I think it shows the necessity of understanding the math, and understanding the engineering that comes with it. IMO, a good Machine Learning Engineer at a startup needs to be a competent software engineer AND a competent mathematician.

The Catch
Looking at the Wikipedia page for MAPE, one might notice that it resembles the formula for MAE
In fact, one might say that MAPE is a weighted version of MAE. Lets just pull out Y_true
and go for lunch. Restructuring MAPE, we get the following
 The Implementation
Great! We know it’s basically just an MAE with a weighted … weight? How in the world do we get this crap into Sklearn (not a ding on Sklearn  I think they do excellent work, I’m just used to building stuff from tf.Varable
in Tensorflow
where I have tons of control over all MY code and training process)
4) RTFM and homestretch
Thankfully, because reading documentation and spending a few minutes a day just reading up about packages I use are both things that I enjoy doing, I had already stumbled on sklearn.SGDRegressor which doesn’t use the least squares optimization method which SHOULD mean that we have greater control over our gradients and should be able to pass in arbitrary loss functions that are differentiable.
Looking through the documentation, we see that it supports a loss known as epsiloninsensitive
which basically amounts to 0 if the error for X_i
is less than epsilon
and is abs(error)
otherwise. So, we just set epsilon
to 0 and we already have the MAE that we wanted
Now, it’s just a matter of passing in the weights that we were looking for and bingobango we’re done!
Assumptions
 When modeling the problem, the scale isn’t a huge factor. Depending on your problem, large
T_x
might lead to largeT_(x+1)
more often than smallT_x
.
Note:
1) Although the equations and the thought process here are true to something I did, I took some liberties to both shorten the scenario, embellish the feedback process, and motivate the steps. My bosses are extremely trusting and would have let me end at the autograd step if I had felt I was happy with it but something nagged at me even after I had run the experiment.
2) Although it was a fun accomplishment and a quick little math trick, this didn’t do any better than the baseline we had established. “C’est la vie” mes amis and we move on to the next problem
Thank you for taking the time to read this and happy holidays!