LOESS

Local regression or local polynomial regression, is a generalization of moving average and polynomial regression. Its most common methods, initially developed for scatterplot smoothing, are LOESS (locally estimated scatterplot smoothing) and LOWESS (locally weighted scatterplot smoothing). They are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model.

A smooth curve through a set of data points obtained with this statistical technique is called a Loess Curve, particularly when each smoothed value is given by a weighted quadratic least squares regression over the span of values of the y-axis scattergram criterion variable.

LOESS and LOWESS thus build on “classical” methods, such as linear and nonlinear least squares regression. In fact, one of the chief attractions of this method is that the data analyst is not required to specify a global function of any form to fit a model to the data, only to fit segments of the data. The trade-off for these features is increased computation.

LOESS makes less efficient use of data than other least squares methods. It requires fairly large, densely sampled data sets in order to produce good models. This is because LOESS relies on the local data structure when performing the local fitting. Thus, LOESS provides less complex data analysis in exchange for greater experimental costs.

LOESS isn’t really trained like some other models, you need to keep the entire training data around to do predictions.

So say your training data is \(T\), and you want to make a prediction at a point \(x\). The general algorithm is this:

  • Take the \(k\) data points in \(T\) whose \(x\)-coordinate is closest to \(x\), call this set \(N_{x,k}\) (this is what the Local is about in the name LOESS)
  • Fit a weighted linear (or polynomial) regression using the training set \(N_{x,k}\).
  • Use the resulting regression to make a prediction at \(x\), this is the LOESS smoothed value at \(x\).

You have to do this entire procedure at each value of \(x\) you want to get the smoothed value of \(y\).

If you would like to do predictions efficiently from LOESS, you should set down a grid of evenly spaced \(x\)-coordinates and find the smoothed value of \(y\) at each of these \(x\)-coordinates. Then interpolate these smoothed points piecewise linearly, and remember the linear equations for each segment.

Implementation in Python and R

Comments