PHP/ir

Information Retrieval and other interesting topics

Smoothing With Holt-Winter

In: statistics

03 Mar 2012

In one of his talks at QCon, John Allspaw mentioned using Holt-Winter exponential smoothing on various monitoring instances. Wikipedia has a good entry on the subject, of course, but the basic idea is to take a noisy/spikey time series and smooth it out, so that unexpected changes will stand out even more. That's often initially done by taking a moving average, so say averaging the last 7 days of data and using that as the current day's value. More complicated schemes weight that average, so that the older data contributes less.

Simple exponential smoothing effectively takes this weighted average further, with more recent values being exponentially more important than older ones. However, this has problems in the face of a long term trend, so double exponential includes a factor for the general tendencies in the data (e.g. an increasing trend over time). Triple exponential, which we've using here, also includes a factor to consider seasonal changes, so I thought I'd give that one a go at implementing. Each of those three smoothing aspects have their own weighting factor, alpha, beta and gamma, that control how much of an impact they have, and by setting each to 0 we can have the same code do any one of the three algorithms. Below I've broken out the function into it's component parts, but you can see the whole thing on github

We'll give it a go on some web data that has an unexpected spike, see how visible that is against the timeline. The algorithm is pretty simple, but we need to setup a bunch of variables first. We start off by calculating an initial trend value by looking at the difference in the average values over the first two 'seasons' (the length being a configurable parameter of the function).

<?php
// Calculate an initial trend level
$trend1 = 0;
for($i = 0; $i < $season_length; $i++) {
    $trend1 += $data[$i];
}
$trend1 /= $season_length;
    
$trend2 = 0;
for($i = $season_length; $i < 2*$season_length; $i++) {
    $trend2 += $data[$i];
}
$trend2 /= $season_length;
    
$initial_trend = ($trend2 - $trend1) / $season_length;
?>

Next we create an initial value for the 'level' part, the direct data smoothing parameter, map the data for the season index, and calculate the seasonal changes for the first period.

<?php
// Take the first value as the initial level
$initial_level = $data[0];
        
// Build index
$index = array();
foreach($data as $key => $val) {
    $index[$key] = $val / ($initial_level + ($key + 1) * $initial_trend);
}
    
// Build season buffer
$season = array_fill(0, count($data), 0);
for($i = 0; $i < $season_length; $i++) {
    $season[$i] = ($index[$i] + $index[$i+$season_length]) / 2;
}
    
// Normalise season
$season_factor = $season_length / array_sum($season);
foreach($season as $key => $val) {
    $season[$key] *= $season_factor;
}
?>

Finally, we actually run the smoothing. This loops over the data, updates trend, level and season values for the three elements of the smoothing, and finally combines them to calculate the smoothed value, factoring in the weighting constants. By continuing beyond the end of the data, we can even use this to project into the future and make a forecast!

<?php
$holt_winters = array();
$alpha_level = $initial_level;
$beta_trend = $initial_trend;
foreach($data as $key => $value) {
    $temp_level = $alpha_level;
    $temp_trend = $beta_trend;
        
    $alpha_level = $alpha * $value / $season[$key] + 
                  (1.0 - $alpha) * ($temp_level + $temp_trend);
    $beta_trend = $beta * ($alpha_level - $temp_level) + ( 1.0 - $beta ) * $temp_trend;
        
    $season[$key + $season_length] = $gamma * $value / $alpha_level
                  + (1.0 - $gamma) * $season[$key];
        
    $holt_winters[$key] = ($alpha_level + $beta_trend * ($key + 1)) * $season[$key];
}
?>

This whole thing is wrapped in a function that sets the values of the smoothing constants, so we can just call $newdata = holt_winters($data, 30). Running this on the webstats data gives us a smoothed graph, as you can (hopefully) see from the Google chart below, assuming the Javascript is behaving.

John used this kind of smoothing at Etsy in combination with error bars to look for unusual events, and trigger their monitoring systems. One thing I noticed from trying a quick implementation is that the length of time considered for the season can have a big effect on the smoothing, as can the values of the $alpha, $beta and $gamma constants, so some tweaking may be required if using a similar technique on your own data.

If we did want to make some sort of triggering based on data, we'd need to create confidence intervals as well. We can do that with an extra array in the main holt winters loop that is updated like this:

<?php
$deviations[$key] = $dev_gamma * abs($value - $holt_winters[$key]) + (1-$dev_gamma) 
    * (isset($deviations[$key - $season_length]) ? $deviations[$key - $season_length] : 0);
?>

This is going to track how much our data is deviating from the smoothed value, and factor in seasonality in that. We can use a number of these values added and subtracted to the smoothed value to create confidence bars, and signal if our data goes outside that. We'll add and subtract three multiples of deviation score, which gives us error bars that look something like the below. Note that as the data gets more variable, the confidence bars open up to respect the general increased volatility, but when the data isn't changing much day to day the error bars are pretty tight.