[Tutorial] Exponential Weighted Average … a good moving windows average

Very often we have a long sequence of data with a very high variance due to noises. The typical approach to this situation is to create a circular vector of size N, put the new incoming value in the vector in a sequential way replacing the older one and finally calculating the mean of the values.
The “smart” approach let us take memory of the sum of the values adding the new value and removing the older before dividing for the size of the vector, not too much complicated, but prone to error.

Math comes to us…

Lets start by this formula:

\left\{ \begin{matrix} m_t=\alpha \cdot m_{t-1}+(1-\alpha)\cdot v_t \\ m_0=0 \end{matrix}\right.

m” is the mean at timet“, “v” is the new incoming value at timet“, “α” is the weight. The only value that we must take care to is “m” that propagates through time.

For example: we are at time “t=100” and we have reached this status:

m_0=0 \\ m_1=\alpha \cdot m_0 +(1-\alpha) \cdot v_1 \\ \vdots \\ m_{98}=\alpha \cdot m_{97} +(1-\alpha) \cdot v_{98} \\ m_{99}=\alpha \cdot m_{98} +(1-\alpha) \cdot v_{99} \\ m_{100}=\alpha \cdot m_{99} +(1-\alpha) \cdot v_{100} = \alpha \cdot [\alpha \cdot m_{98} +(1-\alpha) \cdot v_{99}] +(1-\alpha) \cdot v_{100} = \alpha \cdot \left\{ \alpha \cdot [ \alpha \cdot m_{97} +(1-\alpha) \cdot v_{98}] +(1-\alpha) \cdot v_{99}\right\} +(1-\alpha) \cdot v_{100}

going on replacing each m_t with the value calculate at (t-1) we can see how the weight of older values contributes less to the total sum in an exponential way according to the value of “α”.

An example of how the weigth decreases in time for T=15

An example of how the weigth decreases in time for T=15 (from Wikipedia)

But why can we use this formula to approximate a “moving windows average algorithm”?

The reply is simple: we can approximate the “width” of the windows with the following formula:

W_{size} \approx \frac{1}{ 1 - \alpha}

and we obtain the following window sizes for typical “α” values

\\ \alpha = 0.9 \rightarrow W_{size} \approx \frac{1}{1-0.9} \approx 10 \\ \alpha = 0.95 \rightarrow W_{size} \approx 20 \\ \alpha = 0.97 \rightarrow W_{size} \approx 30 \\ \alpha = 0.975 \rightarrow W_{size} \approx 40 \\ \alpha = 0.98 \rightarrow W_{size} \approx 50 \\ \alpha = 0.99 \rightarrow W_{size} \approx 100

If we want to calculate the value of “α” that must be used to obtain a searched window size we can use the formula

\alpha = \frac{W_{size}-1}{W_{size}}

So many line of codes can be replaced with a single line

The only problem with this approach is that for the first values the mean is really “slow” and very near to zero. We can overcome this problem using some kind of Bias Correction:

\left\{ \begin{matrix} m'_0=0 \\ m'_t=\alpha \cdot m'_{t-1}+(1-\alpha)\cdot v_t \\ m_t = \frac{m'_t}{1- \alpha^t} \\ \end{matrix}\right.

A complete C++ class:

That’s all for this tutorial. I have used this approach more than one time since I discovered it following the “Deep Learning” specialization on Coursera by Andrew Ng.

If you want to learn more follow the links to the three videos about this argument:

 

Comments are closed