# [Tutorial] Exponential Weighted Average … a good moving windows average

Very often we have a long sequence of data with a very high variance due to noises. The typical approach to this situation is to create a circular vector of size N, put the new incoming value in the vector in a sequential way replacing the older one and finally calculating the mean of the values.
The “smart” approach let us take memory of the sum of the values adding the new value and removing the older before dividing for the size of the vector, not too much complicated, but prone to error.

Math comes to us…

Lets start by this formula:

$\left\{&space;\begin{matrix}&space;m_t=\alpha&space;\cdot&space;m_{t-1}+(1-\alpha)\cdot&space;v_t&space;\\&space;m_0=0&space;\end{matrix}\right.$

m” is the mean at timet“, “v” is the new incoming value at timet“, “α” is the weight. The only value that we must take care to is “m” that propagates through time.

For example: we are at time “t=100” and we have reached this status:

$m_0=0&space;\\&space;m_1=\alpha&space;\cdot&space;m_0&space;+(1-\alpha)&space;\cdot&space;v_1&space;\\&space;\vdots&space;\\&space;m_{98}=\alpha&space;\cdot&space;m_{97}&space;+(1-\alpha)&space;\cdot&space;v_{98}&space;\\&space;m_{99}=\alpha&space;\cdot&space;m_{98}&space;+(1-\alpha)&space;\cdot&space;v_{99}&space;\\&space;m_{100}=\alpha&space;\cdot&space;m_{99}&space;+(1-\alpha)&space;\cdot&space;v_{100}&space;=&space;\alpha&space;\cdot&space;[\alpha&space;\cdot&space;m_{98}&space;+(1-\alpha)&space;\cdot&space;v_{99}]&space;+(1-\alpha)&space;\cdot&space;v_{100}&space;=&space;\alpha&space;\cdot&space;\left\{&space;\alpha&space;\cdot&space;[&space;\alpha&space;\cdot&space;m_{97}&space;+(1-\alpha)&space;\cdot&space;v_{98}]&space;+(1-\alpha)&space;\cdot&space;v_{99}\right\}&space;+(1-\alpha)&space;\cdot&space;v_{100}$

going on replacing each $m_t$ with the value calculate at (t-1) we can see how the weight of older values contributes less to the total sum in an exponential way according to the value of “α”.

An example of how the weigth decreases in time for T=15 (from Wikipedia)

But why can we use this formula to approximate a “moving windows average algorithm”?

The reply is simple: we can approximate the “width” of the windows with the following formula:

$W_{size}&space;\approx&space;\frac{1}{&space;1&space;-&space;\alpha}$

and we obtain the following window sizes for typical “α” values

$\\&space;\alpha&space;=&space;0.9&space;\rightarrow&space;W_{size}&space;\approx&space;\frac{1}{1-0.9}&space;\approx&space;10&space;\\&space;\alpha&space;=&space;0.95&space;\rightarrow&space;W_{size}&space;\approx&space;20&space;\\&space;\alpha&space;=&space;0.97&space;\rightarrow&space;W_{size}&space;\approx&space;30&space;\\&space;\alpha&space;=&space;0.975&space;\rightarrow&space;W_{size}&space;\approx&space;40&space;\\&space;\alpha&space;=&space;0.98&space;\rightarrow&space;W_{size}&space;\approx&space;50&space;\\&space;\alpha&space;=&space;0.99&space;\rightarrow&space;W_{size}&space;\approx&space;100$

If we want to calculate the value of “α” that must be used to obtain a searched window size we can use the formula

$\alpha&space;=&space;\frac{W_{size}-1}{W_{size}}$

So many line of codes can be replaced with a single line

The only problem with this approach is that for the first values the mean is really “slow” and very near to zero. We can overcome this problem using some kind of Bias Correction:

$\left\{&space;\begin{matrix}&space;m'_0=0&space;\\&space;m'_t=\alpha&space;\cdot&space;m'_{t-1}+(1-\alpha)\cdot&space;v_t&space;\\&space;m_t&space;=&space;\frac{m'_t}{1-&space;\alpha^t}&space;\\&space;\end{matrix}\right.$

A complete C++ class:

That’s all for this tutorial. I have used this approach more than one time since I discovered it following the “Deep Learning” specialization on Coursera by Andrew Ng.