Running mean of numpy ndarrays from iterator

Hal T Source

The question of how to compute a running mean of a series of numbers has been asked and answered before. However, I am trying to compute the running mean of a series of ndarrays, with an unknown length of series. So, for example, I have an iterator data where I would do:

running_mean = np.zeros((1000,3))
while True:
    datum = next(data)
    running_mean = calc_running_mean(datum)

What would calc_running_mean look like? My primary concern here is memory, as I can't have the entirety of the data in memory, and I don't know how much data I will be receiving. datum would be an ndarray, let's say that for this example it's a (1000,3) array, and the running mean would be an array of the same size, with each element containing the elementwise mean of every element we've seen in that position so far.

The key distinction this question has from previous questions is that it's calculating the elementwise mean of a series of ndarrays, and the number of arrays is unknown.

pythonnumpy

Answers

answered 1 week ago Paul Panzer #1

You can use itertools together with standard operators:

>>> import itertools, operator
>>> running_sum = itertools.accumulate(data)
>>> running_mean = map(operator.truediv, running_sum, itertools.count(1))

Example:

>>> data = (np.linspace(-i, i*i, 6) for i in range(10))
>>> 
>>> running_sum = itertools.accumulate(data)
>>> running_mean = map(operator.truediv, running_sum, itertools.count(1))
>>> 
>>> for i in running_mean:
...     print(i)
... 
[0. 0. 0. 0. 0. 0.]
[-0.5 -0.3 -0.1  0.1  0.3  0.5]
[-1.         -0.46666667  0.06666667  0.6         1.13333333  1.66666667]
[-1.5 -0.5  0.5  1.5  2.5  3.5]
[-2.  -0.4  1.2  2.8  4.4  6. ]
[-2.5        -0.16666667  2.16666667  4.5         6.83333333  9.16666667]
[-3.   0.2  3.4  6.6  9.8 13. ]
[-3.5  0.7  4.9  9.1 13.3 17.5]
[-4.          1.33333333  6.66666667 12.         17.33333333 22.66666667]
[-4.5  2.1  8.7 15.3 21.9 28.5]

comments powered by Disqus