Applying statistics methods on numpy arrays: unexpected results

PatrickT Source

Please explain.

import statistics
x = [0,1]
statistics.mean(x) 
## 0.5

But:

import numpy 
import statistics
x = numpy.array([0,1])
statistics.mean(x) 
## 0

I'm pretty sure it's a basic, well-known, over-discussed issue: please link to a duplicate, as I couldn't find one.

pythonpython-3.xnumpystatisticspython-internals

Answers

answered 5 days ago jpp #1

The reason is there is a conversion method in the statistics module which checks if a data type is a subclass of int. This works for int, but not for np.int32.

import statistics
from fractions import Fraction

a = statistics._convert(Fraction('1/2'), int)       # 0.5
b = statistics._convert(Fraction('1/2'), np.int32)  # 0

def _convert(value, T):
    """Convert value to given numeric type T."""
    if type(value) is T:
        return value

    #### THIS BIT WORKS FOR int BUT not for np.int32 ###
    if issubclass(T, int) and value.denominator != 1:
        T = float

    try:
        return T(value)
    except TypeError:
        if issubclass(T, Decimal):
            return T(value.numerator)/T(value.denominator)
        else:
            raise

Therefore, you can either use statistics with a list, or numpy with an array:

  1. Use statistics.mean([0, 1]); or
  2. Use np.mean(np.array([0, 1])), or np.array([0, 1]).mean().

comments powered by Disqus