Python: dataframe manipulation and aggregation in pandas

Liquidity Source

I have the following df:

dfdict = {'letter': ['a', 'a', 'a', 'b', 'b'], 'category': ['foo', 'foo', 'bar', 'bar', 'spam']}
df1 = pd.DataFrame(dfdict)

  category  letter
0   foo      a
1   foo      a
2   bar      a
3   bar      b
4   spam     b

I want it to output me an aggregated count df like this:

     a    b
foo  2    0
bar  1    1
spam 0    1

This seems like it should be an easy operation. I have figured out how to use df1 = df1.groupby(['category','letter']).size() to get:

category  letter
bar       a         1
          b         1
foo       a         2
spam      b         1

This is closer, except now I need the letters a, b along the top and the counts coming down.

pythonpython-3.xpandasdataframe

Answers

answered 5 days ago Wen #1

You can using crosstab

pd.crosstab(df1.category,df1.letter)
Out[554]: 
letter    a  b
category      
bar       1  1
foo       2  0
spam      0  1

To fix your code , adding unstack

df1.groupby(['category','letter']).size().unstack(fill_value=0)
Out[556]: 
letter    a  b
category      
bar       1  1
foo       2  0
spam      0  1

comments powered by Disqus