Adding new columns based on number of unique row values in Pandas

PokerFace Source

I have a data frame with the structure below:

ID | Name | Role  
1 | John | Owner
1 | Bob | Driver
2 | Jake | Owner
2 | Tom | Driver
2 | Sally | Owner
3 | Mary | Owner
3 | Sue | Driver

I'd like to pivot the Role column and have the Name column as the value, but since some IDs (the index in this case) have more than one person in the owner role and some don't the pivot_table function doesn't work. Is there a way to create a new column for each additional owner a particular ID may have. Some may have 2,3,4+ owners. Thanks!

Sample output below:

ID | Owner_1 | Owner_2 | Driver
1 | John | NaN | Bob 
2 | Jake | Sally | Tom 
3 | Mary | NaN | Sue 

This is what I tried:


DataError: No numeric types to aggregate


answered 2 months ago Wen #1

You can create the additional key for the duplicate Item within each ID by using cumcount, then we can simply using pivot

Role Driver_1 Owner_1 Owner_2
1         Bob    John    None
2         Tom    Jake   Sally
3         Sue    Mary    None

answered 2 months ago DYZ #2

You need to change the default aggregation function from mean to sum:

pivoted = pd.pivot_table(df, values='Name', 
                         index='ID', columns='Role', aggfunc='sum')
#Role  Driver          Owner
#1       Bob           John 
#2       Tom    Jake  Sally 
#3       Sue           Mary 

Now, some owners are represented as multiword strings. Split them into individual words:

result = pivoted.join(pivoted['Owner'].str.split().apply(pd.Series))\
       .drop("Owner", axis=1)
#    Driver     0      1
#1     Bob   John    NaN
#2     Tom   Jake  Sally
#3     Sue   Mary    NaN

result.columns = "Driver", "Owner_1", "Owner_2"

comments powered by Disqus