I have a data frame with the structure below:
ID | Name | Role 1 | John | Owner 1 | Bob | Driver 2 | Jake | Owner 2 | Tom | Driver 2 | Sally | Owner 3 | Mary | Owner 3 | Sue | Driver
I'd like to pivot the Role column and have the Name column as the value, but since some IDs (the index in this case) have more than one person in the owner role and some don't the pivot_table function doesn't work. Is there a way to create a new column for each additional owner a particular ID may have. Some may have 2,3,4+ owners. Thanks!
Sample output below:
ID | Owner_1 | Owner_2 | Driver 1 | John | NaN | Bob 2 | Jake | Sally | Tom 3 | Mary | NaN | Sue
This is what I tried:
pd.pivot_table(df,values='Name',index='ID',columns='Role') DataError: No numeric types to aggregate
You can create the additional key for the duplicate Item within each ID by using
cumcount, then we can simply using
df.Role=df.Role+'_'+df.groupby(['ID','Role']).cumcount().add(1).astype(str) df.pivot('ID','Role','Name') Out: Role Driver_1 Owner_1 Owner_2 ID 1 Bob John None 2 Tom Jake Sally 3 Sue Mary None
You need to change the default aggregation function from
pivoted = pd.pivot_table(df, values='Name', index='ID', columns='Role', aggfunc='sum') #Role Driver Owner #ID #1 Bob John #2 Tom Jake Sally #3 Sue Mary
Now, some owners are represented as multiword strings. Split them into individual words:
result = pivoted.join(pivoted['Owner'].str.split().apply(pd.Series))\ .drop("Owner", axis=1) # Driver 0 1 #ID #1 Bob John NaN #2 Tom Jake Sally #3 Sue Mary NaN result.columns = "Driver", "Owner_1", "Owner_2"