Sorting csv file based on two columns in unix

Preethi Babu Source

I'm a beginner in unix shell scripting. I'm trying to sort a csv file based on two columns.

My file looks like below:

sh-4.4$ cat test.csv                                                             
603,02,0123456,1111,201806131115                                        
603,20,0123456,1111,201806131115                                                 
603,02,9876542,2222,201806131215                                                
603,20,9876542,2222,201806131215                                                 
603,02,0123456,1111,201806131117                                                 
603,20,0123456,1111,201806131117  

I want to group by the 3rd column and the 2nd column should also be ordered as shown below:

603,20,0123456,1111,201806131115
603,02,0123456,1111,201806131115
603,20,0123456,1111,201806131117
603,02,0123456,1111,201806131117
603,20,9876542,2222,201806131215
603,02,9876542,2222,201806131215

I tried doing sort -t',' -k3 -k2 test.csv. This does groups the column 3, but it does not sort the column 2. Its output looks like below.

603,02,0123456,1111,201806131115                                             
603,20,0123456,1111,201806131115              
603,02,0123456,1111,201806131117                 
603,20,0123456,1111,201806131117                 
603,02,9876542,2222,201806131215                 
603,20,9876542,2222,201806131215

I also tried sort -t',' -k3 -rk2 test.csv. This however sorts the column 2 as I desired but the column 3 is not sorted as I expected. Its output looks like below.

603,20,9876542,2222,201806131215                                                                                                          
603,02,9876542,2222,201806131215                                                                                                          
603,20,0123456,1111,201806131117                                                                                                          
603,02,0123456,1111,201806131117                                                                                                          
603,20,0123456,1111,201806131115                                                                                                          
603,02,0123456,1111,201806131115

Any help on this is much appreciated. Suggestions to sort using awk is also welcome.

linuxcsvsortingunixawk

Answers

answered 4 months ago karakfa #1

restrict the sorting fields

$ sort -t, -k3,3 -k2,2 file

should do.

Note however that the output you want doesn't match the spec you describe. You'll get

603,02,0123456,1111,201806131115
603,02,0123456,1111,201806131117
603,20,0123456,1111,201806131115
603,20,0123456,1111,201806131117
603,02,9876542,2222,201806131215
603,20,9876542,2222,201806131215

grouped by third field only and sorted by second field.

Perhaps this is what you wanted?

$ sort -t, -k3 -k2,2r file

603,20,0123456,1111,201806131115
603,02,0123456,1111,201806131115
603,20,0123456,1111,201806131117
603,02,0123456,1111,201806131117
603,20,9876542,2222,201806131215
603,02,9876542,2222,201806131215

note that -k3 means starting from 3rd field to the end, which seems what you want based on the order of the last fields. Also, you want to reorder the rows based on 2nd field in reverse order.

NB. If your numerical fields are not zero padded you may want to add -n option indicate numerical ordering instead of lexical ordering. Here it doesn't make a difference.

comments powered by Disqus