numpy.unique sort based on counts

meow Source

The numpy.unique function allows to return the counts of unique elements if return_counts is True. Now the returned tuple consists of two arrays one containing the unique elements and the 2nd one containing a count array, both are sorted by the unique elements. Now is there a way to have both sorted according to the counts array instead of the unique elements? I mean I know how to do it the hard way but is there some concise one-liner or lambda functionality for such cases?

Current result:

my_chr_list = ["a","a","a", "b", "c", "b","d", "d"]
unique_els, counts = np.unique(my_chr_list, return_counts=True)
print(unique_els, counts)

Which returns something along the lines of this:

>>> (array(['a', 'b', 'c', 'd'], 
     dtype='<U1'), array([3, 2, 1, 2], dtype=int64))

However, what I would want to have:

>>> (array(['a', 'b', 'd', 'c'], 
     dtype='<U1'), array([3, 2, 2, 1], dtype=int64))


answered 3 months ago Kasramvd #1

You can't do this directly with unique function. Instead as a Numpythonic approach, you can use return_index keyword to get the indices of the unique items then use np.argsort to get the indices of the sorted count items and use the result to find the items based on their frequency.

In [33]: arr = np.array(my_chr_list)

In [34]: u, ind, count = np.unique(my_chr_list, return_counts=True,return_index=True)

In [35]: count_sort_ind = np.argsort(-count)

In [36]: arr[ind[count_sort_ind]]
array(['a', 'b', 'd', 'c'], 

In [37]: count[count_sort_ind]
Out[37]: array([3, 2, 2, 1])

comments powered by Disqus