get unique values of a matrix and their indexes in an vectorized way (without a loop)

eric lardon Source

l have to process huge matrices of dimension (150.000, 120.000).

For that reason, l'm looking for an efficient (vectorized way) to get values of a given matrix and its indexes (avoiding to do that in a loop). Because l have about 90.000 matrices to process.

Here is an exemple for the sake of illustration :

matrix_lables=np.random.randint(30,size=(5,8))

array([[24, 18,  4, 17, 24,  0,  3, 26],
       [21, 11, 14,  9,  3, 27, 18, 14],
       [25, 26, 27, 16, 26, 27, 21, 26],
       [ 3, 29, 28,  2, 22, 10, 29, 28],
       [21, 29,  0,  3, 13, 18,  6,  1]])

Then l get the unique values

unique_labels=np.unique(matrix_lables)

for each label l have a set of indexes corresponding to that value in a matrix

dictionnary=[]
for p in unique_labels:
    z=matrix_lables[np.argwhere(matrix_lables==p)]
    label_index = dict(zip(p, z))
    dictionnary.append(label_index)

How to avoid to do that in a loop ?

when l process thousand of matrices where each matrix has about 15,000 labels it becomes time consuming.

stored_matrices # is the variable that stores the t 90.000 matrices

The full algorithm that processes all the matrices is as follow :

full_dictionnary=[]
for m in np.arange(len(stored_matrices)):

    tmp_matrix=stored_matrices[m]
    unique_labels=np.unique(tmp_matrix)
    dictionnary=[]
    for p in unique_labels:

       z=tmp_matrix[np.argwhere(tmp_matrix==p)]
       label_index = dict(zip(p, z))
       dictionnary.append(label_index)  

       full_dictionnary.append(dictionnary)

Example :

cd=np.random.randint(80,size=(10,5,8))
indexes=[]
labels=[]
for m in np.arange(len(cd)):
    tmp_matrix=cd[m]
    unique_labels=np.unique(tmp_matrix)
    for p in unique_labels:
       z=tmp_matrix[np.argwhere(tmp_matrix==p)]
       indexes.append(z)  
       labels.append(p)

The output :

indexes[0]   # x and y coordinates
array([[[[ 3, 51, 14, 28, 50, 30, 16, 40],
         [20, 63, 31,  7, 39, 14, 38, 12],
         [18, 14, 71, 46, 22, 67, 29, 58],
         [34, 10, 70, 65, 18,  7, 69,  7],
         [57, 76, 63, 61, 12, 58, 28, 70]],

        [[ 3, 66,  1, 19, 72, 18, 24, 35],
         [56, 68, 50, 26, 47, 48, 42, 18],
         [74, 18, 52, 40, 37, 38, 55, 66],
         [75, 29, 51, 20, 38, 11, 40, 51],
         [39, 71, 51, 63, 72, 24, 48, 24]]]])

labels[0]=4
arrayspython-3.xnumpydictionarymatrix

Answers

comments powered by Disqus