Creating a subset of array from another array : Python

jig Source

I have a basic question regarding working with arrays:

a= ([ c b a a b b c a a b b c a a b a c b]) 
b= ([ 0 1 0 1 0 0 0 0 2 0 1 0 2 0 0 1 0 1])

I) Is there a short way, to count the number of time 'c' in a corresponds to 0, 1, and 2 in b and 'b' in a corresponds to 0, 1, 2 and so on

II) How do I create a new array c (subset of a) and d(subset of b) such that it only contains those elements if the corresponding element in a is 'c' ?

pythonarrays

Answers

answered 2 months ago Gene Burinsky #1

The questions are a bit vague but here's a quick method (some would call it dirty) using Pandas though I think something written without recourse to Pandas should be preferred.

import pandas as pd

#create OP's lists
a= [ 'c', 'b', 'a',  'a', 'b', 'b', 'c', 'a', 'a', 'b', 'b', 'c', 'a', 'a', 'b', 'a', 'c', 'b']
b= [ 0, 1, 0, 1, 0, 0, 0, 0, 2, 0, 1, 0, 2, 0, 0, 1, 0, 1]

#dump lists to a Pandas DataFrame
df = pd.DataFrame({'a':a, 'b':b})

Question 1

provided I interpreted it correctly, you can cross-tabulate the two arrays: pd.crosstab(df.a, df.b).stack(). Cross-tabulate basically counts the number of times each number corresponds to a particular letter. .stack is a command to turn output from .crosstab into a more legible format.

#question 1
pd.crosstab(df.a, df.b).stack()

## -- End pasted text --
Out[9]:
a  b
a  0    3
   1    2
   2    2
b  0    4
   1    3
   2    0
c  0    4
   1    0
   2    0
dtype: int64

Question 2

Here, I use Pandas boolean indexing ability to only select the elements in array a that correspond to value 'c'. So df.a=='c' will return True for every value in a that is 'c' and False otherwise. df.loc[df.a=='c','a'] will return values from a for which the boolean statement was true.

c = df.loc[df.a == 'c', 'a']
d = df.loc[df.a == 'c', 'b']

In [15]: c
Out[15]:
0     c
6     c
11    c
16    c
Name: a, dtype: object

In [16]: d
Out[16]:
0     0
6     0
11    0
16    0
Name: b, dtype: int64

answered 2 months ago LMD #2

Python List : https://www.tutorialspoint.com/python/python_lists.htm has a count method.

I suggest you to first zip both lists, as said in comments, and then count occurances of tuple c, 1 and occurances of tuple c, 0 and sum them up, thats what you need for (I), basically.

For (II), if I understood you correctly, you have to take the zipped lists and apply filter on them with lambda x: x[0]==x[1]

answered 2 months ago Srini #3

In [10]: p = ['a', 'b', 'c', 'a', 'c', 'a']

In [11]: q  = [1, 2, 1, 3, 3, 1]

In [12]: z = zip(p, q)

In [13]: z
Out[13]: [('a', 1), ('b', 2), ('c', 1), ('a', 3), ('c', 3), ('a', 1)]

In [14]: counts = {}

In [15]: for pair in z:
    ...:     if pair in counts.keys():
    ...:         counts[pair] += 1    
    ...:     else:                    
    ...:         counts[pair] = 1     
    ...:                              

In [16]: counts
Out[16]: {('a', 1): 2, ('a', 3): 1, ('b', 2): 1, ('c', 1): 1, ('c', 3): 1}

In [17]: sub_p = []

In [18]: sub_q = []

In [19]: for i, element in enumerate(p):
    ...:     if element == 'a':
    ...:         sub_p.append(element)
    ...:         sub_q.append(q[i])
In [20]: sub_p
Out[20]: ['a', 'a', 'a']

In [21]: sub_q
Out[21]: [1, 3, 1]

Explanation

  1. zip takes two lists and runs a figurative zipper between them. Resulting in a list of tuples
  2. I've used a simplistic approach, I'm just maintaining a map/dictionary that makes not of how many times it has seen a pair of char-int tuples
  3. Then I make 2 sub lists that you can modify to use the character in question and figure out what it maps to

Alternative methods

As abarnert suggested you could use A Counter from collections instead. Or you could just a count method on z . eg: z.count('a',1). Or you can use a defaultdict instead.

comments powered by Disqus