# Threading a random number generator function in python

West Source

I have the following code in python:

``````results=[]
for i in range(1,7000000):
a=(random.sample(range(1, 45), 6))
results.append(a)
``````

Is there a way I can use threading or any other method to make this code run faster? At the moment its just taking forever,over 20 minutes.

There's not a lot to be gained from threading here, thanks to the GIL, but this is a problem that can be solved with `numpy`, which can perform the work entirely at the C layer, saving a ton of time and memory to boot. A 7M by 6 sized 2D array with values in the given range can be created in less than a second with:

``````import numpy as np

results = np.random.randint(1, 45, (7000000, 6), np.uint8)
``````

This going to be faster in general, and much more memory efficient; a 7M long `list` of six-`tuple`s will (on a 64 bit version of Python) occupy an absolute minimum of around 700 MB (likely more, given allocator overhead). The `numpy` array will occupy about 40 MB. It's also pretty easy to show that the creation of this `list` with all the inner `tuple`s has unavoidable costs; microbenchmarking the `numpy` array alone shows that all the random number generation only takes around 420 ms, but converting from the `numpy` array to a `list` of six-`tuple`s in the most efficient way brings the cost up to 12.5 seconds; if your machine is similar to mine, that's essentially a cap on the performance of any pure Python solution, because it's the raw cost paid by Python to create the `tuple`s and populate the `list`:

``````>>> %timeit -r5 arr = np.random.randint(1, 45, (7000000, 6), np.uint8)
420 ms ± 875 µs per loop (mean ± std. dev. of 5 runs, 1 loop each)

>>> %timeit -r5 arr = list(map(tuple, np.random.randint(1, 45, (7000000, 6), np.uint8)))
12.5 s ± 254 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)
``````

`np.random.randint(1, 45, (7000000, 6), np.uint8).tolist()` goes faster the `list(map(tuple, ...))` (takes around 2.5 seconds), but again, that's only possible because of the C level accelerator assistance (and it would use even more memory, thanks to `list`s being somewhat less memory efficient).

Without `numpy`, the best I can suggest is to avoiding recreating the `range` over and over, by creating it once outside the loop and reusing it, e.g.:

``````choices = tuple(range(1, 45))  # tuple is generally the fastest structure to index
results = []
for i in range(1, 7000000):
a = random.sample(choices, 6)
results.append(a)
``````

That's not likely to save a whole lot though; the `random` module does an awful lot of Python level work wrapping 1-2 C level randomness generators, and that Python level work is going to be much slower than anything a fully accelerated C module could do.