Why is there no speed up when using OpenMP to generate random numbers?

James Source

I am looking to run a type of monte carlo simulations which require the generation of random numbers, and a set of instructions based on those random numbers.

I wish to make use of parallel processing but when testing my code (written in C) there seems to be an inverse speed up with more cores! I'm not sure what I could be doing wrong. I then copied the code form another answer and still get this effect.

The code slightly modified form the answer is

#define NRANDS 1000000
int main() {

    int a[NRANDS];

    #pragma omp parallel default(none) shared(a)
    {
        int i;
        unsigned int myseed = omp_get_thread_num();
        #pragma omp for
        for(i=0; i<NRANDS; i++)
                a[i] = rand_r(&myseed);
    }
    double sum = 0.;
    for (long int i=0; i<NRANDS; i++) {
        sum += a[i];
    }
    printf("sum = %lf\n", sum);

    return 0;
}

where I have just then run the time command in terminal in order to time how long it takes to run. I varied the number of threads allowed using export OMP_NUM_THREADS=2. The output of my terminal is:

Thread total: 1
sum = 1074808568711883.000000
real    0m0,041s
user    0m0,036s
sys 0m0,004s

Thread total: 2
sum = 1074093295878604.000000
real    0m0,037s
user    0m0,058s
sys 0m0,008s

Thread total: 3
sum = 1073700114076905.000000
real    0m0,032s
user    0m0,061s
sys 0m0,010s

Thread total: 4
sum = 1073422298606608.000000

real    0m0,035s
user    0m0,074s
sys 0m0,024s
cmultithreadingrandomparallel-processingopenmp

Answers

answered 3 months ago TypeIA #1

Note that the time command adds up the time spent on all cores when it prints the user and sys values. Observe that your wall time (real) is nearly constant.

Also, your benchmark is too small. There is a significant cost of creating and managing threads. This overhead may be overshadowing the actual execution time of the random number generation. A million values isn't that many. In other words, the time taken to actually compute the random numbers is so small that it's lost in the noise and dwarfed by the setup/teardown costs. If you generate a whole lot more, you may start to see the advantage due to parallelism.

comments powered by Disqus