I think maybe I'm not understanding some fundamental aspect of MPI here, but using 4 processes, here is the breakdown using
13.00 real 4.99 user 3.05 sys 13.00 real 4.84 user 2.87 sys 13.00 real 4.91 user 2.84 sys 13.00 real 5.18 user 3.04 sys
So it seems like CPU time for each process only takes around 5 seconds, yet the total elapse time is much longer than that. Why is this, if the processes are being run in parallel then shouldn't the total elapsed time be much closer to 5 seconds?
If I execute with 1 process, I get
9.92 real 8.49 user 0.88 sys
The code in question is assembling a somewhat large matrix (120,000 x 120,000 in size) and solving a linear system. The only place I can think of in the code I can think of that would be slowing down would be in the linear solve, but I'm using the
superlu_dist package to solve, which is specifically meant for solving in parallel.
So can someone help me to understand these results?parallel-processingmpi