I think maybe I'm not understanding some fundamental aspect of MPI here, but using 4 processes, here is the breakdown using `time ./main`

:

```
13.00 real 4.99 user 3.05 sys
13.00 real 4.84 user 2.87 sys
13.00 real 4.91 user 2.84 sys
13.00 real 5.18 user 3.04 sys
```

So it seems like CPU time for each process only takes around 5 seconds, yet the total elapse time is much longer than that. Why is this, if the processes are being run in parallel then shouldn't the total elapsed time be much closer to 5 seconds?

If I execute with 1 process, I get

```
9.92 real 8.49 user 0.88 sys
```

The code in question is assembling a somewhat large matrix (120,000 x 120,000 in size) and solving a linear system. The only place I can think of in the code I can think of that would be slowing down would be in the linear solve, but I'm using the `superlu_dist`

package to solve, which is specifically meant for solving in parallel.

So can someone help me to understand these results?

parallel-processingmpi