The first decision is whether to run with one thread or two. The default option seems to be two threads, but I found with full chemistry that it was better to run with twice the number of MPI tasks and one thread.
Job is mi-ai234 (N96) for 10 days
so one thread is 5.92% quicker here. Not the 26.3% improvement we get with the full chemistry, but still better in this case.
At N216, job is mi-ai159 for 10 days on XCE
The difference is about 1.6%, which is well within the standard variation for an individual run anyway. We'd probably expect run on less cores to be faster with one thread and runs on more nodes to be faster with two threads.
The job for doing this is mi-ai159
NMPPE*NMPPN (total) | Model run length | Total time (s) | Total time on IBM (Cray/IBM scaling) | Speed (model years per day) |
---|---|---|---|---|
16 * 16 (256, XCE) | 10 days | 7,272 | 4,415 (*1.65) | 0.330 |
32 * 16 (512, XCE) | 10 days | 4,055 | 2,526 (*1.61) | 0.592 |
32 * 32 (1,024, XCE) | 10 days | 2,382 | 1,650 (*1.44) | 1.01 |
64 * 32 (2,048, XCE) | 10 days | 1,808 | 1,284 (*1.41) | 1.33 |
64 * 36 (2,304, XCE) | 10 days | Convergence failure in BiCGstab, suspected NaNs | 1,241 | |
64 * 46 (2,944, XCE) | 10 days | N/A | ||
64 * 64 (4,096, XCE) | 10 days | Too many processors in the North-South direction ( 64) to support the extended halo size ( 7). Try running with 46 processors | 1,647 |
The job for doing this is mi-ai159
NMPPE*NMPPN (total) | Model run length | Total time (s) | Total time on IBM (Cray/IBM scaling) | Speed (model years per day) |
---|---|---|---|---|
16 * 8 * 2 (256, XCE) | 10 days | 7,445 | 4,415 (*1.69) | 0.322 |
16 * 16 * 2 (512, XCE) | 10 days | 4,002 | 2,526 (*1.59) | 0.600 |
32 * 16 * 2 (1,024, XCE) | 10 days | 2,419 | 1,650 (*1.47) | 0.992 |
32 * 32 * 2 (2,048, XCE) | 10 days | 1,750 | 1,284 (*1.36) | 1.37 |
36 * 32 * 2 (2,304, XCE) | 10 days | 1,659 | 1,241 (*1.34) | 1.45 |
64 * 32 * 2 (4,096, XCE) | 10 days | Convergence failure in BiCGstab, suspected NaNs | 1,647 |
The job for doing this is mi-ai234.
NMPPE*NMPPN (total) | Model run length | Total time (s) | Total time on IBM (Cray/IBM scaling) | Speed (model years per day) |
---|---|---|---|---|
16 * 8 (128) | 1 month | 6,379 | 4,061 (*1.57) | 1.13 |
16 * 16 (256) | 1 month | 3,477 | 2,260 (*1.54) | 2.07 |
32 * 16 (512) | 1 month | 2,190 | 1,439 (*1.52) | 3.29 |
32 * 28 (896) | 1 month | 1,668 | 1,163 (*1.43) | 4.32 |
48 * 28 (1,344, XCE) | 1 month | 1,991 | 1,155 (*1.72) | 3.62 |
The job for doing this is mi-ak321.
NMPPE*NMPPN (total) | Model run length | Total time (s) | Total time on IBM (Cray/IBM scaling) | Speed (model years per day) |
---|---|---|---|---|
16 * 8 * 2 (256, XCE) | 1 month | 4,121 | 2,260 (*1.82) | 1.75 |
16 * 16 * 2 (512, XCE) | 1 month | 2,458 | 1,439 (*1.71) | 2.93 |
32 * 16 * 2 (896, XCE) | 1 month | 2,033 | 1,163 (*1.75) | 3.54 |
28 * 24 * 2 (1,344, XCE) | 1 month | 1,739 | 1,155 (*1.51) | 4.14 |
48 * 28 * 2 (2,688, XCE) | 1 month | 1,542 | N/A | 4.67 |
The job for doing this is mi-ai301.
NMPPE*NMPPN (total) | Model run length | Total time (s) | Total time on IBM (Cray/IBM scaling) | Speed (model years per day) |
---|---|---|---|---|
16 * 8 (128) | 1 month | 5,513 | 3,486 (*1.58) | 1.31 |
16 * 16 (256) | 1 month | 2,879 | 1,912 (*1.51) | 2.50 |
32 * 16 (512) | 1 month | 1,835 | 1,189 (*1.54) | 3.92 |
32 * 28 (896) | 1 month | 984 | ||
48 * 28 (1,344) | 1 month | 995 |
The job for doing this is mi-ai301. I found that the 32*28 job has worked after re-compiling the code.
NMPPE*NMPPN (total) | Model run length | Total time (s) | Total time on IBM (Cray/IBM scaling) | Speed (model years per day) |
---|---|---|---|---|
16 * 8 (128) | 1 month | 3,486 (*) | ||
16 * 16 (256) | 1 month | 1,912 (*) | ||
32 * 16 (512) | 1 month | 1,189 (*) | ||
32 * 28 (896, XCF) | 1 month | 1,586 | 984 (*1.61) | 4.54 |
48 * 28 (1,344) | 1 month | 995 |