Timings for offline oxidants

How many threads

The first decision is whether to run with one thread or two. The default option seems to be two threads, but I found with full chemistry that it was better to run with twice the number of MPI tasks and one thread.

Job is mi-ai234 (N96) for 10 days

  • 32x16 with 1 thread takes 00:13:14 (794s)
  • 16x16 with 2 threads takes 00:14:04 (844s)

so one thread is 5.92% quicker here. Not the 26.3% improvement we get with the full chemistry, but still better in this case.

At N216, job is mi-ai159 for 10 days on XCE

  • 32x32 with 1 thread takes 00:39:42 (2,382s)
  • 32x16 with 2 threads takes 00:40:19 (2,419s)

The difference is about 1.6%, which is well within the standard variation for an individual run anyway. We'd probably expect run on less cores to be faster with one thread and runs on more nodes to be faster with two threads.

N216 with 1 thread

The job for doing this is mi-ai159

NMPPE*NMPPN (total) Model run lengthTotal time (s) Total time on IBM (Cray/IBM scaling) Speed (model years per day)
16 * 16 (256, XCE) 10 days7,272 4,415 (*1.65) 0.330
32 * 16 (512, XCE) 10 days4,055 2,526 (*1.61) 0.592
32 * 32 (1,024, XCE) 10 days2,382 1,650 (*1.44) 1.01
64 * 32 (2,048, XCE) 10 days1,808 1,284 (*1.41) 1.33
64 * 36 (2,304, XCE) 10 daysConvergence failure in BiCGstab, suspected NaNs 1,241
64 * 46 (2,944, XCE) 10 days N/A
64 * 64 (4,096, XCE) 10 days Too many processors in the North-South direction ( 64) to support the extended halo size ( 7). Try running with 46 processors 1,647

N216 with 2 threads

The job for doing this is mi-ai159

NMPPE*NMPPN (total) Model run lengthTotal time (s) Total time on IBM (Cray/IBM scaling) Speed (model years per day)
16 * 8 * 2 (256, XCE) 10 days7,445 4,415 (*1.69) 0.322
16 * 16 * 2 (512, XCE) 10 days4,002 2,526 (*1.59) 0.600
32 * 16 * 2 (1,024, XCE) 10 days2,419 1,650 (*1.47) 0.992
32 * 32 * 2 (2,048, XCE) 10 days1,750 1,284 (*1.36) 1.37
36 * 32 * 2 (2,304, XCE) 10 days1,659 1,241 (*1.34) 1.45
64 * 32 * 2 (4,096, XCE) 10 daysConvergence failure in BiCGstab, suspected NaNs 1,647

N96 with 1 thread

The job for doing this is mi-ai234.

NMPPE*NMPPN (total) Model run lengthTotal time (s) Total time on IBM (Cray/IBM scaling) Speed (model years per day)
16 * 8 (128) 1 month6,379 4,061 (*1.57) 1.13
16 * 16 (256) 1 month3,477 2,260 (*1.54) 2.07
32 * 16 (512) 1 month2,190 1,439 (*1.52) 3.29
32 * 28 (896) 1 month1,668 1,163 (*1.43) 4.32
48 * 28 (1,344, XCE) 1 month1,991 1,155 (*1.72) 3.62

N96 with 2 threads

The job for doing this is mi-ak321.

NMPPE*NMPPN (total) Model run lengthTotal time (s) Total time on IBM (Cray/IBM scaling) Speed (model years per day)
16 * 8 * 2 (256, XCE) 1 month4,121 2,260 (*1.82) 1.75
16 * 16 * 2 (512, XCE) 1 month2,458 1,439 (*1.71) 2.93
32 * 16 * 2 (896, XCE) 1 month2,033 1,163 (*1.75) 3.54
28 * 24 * 2 (1,344, XCE) 1 month1,739 1,155 (*1.51) 4.14
48 * 28 * 2 (2,688, XCE) 1 month1,542 N/A 4.67

N96 with 30 minute timestep

The job for doing this is mi-ai301.

NMPPE*NMPPN (total) Model run lengthTotal time (s) Total time on IBM (Cray/IBM scaling) Speed (model years per day)
16 * 8 (128) 1 month5,513 3,486 (*1.58) 1.31
16 * 16 (256) 1 month2,879 1,912 (*1.51) 2.50
32 * 16 (512) 1 month1,835 1,189 (*1.54) 3.92
32 * 28 (896) 1 month 984
48 * 28 (1,344) 1 month 995

N96 with 30 minute timestep after split cabinet fix

The job for doing this is mi-ai301. I found that the 32*28 job has worked after re-compiling the code.

NMPPE*NMPPN (total) Model run lengthTotal time (s) Total time on IBM (Cray/IBM scaling) Speed (model years per day)
16 * 8 (128) 1 month 3,486 (*)
16 * 16 (256) 1 month 1,912 (*)
32 * 16 (512) 1 month 1,189 (*)
32 * 28 (896, XCF) 1 month1,586 984 (*1.61) 4.54
48 * 28 (1,344) 1 month 995