GLOMAP (IBM)
Timings (IBM)
Timings (Cray)
- Atmosphere + CLASSIC
- Offline oxidants
- Full chemistry
- Ocean only
- MEDUSA
- Testing coupling assumptions
- CMIP configurations
- GC3 runs
Timings (Cray II)
Dr Hook

Timings for offline oxidants

How many threads

The first decision is whether to run with one thread or two. The default option seems to be two threads, but I found with full chemistry that it was better to run with twice the number of MPI tasks and one thread.

Job is mi-ai234 (N96) for 10 days

32x16 with 1 thread takes 00:13:14 (794s)
16x16 with 2 threads takes 00:14:04 (844s)

so one thread is 5.92% quicker here. Not the 26.3% improvement we get with the full chemistry, but still better in this case.

At N216, job is mi-ai159 for 10 days on XCE

32x32 with 1 thread takes 00:39:42 (2,382s)
32x16 with 2 threads takes 00:40:19 (2,419s)

The difference is about 1.6%, which is well within the standard variation for an individual run anyway. We'd probably expect run on less cores to be faster with one thread and runs on more nodes to be faster with two threads.

N216 with 1 thread

The job for doing this is mi-ai159

NMPPE*NMPPN (total)	Model run length	Total time (s)	Total time on IBM (Cray/IBM scaling)	Speed (model years per day)
16 * 16 (256, XCE)	10 days	7,272	4,415 (*1.65)	0.330
32 * 16 (512, XCE)	10 days	4,055	2,526 (*1.61)	0.592
32 * 32 (1,024, XCE)	10 days	2,382	1,650 (*1.44)	1.01
64 * 32 (2,048, XCE)	10 days	1,808	1,284 (*1.41)	1.33
64 * 36 (2,304, XCE)	10 days	Convergence failure in BiCGstab, suspected NaNs	1,241
64 * 46 (2,944, XCE)	10 days		N/A
64 * 64 (4,096, XCE)	10 days	Too many processors in the North-South direction ( 64) to support the extended halo size ( 7). Try running with 46 processors	1,647

N216 with 2 threads

The job for doing this is mi-ai159

NMPPE*NMPPN (total)	Model run length	Total time (s)	Total time on IBM (Cray/IBM scaling)	Speed (model years per day)
16 * 8 * 2 (256, XCE)	10 days	7,445	4,415 (*1.69)	0.322
16 * 16 * 2 (512, XCE)	10 days	4,002	2,526 (*1.59)	0.600
32 * 16 * 2 (1,024, XCE)	10 days	2,419	1,650 (*1.47)	0.992
32 * 32 * 2 (2,048, XCE)	10 days	1,750	1,284 (*1.36)	1.37
36 * 32 * 2 (2,304, XCE)	10 days	1,659	1,241 (*1.34)	1.45
64 * 32 * 2 (4,096, XCE)	10 days	Convergence failure in BiCGstab, suspected NaNs	1,647

N96 with 1 thread

The job for doing this is mi-ai234.

NMPPE*NMPPN (total)	Model run length	Total time (s)	Total time on IBM (Cray/IBM scaling)	Speed (model years per day)
16 * 8 (128)	1 month	6,379	4,061 (*1.57)	1.13
16 * 16 (256)	1 month	3,477	2,260 (*1.54)	2.07
32 * 16 (512)	1 month	2,190	1,439 (*1.52)	3.29
32 * 28 (896)	1 month	1,668	1,163 (*1.43)	4.32
48 * 28 (1,344, XCE)	1 month	1,991	1,155 (*1.72)	3.62

N96 with 2 threads

The job for doing this is mi-ak321.

NMPPE*NMPPN (total)	Model run length	Total time (s)	Total time on IBM (Cray/IBM scaling)	Speed (model years per day)
16 * 8 * 2 (256, XCE)	1 month	4,121	2,260 (*1.82)	1.75
16 * 16 * 2 (512, XCE)	1 month	2,458	1,439 (*1.71)	2.93
32 * 16 * 2 (896, XCE)	1 month	2,033	1,163 (*1.75)	3.54
28 * 24 * 2 (1,344, XCE)	1 month	1,739	1,155 (*1.51)	4.14
48 * 28 * 2 (2,688, XCE)	1 month	1,542	N/A	4.67

N96 with 30 minute timestep

The job for doing this is mi-ai301.

NMPPE*NMPPN (total)	Model run length	Total time (s)	Total time on IBM (Cray/IBM scaling)	Speed (model years per day)
16 * 8 (128)	1 month	5,513	3,486 (*1.58)	1.31
16 * 16 (256)	1 month	2,879	1,912 (*1.51)	2.50
32 * 16 (512)	1 month	1,835	1,189 (*1.54)	3.92
32 * 28 (896)	1 month		984
48 * 28 (1,344)	1 month		995

N96 with 30 minute timestep after split cabinet fix

The job for doing this is mi-ai301. I found that the 32*28 job has worked after re-compiling the code.

NMPPE*NMPPN (total)	Model run length	Total time (s)	Total time on IBM (Cray/IBM scaling)	Speed (model years per day)
16 * 8 (128)	1 month		3,486 (*)
16 * 16 (256)	1 month		1,912 (*)
32 * 16 (512)	1 month		1,189 (*)
32 * 28 (896, XCF)	1 month	1,586	984 (*1.61)	4.54
48 * 28 (1,344)	1 month		995

Marc's pages