Timings for full chemistry

N96

The job for doing this is u-ae884 on broadwell nodes with safe optimisation.

NMPPE*NMPPN (total tasks, nodes)	Model run length	Total time	Speed (model years per day)
24 * 24 (576 tasks, 36 nodes)	10 days	23:39 (1,419s)	1.69
36 * 20 (720 tasks, 20 nodes)	10 days	24:10 (1,450s)	1.66
36 * 24 (864 tasks, 24 nodes)*	10 days	20:31 (1,231s)	1.95

Trying to understand why full chemistry is slower than the runtimes we had before

My previous fullChemN96 estimates were based on mi-ah651 at UM10.2, which were run on the haswell nodes.

Main features	NMPPE*NMPPN (total tasks, nodes)	Model run length	Total time	Speed (model years per day)
u-ae884, broadwell, safe	24 * 24 (576 tasks, 16 broadwell nodes)	10 days	23:39 (1,419s)	1.69
u-ae884, haswell, safe	24 * 24 (576 tasks, 18 haswell nodes)	10 days	22:45 (1,365s)	1.76
u-ae884, haswell, high	24 * 24 (576 tasks, 18 haswell nodes)	10 days	20:56 (1,256s)	1.91
mi-ah651, haswell, safe	24 * 24 (576 tasks, 18 haswell nodes)	10 days	20:34 (1,234s)	1.94
mi-ah651, haswell, high	24 * 24 (576 tasks, 18 haswell nodes)	10 days	17:20 (1,040s)	2.31

Clearly my previous full chemistry run at UM10.2 with optimisation `high' is much quicker than my current full chemistry run at UM10.4 at optimisation `safe' by about 31%.

Looking at the profiling is seems that UM10.4 is slower than my previous UM10.2 runs for three reasons

COSP_MAIN is called in UM10.4 and not UM10.2 (it's called from ATMOS_PHYSICS1 and looks to be worth about 100s - adding about 6% to runtime).
Extra STASH from UM10.4 job.
Difference in optimisation. Compared to `safe'
- going to `high' optimisation increased speed of UM10.4 by about 9%
- going to `high' optimisation increased speed of UM10.4 by about 19% (this is most of the 31% difference).

Finding an efficient setup

I need to find a setup which runs at a reasonable speed, but still uses resources fairly efficiently. I've copied GA7.1 + StratTrop UM10.7, u-ak990, to u-am972 for these tests

Nodes (ATM_PROCX*ATM_PROCY)	Threads	Time for one month	Core hours/model year	Speed (model years/day)
8 (18*16)	1	1:55:28 (6,928s)	6,646	1.04
12 (1824)^	1	1:23:19 (4,999s)	7,200	1.44
12 (18*24)	1	1:27:11 (5,231s)	7,513	1.38
18 (36*18)	1	1:11:35 (4,295s)	9,257	1.68

* This didn't have Luke optimisation branch to chemistry solver

It looks like we can drop the speed and get more efficiency, but do we really want to run at less than about 1.4 model years/day. I suspect not so I'll stick with 12 nodes for now and the following settings.

(ATM_PROCX,ATM_PROCY)=(24,18)
OMPTHR_ATM=1
IOS_NPROC=0 (I don't think the I/O server will be beneficial at these speeds, but I should test when I've got more time)
Total nodes=12
Three one month cycles, which took (PBS epilogue failed), 1:21:27 (4,887s) and 1:18:02 (4,682s)
This is an average of 4,785s (1:19:45)
A speed of 1.50 model years/day
This is 24 * 36 * 12 / 1.50 = 6,912 core hours/model year

Marc's pages

Timings for full chemistry

N96

Trying to understand why full chemistry is slower than the runtimes we had before

Finding an efficient setup