In UM ticket 1686, I'm adding most of the code associated with the logical l_strip_ukca. Setting this to true should
I've created a UKESM-hybrid N96 N48 ORCA1 job, u-bh774, where I've only turned on the profiling for Snr and I've made sure that Snr is the slowest component even when l_strip_ukca is true. To do this I have had to reduce the MPI tasks for Snr to be about the minimum that memory will allow and reduced the timestep from 20 mins to 15 mins. The full setting are
The profiling below show is the total time, unless stated as 'itself', which is the time in a given routine plus all the time in routines called by the given routine.
Routines | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (4,534s) | ||||||||||||||||
U_MODEL_4A (4,528s) | ||||||||||||||||
ATM_STEP_4A* (4,265s) | OASIS3_ GETO2A (130s) | OASIS3 _GET_ HYBRID (45s) | OASIS3_ PUTA2O (5s) | OASIS3 _PUT_ HYBRID (7s) | ||||||||||||
ATMOS _PHYS- ICS1 (499s) | ATMOS _PHYS- ICS2 (329s) | EG_ SL_ HELM- HOLTZ (267s) | TR_ SET_ PHYS _4A* (162s) | EG_CORRECT _TRACERS _PRIESTLEY (199s) | SL_ TRAC- ER1_ 4A (277s) | EG_ SL_ MOI- STURE (34s) | EG_SL_ FULL_WIND (88s) | ⇓ | UKCA_MAIN1 (1,506s) | OASIS3_GET (36s) | OASIS3_PUT (9s) | |||||
ATMOS_ PHYS- ICS1 rout- ines | ATMOS_ PHYS- ICS2 rout- ines | EG_ SL_ HELM- HOLTZ rout- ines | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (16 + 16 + 21 = 53s) | STASH (1,482s) | UKCA_ MAIN1 rout- ines | |||||||||||
Itself (64s) | EG_INTERPOLATION _ETA_PMF (234s) | DEP- ARTURE_ POINT _ETA (44s) | STWORK (1,481s) | |||||||||||||
EG_INTERPOLATION _ETA (295s) | Itself (8s) | PP_ HEAD (370s) | EXP- PXI (303s, itself) | |||||||||||||
EG_ CUBIC_ LAG- RANGE (79s, itself) | MONO_ ENFORCE (24s, itself) | Itself (104s) |
Routines | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (2,028s) | |||||||||||||||
U_MODEL_4A (2,022s) | |||||||||||||||
ATM_STEP_4A* (1,894s) | OASIS3_ GETO2A (21s) | OASIS3 _GET_ HYBRID (48s) | OASIS3_ PUTA2O (5s) | OASIS3 _PUT_ HYBRID (6s) | |||||||||||
ATMOS _PHYS- ICS1 (499s) | ATMOS _PHYS- ICS2 (178s) | EG_ SL_ HELM- HOLTZ (247s) | TR_ SET_ PHYS _4A* (9s) | EG_CORRECT _TRACERS _PRIESTLEY (10s) | SL_ TRAC- ER1_ 4A (13s) | EG_ SL_ MOI- STURE (32s) | EG_SL_ FULL_WIND (65s) | ⇓ | OASIS3_GET (44s) | OASIS3_PUT (8s) | |||||
ATMOS_ PHYS- ICS1 rout- ines | ATMOS_ PHYS- ICS2 rout- ines | EG_ SL_ HELM- HOLTZ rout- ines | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (15 + 15 + 17 = 47s) | STASH (629s) | |||||||||||
Itself (3s) | EG_INTERPOLATION _ETA_PMF (61s) | DEP- ARTURE_ POINT _ETA (34s) | STWORK (628s) | ||||||||||||
EG_INTERPOLATION _ETA (60s) | Itself (8s) | PP_ HEAD (159s) | EXP- PXI (131s, itself) | ||||||||||||
EG_ CUBIC_ LAG- RANGE (19s, itself) | MONO_ ENFORCE (4s, itself) | Itself (26s) |
The times highlighted in bold are those times which are much smaller than when running with l_strip_ukca=false. The runtime when l_strip_ukca=true is less than half the runtime for l_strip_ukca=false and the main reason for this are
ATMOS_PHYSICS2 (329s) | |||
NI_CONV_CTL (145s) | NI_IMP_CTL (61s) | SWAP_BOUNDS routines | |
---|---|---|---|
GLUE_CONV_6A (102s, 101s) | IMP_SOLVER (30s) | ||
Itself (61s, 61s) | MID_CONV_6A (24s, 24s) | ||
Itself (15s, 14s) |
ATMOS_PHYSICS2 (174s) | |||
NI_CONV_CTL (43s) | NI_IMP_CTL (38s) | SWAP_BOUNDS routines | |
---|---|---|---|
GLUE_CONV_6A (32s, 31s) | IMP_SOLVER (18s) | ||
Itself (12s, 12s) | MID_CONV_6A (11s, 10s) | ||
Itself (3s, 3s) |
The tables above show that with l_strip_ukca=true there is just a general reduction in the time across all the routines called from ATMOS_PHYSICS2.