All runs are
I'm profiling Snr UM with
perl drHook.pl --dir=/data/cr/ukesm/mstringe/cakdx_cakdy --nRoutines=9999 --endPe=256 [--orderBy=total]
Routines | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (1,209s) | |||||||||||||||
U_MODEL_4A (1,204s) | |||||||||||||||
ATM_STEP_4A* (1,142s) | |||||||||||||||
ATMOS _PHYS- ICS1 (315s) | EG_ COR- RECT_ TRAC- ERS (61s) | ATMOS _PHYS- ICS2 (221s) | EG_ SL_ HELM- HOLTZ (183s) | TR_ SET_ PHYS _4A* (17s) | EG_ SISL_ INIT (34s) | SL_ TRAC- ER1_ 4A (46s) | EG_ SL_ MOI- STURE (38s) | EG_SL_ FULL_WIND (78s) | EG_Q_ TO_MIX (14s) | ATM_ STEP_ STASH (29s) | ⇓ | ||||
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below | See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below | EG_ SISL_ INIT_ UVW (32s) | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (24 + 24 + 26 = 74s) | EG_SWAP_ BOUNDS_DP (64s) | STASH (34s) | ||||||||||
Itself (19s) | EG_INTERPOLATION _ETA (103s) | DEP- ARTURE_ POINT _ETA (54s) | See profile for SWAP_ BOUNDS _DP below | STWORK (34s) | |||||||||||
EG_ CUBIC_ LAG- RANGE (47s, itself) | EG_VERT_ WEIGHTS_ ETA (13s, itself) | MONO_ ENFORCE (7s, itself) | Itself (20s) | SPA- TIAL (11s) | PP_ HEAD (10s) | EXP- PXI (6s, itself) |
PEs for Jnr were 32x16, so I wouldn't expect it to hold up Snr significantly (dynamical core fields are only passed Snr -> Jnr on the hour in this run). So far the call the UKCA_MAIN1 and EG_CORRECT_TRACERS_UKCA have been stripped out. Bold numbers show where times are much bigger and underlined number were times are much smaller.
Routines | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (2,107s) | ||||||||||||||||
U_MODEL_4A (2,103s) | ||||||||||||||||
ATM_STEP_4A* (1,802s) | OASIS3 _GET _SNR (127s) | |||||||||||||||
ATMOS _PHYS- ICS1 (505s) | EG_ COR- RECT_ TRAC- ERS (33s) | ATMOS _PHYS- ICS2 (392s) | EG_ SL_ HELM- HOLTZ (208s) | TR_ SET_ PHYS _4A* (85s) | EG_ SISL_ INIT (37s) | SL_ TRAC- ER1_ 4A (234s) | EG_ SL_ MOI- STURE (37s) | EG_SL_ FULL_WIND (81s) | EG_Q_ TO_MIX (46s) | ATM_ STEP_ STASH (23s) | ⇓ | |||||
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below | See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below | EG_ SISL_ INIT_ UVW (35s) | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (23 + 23 + 26 = 72s) | EG_SWAP_ BOUNDS_DP (118s) | STASH (26s) | |||||||||||
Itself (39s) | EG_INTERPOLATION _ETA (224s) | DEP- ARTURE_ POINT _ETA (58s) | See profile for SWAP_ BOUNDS _DP below | STWORK (25s) | ||||||||||||
EG_ CUBIC_ LAG- RANGE (126s, itself) | EG_VERT_ WEIGHTS_ ETA (13s, itself) | MONO_ ENFORCE (24s, itself) | Itself (20s) | SPA- TIAL (9s) | PP_ HEAD (6s) | EXP- PXI (4s, itself) |
Summary
This is 745s, and there 899s extra (at U_MODEL_4A) so still missing 154s extra somewhere.
In addition above I've removed for Snr where TRACER_UKCA/TRACERS_UKCA is added to SUPER_ARRAY, SUPER_TRACER_PHYS1 and SUPER_TRACER_PHYS2.
Routines | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (1,794s) | ||||||||||||||||
U_MODEL_4A (1,791s) | ||||||||||||||||
ATM_STEP_4A* (1,478s) | OASIS3 _GET _SNR (134s) | |||||||||||||||
ATMOS _PHYS- ICS1 (507s) | EG_ COR- RECT_ TRAC- ERS (33s) | ATMOS _PHYS- ICS2 (390s) | EG_ SL_ HELM- HOLTZ (205s) | TR_ SET_ PHYS _4A* (9s) | EG_ SISL_ INIT (37s) | SL_ TRAC- ER1_ 4A (26s) | EG_ SL_ MOI- STURE (36s) | EG_SL_ FULL_WIND (72s) | EG_Q_ TO_MIX (46s) | ATM_ STEP_ STASH (23s) | ⇓ | |||||
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below | See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below | EG_ SISL_ INIT_ UVW (35s) | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (22 + 22 + 24 = 68s) | EG_SWAP_ BOUNDS_DP (115s) | STASH (25s) | |||||||||||
Itself (19s) | EG_INTERPOLATION _ETA (86s) | DEP- ARTURE_ POINT _ETA (48s) | See profile for SWAP_ BOUNDS _DP below | STWORK (25s) | ||||||||||||
EG_ CUBIC_ LAG- RANGE (37s, itself) | EG_VERT_ WEIGHTS_ ETA (11s, itself) | MONO_ ENFORCE (5s, itself) | Itself (20s) | SPA- TIAL (9s) | PP_ HEAD (7s) | EXP- PXI (4s, itself) |
Summary
In addition above I've removed for Snr where UKCA_TRACERS is added to TOT_TRACER.
Routines | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (1,587s) | |||||||||||||||
U_MODEL_4A (1,576s) | |||||||||||||||
ATM_STEP_4A* (1,256s) | DUMPCTL (68s) | MEANCTL (37s) | OASIS3 _GET _SNR (136s) | ||||||||||||
ATMOS _PHYS- ICS1 (507s) | EG_ COR- RECT_ TRAC- ERS (33s) | ATMOS _PHYS- ICS2 (193s) | EG_ SL_ HELM- HOLTZ (186s) | TR_ SET_ PHYS _4A* (9s) | EG_ SISL_ INIT (34s) | SL_ TRAC- ER1_ 4A (26s) | EG_ SL_ MOI- STURE (36s) | EG_SL_ FULL_WIND (72s) | EG_Q_ TO_MIX (44s) | ⇓ | STASH (25s) | UM_ WRITDUMP (62s) | ACUMPS (42s) | ||
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below | See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below | EG_ SISL_ INIT_ UVW (33s) | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (22 + 22 + 24 = 68s) | EG_SWAP_ BOUNDS_DP (112s) | STWORK (25s) | GENERAL_ GATHER_FIELD (105s) | |||||||||
Itself (19s) | EG_INTERPOLATION _ETA (86s) | DEP- ARTURE_ POINT _ETA (48s) | See profile for SWAP_ BOUNDS _DP below | STASH_GATHER_ FIELD (104s) | |||||||||||
EG_ CUBIC_ LAG- RANGE (37s, itself) | EG_VERT_ WEIGHTS_ ETA (11s, itself) | MONO_ ENFORCE (5s, itself) | Itself (20s) | GATHER_FIELD (104s) | |||||||||||
GATHER_FIELD_MPL (104s, itself) |
Summary
Routines | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (1,532s) | |||||||||||||||
U_MODEL_4A (1,525s) | |||||||||||||||
ATM_STEP_4A* (1,260s) | DUMPCTL (36s) | MEANCTL (9s) | OASIS3 _GET _SNR (141s) | ||||||||||||
ATMOS _PHYS- ICS1 (510s) | EG_ COR- RECT_ TRAC- ERS (33s) | ATMOS _PHYS- ICS2 (193s) | EG_ SL_ HELM- HOLTZ (187s) | TR_ SET_ PHYS _4A* (9s) | EG_ SISL_ INIT (35s) | SL_ TRAC- ER1_ 4A (26s) | EG_ SL_ MOI- STURE (36s) | EG_SL_ FULL_WIND (72s) | EG_Q_ TO_MIX (42s) | ⇓ | STASH (25s) | UM_ WRITDUMP (36s) | ACUMPS (9s) | ||
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below | See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below | EG_ SISL_ INIT_ UVW (33s) | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (22 + 22 + 24 = 68s) | EG_SWAP_ BOUNDS_DP (114s) | STWORK (24s) | GENERAL_ GATHER_FIELD (45s) | |||||||||
Itself (19s) | EG_INTERPOLATION _ETA (86s) | DEP- ARTURE_ POINT _ETA (48s) | See profile for SWAP_ BOUNDS _DP below | STASH_GATHER_ FIELD (44s) | |||||||||||
EG_ CUBIC_ LAG- RANGE (37s, itself) | EG_VERT_ WEIGHTS_ ETA (11s, itself) | MONO_ ENFORCE (5s, itself) | Itself (20s) | GATHER_FIELD (45s) | |||||||||||
GATHER_FIELD_MPL (45s, itself) |
Removing the section 34 and 38 fields with usage UPMEAN and produced a run with 1,532s in UM_SHELL, so 27% slower than CLASSIC run.
ATMOS_PHYSICS1 (315s) | EG_CORRECT_TRACERS (61s) | ||||||||||||
RAD_CTL (86s) | MICROPHYS_CTL (62s) | NI_GWD_CTL (78s) | ⇓ | ⇓ | EG_MASS_ CONSERVATION (48s) | ||||||||
LW_RAD (61s) | SW_RAD (20s) | LS_PPN (56s) | G_ WAVE _5A (63s) | GW_ USSP (14s) | GLOBAL _2D_ SUMS (21s, itself) | Itself (36s) | |||||||
RADIANCE_CALC (79s) | LS_PPNC (52s) | SWAP_ BOUNDS (see table below) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SOLVE_BAND_K_EQV (60s) | LSP_ICE (40s) | Itself (13s) | |||||||||||
MCICA_ SAMPLE (47s) | SCALE_ ABSORB (8s) | LSP_ INIT (9s) | LSP_ FALL (8s) | ||||||||||
MONOCHR- OMATIC_ RADIANCE (40s) | Itself (7s) | Itself (5s) | Itself (7s) | ||||||||||
MONOCHR- OMATIC_ RADIANCE _TSEQ (35s) | |||||||||||||
MCICA_ COLUMN (35s) | |||||||||||||
TWO_COEFF (24s) | |||||||||||||
TRANS_ SOURCE_ COEFF (14s) | Itself (4s) | ||||||||||||
Itself (9s) | |||||||||||||
ATMOS_PHYSICS1 (505s) | EG_CORRECT_TRACERS (34s) | |||||||||||||
RAD_CTL (161s) | MICROPHYS_CTL (59s) | NI_GWD _CTL (201s) | ⇓ | ⇓ | EG_MASS_ CONSERVATION (31s) | |||||||||
LW_RAD (116s) | SW_RAD (38s) | LS_PPN (54s) | G_ WAVE _5A (171s) | GW_ USSP (30s) | GLOBAL _2D_ SUMS (28s, itself) | Itself (23s) | ||||||||
RADIANCE_CALC (151s) | LS_PPNC (51s) | SWAP_ BOUNDS (see table below) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UKCA_ RADAER _BAND_ AVERAGE (76s, itself) | SOLVE_BAND_K_EQV (59s) | UKCA_ RADAER_ COMPUTE _AOD (6s, itself) | LSP_ICE (39s) | Itself (13s) | ||||||||||
MCICA_ SAMPLE (46s) | SCALE_ ABSORB (8s) | LSP_ INIT (9s) | LSP_ FALL (8s) | |||||||||||
MONOCHR- OMATIC_ RADIANCE (40s) | Itself (6s) | Itself (5s) | Itself (7s) | |||||||||||
MONOCHR- OMATIC_ RADIANCE _TSEQ (35s) | ||||||||||||||
MCICA_ COLUMN (34s) | ||||||||||||||
TWO_COEFF (24s) | ||||||||||||||
TRANS_ SOURCE_ COEFF (14s) | Itself (4s) | |||||||||||||
Itself (9s) | ||||||||||||||
If the extra time in NI_GWD_CTL is just a consequence of a barrier and massive imbalance in UKCA_RADAER_BAND_AVERAGE, where the times vary from 17s to 133s (and imbalance throughout day is likely to be much higher).
ATMOS_PHYSICS2 (221s) | |||
NI_CONV_CTL (95s) | NI_IMP_CTL (49s) | SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV (see table below) | |
---|---|---|---|
GLUE_CONV_5A (70s) | IMP_SOLVER (24s) | ||
Itself (30s) | MID_CONV_5A (9s) | ||
Itself (8s) |
ATMOS_PHYSICS2 (392s) | |||
NI_CONV_CTL (212s) | NI_IMP_CTL (73s) | SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV (see table below) | |
---|---|---|---|
GLUE_CONV_5A (166s) | IMP_SOLVER (40s) | ||
Itself (104s) | MID_CONV_5A (38s) | ||
Itself (18s) |
All the parts of ATMOS_PHYSICS2 look bigger for my Snr UM.
Colin mentioned that this needed removing, although the total time in UKCA_SCAVENGING_MOD.UKCA_PLUME_SCAV is less than 1s.
Extra part | Do I want to keep it | Flags controlling it |
---|---|---|
RADAER code | Yes | l_ukca_radaer |
Extra storage space | Most, if not all | I think fields need to be present in start dump and pass the test is TSTMSK, which uses many of logicals in this table |
UKCA_MAIN1 and code below | No | l_ukca |
EG_CORRECT_TRACERS_UKCA | No | l_tracer, l_conserve_ukca_with_tr = .false. |
Section 1.10 in NI_CONV_CTL | No | l_biomass, l_dust, l_ocff, l_soot, l_sulp_nh3, l_sulp_so2, l_use_cariolle, tr_ukca, tr_vars |
Number of sections in SL_TRACER_4A | No | |
UKCA_PLUME_SCAV | No, but it's small | l_tracer .AND. l_ukca .AND. l_ukca_plume_scav .AND. npnts > 0 |
For CLASSIC job, gadga, most of the flags in &Amp;RUN_Aerosol are TRUE, including L_BIOMASS, L_OCFF, L_SOOT, L_SULPC_NH3 and L_SULPC_SO2.
Estimate of extra components of full chemistry compared to CLASSIC (1,209s in UM_SHELL).
Components | % extra |
---|---|
What can be removed from Snr | |
Chemistry scheme/UKCA_MAIN1 (+4,878s**) | +400% |
Advection of UKCA fields/EG_CORRECT_TRACERS_UKCA (+238s*) | +31% |
SL transport of UKCA fields/SL_TRACER1_4A & TR_SET_PHYS_4A (+256s) | +21% |
Convective transport of UKCA fields/NI_CONV_CTL (+171s) | +14% |
Meaning of UKCA fields for diagnostics (+60s) | +5% |
What is difficult to remove from Snr (done through D1 and blind to section) | |
Scatter and gather of UKCA fields (+45s) | +4% |
What can't be removed from Snr | |
RADAER/UKCA_RADAER_BAND_AVERAGE (+190s) | +16% |
Receive/wait for coupling fields/OASIS3_GET_SNR (+140s) | +12% |
My final run is overall 26% slower than CLASSIC (but results not checked as cumf is giving a memory fault).
Jnr PEs | Total time for Snr | |
---|---|---|
UM_SHELL | OASIS3_GET_SNR | |
48x28=1,344 | 1,506 | 107 |
32x16=512 | 1,532 | 141 |
16x24=384 | 1,740 | 334 |
16x16=256 | 2,413 | 1,016 |
8x24=192 | 3,075 | 1,686 |