GLOMAP
Comparing Backward-Euler with CLASSIC
In the profiles for the CLASSIC run, the bold numbers show where
the time is much less than for the
Backward-Euler run, and the underlined times where the time is
much more (just for EG_SISL_INIT and ATM_STEP_STASH).
The top profile tree
Backward-Euler
Routines |
UM_SHELL (5,000s) |
U_MODEL_4A (4,997s) |
ATM_STEP_4A* (3,374s) |
UKCA_MAIN1 (1,464s) |
ATMOS _PHYS- ICS1 (1,493s) |
EG_ COR- RECT_ TRAC- ERS (214s) |
ATMOS _PHYS- ICS2 (442s) |
EG_ SL_ HELM- HOLTZ (227s) |
TR_ SET_ PHYS _4A* (76s) |
EG_CORRECT _TRACERS _UKCA (167s)
|
SL_ TRAC- ER1_ 4A (162s) |
EG_ SL_ MOI- STURE (80s) |
EG_SL_ FULL_WIND (135s) |
⇓ |
UP- DATE _M_ STAR (74s) |
ATM_ STEP_ STASH (60s) |
⇓ |
⇓ |
See profiling for UKCA_ MAIN1 |
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below
|
See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below
|
EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (38 + 38 + 45 = 121s)
|
⇓ |
EG_Q_ TO_MIX (75s) |
⇓ |
STASH (172s) |
Itself (120s) |
EG_INTERPOLATION _ETA (237s) |
DEP- ARTURE_ POINT _ETA (82s) |
EG_SWAP_ BOUNDS_DP (159s) |
STWORK (172s) |
EG_ CUBIC_ LAG- RANGE (98s, itself) |
EG_VERT_ WEIGHTS_ ETA (19s, itself) |
MONO_ ENFORCE (19s, itself) |
Itself (36s) |
See profile for SWAP_ BOUNDS _DP below |
SPA- TIAL (64s) |
PP_ HEAD (54s) |
EXP- PXI (35s, itself) |
CLASSIC
Routines |
UM_SHELL (2,568s) |
U_MODEL_4A (2,566s) |
ATM_STEP_4A* (2,498s) |
ATMOS _PHYS- ICS1 (949s) |
EG_ COR- RECT_ TRAC- ERS (112s) |
ATMOS _PHYS- ICS2 (404s) |
EG_ SL_ HELM- HOLTZ (240s) |
TR_ SET_ PHYS _4A* (48s) |
EG_ SISL_ INIT (50s) |
SL_ TRAC- ER1_ 4A (102s) |
EG_ SL_ MOI- STURE (79s) |
EG_SL_ FULL_WIND (135s) |
EG_Q_ TO_MIX (21s) |
ATM_ STEP_ STASH (112s) |
⇓ |
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below
|
See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below
|
EG_ SISL_ INIT_ UVW (47s) |
EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (39 + 38 + 45 = 122s)
|
EG_SWAP_ BOUNDS_DP (107s) |
STASH (130s) |
Itself (33s) |
EG_INTERPOLATION _ETA (177s) |
DEP- ARTURE_ POINT _ETA (83s) |
See profile for SWAP_ BOUNDS _DP below |
STWORK (130s) |
EG_ CUBIC_ LAG- RANGE (80s, itself) |
EG_VERT_ WEIGHTS_ ETA (19s, itself) |
MONO_ ENFORCE (14s, itself) |
Itself (36s) |
|
|
|
SPA- TIAL (33s) |
PP_ HEAD (46s) |
EXP- PXI (30s, itself) |
*should also link to SWAP_BOUNDS_DP, like many other
returns.
From the profiling comparison, it's clear that most of the
2,432s that the CLASSIC run saves over the Backward-Euler run
is from
No UKCA_MAIN1 |
1,464s |
Smaller ATMOS_PHYSICS1 |
544s |
No UKCA_CORRECT_TRACERS_UKCA |
167s |
Smaller EG_CORRECT_TRACERS |
102s |
Total |
2,277s |
Profiling for ATMOS_PHYSICS1 and EG_CORRECT_TRACERS
Backward-Euler
ATMOS_PHYSICS1 (1,493s) |
EG_CORRECT_TRACERS (214s) |
RAD_CTL (494s) |
MICROPHYS_CTL (329s) |
NI_GWD _CTL (328s) |
⇓ |
⇓ |
EG_MASS_ CONSERVATION (138s) |
Itself (39s) |
LW_RAD (357s) |
SW_RAD (120s) |
LS_PPN (322s) |
G_ WAVE _5A (279s) |
GW_ USSP (47s) |
GLOBAL _2D_ SUMS (51s, itself) |
Itself (97s) |
RADIANCE_CALC (470s) |
LS_PPNC (313s) |
SWAP_ BOUNDS (see table below) |
UKCA_ RADAER _BAND_ AVERAGE (272s, itself) |
SOLVE_BAND_K_EQV (154s) |
UKCA_ RADAER_ COMPUTE _AOD (21s, itself) |
LSP_ICE (198s) |
Itself (114s) |
MCICA_ SAMPLE (122s) |
SCALE_ ABSORB (20s) |
LSP_ SUBGRID (107s) |
LSP_ INIT (23s) |
LSP_ FALL (19s) |
MONOCHR- OMATIC_ RADIANCE (107s) |
Itself (15s) |
LSP_ QCLEAR (77s) |
Itself (25s) |
Itself (12s) |
Itself (17s) |
MONOCHR- OMATIC_ RADIANCE _TSEQ (94s) |
QWIDTH (77s, itself) |
MCICA_ COLUMN (94s) |
TWO_COEFF (67s) |
TRANS_ SOURCE_ COEFF (37s) |
Itself (12s) |
Itself (23s) |
|
|
CLASSIC
ATMOS_PHYSICS1 (949s) |
EG_CORRECT_TRACERS (122s) |
RAD_CTL (215s) |
MICROPHYS_CTL (186s) |
NI_GWD_CTL (176s) |
⇓ |
⇓ |
EG_MASS_ CONSERVATION (90s) |
Itself (17s) |
LW_RAD (153s) |
SW_RAD (53s) |
LS_PPN (174s) |
G_ WAVE _5A (150s) |
GW_ USSP (25s) |
GLOBAL _2D_ SUMS (38s, itself) |
Itself (63s) |
RADIANCE_CALC (200s) |
LS_PPNC (163s) |
SWAP_ BOUNDS (see table below) |
SOLVE_BAND_K_EQV (155s) |
LSP_ICE (95s) |
Itself (68s) |
MCICA_ SAMPLE (122s) |
SCALE_ ABSORB (20s) |
LSP_ SUBGRID (6s) |
LSP_ INIT (23s) |
LSP_ FALL (19s) |
MONOCHR- OMATIC_ RADIANCE (107s) |
Itself (15s) |
Itself (6s) |
Itself (11s) |
Itself (16s) |
MONOCHR- OMATIC_ RADIANCE _TSEQ (94s) |
MCICA_ COLUMN (94s) |
TWO_COEFF (67s) |
TRANS_ SOURCE_ COEFF (37s) |
Itself (12s) |
Itself (23s) |
|
|
From this profiling it's clear that the 544s savings in
ATMOS_PHYSICS1 for the CLASSIC over Backward-Euler run
largely come from
No UKCA_RADAER_BAND_AVERAGE |
272s |
Reduction in use of SWAP BOUNDS |
~150s |
A big reduction in the use of LSP_SUBGRID (including
LSP_QCLEAR and QWIDTH) |
101s |
Total |
523s |
For the EG_CORRECT_TRACERS strand, EG_CORRECT_TRACERS and
EG_MASS_CONSERVATION are about 1.75 and 1.53 times slower for
Backward-Euler than CLASSIC. Apparently these routines are still
processing all tacers, even though EG_CORRECT_TRACERS_UKCA is
processing many of them, which means these routines should be
processing
- 8 + 35 = 43 fields for Backward-Euler
- 8 + 23 = 31 fields for CLASSIC
This would suggest that Backward-Euler should be 41/31=1.39 slower,
but it seems to be even slower than that. Mohit is creating a
branch to remove the fields that are being advected by
EG_CORRECT_TRACERS_UKCA.
What do LSP_SUBGRID, LSP_QCLEAR and QWIDTH do?
According to its header, LSP_SUBGRID,
! Purpose:
! Perform the subgrid-scale setting up calculations
! Method:
! Parametrizes the width of the vapour distribution in the part
! of the gridbox which does not have liquid water present.
! Calculates the overlaps within each gridbox between the cloud
! fraction prognostics and rainfraction diagnostic.
The addition of LSP_QCLEAR and QWIDTH is from an addition of a
branch. According to its header, LSP_QCLEAR,
! Purpose:
! Returns the average relative humidity in the cloud-free (clear-sky)
! portion of gridboxes.
!
! Method:
! Parametrizes the width of the vapour distribution in the part
! of the gridbox which does not have liquid water present.
and according to its header, QWIDTH, `Calculates width of vapour
dist. in ice and clear region.'
It looks like it's the number of
calls that is causing the pain here
- calls for QWIDTH for Backward-Euler: 165,589,760 (mean)
- calls for LS_QCLEAR for Backward-Euler: 937,668 (mean)
- calls for LSP_SUBGRID for Backward-Euler: 754,068 (mean)
Profiling for ATMOS_PHYSICS2 and EG_SL_HELMHOLTZ
Backward-Euler
ATMOS_PHYSICS2 (442s) |
EG_SL_HELMHOLTZ (227s) |
NI_CONV_CTL (206s) |
NI_IMP_CTL (79s) |
SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV
(see table below)
|
EG_BICGSTAB (128s) |
EG_HELM_RHS_STAR (67s) |
GLUE_CONV_6A (155s) |
IMP_SOLVER (40s) |
EG_PRECON (87s) |
EG_SISL_INIT (51s) |
Itself (64s) |
MID_CONV_6A (53s) |
TRI_SOR_DP_DP (87s) |
EG_SISL_INIT_UVW (48s) |
Itself (17s) |
Itself (58s) |
Itself (33s) |
CLASSIC
ATMOS_PHYSICS2 (404s) |
EG_SL_HELMHOLTZ (240s) |
NI_CONV_CTL (173s) |
NI_IMP_CTL (79s) |
SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV
(see table below)
|
EG_BICGSTAB (140s) |
EG_HELM_RHS_STAR (69s) |
GLUE_CONV_5A (126s) |
IMP_SOLVER (38s) |
EG_PRECON (95s) |
EG_SISL_INIT* (50s) |
Itself (47s) |
MID_CONV_5A (42s) |
TRI_SOR_DP_DP (95s) |
EG_SISL_INIT_UVW (47s) |
Itself (13s) |
Itself (64s) |
Itself (33s) |
* EG_SISL_INIT is also called from ATM_STEP_4A.
The difference in ATMOS_PHYSICS2 which is 38s, look largely down
to the 29s difference in using GLUE_CONV_6A rather than GLUE_CONV_5A.
Profiling for SWAP_* routines
Backward-Euler
Routines |
Total mean time |
EG_SWAP_BOUNDS_DP (159s) |
ATMOS_PHYSICS1, ATMOS_PHYSICS2, G_WAVE_5A, ... |
159 + ... |
SWAP_BOUNDS & SWAP_BOUNDS_DP (547 + 187 = 734s)
|
SWAP_BOUNDS_MV (112s, itself) |
846s |
SWAP_BOUNDS_EW_DP (393s) |
SWAP_BOUNDS_NS_DP (340s, itself) |
845s |
SWAP_BOUNDS_EW_H1_DP (280s, itself) |
Itself (113s) |
845s |
CLASSIC
Routines |
Total mean time |
EG_SWAP_BOUNDS_DP (107s) |
ATMOS_PHYSICS1, ATMOS_PHYSICS2, G_WAVE_5A, ... |
107 + ... |
SWAP_BOUNDS & SWAP_BOUNDS_DP (339 + 118 = 457s)
|
SWAP_BOUNDS_MV (103s, itself) |
560s |
SWAP_BOUNDS_EW_DP (234s) |
SWAP_BOUNDS_NS_DP (222s, itself) |
559s |
SWAP_BOUNDS_EW_H1_DP (153s, itself) |
Itself (81s) |
559s |
The total time for SWAP_BOUNDS_EW_DP and SWAP_BOUNDS_NS_DP are
1.68 and 1.53 times slower for Backward-Euler than CLASSIC, which
again is more than the 1.39 slower that the extra 10 tracers would
suggest (1.39 = 43/31). The total time in the SWAP routines is
about 1.5 times slower for Backward-Euler (845s) than CLASSIC
(559s).
Summary
After looking at all the profiles, most of the 2,432s extra time
needed for the Backward-Euler runs comes from
No UKCA_MAIN1 |
1,464s |
Extra time in SWAP_* routines (extra fields) |
286s+ |
No UKCA_RADAER_BAND_AVERAGE |
272s |
No UKCA_CORRECT_TRACERS_UKCA |
167s |
Smaller EG_CORRECT_TRACERS |
102s |
A big reduction in the use of LSP_SUBGRID (including
LSP_QCLEAR and an average of 166 million calls to QWIDTH) |
101s |
Total |
2,392s+ |
+ some of the extra SWAP_* time will come from UK_MAIN1, so this
time will have been double counted here, but it shouldn't be that
much.