GLOMAP

Comparing Backward-Euler with CLASSIC

In the profiles for the CLASSIC run, the bold numbers show where the time is much less than for the Backward-Euler run, and the underlined times where the time is much more (just for EG_SISL_INIT and ATM_STEP_STASH).

The top profile tree

Backward-Euler

Routines
UM_SHELL (5,000s)
U_MODEL_4A (4,997s)
ATM_STEP_4A* (3,374s) UKCA_MAIN1 (1,464s)
ATMOS _PHYS- ICS1 (1,493s) EG_ COR- RECT_ TRAC- ERS (214s) ATMOS _PHYS- ICS2 (442s) EG_ SL_ HELM- HOLTZ (227s) TR_ SET_ PHYS _4A* (76s) EG_CORRECT _TRACERS _UKCA (167s) SL_ TRAC- ER1_ 4A (162s) EG_ SL_ MOI- STURE (80s) EG_SL_ FULL_WIND (135s)  ⇓  UP- DATE _M_ STAR (74s) ATM_ STEP_ STASH (60s)  ⇓   ⇓  See profiling for UKCA_ MAIN1
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (38 + 38 + 45 = 121s)  ⇓  EG_Q_ TO_MIX (75s)  ⇓  STASH (172s)
Itself (120s) EG_INTERPOLATION _ETA (237s) DEP- ARTURE_ POINT _ETA (82s) EG_SWAP_ BOUNDS_DP (159s) STWORK (172s)
EG_ CUBIC_ LAG- RANGE (98s, itself) EG_VERT_ WEIGHTS_ ETA (19s, itself) MONO_ ENFORCE (19s, itself) Itself (36s) See profile for SWAP_ BOUNDS _DP below SPA- TIAL (64s) PP_ HEAD (54s) EXP- PXI (35s, itself)

CLASSIC

Routines
UM_SHELL (2,568s)
U_MODEL_4A (2,566s)
ATM_STEP_4A* (2,498s)
ATMOS _PHYS- ICS1 (949s) EG_ COR- RECT_ TRAC- ERS (112s) ATMOS _PHYS- ICS2 (404s) EG_ SL_ HELM- HOLTZ (240s) TR_ SET_ PHYS _4A* (48s) EG_ SISL_ INIT (50s) SL_ TRAC- ER1_ 4A (102s) EG_ SL_ MOI- STURE (79s) EG_SL_ FULL_WIND (135s) EG_Q_ TO_MIX (21s) ATM_ STEP_ STASH (112s)  ⇓ 
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below EG_ SISL_ INIT_ UVW (47s) EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (39 + 38 + 45 = 122s) EG_SWAP_ BOUNDS_DP (107s) STASH (130s)
Itself (33s) EG_INTERPOLATION _ETA (177s) DEP- ARTURE_ POINT _ETA (83s) See profile for SWAP_ BOUNDS _DP below STWORK (130s)
EG_ CUBIC_ LAG- RANGE (80s, itself) EG_VERT_ WEIGHTS_ ETA (19s, itself) MONO_ ENFORCE (14s, itself) Itself (36s)         SPA- TIAL (33s) PP_ HEAD (46s) EXP- PXI (30s, itself)
*should also link to SWAP_BOUNDS_DP, like many other returns.

From the profiling comparison, it's clear that most of the 2,432s that the CLASSIC run saves over the Backward-Euler run is from

No UKCA_MAIN1 1,464s
Smaller ATMOS_PHYSICS1 544s
No UKCA_CORRECT_TRACERS_UKCA 167s
Smaller EG_CORRECT_TRACERS 102s
Total 2,277s

Profiling for ATMOS_PHYSICS1 and EG_CORRECT_TRACERS

Backward-Euler

ATMOS_PHYSICS1 (1,493s) EG_CORRECT_TRACERS (214s)
RAD_CTL (494s) MICROPHYS_CTL (329s) NI_GWD _CTL (328s)  ⇓   ⇓  EG_MASS_ CONSERVATION (138s) Itself (39s)
LW_RAD (357s) SW_RAD (120s) LS_PPN (322s) G_ WAVE _5A (279s) GW_ USSP (47s) GLOBAL _2D_ SUMS (51s, itself) Itself (97s)
RADIANCE_CALC (470s) LS_PPNC (313s) SWAP_ BOUNDS (see table below)
UKCA_ RADAER _BAND_ AVERAGE (272s, itself) SOLVE_BAND_K_EQV (154s) UKCA_ RADAER_ COMPUTE _AOD (21s, itself) LSP_ICE (198s) Itself (114s)
MCICA_ SAMPLE (122s) SCALE_ ABSORB (20s) LSP_ SUBGRID (107s) LSP_ INIT (23s) LSP_ FALL (19s)
MONOCHR- OMATIC_ RADIANCE (107s) Itself (15s) LSP_ QCLEAR (77s) Itself (25s) Itself (12s) Itself (17s)
MONOCHR- OMATIC_ RADIANCE _TSEQ (94s) QWIDTH (77s, itself)
MCICA_ COLUMN (94s)
TWO_COEFF (67s)
TRANS_ SOURCE_ COEFF (37s) Itself (12s)
Itself (23s)
  

CLASSIC

ATMOS_PHYSICS1 (949s) EG_CORRECT_TRACERS (122s)
RAD_CTL (215s) MICROPHYS_CTL (186s) NI_GWD_CTL (176s)  ⇓   ⇓  EG_MASS_ CONSERVATION (90s) Itself (17s)
LW_RAD (153s) SW_RAD (53s) LS_PPN (174s) G_ WAVE _5A (150s) GW_ USSP (25s) GLOBAL _2D_ SUMS (38s, itself) Itself (63s)
RADIANCE_CALC (200s) LS_PPNC (163s) SWAP_ BOUNDS (see table below)
SOLVE_BAND_K_EQV (155s) LSP_ICE (95s) Itself (68s)
MCICA_ SAMPLE (122s) SCALE_ ABSORB (20s) LSP_ SUBGRID (6s) LSP_ INIT (23s) LSP_ FALL (19s)
MONOCHR- OMATIC_ RADIANCE (107s) Itself (15s) Itself (6s) Itself (11s) Itself (16s)
MONOCHR- OMATIC_ RADIANCE _TSEQ (94s)
MCICA_ COLUMN (94s)
TWO_COEFF (67s)
TRANS_ SOURCE_ COEFF (37s) Itself (12s)
Itself (23s)
  

From this profiling it's clear that the 544s savings in ATMOS_PHYSICS1 for the CLASSIC over Backward-Euler run largely come from

No UKCA_RADAER_BAND_AVERAGE 272s
Reduction in use of SWAP BOUNDS ~150s
A big reduction in the use of LSP_SUBGRID (including LSP_QCLEAR and QWIDTH) 101s
Total 523s

For the EG_CORRECT_TRACERS strand, EG_CORRECT_TRACERS and EG_MASS_CONSERVATION are about 1.75 and 1.53 times slower for Backward-Euler than CLASSIC. Apparently these routines are still processing all tacers, even though EG_CORRECT_TRACERS_UKCA is processing many of them, which means these routines should be processing

  • 8 + 35 = 43 fields for Backward-Euler
  • 8 + 23 = 31 fields for CLASSIC
This would suggest that Backward-Euler should be 41/31=1.39 slower, but it seems to be even slower than that. Mohit is creating a branch to remove the fields that are being advected by EG_CORRECT_TRACERS_UKCA.

What do LSP_SUBGRID, LSP_QCLEAR and QWIDTH do?

According to its header, LSP_SUBGRID,

! Purpose:
!   Perform the subgrid-scale setting up calculations

! Method:
!   Parametrizes the width of the vapour distribution in the part
!   of the gridbox which does not have liquid water present.
!   Calculates the overlaps within each gridbox between  the cloud
!   fraction prognostics and rainfraction diagnostic.
The addition of LSP_QCLEAR and QWIDTH is from an addition of a branch. According to its header, LSP_QCLEAR,
! Purpose:
! Returns the average relative humidity in the cloud-free (clear-sky)
! portion of gridboxes.
! 
! Method:
!   Parametrizes the width of the vapour distribution in the part
!   of the gridbox which does not have liquid water present.
and according to its header, QWIDTH, `Calculates width of vapour dist. in ice and clear region.'

It looks like it's the number of calls that is causing the pain here

  • calls for QWIDTH for Backward-Euler: 165,589,760 (mean)
  • calls for LS_QCLEAR for Backward-Euler: 937,668 (mean)
  • calls for LSP_SUBGRID for Backward-Euler: 754,068 (mean)

Profiling for ATMOS_PHYSICS2 and EG_SL_HELMHOLTZ

Backward-Euler

ATMOS_PHYSICS2 (442s) EG_SL_HELMHOLTZ (227s)
NI_CONV_CTL (206s) NI_IMP_CTL (79s) SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV (see table below) EG_BICGSTAB (128s) EG_HELM_RHS_STAR (67s)
GLUE_CONV_6A (155s) IMP_SOLVER (40s) EG_PRECON (87s) EG_SISL_INIT (51s)
Itself (64s) MID_CONV_6A (53s) TRI_SOR_DP_DP (87s) EG_SISL_INIT_UVW (48s)
Itself (17s) Itself (58s) Itself (33s)

CLASSIC

ATMOS_PHYSICS2 (404s) EG_SL_HELMHOLTZ (240s)
NI_CONV_CTL (173s) NI_IMP_CTL (79s) SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV (see table below) EG_BICGSTAB (140s) EG_HELM_RHS_STAR (69s)
GLUE_CONV_5A (126s) IMP_SOLVER (38s) EG_PRECON (95s) EG_SISL_INIT* (50s)
Itself (47s) MID_CONV_5A (42s) TRI_SOR_DP_DP (95s) EG_SISL_INIT_UVW (47s)
Itself (13s) Itself (64s) Itself (33s)
* EG_SISL_INIT is also called from ATM_STEP_4A.

The difference in ATMOS_PHYSICS2 which is 38s, look largely down to the 29s difference in using GLUE_CONV_6A rather than GLUE_CONV_5A.

Profiling for SWAP_* routines

Backward-Euler

Routines Total mean time
EG_SWAP_BOUNDS_DP (159s) ATMOS_PHYSICS1, ATMOS_PHYSICS2, G_WAVE_5A, ... 159 + ...
SWAP_BOUNDS & SWAP_BOUNDS_DP (547 + 187 = 734s) SWAP_BOUNDS_MV (112s, itself) 846s
SWAP_BOUNDS_EW_DP (393s) SWAP_BOUNDS_NS_DP (340s, itself) 845s
SWAP_BOUNDS_EW_H1_DP (280s, itself) Itself (113s) 845s

CLASSIC

Routines Total mean time
EG_SWAP_BOUNDS_DP (107s) ATMOS_PHYSICS1, ATMOS_PHYSICS2, G_WAVE_5A, ... 107 + ...
SWAP_BOUNDS & SWAP_BOUNDS_DP (339 + 118 = 457s) SWAP_BOUNDS_MV (103s, itself) 560s
SWAP_BOUNDS_EW_DP (234s) SWAP_BOUNDS_NS_DP (222s, itself) 559s
SWAP_BOUNDS_EW_H1_DP (153s, itself) Itself (81s) 559s

The total time for SWAP_BOUNDS_EW_DP and SWAP_BOUNDS_NS_DP are 1.68 and 1.53 times slower for Backward-Euler than CLASSIC, which again is more than the 1.39 slower that the extra 10 tracers would suggest (1.39 = 43/31). The total time in the SWAP routines is about 1.5 times slower for Backward-Euler (845s) than CLASSIC (559s).

Summary

After looking at all the profiles, most of the 2,432s extra time needed for the Backward-Euler runs comes from

No UKCA_MAIN1 1,464s
Extra time in SWAP_* routines (extra fields) 286s+
No UKCA_RADAER_BAND_AVERAGE 272s
No UKCA_CORRECT_TRACERS_UKCA 167s
Smaller EG_CORRECT_TRACERS 102s
A big reduction in the use of LSP_SUBGRID (including LSP_QCLEAR and an average of 166 million calls to QWIDTH) 101s
Total 2,392s+
+ some of the extra SWAP_* time will come from UK_MAIN1, so this time will have been double counted here, but it shouldn't be that much.