GLOMAP

Stratospheric chemistry + GLOMAP

Details of run

  • facei (mstringe) copied from aofni (frjy)
  • Stratospheric chemistry + GLOMAP
  • 192 PEs (16*12 rather than 16 * 8 = 128 PEs, which was used before).
  • Grid size is 192*144*85
  • Runs for 1 month.

I've run Backward-Euler with a similarly configuration for comparison (facej).

Profiling for UM_SHELL

Backward-Euler

Routines
UM_SHELL (3,103s)
U_MODEL_4A (3,099s)
ATM_STEP_4A* (2,117s) UKCA_MAIN1 (844s) MEAN- CTL (28s)
ATMOS _PHYS- ICS1 (898s) EG_ COR- RECT_ TRAC- ERS (19s) ATMOS _PHYS- ICS2 (294s) EG_ SL_ HELM- HOLTZ (165s) TR_ SET_ PHYS _4A* (64s) EG_CORRECT _TRACERS _UKCA (110s) SL_ TRAC- ER1_ 4A (121s) EG_ SL_ MOI- STURE (52s) EG_SL_ FULL_WIND (92s)  ⇓  UP- DATE _M_ STAR (59s) ATM_ STEP_ STASH (56s)  ⇓   ⇓  See profiling for UKCA_ MAIN1 below
Profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS not shown here See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (26 + 25 + 30 = 81s)  ⇓  EG_Q_ TO_MIX (60s)  ⇓  STASH (160s)
Itself (79s) EG_INTERPOLATION _ETA (157s) DEP- ARTURE_ POINT _ETA (58s) EG_SWAP_ BOUNDS_DP (131s) STWORK (160s)
EG_ CUBIC_ LAG- RANGE (64s, itself) EG_VERT_ WEIGHTS_ ETA (11s, itself) MONO_ ENFORCE (13s, itself) Itself (24s) See profile for SWAP_ BOUNDS _DP below SPA- TIAL (55s) PP_ HEAD (54s) EXP- PXI (34s, itself)
*should also link to SWAP_BOUNDS_DP, like many other returns.

Stratospheric chemistry + GLOMAP

Routines
UM_SHELL (8,218s)
U_MODEL_4A (8,214s)
ATM_STEP_4A* (3,287s) UKCA_ MAIN1 (3,195s) MEAN- CTL (83s)
ATMOS _PHYS- ICS1 (923s) EG_ COR- RECT_ TRAC- ERS (19s) ATMOS _PHYS- ICS2 (456s) EG_ SL_ HELM- HOLTZ (176s) TR_ SET_ PHYS _4A* (196s) EG_CORRECT _TRACERS _UKCA (612s) SL_ TRAC- ER1_ 4A (387s) EG_ SL_ MOI- STURE (55s) EG_SL_ FULL_WIND (106s)  ⇓  UP- DATE _M_ STAR (33s) ATM_ STEP_ STASH (81s)  ⇓   ⇓  See profil- ing for UKCA_ MAIN1 below AC- UMPS (72s)
Profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS not shown here See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below EG_SL_ WIND_U, EG_SL_ WIND_V & EG_SL_ WIND_W (26 + 25 + 31 = 82s)  ⇓  EG_Q_ TO_MIX (34s)  ⇓  STASH (407s) GEN- ERAL_ GATHER _FIELD (66s)
Itself (461s) EG_INTERPOLATION _ETA (342s) DEPAR- TURE_ POINT _ETA (62s) EG_SWAP_ BOUNDS_DP (118s) STWORK (406s) STASH_ GATHER _FIELD (66s)
EG_ CUBIC _LAG- RANGE (140s, itself) EG_VERT_ WEIGHTS_ ETA (11s, itself) MONO _EN- FORCE (32s, itself) Itself (24s) See profile for SWAP_ BOUNDS _DP below SPA- TIAL (110s) PP_ HEAD (158s) EXP- PXI (99s, itself) GATHER _FIELD (65s)
GATHER _FIELD _MPL (65s, itself)
*should also link to SWAP_BOUNDS_DP, like many other returns.

The path below MEANCTL didn't show up the calling trees before, because the time spent into the routine was sufficiently small to be ignored (i.e. below 50s).

Profiling for ATMOS_PHYSICS2

Backward-Euler

ATMOS_PHYSICS2 (294s)
NI_CONV_CTL (129s) NI_IMP_CTL (59s) SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV (see table below)
GLUE_CONV_6A (94s) IMP_SOLVER (31s)
Itself (37s) MID_CONV_6A (34s)
Itself (11s)

Stratospheric chemistry + GLOMAP

ATMOS_PHYSICS2 (456s)
NI_CONV_CTL (231s) NI_IMP_CTL (87s) SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV (see table below)
GLUE_CONV_5A (175s) IMP_SOLVER (47s)
Itself (95s) MID_CONV_5A (46s)
Itself (20s)

The difference looks largely to be in SWAP_BOUNDS - so it's probably just caused by barrier call in here and because the stratospheric chemistry + GLOMAP run is more imbalanced.

Profiling for UKCA_MAIN1

Backward-Euler

Routines Total mean time
UKCA_MAIN* (843s) 843s
UKCA_AERO_CTL (572s) UKCA_ ACTIVATE (95s) 667s
UKCA_AERO_STEP (539s) UKCA_ ABDULRAZZAK_ GHAN (88s) 627s
UKCA_COAGWITHNUCL (238s) UKCA_ CONDEN (95s) UKCA_ CHECK_ MD_ND (48s, itself) UKCA_ CALCNUCRATE (46s) UKCA_ VOLUME_ MODE (34s) Itself (84s) 545s
Itself (198s) UKCA_ SOLVECOAGNUCL _V (40s, itself) UKCA_ COND_ COFF_V (61s, itself) Itself (31s) UKCA_ BINAPARA (43s, itself) Itself (17s) 522s
*UCKA_MAIN also calls STASH

Stratospheric chemistry + GLOMAP

Routines
UKCA_MAIN* (3,072s)
Code in offline oxidants Code for the extra chemistry
UKCA_AERO_CTL (539s) UKCA_ ACT- IVATE (95s) UKCA_CHEMISTRY_CTL (1,489s) UKCA_FASTJX (544s) UKCA _EMI- SSION _CTL (51s)
UKCA_AERO_STEP (508s) UKCA_ ABDUL- RAZZAK _GHAN (88s) ASAD_CDRIVE (1,381s) UKCA_ STRAT _PHOT- OL (51s) FASTJX_PHOTOJ (541s)
UKCA_COAG- WITHNUCL (209s) UKCA_ CONDEN (94s) UKCA_ CHECK _MD_ ND (47s, itself) UKCA_ CALC- NU- CRATE (46s) UKCA_ VOL- UME_ MODE (33s) Itself (84s) ASAD_SPMJPDRIV (1,274s)  ⇓  INI- JTAB (51s) FASTJX_OPMIE (251s) FL- INT (136s, itself) Itself (105s)
Itself (169s) UKCA_ SOLVE- COAG- NUCL _V (40s, itself) UKCA_ COND_ COFF _V (63s, itself) Itself (30s) UKCA_ BIN- APARA (43s, itself) Itself (16s) ASAD_SPIMPMJP (1,261s)  ⇓  SET- TAB (51s) FASTJX_ MIESCT (155s) Itself (96s)
SP- LIN- SLV2 (546s, itself) SP- FUL- JAC (479s, itself) Itself (99s) ASAD_ DIFFUN (128s) Itself (13s) BLKSLV (154s)
ASAD_ PRLS (119s, itself) Itself (66s) MAT- INW (54s, itself)
     
*UCKA_MAIN also calls STASH, and probably quite large ~ 300s

10 day run with kcdt changed from 3600s to 1200s

I've carried out a run where run time is reduced by a third, but most of the calls to chemistry are increased by three - so most chemistry is called the same amount. Trying to see which routines converging quicker by calling them every timestep. Time in ATM_STEP_4A is 1,084s so about a third of the 3,287s shown for run above.

Routines
UKCA_MAIN* (2,729s)
Code in offline oxidants Code for the extra chemistry
UKCA_AERO_CTL (517s) UKCA_ ACT- IVATE (83s) UKCA_CHEMISTRY_CTL (1,405s) UKCA_FASTJX (546s) UKCA _EMI- SSION _CTL (17s)
UKCA_AERO_STEP (486s) UKCA_ ABDUL- RAZZAK _GHAN (77s) ASAD_CDRIVE (1,297s) UKCA_ STRAT _PHOT- OL (52s) FASTJX_PHOTOJ (543s)
UKCA_COAG- WITHNUCL (207s) UKCA_ CONDEN (83s) UKCA_ CHECK _MD_ ND (47s, itself) UKCA_ CALC- NU- CRATE (46s) UKCA_ VOL- UME_ MODE (31s) Itself (72s) ASAD_SPMJPDRIV (1,190s)  ⇓  INI- JTAB (51s) FASTJX_OPMIE (252s) FL- INT (137s, itself) Itself (106s)
Itself (169s) UKCA_ SOLVE- COAG- NUCL _V (38s, itself) UKCA_ COND_ COFF _V (53s, itself) Itself (30s) UKCA_ BIN- APARA (43s, itself) Itself (16s) ASAD_SPIMPMJP (1,177s)  ⇓  SET- TAB (51s) FASTJX_ MIESCT (157s) Itself (95s)
SP- LIN- SLV2 (507s, itself) SP- FUL- JAC (449s, itself) Itself (92s) ASAD_ DIFFUN (121s) Itself (13s) BLKSLV (156s)
ASAD_ PRLS (112s, itself) Itself (67s) MAT- INW (54s, itself)
     

Calling tree for Stratospheric chemistry + GLOMAP by maximum time

Until now, the calling trees have been shown by mean time, but as the full chemistry is very imbalanced it maybe instructive to see the maximum time. The bold numbers show the maximum times which significantly larger than the mean times.

Routines
UKCA_MAIN* (s)
Code in offline oxidants Code for the extra chemistry
UKCA_AERO_CTL (564s) UKCA_ ACT- IVATE (123s) UKCA_CHEMISTRY_CTL (1,671s) UKCA_FASTJX (583s) UKCA _EMI- SSION _CTL (53s)
UKCA_AERO_STEP (532s) UKCA_ ABDUL- RAZZAK _GHAN (117s) ASAD_CDRIVE (1,563s) UKCA_ STRAT _PHOT- OL (53s) FASTJX_PHOTOJ (580s)
UKCA_COAG- WITHNUCL (216s) UKCA_ CONDEN (104s) UKCA_ CHECK _MD_ ND (50s, itself) UKCA_ CALC- NU- CRATE (47s) UKCA_ VOL- UME_ MODE (34s) Itself (112s) ASAD_SPMJPDRIV (1,456s)  ⇓  INI- JTAB (52s) FASTJX_OPMIE (278s) FL- INT (150s, itself) Itself (111s)
Itself (175s) UKCA_ SOLVE- COAG- NUCL _V (43s, itself) UKCA_ COND_ COFF _V (72s, itself) Itself (32s) UKCA_ BIN- APARA (45s, itself) Itself (17s) ASAD_SPIMPMJP (1,443s)  ⇓  SET- TAB (52s) FASTJX_ MIESCT (178s) Itself (119s)
SP- LIN- SLV2 (622s, itself) SP- FUL- JAC (551s, itself) Itself (115s) ASAD_ DIFFUN (149s) Itself (13s) BLKSLV (177s)
ASAD_ PRLS (136s, itself) Itself (80s) MAT- INW (59s, itself)
     
*UCKA_MAIN also calls STASH

The imbalance here isn't as much as I expected given the wait time after UKCA_MAIN1, which is 1,570 mean seconds. But maybe this suggests that the imbalance is worse onto a time step by time step basis - such as which points are in sun - rather than averaging of whole days.