It's been very noticeable that the first month of our prototype UKESM0.4 is running much quicker than all the following months, e.g. on 23 nodes the first month takes about 70.5 minutes (1.7 model years/day) and about 102 minutes the remaining months (1.2 model years/day).
Increasing the nodes of the ocean from 3 to 8 (u-af845) doesn't appear to speed the model up, which suggests the bottleneck for all months is still the atmosphere. In addition, Richard has also been turning on all the timers for the model components and his results suggest the time in the UKCA chemistry almost doubles after the first 20 days.
The profiling was done with u-af879.
Routines | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (5,221s) | ||||||||||||||
U_MODEL_4A (5,214s) | ||||||||||||||
ATM_STEP_4A* (4,677s) | OASIS3_ PUTA2O (210s) | OASIS3_ GETO2A (14s) | ||||||||||||
ATMOS _PHYS- ICS1 (929s) | ATMOS _PHYS- ICS2 (286s) | EG_ SL_ HELM- HOLTZ (267s) | TR_ SET_ PHYS _4A* (228s) | EG_CORRECT _TRACERS _PRIESTLEY (296s) | SL_ TRAC- ER1_ 4A (394s) | EG_ SL_ MOI- STURE (56s) | EG_SL_ FULL_WIND (138s) | ⇓ | UKCA_MAIN1 (1,003s) | ICE_ SHEET_ MASS (207s) | OASIS3_ GET (9s) | |||
Profile for ATMOS_ PHYSICS1 | Profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (16 + 15 + 53 = 84s) | STASH (289s) | See profiling for UKCA_ MAIN1 | GLOBAL_ 2D_ SUMS** (348s) | |||||||||
Itself (89s) | EG_INTERPOLATION _ETA (254s) | DEP- ARTURE_ POINT _ETA (95s) | STWORK (289s) | |||||||||||
EG_ CUBIC_ LAG- RANGE (78s, itself) | MONO_ ENFORCE (30s, itself) | Itself (120s) | Itself (8s) | PP_ HEAD (129s) | EXP- PXI (102s, itself) |
Routines | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UM_SHELL (7,000s) | ||||||||||||||
U_MODEL_4A (6,990s) | ||||||||||||||
ATM_STEP_4A* (6,006s) | OASIS3_ PUTA2O (567s) | OASIS3_ GETO2A (52s) | ||||||||||||
ATMOS _PHYS- ICS1 (914s) | ATMOS _PHYS- ICS2 (283s) | EG_ SL_ HELM- HOLTZ (292s) | TR_ SET_ PHYS _4A* (202s) | EG_CORRECT _TRACERS _PRIESTLEY (297s) | SL_ TRAC- ER1_ 4A (386s) | EG_ SL_ MOI- STURE (53s) | EG_SL_ FULL_WIND (121s) | ⇓ | UKCA_MAIN1 (1,624s) | ICE_ SHEET_ MASS (563s) | OASIS3_ GET (39s) | |||
Profile for ATMOS_ PHYSICS1 | Profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ | EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (15 + 15 + 41 = 71s) | STASH (289s) | See profiling for UKCA_ MAIN1 | GLOBAL_ 2D_ SUMS** (724s) | |||||||||
Itself (89s) | EG_INTERPOLATION _ETA (254s) | DEP- ARTURE_ POINT _ETA (99s) | STWORK (289s) | |||||||||||
EG_ CUBIC_ LAG- RANGE (78s, itself) | MONO_ ENFORCE (30s, itself) | Itself (120s) | Itself (8s) | PP_ HEAD (129s) | EXP- PXI (102s, itself) |
The two routines showing a much greater increase in mean time are
Routines | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UKCA_MAIN* (1,003s) | |||||||||||||||
UKCA_AERO_CTL (146s) | UKCA_ ACT- IVATE (41s) | UKCA_CHEMISTRY_CTL (298s) | UKCA_FASTJX (154s) | UKCA_ NEW_ EMISS_ CTL (77s) | |||||||||||
UKCA_AERO_STEP (142s) | UKCA_ ABDUL- RAZZAK _GHAN (39s) | ASAD_CDRIVE (266s) | FASTJX_PHOTOJ (154s) | ||||||||||||
UKCA_ COAGWITH- NUCL (41s) | UKCA_ CONDEN (38s) | Itself (38s) | ASAD_SPMJPDRIV (236s) | FASTJX_OPMIE (93s) | FLINT (28s, itself) | Itself (22s) | |||||||||
Itself (32s) | UKCA_ COND_ COFF _V (30s, itself) | ASAD_SPIMPMJP (232s) | FASTJX_ MIESCT (67s) | Itself (26s) | |||||||||||
SPLINSLV2 (109s, itself) | SPFULJAC (68s, itself) | Itself (21s) | BLKSLV (67s) | ||||||||||||
Itself (45s) |
Routines | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UKCA_MAIN* (1,624s) | |||||||||||||||
UKCA_AERO_CTL (142s) | UKCA_ ACT- IVATE (45s) | UKCA_CHEMISTRY_CTL (921s) | UKCA_FASTJX (154s) | UKCA_ NEW_ EMISS_ CTL (77s) | |||||||||||
UKCA_AERO_STEP (138s) | UKCA_ ABDUL- RAZZAK _GHAN (44s) | ASAD_CDRIVE (889s) | FASTJX_PHOTOJ (154s) | ||||||||||||
UKCA_ COAGWITH- NUCL (41s) | UKCA_ CONDEN (38s) | Itself (43s) | ASAD_SPMJPDRIV (860s) | FASTJX_OPMIE (93s) | FLINT (28s, itself) | Itself (22s) | |||||||||
Itself (32s) | UKCA_ COND_ COFF _V (30s, itself) | ASAD_SPIMPMJP (854s) | FASTJX_ MIESCT (67s) | Itself (26s) | |||||||||||
SPLINSLV2 (414s, itself) | SPFULJAC (245s, itself) | Itself (73s) | BLKSLV (67s) | ||||||||||||
Itself (45s) |
Remarkably clear signal, the 2nd month is slowed by the extra time in the calling tree of UKCA_CHEMISTRY_CTL -> ASAD_CDRIVE -> ASAD_SPMJPDRIV -> ASAD_SPIMPJP, and most of this can be attributed to the solvers SPLINSLV2 and SPFULJAC.
And if we look at the minimum and maximum time in these two routines, we can see how imbalanced they are
Routine | Minimum | Mean | Maximum |
---|---|---|---|
SPLINSLV2 | 261.856 (PE 404) | 414.23 | 1087.191 (PE 0) |
SPFULJAC | 148.937 (PE 404) | 244.75 | 683.399 (PE 23) |