Profiling Physical Ocean vs MEDUSA

It's been a few years since I've profiled MEDUSA, the last time was Updating the profiling of spin-up. Now we only have one branch which contains the changes for both the GO6 package and MEDUSA, which should make profiling easier as it removes the problem I had with merge clashes before, when there was two branches. Still, it has been more work than I expected and I've written this up in UKESM ticket 645.

As well as updating the profiling, I also wanted to do a direct comparison between the physical ocean with and without MEDUSA, so we can see what overheads are added by MEDUSA.

Normally we would run the ocean with detached XIOS. However, when I added in the Dr Hook libraries I found this configuration was crashing in XIOS so I've just run with attached XIOS. This means that the runs will be a bit slower, especially for the I/O routines.

The timings below are the timings I usual use. Hence, most timings are the `total time' which is the time in a routines plus the time for all the routines called by that routine. If timings are labelled with `itself' then that is the time only for that routine. I have had to hack in the Dr Hook routines, so it's very possible that some routines are attributed the time when actually the time should be attributed to a routine called by that routine, where the called routine doesn't have Dr Hook added. For example, there probably isn't much time which should be attributed solely to STP, but there is in the profiling below, probably because I've not added Dr Hook in some of the routines called by STP.

ORCA1

Both the just physical ocean run, u-bn904, and the physical ocean + MEDUSA run, u-bn573, have been run with the following.

  • 2 months
  • (NEMO_IPROC, NEMO_JPROC)=(12, 9)
  • 3 nodes

Without MEDUSA

This profiling comes from u-bn904.

Routines
NEMOGCM (2,257s)
STP (2,256s)
Itself (364s), STP_CTL (594s, itself), ZDF_TKE (70s), DIA_WRI (177s) & DIA_PTR (57s) DYN_SPG (52s) SBC (266s) TRD_TRA (123s)
DYN_SPG_FLT (42s) SBC_ICE_CICE (179s) TRD_TRA_MNG (65s) Itself (40s)
Itself (38s) Itself (169s) TRD_TRA_IOM (65s, itself)
* A lot of these routines call LBC_LNK, which calls MPP_LNK_2D (59s) or MPP_LNK_3D (244s)

With MEDUSA

This profiling comes from u-bn573.

Routines
NEMOGCM (5,252s)
STP (5,252s)
Itself (387s), STP_CTL (732s, itself), ZDF_TKE (70s), DIA_WRI (182s) & DIA_PTR (57s) DYN_ SPG (52s) SBC (262s) TRC_STP (2,832s)
DYN_ SPG_ FLT (42s) SBC_ ICE_ CICE (172s) TRC_SMS (681s) TRC_TRP (1,958s) TRC_ WRI (82s)
Itself (38s) Itself (163s) TRC_BIO_MEDUSA (614s) TRC_ LDF (533s) TRC_ ZDF (168s) TRC_ BBL (98s) TRC_ SBC (74s) TRC_ADV (697s) TRC_NXT (271s)
TRC_ BIO_ CHECK (247s, itself) PLA- NK- TON (87s) AIR_ SEA (59s) BIO_ MED- USA_ DIG+ FIN (64+ 49s) Itself (533s) Itself (144s) Itself (51s) Itself (371s) TRD_ TRA (578s) Itself (119s)
PHY- TO- PLA- NK- TON (77s) MOC- SY_ INT- ER- FACE (56s) Itself (284s) TRD_ TRA_ ADV (205s, itself) TRD_ TRA_ MNG (64s)
MOC- SY_ CAR- CHEM (55s) TRD_ TRA_ IOM (64s)
VARS (48s)
* A lot of these routines call LBC_LNK, which calls MPP_LNK_2D (183s) or MPP_LNK_3D (383s)

ORCA025

Both the just physical ocean run, u-bn863, and the physical ocean + MEDUSA run, u-bn355, have been run with the following.

  • 1 month
  • (NEMO_IPROC, NEMO_JPROC)=(47, 27)
  • NEMO_LAND_SUPPRESS=true
  • NEMO_NPROC=850 (850/36=23.61)
  • 24 nodes

Without MEDUSA

This profiling comes from u-bn863.

Routines
NEMOGCM (1,895s)
STP (1,894s)
Itself (317s), STP_CTL (65s, itself), ZDF_TKE (69s), DIA_WRI (248s) & DIA_PTR (73s) DYN_SPG (116s) SBC (325s) TRD_TRA (181s)
DYN_SPG_FLT (116s) SBC_ICE_CICE (281s) TRD_TRA_MNG (108s) Itself (49s)
Itself (106s) Itself (267s) TRD_TRA_IOM (108s, itself)
* A lot of these routines call LBC_LNK, which calls MPP_LNK_2D (s) or MPP_LNK_3D (s)

With MEDUSA

This profiling comes from u-bn355.

Routines
NEMOGCM (5,803s)
STP (5,802s)
Itself (338s), STP_CTL (268s, itself), ZDF_TKE (69s), DIA_WRI (249s) & DIA_PTR (93s) DYN_ SPG (102s) SBC (332s) TRC_STP (3,588s)
DYN_ SPG_ FLT (102s) SBC_ ICE_ CICE (280s) TRC_SMS (978s) TRC_TRP (2,433s) TRC_ WRI (139s)
Itself (94s) Itself (267s) TRC_BIO_MEDUSA (834s) TRC_ SMS_ CFC (85s) TRC_ LDF (809s) TRC_ ZDF (205s) TRC_ BBL (97s) TRC_ SBC (73s) TRC_ADV (890s) TRC_NXT (340s)
TRC_ BIO_ CHECK (302s, itself) PLA- NK- TON (93s) AIR_ SEA (104s) BIO_ MED- USA_ DIG+ FIN (112+ 77s) Itself (676s) Itself (188s) Itself (62s) Itself (400s) TRD_ TRA (851s) Itself (147s)
PHY- TO- PLA- NK- TON (78s) MOC- SY_ INT- ER- FACE (98s) Itself (388s) TRD_ TRA_ ADV (259s, itself) TRD_ TRA_ MNG (167s)
MOC- SY_ CAR- CHEM (95s) TRD_ TRA_ IOM (167s)
VARS (83s)
* A lot of these routines call LBC_LNK, which calls MPP_LNK_2D (s) or MPP_LNK_3D (s)

Summary

My summary of this is

  • The timings I did at comment 2 of UKESM ticket 65 suggest that MEDUSA slows the physical ocean by about x2.75 (but this varied between about x2.3 to x3.1)
  • Nearly all the extra computation can be attributed to time in TRC_STP and the routines called by this. This is about 60% of a physical ocean + MEDUSA run.
    • About 70% of the time in TRC_STP is in TRC_TRP, which I assume is responsible for transport (about 40% of total run).
    • About 25% of the time in TRC_STP is in TRC_SMS, which looks to be the MEDUSA science (about 15% of total run).