Profiling of MEDUSA was done at revision 6509 of MEDUSA branch. However, we'd like to know how these timings have changed since the following has happened
and a general updating of code - MEDUSA branch is now at revision 7611.
In addition to this, we'd like to get some of Maff's optimisations into the GO6 package branch, so Richard has created a version of the GO6 package branch, which includes most of them. Profiling this is both interesting and we'll hopefully convince the owners of the GO6 package branch to add them in.
The runs I've made are
All the runs are
Unless stated as `itself', the times below are the total times, which is the time spent in each routine and all the routines which it spawns/calls. Times specified as itself, include only the time spent in that routine. A lot of the smaller routines are not included to make the diagrams simpler, so the total times lower down the table will not add to the total times higher up, but the totals should be closish. For the calls to LBC_LNK I've sometimes indicated the grid-type, such as T, U or V, because it gives a clue as to what is being passed. For example, I'm guessing two fields of grid type T are probably temperature and salinity. I haven't included more here, because I don't have the space and I'm frequently not sure what is being passed.
Routines | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NEMOGCM (2,268s) | |||||||||||||
NEMO_INIT (50s) | STP (2,218s) | ||||||||||||
ICB_ INIT (8s) | TRC_ INIT (19s) | TRC_STP (836s) | SBC (784s) | IOM_ INIT* (27s) | DYN_SPG (185s) | STP_ CTL (70s, itself) | TRA_ADV (55s) | DOM _VVL _SF_ SWP (36s) | ZDF_ TKE* (33s) | ||||
ICB_ RST_ READ (8s) | TRC_ RST_ READ (19s) | See TRC_STP routines below for more detail | See SBC routines below for more detail | DYN_SPG_ FLT (185s) | TRA_ADV_TVD (50s) | Itself (29s) | |||||||
Itself (7s) | Itself (12s) | SOL_PCG (179s) | ⇓ for U & V | NONOSC (28s) | ⇓ for U, V & W | ||||||||
Itself (159s) | ⇓ | ⇓ for T (*2), U & V | Itself (14s) | ||||||||||
⇓ | ⇓ | ||||||||||||
LBC_LNK (797s) | |||||||||||||
MPP_LNK_2D (441s) | MPP_LNK_3D (356s) |
Routines | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NEMOGCM (2,135s) | |||||||||||||||
NEMO_INIT (18s) | STP (2,121s) | ||||||||||||||
ICB_ INIT (1s) | TRC_ INIT (2s) | ISTATE_ INIT (4s) | TRC_STP (967s) | SBC (537s) | IOM_ INIT* (86s) | DYN_SPG (106s) | STP_ CTL (62s, itself) | TRA_ADV (56s) | DOM _VVL _SF_ SWP (9s) | DOM _VVL _SF_ NXT (6s) | ZDF_ TKE* (32s) | ||||
ICB_ RST_ READ (1s) | TRC_ RST_ READ (0s) | Itself (3s) | See TRC_STP routines below for more detail | See SBC routines below for more detail | DYN_SPG_ FLT (106s) | TRA_ADV_TVD (50s) | DOM_ VVL_ INTERPOL (16s) | ||||||||
Itself (1s) | Itself (0s) | SOL_PCG (94s) | ⇓ for U & V | NONOSC (27s) | ⇓ for U, V & W | Itself (8s) | ⇓ | ||||||||
Itself (83s) | ⇓ | ⇓ for T (*2), U & V | Itself (15s) | ||||||||||||
⇓ | ⇓ | ||||||||||||||
LBC_LNK (632s) | |||||||||||||||
MPP_LNK_2D (373s) | MPP_LNK_3D (259s) |
Routines | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NEMOGCM (2,085s) | |||||||||||||||
NEMO_INIT (34s) | STP (2,071s) | ||||||||||||||
ICB_ INIT (7s) | TRC_ INIT (3s) | ISTATE_ INIT (8s) | TRC_STP (911s) | SBC (538s) | IOM_ INIT* (98s) | DYN_SPG (109s) | STP_ CTL (64s, itself) | TRA_ADV (50s) | DOM _VVL _SF_ SWP (9s) | DOM _VVL _SF_ NXT (6s) | ZDF_ TKE* (33s) | ||||
ICB_ RST_ READ (3s) | TRC_ RST_ READ (0s) | Itself (5s) | See TRC_STP routines below for more detail | See SBC routines below for more detail | DYN_SPG_ FLT (109s) | TRA_ADV_TVD (44s) | DOM_ VVL_ INTERPOL (16s) | ||||||||
Itself (0s) | Itself (0s) | SOL_PCG (95s) | ⇓ for U & V | NONOSC (23s) | ⇓ for U, V & W | Itself (8s) | ⇓ | ||||||||
Itself (84s) | ⇓ | ⇓ for T (*2), U & V | Itself (10s) | ||||||||||||
⇓ | ⇓ | ||||||||||||||
LBC_LNK (647s) | |||||||||||||||
MPP_LNK_2D (375s) | MPP_LNK_3D (272s) |
For the change between revision 6509 and 7611.
For the optimisation branch, there's no obvious improvement here. This includes SOL_PCG which showed good improvement before.
This is the passive tracer part of the model
Routines | ||||||||
---|---|---|---|---|---|---|---|---|
TRC_STP (836s) | ||||||||
TRC_TRP (669s) | TRC_SMS (84s) | |||||||
TRC_ADV (420s) | TRC_LDF (104s) | TRC_BBL (60s) | TRC_SBC (0s) | TRC_NXT (55s) | TRC_SMS_MEDUSA (83s) | |||
TRA_ADV_MUSCL (323s) | ⇓ for T (*2) | TRA_LDF_ISO (117s, itself) | BBL (59s) | ⇓ for T | ⇓ for T | TRC_BIO_MEDUSA (76s) | ||
Itself (201s) | ⇓ for U (*2) & V (*2) | EOS_RAB/ RAB_2D (58s) | Itself (58s) | ⇓ for T (*2) | ||||
⇓ for T (*2) | ||||||||
LBC_LNK |
Routines | ||||||||
---|---|---|---|---|---|---|---|---|
TRC_STP (967s) | ||||||||
TRC_TRP (676s) | TRC_SMS (184s) | |||||||
TRC_ADV (336s) | TRC_LDF (128s) | TRC_BBL (2s) | TRC_SBC (130s) | TRC_NXT (53s) | TRC_SMS_MEDUSA (182s) | |||
TRA_ADV_MUSCL (331s) | ⇓ for T (*2) | TRA_LDF_ISO (142s, itself) | BBL (2s) | ⇓ for T | ⇓ for T | TRC_BIO_MEDUSA (174s) | ||
Itself (221s) | ⇓ for U (*2) & V (*2) | EOS_RAB/ RAB_2D/ RAB_3D (6s) | Itself (120s) | ⇓ for T (*2) | ||||
⇓ for T (*2) | ||||||||
LBC_LNK |
Routines | ||||||||
---|---|---|---|---|---|---|---|---|
TRC_STP (911s) | ||||||||
TRC_TRP (619s) | TRC_SMS (184s) | |||||||
TRC_ADV (329s) | TRC_LDF (78s) | TRC_BBL (2s) | TRC_SBC (131s) | TRC_NXT (53s) | TRC_SMS_MEDUSA (183s) | |||
TRA_ADV_MUSCL (323s) | ⇓ for T (*2) | TRA_LDF_ISO (85s, itself) | BBL (1s) | ⇓ for T | ⇓ for T | TRC_BIO_MEDUSA (174s) | ||
Itself (197s) | ⇓ for U (*2) & V (*2) | EOS_RAB/ RAB_2D (6s) | Itself (120s) | ⇓ for T (*2) | ||||
⇓ for T (*2) | ||||||||
LBC_LNK |
For the change between revision 6509 and 7611.
For the optimisation branch, the only significant improvement seems to be for TRA_LDF_ISO, which is almost halved. Previously, we had good improvement for TRA_ADV_MUSCL and this has improved, but not by a lot.
These are the routines which update data, open boundaries and provide surface BCs (including sea-ice).
Routines | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SBC (783s) | ||||||||||||
SBC_BLK_CORE (354s) | SBC_ICE_CICE (303s) | ICB_STP (69s) | SBC_ FWB (33s, itself) | ⇓ for T | ||||||||
FLD_READ (254s*) | BLK_OCE_ CORE (101s) | CICE_RUN: into CICE (~183s) | CICE_SBC_IN (73s) | CICE_SBC _OUT (47s) | ICB_LBC_MPP (57s, itself) | |||||||
FLD_GET (247s, itself) | ⇓ for T (*2), U & V | NEMO2CICE (73s) | CICE2NEMO (36s) | |||||||||
⇓ for Z (*2) | Itself (108s) | ⇓ | Itself (33s) | ⇓ | Itself (19s) | |||||||
LBC_LNK |
Routines | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SBC (537s) | ||||||||||||
SBC_BLK_CORE (158s) | SBC_ICE_CICE (319s) | ICB_STP (23s) | SBC_ FWB (12s, itself) | ⇓ for T | ||||||||
FLD_READ (127s*) | BLK_OCE_ CORE (35s) | CICE_RUN: into CICE (~188s) | CICE_SBC_IN (77s) | CICE_SBC _OUT (55s) | ICB_LBC_MPP (8s, itself) | |||||||
FLD_GET (119s, itself) | ⇓ for T (*2), U & V | NEMO2CICE (75s) | CICE2NEMO (42s) | |||||||||
⇓ for Z (*2) | Itself (66s) | ⇓ | Itself (35s) | ⇓ | Itself (22s) | |||||||
LBC_LNK |
Routines | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SBC (538s) | ||||||||||||
SBC_BLK_CORE 137(s) | SBC_ICE_CICE (328s) | ICB_STP (25s) | SBC_ FWB (20s, itself) | ⇓ for T | ||||||||
FLD_READ (110s*) | BLK_OCE_ CORE (32s) | CICE_RUN: into CICE (~193s) | CICE_SBC_IN (81s) | CICE_SBC _OUT (54s) | ICB_LBC_MPP (8s, itself) | |||||||
FLD_GET (102s, itself) | ⇓ for T (*2), U & V | NEMO2CICE (80s) | CICE2NEMO (41s) | |||||||||
⇓ for Z (*2) | Itself (56s) | ⇓ | Itself (38s) | ⇓ | Itself (23s) | |||||||
LBC_LNK |
For the change between revision 6509 and 7611.
For the optimisation branch, FLD_READ is quicker but probably not large enough to be outside the Cray noise on timings to be sure.
For the change between revision 6509 and 7611.