GLOMAP

Most expensive GLOMAP routines

The data for this webpage is from the Dr Hook files in /data/cr/ukesm/mstringe/facef. Run facef is the Backward-Euler run, but with the tracer advection fix, the offline oxidants fix and the change to the lsp_subgrid.F90 file, as described on previous pages.

The mean wall time for this run is 4,423s and mean total time is 4,390s (a reduction of 610s on the original Backward-Euler run without the fixes).

Profile tree for UKCA_MAIN

The times in the tables below are the total time, which is the time in each routine and the routines below it, except where `itself' is written to indicate time taken solely by that routine. Only the main routes - as given by time spent in them - are general shown. Routines with less than 50s total time are general not shown.

Routines Total mean time
UKCA_MAIN* (1,248s) 1,248s
UKCA_AERO_CTL (885s) UKCA_ ACTIVATE (143s) 1,028s
UKCA_AERO_STEP (835s) UKCA_ ABDULRAZZAK_ GHAN (132s) 967s
UKCA_COAGWITHNUCL (378s) UKCA_ CONDEN (141s) UKCA_ CHECK_ MD_ND (70s, itself) UKCA_ CALCNUCRATE (69s) UKCA_ VOLUME_ MODE (51s) Itself (125s) 834s
Itself (315s) UKCA_ SOLVECOAGNUCL _V (63s, itself) UKCA_ COND_ COFF_V (92s, itself) Itself (49s) UKCA_ BINAPARA (65s, itself) Itself (25s) 804s
*UCKA_MAIN also calls STASH

Additional GLOMAP routines

In addition to the routines within UKCA_MAIN there is also UKCA_RADAER_BAND_AVERAGE which takes 273 seconds and is called from RADIANCE_CALC, see profiling for ATMOS_PHYSICS1 and EG_CORRECT_TRACERS on the Profile tree & Backward-Euler page. The time spent on this routine is more important than the time spent on the routines below UKCA_MAIN because this overhead is present in almost every UM run and not just the UM run which call UKCA_MAIN.

The most expensive GLOMAP routines

The most expensive routines by self time are list below. The @2 indicates routines run by thread 2. I believe that this has something to do with a second OpenMP thread, but as thread2 is a small percentage of the total wall time I've largely ignored these.

UKCA_SYNC is not a proper routine but has been added to measure the wait time after UKCA_MAIN is called. It's called from U_MODEL_4A straight after UKCA_MAIN, and consist of two calls to dr_hook, which sandwich a call to GC_SYNC - so it measures how long each PE has to wait for the other PEs to catch up. It gives a measure of how unbalanced the workload across PEs is for UKCA_MAIN and the routines below it.

Ordering routines by self: mean
Min Mean Max (Max-Min)
UKCA_COAGWITHNUCL@1 295.731 (PE 58) 315.32 327.669 (PE 85) 31.94
UKCA_RADAER_BAND_AVERAGE@2 165.126 (PE 1) 276.35 374.797 (PE 61) 209.67
UKCA_RADAER_BAND_AVERAGE@1 171.194 (PE 1) 272.68 373.146 (PE 84) 201.95
UKCA_ABDULRAZZAK_GHAN@1 28.218 (PE 4) 125.38 163.021 (PE 68) 134.80
UKCA_COND_COFF_V@1 81.723 (PE 119) 92.03 102.862 (PE 64) 21.14
UKCA_SYNC@1 24.754 (PE 76) 88.26 226.223 (PE 2) 201.47
UKCA_CHECK_MD_ND@1 68.498 (PE 52) 69.88 73.374 (PE 33) 4.88
UKCA_BINAPARA@1 64.892 (PE 73) 65.07 67.294 (PE 82) 2.40
UKCA_SOLVECOAGNUCL_V@1 57.818 (PE 5) 62.89 69.052 (PE 64) 11.23
UKCA_CONDEN@1 47.068 (PE 12) 49.37 52.027 (PE 64) 4.96
UKCA_AERO_CTL@1 44.856 (PE 2) 45.52 45.995 (PE 79) 1.14
UKCA_COAG_COFF_V@1 33.705 (PE 77) 33.77 34.695 (PE 48) 1
UKCA_MAIN1@1 27.028 (PE 61) 28.79 29.977 (PE 79) 2.95
UKCA_VOLUME_MODE@1 25.205 (PE 7) 25.83 26.778 (PE 59) 1.57
UKCA_CALC_DRYDIAM@1 21.931 (PE 54) 22.19 22.869 (PE 15) 0.94
UKCA_AERO_STEP@1 20.404 (PE 23) 21.75 23.377 (PE 98) 2.97
UKCA_RADAER_COMPUTE_AOD@2 13.419 (PE 1) 21.54 29.495 (PE 68) 16.08
UKCA_RADAER_COMPUTE_AOD@1 13.797 (PE 1) 21.34 29.746 (PE 68) 15.95

UKCA_COAGWITHNUCL is the most expensive written, and it's believed to be badly written. It uses a lot of WHERE statements to find values for masks.

A further problem is how often it's called from UKCA_AERO_STEP. UKCA_AERO_STEP doesn't have a convergence criteria and so the number of calls to UKCA_COAGWITHNUCL depends on the value of NZTS (which is 15/20) - and it's hoped that convergence is met once NZTS is reached. Sometimes NZTS isn't large enough, and most of the time it's probably too large. Colin is trying to come up with a convergence criteria but any input on this is very welcome. I'm also told that is could probably be compartmentalized much more, so that it would run more efficiently.

I'm told by Colin that UKCA_RADAER_BAND_AVERAGE is written by Nicolas Bellouin, so it's likely to be pretty good. The imbalance is mostly because, as well the longwave (LW) code, it's called by the shortwave code (SW) - but only when points are in daylight, and this varies a lot over PEs. But useful if someone looked over it because it's an overhead that's present in all UM runs - not just UKCA runs.

Colin says that Dan Partridge at Oxford has noticed some problems with UKCA_ABDULRAZZAK_GHAN - but he doesn't think he's done anything about it yet. Any improvement to see would obviously be welcome.

Any additions of or advice on adding OpenMP to the code would also be very welcome.