GLOMAP
OpenMP and SMT in Atmosphere only model
All runs are CLASSIC at N96 (faced), which a PE grid of
16 * 8 (128). SMT is simultaneous
multithreading, and it is common to have two concurrent threads per
CPU core.
Four threads
I've tried running on 4 threads, but at about timestep 681
Signal received: SIGFPE - Floating-point exception
Signal generated for floating-point exception:
FP invalid operation
Not clear, but probably crashing in HALO_EXCHANGE:SWAP_BOUNDS_NS_DP
(see faced000.faced.d14322.t171306.leave).
I've tried running again and this time a float-point exception
at around timestep 71, possibly in routine UC_TO_UB (see
faced000.faced.d14323.t085124.leave).
Two threads without SMT (128 tasks on 8 nodes/256 cores)
Similar problem to above around timestep 1,657 (see
faced000.faced.d14349.t115553.leave).
JC think it might by very roughly about 80% of the time of
two threads with SMT. It completed 1,657/2160 = 76.7% of the run
in 1,746s - which suggested it would have taken
1746*2160/1657=2,276s, i.e. 2276/2568=88.6% of the time for a
normal two threaded run with SMT.
The top profile tree
One thread with SMT (128 tasks on 2 nodes/64 cores)
Routines |
UM_SHELL (4,661s) |
U_MODEL_4A (4,659s) |
ATM_STEP_4A* (4,559s) |
ATMOS _PHYS- ICS1 (1,611s) |
EG_ COR- RECT_ TRAC- ERS (258s) |
ATMOS _PHYS- ICS2 (706s) |
EG_ SL_ HELM- HOLTZ (470s) |
TR_ SET_ PHYS _4A* (100s) |
EG_ SISL_ INIT (113s) |
SL_ TRAC- ER1_ 4A (200s) |
EG_ SL_ MOI- STURE (79s) |
EG_SL_ FULL_WIND (259s) |
EG_Q_ TO_MIX (41s) |
ATM_ STEP_ STASH (181s) |
⇓ |
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below
|
See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below
|
EG_ SISL_ INIT_ UVW (107s) |
EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (75 + 75 + 88 = 238s)
|
EG_SWAP_ BOUNDS_DP (207s) |
STASH (208s) |
Itself (71s) |
EG_INTERPOLATION _ETA (353s) |
DEP- ARTURE_ POINT _ETA (161s) |
See profile for SWAP_ BOUNDS _DP below |
STWORK (208s) |
EG_ CUBIC_ LAG- RANGE (159s, itself) |
EG_VERT_ WEIGHTS_ ETA (39s, itself) |
MONO_ ENFORCE (20s, itself) |
Itself (70s) |
|
|
|
SPA- TIAL (53s) |
PP_ HEAD (73s) |
EXP- PXI (49s, itself) |
One thread without SMT (128 tasks on 4 nodes/128 cores)
Routines |
UM_SHELL (2,746s) |
U_MODEL_4A (2,744s) |
ATM_STEP_4A* (2,674s) |
ATMOS _PHYS- ICS1 (1,031s) |
EG_ COR- RECT_ TRAC- ERS (123s) |
ATMOS _PHYS- ICS2 (438s) |
EG_ SL_ HELM- HOLTZ (228s) |
TR_ SET_ PHYS _4A* (48s) |
EG_ SISL_ INIT (49s) |
SL_ TRAC- ER1_ 4A (107s) |
EG_ SL_ MOI- STURE (83s) |
EG_SL_ FULL_WIND (154s) |
EG_Q_ TO_MIX (27s) |
ATM_ STEP_ STASH (115s) |
⇓ |
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below
|
See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below
|
EG_ SISL_ INIT_ UVW (46s) |
EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (45 + 45 + 52 = 142s)
|
EG_SWAP_ BOUNDS_DP (107s) |
STASH (132s) |
Itself (33s) |
EG_INTERPOLATION _ETA (200s) |
DEP- ARTURE_ POINT _ETA (98s) |
See profile for SWAP_ BOUNDS _DP below |
STWORK (131s) |
EG_ CUBIC_ LAG- RANGE (97s, itself) |
EG_VERT_ WEIGHTS_ ETA (18s, itself) |
MONO_ ENFORCE (12s, itself) |
Itself (50s) |
|
|
|
SPA- TIAL (34s) |
PP_ HEAD (46s) |
EXP- PXI (31s, itself) |
Two threads with SMT (128 tasks on 4 nodes/128 cores) - normal OpenMP & SMT
selection
Routines |
UM_SHELL (2,568s) |
U_MODEL_4A (2,566s) |
ATM_STEP_4A* (2,498s) |
ATMOS _PHYS- ICS1 (949s) |
EG_ COR- RECT_ TRAC- ERS (112s) |
ATMOS _PHYS- ICS2 (404s) |
EG_ SL_ HELM- HOLTZ (240s) |
TR_ SET_ PHYS _4A* (48s) |
EG_ SISL_ INIT (50s) |
SL_ TRAC- ER1_ 4A (102s) |
EG_ SL_ MOI- STURE (79s) |
EG_SL_ FULL_WIND (135s) |
EG_Q_ TO_MIX (21s) |
ATM_ STEP_ STASH (112s) |
⇓ |
See profile for ATMOS_ PHYSICS1 and EG_CORRECT _TRACERS below
|
See profile for ATMOS_ PHYSICS2 and EG_SL_ HELMHOLTZ below
|
EG_ SISL_ INIT_ UVW (47s) |
EG_SL_WIND_U, EG_SL_WIND_V & EG_SL_WIND_W (39 + 38 + 45 = 122s)
|
EG_SWAP_ BOUNDS_DP (107s) |
STASH (130s) |
Itself (33s) |
EG_INTERPOLATION _ETA (177s) |
DEP- ARTURE_ POINT _ETA (83s) |
See profile for SWAP_ BOUNDS _DP below |
STWORK (130s) |
EG_ CUBIC_ LAG- RANGE (80s, itself) |
EG_VERT_ WEIGHTS_ ETA (19s, itself) |
MONO_ ENFORCE (14s, itself) |
Itself (36s) |
|
|
|
SPA- TIAL (33s) |
PP_ HEAD (46s) |
EXP- PXI (30s, itself) |
*should also link to SWAP_BOUNDS_DP, like many other
returns.
Profiling for ATMOS_PHYSICS1 and EG_CORRECT_TRACERS
One thread with SMT (128 tasks on 2 nodes/64 cores)
ATMOS_PHYSICS1 (1,610s) |
EG_CORRECT_TRACERS (258s) |
RAD_CTL (387s) |
MICROPHYS_CTL (279s) |
NI_GWD_CTL (303s) |
⇓ |
⇓ |
EG_MASS_ CONSERVATION (187s) |
LW_RAD (285s) |
SW_RAD (93s) |
LS_PPN (259s) |
G_ WAVE _5A (257s) |
GW_ USSP (45s) |
GLOBAL _2D_ SUMS (48s, itself) |
Itself (130s) |
RADIANCE_CALC (366s) |
LS_PPNC (241s) |
SWAP_ BOUNDS (see table below) |
SOLVE_BAND_K_EQV (301s) |
GREY_ OPT_ PROP (40s) |
LSP_ICE (211s) |
Itself (30s) |
MCICA_ SAMPLE (233s) |
SCALE_ ABSORB (42s) |
OPT_ PROP_ AEROSOL (32s, itself) |
LSP_ INIT (46s) |
LSP_ FALL (40s) |
MONOCHR- OMATIC_ RADIANCE (203s) |
Itself (32s) |
Itself (22s) |
Itself (35s) |
MONOCHR- OMATIC_ RADIANCE _TSEQ (176s) |
MCICA_ COLUMN (175s) |
TWO_COEFF (118s) |
TRANS_ SOURCE_ COEFF (66s) |
Itself (20s) |
Itself (42s) |
|
|
One thread without SMT (128 tasks on 4 nodes/128 cores)
ATMOS_PHYSICS1 (1,031s) |
EG_CORRECT_TRACERS (123s) |
RAD_CTL (232s) |
MICROPHYS_CTL (193s) |
NI_GWD_CTL (219s) |
⇓ |
⇓ |
EG_MASS_ CONSERVATION (91s) |
LW_RAD (171s) |
SW_RAD (56s) |
LS_PPN (180s) |
G_ WAVE _5A (184s) |
GW_ USSP (35s) |
GLOBAL _2D_ SUMS (32s, itself) |
Itself (65s) |
RADIANCE_CALC (218s) |
LS_PPNC (171s) |
SWAP_ BOUNDS (see table below) |
SOLVE_BAND_K_EQV (185s) |
GREY_ OPT_ PROP (20s) |
LSP_ICE (155s) |
Itself (16s) |
MCICA_ SAMPLE (142s) |
SCALE_ ABSORB (30s) |
OPT_ PROP_ AEROSOL (15s, itself) |
LSP_ INIT (31s) |
LSP_ FALL (32s) |
MONOCHR- OMATIC_ RADIANCE (124s) |
Itself (26s) |
Itself (16s) |
Itself (29s) |
MONOCHR- OMATIC_ RADIANCE _TSEQ (108s) |
MCICA_ COLUMN (108s) |
TWO_COEFF (77s) |
TRANS_ SOURCE_ COEFF (44s) |
Itself (s) |
Itself (s) |
|
|
Two threads with SMT (128 tasks on 4 nodes/128 cores) - normal OpenMP & SMT
selection
ATMOS_PHYSICS1 (949s) |
EG_CORRECT_TRACERS (122s) |
RAD_CTL (215s) |
MICROPHYS_CTL (186s) |
NI_GWD_CTL (176s) |
⇓ |
⇓ |
EG_MASS_ CONSERVATION (90s) |
LW_RAD (153s, 153s) |
SW_RAD (53s, 53s) |
LS_PPN (174s) |
G_ WAVE _5A (150s) |
GW_ USSP (25s) |
GLOBAL _2D_ SUMS (38s, itself) |
Itself (63s) |
RADIANCE_CALC (200s, 199s) |
LS_PPNC (163s) |
SWAP_ BOUNDS (see table below) |
SOLVE_BAND_K_EQV (155s, 154s) |
GREY_ OPT_ PROP (33s, 33s) |
LSP_ICE (95s, 96s) |
Itself (68s) |
MCICA_ SAMPLE (122s, 122s) |
SCALE_ ABSORB (20s, 20s) |
OPT_ PROP_ AEROSOL (29s, 29s, itself) |
LSP_ INIT (23s, 23s) |
LSP_ FALL (19s, 19s) |
MONOCHR- OMATIC_ RADIANCE (107s, 107s) |
Itself (15s, 15s) |
Itself (11s, 11s) |
Itself (16s, 17s) |
MONOCHR- OMATIC_ RADIANCE _TSEQ (94s, 94s) |
MCICA_ COLUMN (94s, 94s) |
TWO_COEFF (67s, 67s) |
TRANS_ SOURCE_ COEFF (37s, 37s) |
Itself (12s) |
Itself (23s) |
|
|
Profiling for ATMOS_PHYSICS2 and EG_SL_HELMHOLTZ
One thread with SMT (128 tasks on 2 nodes/64 cores)
ATMOS_PHYSICS2 (706s) |
EG_SL_HELMHOLTZ (470s) |
NI_CONV_CTL (325s) |
NI_IMP_CTL (136s) |
SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV
(see table below)
|
EG_BICGSTAB (272s) |
EG_HELM_RHS_STAR (132s) |
GLUE_CONV_5A (265s) |
IMP_SOLVER (64s) |
EG_PRECON (198s) |
EG_SISL_INIT* (113s) |
Itself (105s) |
MID_CONV_5A (92s) |
TRI_SOR_DP_DP (198s) |
EG_SISL_INIT_UVW (107s) |
Itself (31s) |
Itself (143s) |
Itself (71s) |
One thread without SMT (128 tasks on 4 nodes/128 cores)
ATMOS_PHYSICS2 (s) |
EG_SL_HELMHOLTZ (s) |
NI_CONV_CTL (187s) |
NI_IMP_CTL (87s) |
SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV
(see table below)
|
EG_BICGSTAB (123s) |
EG_HELM_RHS_STAR (74s) |
GLUE_CONV_5A (154s) |
IMP_SOLVER (42s) |
EG_PRECON (86s) |
EG_SISL_INIT* (49s) |
Itself (56s) |
MID_CONV_5A (54s) |
TRI_SOR_DP_DP (86s) |
EG_SISL_INIT_UVW (46s) |
Itself (17s) |
Itself (58s) |
Itself (33s) |
Two threads with SMT (128 tasks on 4 nodes/128 cores) - normal OpenMP & SMT
selection
ATMOS_PHYSICS2 (404s) |
EG_SL_HELMHOLTZ (240s) |
NI_CONV_CTL (173s) |
NI_IMP_CTL (79s) |
SWAP_BOUNDS, SWAP_BOUNDS_2D_MV & SWAP_BOUNDS_MV
(see table below)
|
EG_BICGSTAB (140s) |
EG_HELM_RHS_STAR (69s) |
GLUE_CONV_5A (126s, 123s) |
IMP_SOLVER (38s) |
EG_PRECON (95s) |
EG_SISL_INIT* (50s) |
Itself (47s, 46s) |
MID_CONV_5A (42s, 42s) |
TRI_SOR_DP_DP (95s) |
EG_SISL_INIT_UVW (47s) |
Itself (13s, 13s) |
Itself (64s) |
Itself (33s) |
* EG_SISL_INIT is also called from ATM_STEP_4A.
Profiling for SWAP_* routines
One thread with SMT (128 tasks on 2 nodes/64 cores)
Routines |
Total mean time |
EG_SWAP_BOUNDS_DP (207s) |
ATMOS_PHYSICS1, ATMOS_PHYSICS2, G_WAVE_5A, ... |
207 + ... |
SWAP_BOUNDS & SWAP_BOUNDS_DP (653 + 227 = 880s)
|
SWAP_BOUNDS_MV (169s, itself) |
1,049s |
SWAP_BOUNDS_EW_DP (452s) |
SWAP_BOUNDS_NS_DP (425s, itself) |
1,046s |
SWAP_BOUNDS_EW_H1_DP (281s, itself) |
Itself (171s) |
1,046s |
One thread without SMT (128 tasks on 4 nodes/128 cores)
Routines |
Total mean time |
EG_SWAP_BOUNDS_DP (107s) |
ATMOS_PHYSICS1, ATMOS_PHYSICS2, G_WAVE_5A, ... |
107 + ... |
SWAP_BOUNDS & SWAP_BOUNDS_DP (375 + 121 = 496s)
|
SWAP_BOUNDS_MV (119s, itself) |
615s |
SWAP_BOUNDS_EW_DP (248s) |
SWAP_BOUNDS_NS_DP (245s, itself) |
612s |
SWAP_BOUNDS_EW_H1_DP (168s, itself) |
Itself (80s) |
612s |
Two threads with SMT (128 tasks on 4 nodes/128 cores) - normal OpenMP & SMT
selection
Routines |
Total mean time |
EG_SWAP_BOUNDS_DP (107s) |
ATMOS_PHYSICS1, ATMOS_PHYSICS2, G_WAVE_5A, ... |
107 + ... |
SWAP_BOUNDS & SWAP_BOUNDS_DP (339 + 118 = 457s)
|
SWAP_BOUNDS_MV (103s, itself) |
560s |
SWAP_BOUNDS_EW_DP (234s) |
SWAP_BOUNDS_NS_DP (222s, itself) |
559s |
SWAP_BOUNDS_EW_H1_DP (153s, itself) |
Itself (81s) |
559s |