Timings for all UKESM configurations

We have the UKESM-LTSM 2017 coming up (12/13 June) and it would be useful to have an idea of the performance of all our configurations.

Comparing efficient setup with fatest setup

See the table below this for a description of these runs. All configurations use N96 or ORCA1 or both. There is a lot of variability in timings on the Cray XC40, so where I've got several runs I've generally given a speed to 2 significant figures which is a sort of average of relevant runs. If only one run, I've generally given it to 3 significant figures.

Where I haven't done extensive testing, a number of configuration could probably lose the odd ocean node and run just as fast. As the N96 atmosphere is by far the more expensive component is better to have too many ocean nodes - which won't make a big difference to speed and resource calculation - than too few - which would slow the atmosphere and make a big difference to speed and resource calculations.

Nodes (ATMOS, OCN) Core hours/model year Speed (model years/day)
Ocean with MEDUSA 5 (0, 5) 569 7.59
36 (0, 36) 957 32.5
GA7.1 10 (10, 0) 3,585 2.41
65 (65, 0) 8,666 6.48
97 (97, 0) 12,307 6.81
GC3.1 13 (10, 3) 4,800 2.34
69 (65, 4) 9,315 6.4
102 (97, 5) 12,960 6.8
GC3.1 + MEDUSA 14 (10, 4) 4,858 2.49
74 (65, 9) 9,543 6.6
108 (97, 11) 12,610 7.4
UKESM-CN* 14 (10, 4) 5,214 2.32
74 (65, 9) 10,764 5.94
108 (97, 11) 14,788 6.31
GA7.1 + StratTrop 12 (12, 0) 6,912 1.50
32 (32, 0) 12,567 2.20
65 (65, 0) 19,299 2.91
UKESM* 16 (12, 4) 9,600 1.44
36 (32, 4) 14,074 2.21
72 (65, 7) 21,751 2.86
*Run at UM10.6 so I didn't have Maff's OpenMP branch in the aerosol chemistry which speeds-up GA7 by about 8%.

The runs which produced the results above are described in this table

Ocean with MEDUSA
5 Nodes36 Nodes
  • u-ak506
  • (NEMO_IPROC,NEMO_JPROC)=(12,12)
  • (CICE_BLKX,CICE_BLKY)=(30,28)
  • OCN_PPN=36
  • XIOS_NPROC=8
  • Three six month cycles, which took 1:34:14 (5,654s), 1:33:55 (5,635s) & 1:36:31 (5,791s)
  • This is an average of 5,693s (1:34:53) for one six month cycle
  • A speed of 7.59 model years/day
  • This is 569 core hours/model year
  • u-ak506
  • Information at MEDUSA spin-up speed tests 2
  • (NEMO_IPROC,NEMO_JPROC)=(30,28)
  • (CICE_BLKX,CICE_BLKY)=(12,12)
  • OCN_PPN=24
  • XIOS_NPROC=8
  • 6 one year cycles
GA7.1
10 Nodes65 Nodes
  • u-am967
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=2
  • IOS_NPROC=6
  • Total ATMOS nodes=65
  • Run for three three month cycles which took 52:02 (3,122s), 1:00:03 (3,603s) & 54:34 (3,274s)
  • This is an average time of 3,333s (55:33)
  • A speed of 6.48 model years/day
  • And 8,666 core hours/model year
97 Nodes
  • u-am967
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=3
  • IOS_NPROC=6
  • Total ATMOS nodes=97
  • Run for three three month cycles which took 49:16 (2,956s), 50:02 (3,002s) & 59:11 (3,551s)
  • This is 3,170s (52:50)
  • A speed of 6.81 model years/day
  • And 12,307 core hours/model year
GC3.1
13 Nodes69 Nodes
  • u-am151
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(20,18)
  • OMPTHR_ATM=0
  • IOS_NPROC=0
  • Total ATMOS nodes=10
  • (NEMO_IPROC,NEMO_JPROC)=(9,8)
  • (CICE_BLKX,CICE_BLKY)=(40,42)
  • XIOS_NPROC=6
  • Total OCN nodes=3
  • One month cycle for XIOS
  • Using "one_file"
  • Full iodef.xml
  • Run for three one month cycles which took 51:25 (3,085s), 50:47 (3,047s) & 51:46 (3,106s)
  • This is an average of 3,079s (51:19)
  • A speed of 2.34 model years/day
  • And 4,800 core hours/model year
  • u-am151
  • Under `Benchmark against GC3.1 N96/ORCA1' in Speed tests for MEDUSA + GC3.1
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=2
  • IOS_NPROC=6
  • Total ATMOS nodes=65
  • (NEMO_IPROC,NEMO_JPROC)=(12,9)
  • (CICE_BLKX,CICE_BLKY)=(30,37)
  • XIOS_NPROC=6
  • Total OCN nodes=4
  • One month cycle for XIOS
  • Using "multiple_file"
  • Full iodef.xml
  • Run for three three month cycles
102 nodes
  • u-am151
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=3
  • IOS_NPROC=6
  • Total ATMOS nodes=97
  • (NEMO_IPROC,NEMO_JPROC)=(12,12)
  • (CICE_BLKX,CICE_BLKY)=(30,28)
  • XIOS_NPROC=6
  • Total OCN nodes=5
  • One month cycle for XIOS
  • Using "one_file"
  • Full iodef.xml
  • Run for three three month cycles, which took 52:54 (3,174s), 54:50 (3,290s) & 51:35 (3,095s)
  • An average time of 3,186s (53:06)
  • This is a speed of 6.8 model years/day
GC3.1 + MEDUSA
14 Nodes74 Nodes
  • u-am354
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(20,18)
  • OMPTHR_ATM=0
  • IOS_NPROC=0
  • Total ATMOS nodes=10
  • (NEMO_IPROC,NEMO_JPROC)=(12,9)
  • (CICE_BLKX,CICE_BLKY)=(30,37)
  • XIOS_NPROC=6
  • Total OCN nodes=4
  • One month cycle for XIOS
  • Using "one_file"
  • Removed groupMEDUSA_cmip6 from iodef.xml
  • Run for three one month cycles which took 48:41 (2,921s), 48:10 (2,890s) & 47:45 (2,865s)
  • This is an average time of 2,892s (48:12) for one month
  • A speed of 2.49 model years/day
  • And this is 4,858 core hours/model year
  • u-am354
  • Information at bottom of Speed tests for MEDUSA + GC3.1
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=2
  • IOS_NPROC=6
  • Total ATMOS nodes=65
  • (NEMO_IPROC,NEMO_JPROC)=(18,16)
  • (CICE_BLKX,CICE_BLKY)=(20,21)
  • XIOS_NPROC=6
  • Total OCN nodes=9
  • One month cycle for XIOS
  • Using "one_file"
  • Removed groupMEDUSA_cmip6 from iodef.xml
  • Run for three three month cycles
108 nodes
  • u-am375
  • Information at bottom of Speed tests for MEDUSA + GC3.1
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=3
  • IOS_NPROC=6
  • Total ATMOS nodes=97
  • (NEMO_IPROC,NEMO_JPROC)=(20,18)
  • (CICE_BLKX,CICE_BLKY)=(18,19)
  • XIOS_NPROC=6
  • Total OCN nodes=11
  • One month cycle for XIOS
  • Using "one_file"
  • Removed groupMEDUSA_cmip6 from iodef.xml
  • Run for three three month cycles
UKESM-CN
14 Nodes74 Nodes
  • u-aj599
  • UM10.6
  • (ATM_PROCX,ATM_PROCY)=(20,18)
  • OMPTHR_ATM=1
  • IOS_NPROC=0
  • Total ATMOS nodes=10
  • (NEMO_IPROC,NEMO_JPROC)=(12,9)
  • (CICE_BLKX,CICE_BLKY)=(30,37)
  • XIOS_NPROC=6
  • Total OCN nodes=4
  • Using "one_file"
  • An old iodef.xml
  • Run for three one month cycles which took 51:42 (3,102s), 52:23 (3,143s) & 50:51 (3,051s)
  • This is an average of 3,099s (51:36)
  • A speed of 2.32 model years/day
  • And 5,214 core hours/model year
  • u-aj599
  • UM10.6
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=2
  • IOS_NPROC=6
  • Total ATMOS nodes=65
  • (NEMO_IPROC,NEMO_JPROC)=(18,16)
  • (CICE_BLKX,CICE_BLKY)=(20,21)
  • XIOS_NPROC=6
  • Total OCN nodes=9
  • Using "one_file"
  • An old iodef.xml
  • Run for three three month cycles which took 57:36 (3,456s), 1:04:36 (3,876s) & 59:38 (3,578s)
  • An average time of 3,637s (1:00:37) for one three month cycle
  • This is 5.94 model years/day
  • And is 10,764 core hours/model year
108 nodes
  • u-aj599
  • UM10.6
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=3
  • IOS_NPROC=6
  • Total ATMOS nodes=97
  • (NEMO_IPROC,NEMO_JPROC)=(20,18)
  • (CICE_BLKX,CICE_BLKY)=(18,19)
  • XIOS_NPROC=6
  • Total OCN nodes=11
  • Using "one_file"
  • An old iodef.xml
  • Run for three three month cycles which took 57:56 (3,476s), 56:17 (3,377s) & 56:54 (3,414s)
  • This is an average of 3422s (57:02) for one three month cycle
  • A speed of 6.31 model years/day
  • And 14,788 core hours/model year
GA7.1 + StratTrop
12 Nodes32 Nodes
  • u-am972
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=1
  • IOS_NPROC=0
  • Total ATMOS nodes=32
  • Run for three one month cycles which took 54:28 (3,268s), 51:47 (3,107s) & 57:14 (3,434s)
  • This is an average time of 3,270s (54:30)
  • A speed of 2.20 model years/day
  • And 12,567 core hours/model year
65 Nodes
  • u-am972
  • UM10.7
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=2
  • IOS_NPROC=6
  • Total ATMOS nodes=65
  • Run for three one month cycles which took 43:56 (2,636s), 43:53 (2,633s) & 35:56 (2,156s)
  • This is an average time of 2,475s (41:15)
  • This is a speed of 2.91 model years/day
  • And is 19,299 core hours/model year
UKESM
16 Nodes36 Nodes
  • u-ak081
  • UM10.6
  • (ATM_PROCX,ATM_PROCY)=(24,18)
  • OMPTHR_ATM=1
  • IOS_NPROC=0
  • Total ATMOS nodes=12
  • (NEMO_IPROC,NEMO_JPROC)=(12,9)
  • (CICE_BLKX,CICE_BLKY)=(30,37)
  • XIOS_NPROC=6
  • Total OCN nodes=4
  • Using "one_file"
  • An old iodef.xml
  • Run for three one month cycles which took 1:22:40 (4,960s), 1:23:53 (5,033s) & 1:23:24 (5,004s)
  • This is an average time of 4,999s (1:23:19) for one month
  • A speed of 1.44 model years/day
  • And 9,600 core hours/model year
  • u-ak081
  • UM10.6
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=1
  • IOS_NPROC=6
  • Total ATMOS nodes=32
  • (NEMO_IPROC,NEMO_JPROC)=(12,9)
  • (CICE_BLKX,CICE_BLKY)=(30,37)
  • XIOS_NPROC=6
  • Total OCN nodes=4
  • Using "one_file"
  • An old iodef.xml
  • Run for three one month cycles which took 55:35 (3,335s), 53:55 (3,235s) & 53:21 (3,201s)
  • This is an average of 3,257s (54:17)
  • A speed of 2.21 model years/day
  • And 14,074 core hours/model year
72 Nodes
  • u-ak081
  • UM10.6
  • (ATM_PROCX,ATM_PROCY)=(48,24)
  • OMPTHR_ATM=2
  • IOS_NPROC=6
  • Total ATMOS nodes=65
  • (NEMO_IPROC,NEMO_JPROC)=(18,12) (Probably more than I need)
  • (CICE_BLKX,CICE_BLKY)=(20,28)
  • XIOS_NPROC=6
  • Total OCN nodes=7
  • Using "one_file"
  • An old iodef.xml
  • Run for three one month cycles which took 43:32 (2,612s), 42:10 (2,530s) & 40:10 (2,410s)
  • An average of 2,517s (41:57) for one month
  • This is 2.86 model years/day
  • And 21,751 core hours/model year
UKESM-hybrid
? Nodes? Nodes

Comparing ARCHER performance with XCS

The one GC job that NCAS-CMS have ported is Dan's GC3.1 N216/ORCA025 UM10.6.1, u-ai599, which they ported to u-ak155. Both jobs have the following

  • (ATM_PROCX,ATM_PROCY)=(42,36)
  • OMPTHR_ATM=1
  • IOS_NPROC=0
  • Total ATMOS nodes
    • For XCS: 42
    • For ARCHER: 63
  • (NEMO_IPROC,NEMO_JPROC)=(24,26)
  • (CICE_BLKX,CICE_BLKY)=(60,47)
  • XIOS_NPROC
    • For XCS: =6
    • For ARCHER: =12 (and XCPU=2, this is probably relevant) and it's spread across 6 nodes to avoid OOMing (running out of virtual memory)
  • Total OCN nodes
    • For XCS: 19 (24*26=624 and 624/36=17.33 so we're a 2/3 of one node)
    • For ARCHER: 32
  • Total nodes
    • For XCS: 61 (2,196 cores)
    • For ARCHER: 95 (2,280 cores)
  • Performance for XCS
    • Three one months took 1:36:07 (5,767s), 1:37:10 (5,585s) & 1:34:06 (5,646s)
    • This is an average of 5,666s for one month (1:34:26)
    • A speed of 1.27 model years/day
    • And 2196 * 24/1.27 = 41,499 core hours/model year
  • Performance for ARCHER
    • Ros Hatcher timed one month at 1:24:29 (5,069s)
    • This is 1.42 model years/day
    • And 2280 * 24/1.42 = 38,535 core hours/model year
  • For this test ARCHER is 12% faster and uses 7.1% less resource