All the runs below use
Job id | Description | Times for first 4 years | Average time for 1 year | Speed (model yrs/day) |
---|---|---|---|---|
Julien's u-aj946 | Our reference spin-up suite | 1:06:50 (4,010s), 1:05:47 (3,947s), 1:04:28 (3,868s) & 1:03:12 (3792s) | 3,904.25s (1:05:04) | 22.1 |
u-ak454* | Ported to XCS | 1:00:58 (3,658s), 1:01:35 (3,695s), 0:58:44 (3,524) & 0:59:48 (3,588s) | 3,616.25s (1:00:16) | 23.9 |
u-ak491** | XCS & using Richard's version of GO6 package branch, which has Maff's pointer->allocatable optimisations in | 1:00:15 (3,615s), 0:56:27 (3,387s), 0:58:06 (3,486s) & 0:56:21 (3,381s) | 3,467.25s (0:57:47) | 24.9 |
u-ak506*** | XCS, Richard's GO6 package branch & Maff's compiler optimisations | 0:50:31 (3,031s), 0:51:12 (3,072s), 0:53:08 (3,188s) & 0:53:04 (3,184s) | 3,118.75s (0:51:59) | 27.7 |
The reason why all fields aren't tested is that there doesn't seem to be an easy method of comparing all fields in two netCDF files (unlike for the UM, where we have um-cumf). What I've done is create the full restart files at the end of each run and then use ncdiff to find the difference between these files. Rather than checking that all the differences are zero, I've generally just considered SN and TN for difference for the *_restart.nc files and a couple of fields from the *_restart_trc.nc files, e.g. TNCHN and TNDiC.
Some unnecessary MPI communication has been removed since I last looked at the maximum speed of MEDUSA, so I'd expect that we can increase the PEs before reaching maximum speed.
I'm using u-ak506 to do all these speed tests, with Richard's optimised version of GO6 package branch and Maff's compiler options.
NEMO_IPROC*NEMO_JPROC (CICE_BLKX*CICE_BLKY) | Total nodes | Times | Average time for 1 year | Speed (model yrs/day) |
---|---|---|---|---|
24*24 (15*14) | 25 | 0:50:31 (3,031s), 0:51:12 (3,072s), 0:53:08 (3,188s) & 0:53:04 (3,184s) | 3,118.75s (0:51:59) | 27.7 |
24*26 (15*13) | 27 | 0:50:44 (3,044s) | 28.4 | |
24*28* (15*12) | 29 | 0:47:27 (2,847s) | 30.3 | |
24*28* (15*12) | 27 | 48:21 (2,901s), 48:18 (2,898s), 47:48 (2,851s) & 47:22 (2,842s) | 2,873s (0:47:53) | 30.1 |
30*24 (12*14) | 31 | 0:48:09 (2,902s) | 29.8 | |
30*24 (12*14) | 31 | 45:13 (2,713s), 45:42 (2,742s), 45:06 (2,706s) & 45:33 (2,733s) | 2,723.5s (0:45:24) | 31.7 |
30*28 (12*12) | 36 | 43:32 (2,612s), 46:26 (2,786s), 44:15 (2,655s) & 43:17 (2,597s) | 2,662.5s (0:44:23) | 32.5 |
40*24 (9*14) | 41 | 45:07 (2,707s), 49:47 (2,987s), 43:19 (2,599s) & 46:20 (2,780s) | 2,768.25s (0:46:08) | 31.2 |
Based on the table above, I'd recommend the u-ak506 configuration with (NEMO_IPROC,NEMO_JPROC)=(30,28), except we should use the latest version of the GO6 package branch rather than Richard's version of this - and push to get the Maff's optimisations in Richard's branch into the GO6 package branch.