Timing tests for using MEDUSA (ORCA1) on only a few nodes

Should we underpopulate the Broadwell nodes?

For the coupled run, I expect that we'll use about 3 nodes with attached XIOS for the ocean + MEDUSA. It's not currently clear if we should underpopulate the Broadwell nodes or not.

The job I'm using for this is u-ae035.

Cores used per Broadwell node (total cores used) NEMO_IPROC * NEMO_JPROC (CICE_BLKX * CICE_BLKY) Times for first 6 months Average time for 1 month Speed (model yrs/day)
36 (full, 108) 12*9 (30*37) 29:13 (1,753s), 28:40 (1,720s), 28:29 (1,709s), 29:34 (1,774s), 28:33 (1,713s) & 29:02 (1,742s) 28:55 (1,735s) 4.15
32 (96) 12*8 (30*42) 30:12 (1,812s), 30:55 (1,855s), 30:09 (1,809s), 30:12 (1,812s), 30:05 (1,805s) & 30:44 (1,844s) 30:23 (1,823s) 3.95
28 (84) 12*7 (30*48) 31:53 (1,913s), 30:29 (1,829s), 29:58 (1,798s), 30:38 (1,838s), 30:23 (1,823s) & 30:01 (1,801s) 30:34 (1,834s) 3.93
24 (72) 9*8 (40*42) 32:02 (1,922s), 33:59 (2,039s), 34:53 (2,093s), 32:49 (1,969s), 32:59 (1,979s) & 32:17 (1,937s) 33:10 (1,990s) 3.62

Hence, we can be fairly confident that using all the cores - no underpopulation - is best.

Run times for 1 and 2 nodes

We have a run time for 3 nodes from above, which is a lot faster than our atmosphere will probably be. Maybe we can get away with less nodes for MEDUSA.

Following the timing tests above, I'm fully populating the nodes.

Nodes (cores) NEMO_IPROC * NEMO_JPROC (CICE_BLKX * CICE_BLKY) Times for first 6 months Average time for 1 month Speed (model yrs/day)
2 (72) 9*8 (40*42) 39:26 (2,366s), 40:35 (2,435s), 39:49 (2,389s), 41:20 (2,480s), 41:50 (2,510s) & 39:31 (2,371s) 40:25 (2,425s) 2.97
1 (36) 9*4 (40*83) 1:07:43 (4,063s), 1:10:54 (4,254s), 1:08:02 (4,082s), 1:08:31 (4,111s), 1:07:17 (4,037s) & 1:07:19 (4,039s) 1:08:18 (4,098s) 1.76

Likely number of nodes for ocean

Given that the ocean at ORCA1 is so much cheaper than almost any atmosphere configuration at N96, we'll probably use 3 nodes for the ocean - even though 2 nodes it likely to be enough most of the time. Otherwise, we risk the odd random slow ocean run which causes all the nodes we're using for the atmosphere to wait.

It's obviously worth testing this.