CMIP configurations

The CMIP configurations we're considering are (according to Colin's e-mail on 2 July 2015)

  • Low resolution
    • UKESM-lr: N96/ORCA1: Full strat-trop UKCA and MEDUSA
    • UKESM-lr-cc: N96/ORCA1: Simplified UKCA and MEDUSA
    • UKESM-lr-ao: N96/ORCA1: Simplified UKCA no MEDUSA
  • High resolution
    • UKESM-hr: N216/ORCA025: Full strat-trop UKCA and MEDUSA
    • UKESM-hr-cc: N96/ORCA1: Simplified UKCA and MEDUSA
    • UKESM-hr-ao: N96/ORCA1: Simplified UKCA no MEDUSA
  • Hybrid resolution
    • UKESM-hybrid: N216+N96/ORCA025: Full strat-trop UKCA and MEDUSA
  • NEMO-MEDUSA (ocean only)

Cray unable to share nodes across submodels

I understand that the cray cannot share nodes across model components, so it would make sense to use multiples of 32 cores for each submodel component.

Not skimping of submodels requiring the smallest resources

Some submodels, like ORCA1 ocean, are very fast for a small amount of resources, so probably doesn't make sense to try and reduce resources here when it could slow the whole coupled model for minimal gain.

BICILES?

ARE WE NEEDING ANY RESOURCES FOR BICILES?

UKESM-lr

ORCA1

For one node, I think nemoCiceOrca1 will run at 10.3 model years/day. If MEDUSA slows this by about 3.5 times then one node will run at 2.94 model years/day. While for two nodes, nemoCiceOrca1 will run at 18.1 model years/day and MEDUSA at about 6.00 model years/day.

UKESM-lr: N96/ORCA1: Full strat-trop UKCA and MEDUSA

Maybe suggest 1 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 1.11 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
fullChemN96 medOrca1 fullChemN96 (nodes) medOrca1 (nodes)
1 1.11 220 (20x11) ? 224 (14x16) (7) 32 (1?) 8?

Alternatively, maybe suggest 2 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 2.22 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
fullChemN96 medOrca1 fullChemN96 (nodes) medOrca1 (nodes)
2 2.22 560 (28x20) ? 560 (24x24) (18) 32 (1?) 19?

UKESM-lr-cc: N96/ORCA1: Simplified UKCA and MEDUSA

Maybe suggest 2 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 2.22 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
offOxN96 medOrca1 offOxN96 (nodes) medOrca1 (nodes)
2 2.22 150 (10x15) ? 160 (10x16) (5) 32 (1?) 6?

Hence one model year is completed in 24 / 2 = 12 hours, so we need 12*6 = 72 node-hours to complete one model year. Or 72 * 32 = 2,304 core-hours to complete one model year.

On Cray we expect a 1.5 times slow down, so this is 2,304 * 1.5 = 3,456 core-hours to complete one model year. For ARCHER this is 3,456 * 15 = 51,860 AUs.

Alternatively, maybe suggest 4 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 4.44 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
offOxN96 medOrca1 offOxN96 (nodes) medOrca1 (nodes)
4 4.44 390 (26x15) ? 384 (16x24) (12) 64 (2?) 14?

UKESM-lr-ao: N96/ORCA1: Simplified UKCA no MEDUSA

Maybe suggest 2 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 2.22 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
offOxN96 nemoCiceOrca1 offOxN96 (nodes) nemoCiceOrca1 (nodes)
2 2.22 150 (10x15) 6 (2x3) 160 (10x16) (5) 32 (8x4) (1) 6

Alternatively, maybe suggest 4 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 4.44 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
offOxN96 nemoCiceOrca1 offOxN96 (nodes) nemoCiceOrca1 (nodes)
4 4.44 390 (26x15) 12 (3x4) 384 (16x24) (12) 32 (1) 13

UKESM-hr

ORCA025

Based on Yongming's runs, it looks like MEDUSA slows the ocean by a factor of about 3.

XIOS

I'm assuming that XIOS can share the same nodes as ocean?

UKESM-hr: N216/ORCA025: Full strat-trop UKCA and MEDUSA

Maybe suggest 0.6 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 0.67 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
fullChemN216 medOrca025 XIOS fullChemN216 (nodes) medOrca025 + XIOS (nodes)
0.6 0.67 1,020 (30x34) 192 (16x12) 6 1024 (32x32) (32) 218 (18x12) + 6 (7) 39?

UKESM-hr-cc: N216/ORCA025: Simplified UKCA and MEDUSA

Maybe suggest 1 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 1.11 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
offOxN216 medOrca025 XIOS offOxN216 (nodes) medOrca025 + XIOS (nodes)
1 1.11 550 (26x21) 340 (20x17) 11 544 (32x17) (17) 340 (20x17) + 12 (11) 28?

Hence one model year is completed in 24 / 1 = 24 hours, so we need 28*24 = 672 node-hours to complete one model year. Or 672 * 32 = 21,504 core-hours to complete one model year.

On Cray we expect a 1.5 times slow down, so this is 21,504 * 1.5 = 32,256 core-hours to complete one model year. For ARCHER this is 32,256 * 15 = 483,840 AUs.

UKESM-hr-ao: N216/ORCA025: Simplified UKCA no MEDUSA

Maybe suggest 1 model years/day. If we assume a 10% slow down for coupling, each component needs to reach 1.11 model years/day

Speed (model years/day) Speed/0.9 (model years/day) Predicted components Recommended components Total nodes
offOxN216 nemoCiceOrca025 XIOS offOxN96 (nodes) nemoCiceOrca025 + XIOS (nodes)
1 1.11 550 (26x21) 102 (6x17) 3 544 (32x17) (17) 124 + 4 (4) 21?

UKESM-hybrid?

NEMO-MEDUSA

In e-mail from 28 September, Andrew Yool has asked me about the cost of running NEMO-MEDUSA. He's not said which resolution or which platform.

NEMO-MEDUSA: ORCA1: low resolution ocean

Speed (model years/day) Predicted components Recommended components Total nodes
medOrca1 XIOS medOrca01 (nodes) XIOS (nodes)
12.8 160 (16x10) 5 160 (16x10) (5) 32 (1) 6

Hence one model year is completed in 24 / 12.8 = 1.875 hours, so we need 11.25 node-hours to complete one model year. Or 11.25 * 32 = 360 core-hours to complete one model year at ORCA1.

On Cray we expect a 1.5 times slow down, so this is 360 * 1.5 = 540 core-hours to complete one model year.

NEMO-MEDUSA: ORCA025: high resolution ocean

The graph from Julien's timings are a bit weird, because the gradient actually increases with number of PEs at one point, i.e. scaling is better than 1-1 for speed versus PEs. So I think I'll just take Julien's speed of 1.52 model years/day for 640 cores.

Speed (model years/day) Predicted components Recommended components Total nodes
medOrca025 XIOS medOrca025 (nodes) XIOS (nodes)
1.52 640 (32x20) 20 640 (32x20) (20) 32 (1) 21

Hence one model year is completed in 24 / 1.52 = 15.79 hours, so we need 332 node-hours to complete one model year. Or 332 * 32 = 10,611 core-hours to complete one model year at ORCA025.

On Cray we expect a 1.5 times slow down, so this is 15,917 core-hours to complete one model year.