Estimates for run time of UKESM1

Estimates for how long UKESM1 might take and how much HPC usage we're likely to use is evolving. The page records some of the estimates.

1 September 2014: 0.85 model years per day

Information for this largely comes from Colin and Richard's e-mails on 1 September 2014. The estimate was derived from

  • GC2 run (N216-ORCA025) took 20 hours per model year on 36 nodes (see http://collab.metoffice.gov.uk/twiki/bin/view/Support/MONSooNJobSizes under HadGEM3 and HadGAM3. This is 720 node hours on 36 * 32 = 1,152 PEs
  • Richard found from speaking with Malcolm Roberts that the PE split was about 48:52 for ATMOS:NEMO-CICE. (Richard's old timings suggested a split of 70:30).
  • Colin's conversation with Mike Carter suggests that new HPC will have a similar speed to current HPC but have more nodes.
  • If we assume
    • Split should be about 50:50
    • ATMOS with simplified chemistry (offline oxidants) is 1.7 times slower than just ATMOS (based on one N96 run I did).
    • NEMO-CICE + MEDUSA is around 2 times slower than just NEMO-CICE (based on Andrew Yool's e-mail to Colin that says that adding MEDUSA is *2.73 for 128 PEs and *2.10 for 256 PEs).
    then the PE split should be about (50*1.7):(50*2) or 85:100 (or 46:54) - so more PEs for NEMO-CICE + MEDUSA. So the scaling factor for completing ATMOS + simplified chemistry is roughly (T(ATMOS + simplified chem) / TATMOS)|same number of PEs * (NPEATMOS / NPE(ATMOS + simplified chem)) = (85/50) * (50/100 / (85/(100 + 85)) = (85/50) * (50/100 * 185/85) = 185/100 or 1.85 times longer. As we've balanced NEMO-CICE + MEDUSA with ATMOS + simplified chemistry this will take the same time.
  • And 1.85 * 720 = 1,332 node hours. With 36 nodes this is 37.0 hours for one model year or 1.54 days for one model year.
  • Richard thinks the code can probably go up to about 1500 PEs, so about 47 nodes, which would be about 1,332 / 47 = 28.3 hours for one year, or about 1.18 days for one model year. This is about 1.2 days for one model year or 0.85 model years per day.

12 September: performance thresholds

  • 1.2 days for one model year would imply that the the historical + RCP run of 250 years would take 300 days or 10 months (with the usual IT problems, so one year). This suggests that 0.85 model years per day is roughly a lower bound for UKESM1, and we really need to get it higher.
  • Colin says in e-mail on this day that he wants most jobs to get at least 1.5 years per day (so 0.67 days for one model year) - about twice the estimate above.
  • The GC2 run (N216-ORCA025) mentioned in section above is 1.2 model years per day.