Timings for ocean only

ORCA025 scaling (see Colin's e-mail from 18 Sep 2014). Data is originally from Tim Graham, so I assume it's on our current HPCs.

.

From the figure above, I'm getting the table below

Number of Cores Days for one model year - inverse of next column (seconds for for one model) Speed (model years per day)
901.11 (96,000s)0.9
1950.513 (44,308s)1.95
2600.370 (32,000s)2.7
3150.303 (26,182s)3.3
6500.175 (15,158s)5.7

ORCA025

We running our own ocean runs. Richard has found me Dave Storky's G05.0 standard job (amhih), which is NEMO-CICE, and I've copied this to jabha. One month in this job appears to be 28 days, so I've explicitly run it for 30 days to represent one month.

When changing the `Number of PEs for NEMO East-West' and `Number of PEs for NEMO North-South', I need to make sure that

  • (Number of columns for CICE East-West) / (Number of columns per block for CICE East-West) = (Number of PEs for NEMO East-West), and
  • (Number of rows for CICE North-South) / (Number of rows per block for CICE North-South) = (Number of PEs for NEMO North-South)

According to Richard

(Number of columns for CICE East-West) / (Number of PEs for NEMO East-West) > 32/33

and

(Number of rows for CICE North-South) / (Number of PEs for NEMO North-South) > 32/33

which suggests that (Number of PEs for NEMO East-West) < 1440 / 32.5 = 44.3, and (Number of rows for CICE North-South) < 1020 / 32.5 = 31.4.

The (Number of PEs for NEMO East-West) is 1440 and the (Number of PEs for NEMO East-West) must be a factor of this. The factors of 1440 are 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 30, 32, 36, 40, 45, 48, ...

The (Number of PEs for NEMO North-South) is 1020 and the (Number of PEs for NEMO North-South) must be a factor of this. The factors of 1020 are 2, 3, 4, 5, 6, 10, 12, 15, 17, 20, 30, 34, ...

(NEMO East-West)*(NEMO North South) (total) Model run lengthTotal time (s) Speed (model years per day)
16 * 12 (192) 30 days4,991 1.44
32 * 12 (384) 30 days2,633 2.73
32 * 20 (640) 30 days1,707 4.22
32 * 30 (960) 30 days1,319 5.46
48 * 20 (960) 30 daysIllegal instruction
36 * 30 (1,080) 30 days1,211 5.95
40 * 30 (1,200) 30 days Symbol resolution failed for nemo.exe
36 * 34 (1,224) 30 days

It seems that the code crashes when either

  • (Number of PEs for NEMO East-West) > (Number of columns for CICE East-West), or
  • (Number of PEs for NEMO North-South) > (Number of rows for CICE North-South)
and 36 * 40 = 1440 and 30 * 34 = 1020, so this would make 36 * 30 as the largest PE decomposition.

Comparing my times with Tim Graham's results

The plot at top of page, from Tim Graham, shows much faster times than I'm getting for my ORCA025. Maybe the plot above is for NEMO only, and doesn't include CICE?

Wrong, Tim's results do contain CICE. He says it's because some diagnostics take a long time, see next section.

ORCA025 with Tim Graham's change

According to Tim, `There were some extra diagnostics added at GO5 that weren’t in my runs. I suspect that calculating these may be quite slow. You can turn them off as follows:

  1. Take a copy of ~frsy/NEMO/GO5.0/nemo_keys_GO5.0_diacorr_trdtra.cfg on your Linux machine.
  2. Remove the last 2 keys (key_diacorr and key_trdtra)
  3. In the UMUI go to FCM Configuration -> FCM options for NEMO
  4. Change the FPP keys configuration file to point to your new file created above
  5. Recompile and rerun the model'

I've created jabhb, although I've mostly used times taken directly from Tim ( /net/home/h05/hadtd/My_Code/Python_workspace/Ocean_resolution_plots/ORCA025_1month_times)

(NEMO East-West)*(NEMO North South) (total) Model run lengthTotal time (s) Speed (model years per day)
80+ 1 month8,193 0.879
128+ 1 month5,294 1.36
160+ 1 month4,279 1.68
192+ 1 month3,651 1.97
256+ 1 month2,697 2.67
32 * 12 (384) 30 days2,005 3.59
32 * 20 (640) 30 days1,336 5.39
640+ 1 month1,276 5.64
960+ 1 month1,039 6.93
36 * 30 (1,080) 30 days998 7.21
+Times from Tim

Comparing my times with Tim Graham's results II

The speeds after Tim's changes are shown below where nemoCiceOrca025 is with Tim's changes and nemoCiceOrca025-2 is the GO5 standard job.

The questions are

  • Can we do without the keys key_diacorr and key_trdtra?
    • key_trdtra is ocean tracer trends. Dan Copsey says it's useful for determining biases.
  • And if not, can we shift them to XIOS (which isn't included in run above)?
    • Tim and I have been running vn3.4, which doesn't have XIOS.
    • For the atmosphere the rough ratio of IOS cores to model cores is 1:32. A bit early to say, but Dan think it's likely to be similar for XIOS in the ocean, e.g. 512 ocean cores would probably have about 16 XIOS cores.

I think we can almost follow the dark blue line, but multiply the cores by (1+1/32) to allow for XIOS.

ORCA1

I've coped Tim Graham's amwmn to jabhc and combined some of my times with his (/net/home/h05/hadtd/My_Code/Python_workspace/Ocean_resolution_plots/ORCA1_2year_times)

(NEMO East-West)*(NEMO North South) (total) Model run lengthTotal time (s) Speed (model years per day)
64+ 2 years9,183 18.8
64+ 2 years9,046 19.1
128+ 2 years5,327 32.4
192+ 2 years4,037 42.8
12*16 (192) 2 years3,678 47.0
256+ 2 years3,449 50.1
320+ 2 years3,004 57.5
+Times from Tim

The one time I've done (12*16 = 192 cores) does look to be faster - maybe some improvements with time. I think Tim's runs were done a while ago.

And Tim's scaling plot