Restarting

How NEMO determines how to start and end a cycle

The namelist responsible for determining the start and end of a cycle for NEMO is namelist_cfg, where the important parameters are

  • nn_it000: the first timestep of cycle
  • nn_itend: the last timestep of cycle

To convert the timesteps into model time, you need to determine how many timesteps there are per day. There are a few ways you can do this. One is if you know the dump cycle is 10 days, which is usually is, then divide the value of nn_write by 10. For ORCA1 this probably gives 320 / 10 = 32 timesteps per day (a 45 minute timestep). The equations for nn_it000 and nn_itend are therefore

nn_it000 = (days since start) * (timesteps per day) + 1

and

nn_itend = (days since start + cycle length in days) * (timesteps per day)

The submission scripts calculate the value of nn_it000 and nn_itend by taking the namelist from last successful cycle which is archived at share/data/History_Data/NEMOhist/namelist_cfg, and our new nn_it000 is calculated by adding 1 to the value of nn_itend in the archived namelist_cfg. The new nn_itend is calculated by adding the number of timesteps for a complete cycle (calculated using TASKLENGTH from rose suite) to the archived nn_itend.

How UM determines how to start and end a cycle

The UM read the history file, share/data/History_Data/<job id>.xhist, to determine the dump to start from and runs until it reaches run_target_end, where run_target_end is set in the rose-suite using the variable TASKEND.

Instructions for restarting

I've been getting a number of crashes in MPI_Finalize and I've asked Yongming for instructions for restarting.

  • Note the current cycle which failed, e.g. 1986011T0000Z for a 10 day cycle job
  • cd share/data/History_Data and
    • Remove the UM dump written at the end of current cycle, e.g. ah370a.da19860121_00
    • Find the temporary history file from the previous cycle, e.g. temp_hist.0001 or temp_hist.0002 (they should be the same) from work/19860101T0000Z/coupled/history_archive, and copy it over the history file (*xhist file).
  • cd share/data/History_Data/NEMOhist and remove all NEMO dumps written at end of current cycle, e.g rm *19860121*
  • cd share/data/History_Data/CICEhist and
    • Remove CICE dump at end of current cycle, e.g. rm *i.restart.1986-01-21-00000.nc
    • Edit ice.restart_file so that it's moved back one cycle, e.g. replace 1986-01-21-00000 with 1986-01-11-00000
  • If a work directory exists for the next cycle, remove it (in my example the directory work/19860121T0000Z did not exist).