The calls from LSP_SUBGRID to both QWIDTH and to LSP_QCLEAR which calls QWIDTH is additional to the GLOMAP runs, and adds about 100s to the Dr Hook runs compared to the CLASSIC run. The time spent in these routines is small, but as QWIDTH is called about 166 millions times on each PE then the total time becomes very significant.
Richard has pointed out that a significant proportion of this time could be down to Dr Hook, so I removed the Dr Hook calls in both LSP_QCLEAR and QWIDTH and this has shown Richard to be right - the time has been reduced from 107s to 9s!
It should be possible to save time by moving the vital code for QWIDTH into LSP_QCLEAR and LSP_SUBGRID. The original code is below.
SUBROUTINE lsp_subgrid( ... ! Cloud modules USE cloud_inputs_mod, ONLY: ice_width ... DO i = 1, points ... width = qwidth(q(i), qcftot(i), qs(i), qsl(i), rhcpt(i))
SUBROUTINE LSP_QCLEAR( ... ! loop over points DO k = 1, npnts ... width = qwidth(q(k), qcf(k), qsmr(k), qsmr_wat(k), rhcrit(k))
FUNCTION qwidth(q, qcf, qsmr, qsmr_wat, rhcrit) USE cloud_inputs_mod, ONLY: ice_width ... qwidth = 2.0 *(1.0-rhcrit)*qsmr_wat & *MAX((1.0-0.5*qcf/(ice_width * qsmr_wat)), 0.001) ! The full width cannot be greater than 2q because otherwise ! part of the gridbox would have negative q. Also ensure that ! the full width is not zero (possible if rhcpt is 1). qwidth = MIN(qwidth , MAX(2.0*q,0.001*qsmr)) ! 0.001 is to avoid divide by zero problems
I've change LSP_SUBGRID above to be
SUBROUTINE lsp_subgrid( ... ! Cloud modules USE cloud_inputs_mod, ONLY: ice_width ... DO i = 1, points ... width = 2.0 *(1.0-rhcpt(i))*qsl(i) & *MAX((1.0-0.5*qcftot(i)/(ice_width * qsl(i))), 0.001) ! The full width cannot be greater than 2q because otherwise ! part of the gridbox would have negative q. Also ensure that ! the full width is not zero (possible if rhcpt is 1). ! 0.001 is to avoid divide by zero problems width = MIN(width , MAX(2.0*q(i),0.001*qs(i)))
I've change LSP_QCLEAR above to be
SUBROUTINE LSP_QCLEAR( ... ! Cloud modules USE cloud_inputs_mod, ONLY: ice_width ... ! loop over points DO k = 1, npnts ... width = 2.0 *(1.0-rhcrit(k))*qsmr_wat(k) & *MAX((1.0-0.5*qcf(k)/(ice_width * qsmr_wat(k))), 0.001) ! The full width cannot be greater than 2q because otherwise ! part of the gridbox would have negative q. Also ensure that ! the full width is not zero (possible if rhcpt is 1). ! 0.001 is to avoid divide by zero problems width = MIN(width , MAX(2.0*q(k),0.001*qsmr(k)))
When running this with Dr Hook in both LSP_SUBGRID and LSP_QCLEAR (there is no QWIDTH) the output is identical and the total time taken in LSP_SUBGRID is about 10s.
As the mean time in this routine was about 107s, this is a saving of about 97 seconds. However, as this was one of the more imbalanced routines - time spent in this routine varied between about 11s and 191s - the actual run time saved appear much greater, very roughly around 180s.
After our three changes, the total time in UM_SHELL is down from about 5,000s to 4,388s (from about 95% more than the CLASSIC run time to 71% more).