Support for parallelisation in hydromad
hydromad.parallel(settings)
Placeholder
Hydromad allows model runs in some functions to be parallelised. The parallelisation methods provided for each function depend on its characteristics and are documented below and in each function's help page. Parallelisation is by default turned off, as it requires the user to set it up for their particular computer and to judge whether parallelisation is worthwhile for their particular case.
A number of functions in hydromad involve performing a large number of model
runs that are to some extent independent of each other. Hydromad uses a
number of R packages to allow these models runs to occur simultaneously.
This usually involves splitting the job between separate 'worker' R
sessions. However, parallelisation involves an overhead. Even on a single
computer, it takes extra time to communicate with workers to transfer
instructions and retrieve results. Parallelisation is therefore not always
worthwhile, and the best method for parallelising a particular analysis
depends on its characteristics. Do not use parallelisation if you only have
a single core, if you will run out of memory or the analysis already runs
quickly. Several settings of the parallelisation can be modified by
specifying a list of options to the function to be parallelised, with the
argument parallel
, e.g.:
evalPars(pars,model,parallel=list(method="foreach",
packages=c("hydromad","fuse"), async=TRUE))
The available options are described below.
The available method
are specific
to each function that is parallelised and documented in its respective help
page. A summary of the most common methods is given here.
method="clusterApply"
uses either the built-in parallel
package or the snow
package. Setting up the cluster requires code
like:
library(parallel)
cl <- makeCluster(2, type="SOCK")
Functions look for the object 'cl' in the global environment. The function
can be run on multiple cores on non-Windows machines by creating cl using
makeForkCluster
. method="foreach"
allows a number of
backends, e.g. the packages doParallel, doRedis. Each of these
packages provides a backend-specific registration function that needs to be
called, e.g.:
library(doParallel)
registerDoParallel()
foreach
generally incurs a higher overhead than other methods, but
has the advantage of flexibility.
If the model being run depends on functions or variables not defined in the hydromad package, they may need to be explicitly exported. The names of the variables to be exported are specified as a character vector, e.g.:
export=c("mySMA.sim","myNewObjectiveFunction")
Examples:
if you have created your own soil moisture accounting function
mySMA.sim
if your objective function or a model you have made
yourself depends on other functions or variables in the global environment,
e.g.
objective=function(Q,X){myNewObjectiveFunction(Q-X)+hmadstat("r.squared")(Q,X)}
If you are using models that defined in another
package, e.g. the fuse
set of models, then the workers can be told to
load the package as well as hydromad by specifying, e.g.
packages=c("hydromad","fuse")
. Make sure the package is installed on
all the workers.
If the parallelisation method supports
it, some functions allow runs to occur in the background. Instead the
function returns immediately. Results can be retrieved later using methods
specific to the parallelisation method. Where available, this feature is
enabled with async=TRUE
.
Placeholder
Functions with parallelisation: evalPars
,
crossValidate
, update.runlist
,
objFunVal.runlist
, paretoObjectivesVaryWeights