Support for parallelisation in hydromad

hydromad.parallel(settings)

Arguments

settings: Placeholder

Details

Hydromad allows model runs in some functions to be parallelised. The parallelisation methods provided for each function depend on its characteristics and are documented below and in each function's help page. Parallelisation is by default turned off, as it requires the user to set it up for their particular computer and to judge whether parallelisation is worthwhile for their particular case.

A number of functions in hydromad involve performing a large number of model runs that are to some extent independent of each other. Hydromad uses a number of R packages to allow these models runs to occur simultaneously. This usually involves splitting the job between separate 'worker' R sessions. However, parallelisation involves an overhead. Even on a single computer, it takes extra time to communicate with workers to transfer instructions and retrieve results. Parallelisation is therefore not always worthwhile, and the best method for parallelising a particular analysis depends on its characteristics. Do not use parallelisation if you only have a single core, if you will run out of memory or the analysis already runs quickly. Several settings of the parallelisation can be modified by specifying a list of options to the function to be parallelised, with the argument parallel, e.g.:

evalPars(pars,model,parallel=list(method="foreach", packages=c("hydromad","fuse"), async=TRUE))

The available options are described below.

Method

The available method are specific to each function that is parallelised and documented in its respective help page. A summary of the most common methods is given here. method="clusterApply" uses either the built-in parallel package or the snow package. Setting up the cluster requires code like:

library(parallel)
cl <- makeCluster(2, type="SOCK")

Functions look for the object 'cl' in the global environment. The function can be run on multiple cores on non-Windows machines by creating cl using makeForkCluster. method="foreach" allows a number of backends, e.g. the packages doParallel, doRedis. Each of these packages provides a backend-specific registration function that needs to be called, e.g.:

library(doParallel) registerDoParallel()

foreach generally incurs a higher overhead than other methods, but has the advantage of flexibility.

Export

If the model being run depends on functions or variables not defined in the hydromad package, they may need to be explicitly exported. The names of the variables to be exported are specified as a character vector, e.g.:

export=c("mySMA.sim","myNewObjectiveFunction")

Examples:

if you have created your own soil moisture accounting function mySMA.sim
if your objective function or a model you have made yourself depends on other functions or variables in the global environment, e.g. objective=function(Q,X){myNewObjectiveFunction(Q-X)+hmadstat("r.squared")(Q,X)}

Packages

If you are using models that defined in another package, e.g. the fuse set of models, then the workers can be told to load the package as well as hydromad by specifying, e.g. packages=c("hydromad","fuse"). Make sure the package is installed on all the workers.

Async

If the parallelisation method supports it, some functions allow runs to occur in the background. Instead the function returns immediately. Results can be retrieved later using methods specific to the parallelisation method. Where available, this feature is enabled with async=TRUE.

References

Placeholder

Author

Joseph Guillaume