Statistics for use in hydromad

These functions provide access to a built-in set of statistics, and also allow the user to change or add named statistics. Its usage is similar to hydromad.options.

hydromad.stats(...)

buildCachedObjectiveFun(
  objective,
  model,
  DATA = observed(model, select = TRUE),
  Q = DATA[, "Q"]
)

hmadstat(name, DATA = NULL, Q = DATA[, "Q"])

Arguments

...: new stats can be defined, or existing ones modified, using one or more arguments of the form 'name = value' or by passing a list of such tagged values. Existing values can be retrieved by supplying the names (as character strings) of the components as unnamed arguments.
objective: Placeholder
model: Placeholder
DATA, Q: If either DATA or Q is given, the returned statistic function will be pre-evaluated with the given data. Technically, any special .() expressions in the named objective function will be evaluated and may refer to variables DATA and/or Q. These chunks are cached so that repeated evaluation of the objective function is faster; but it is then only valid for models using the same dataset. Remember that objective functions are normally evaluated on time series with the warmup period removed, not the whole time series passed to hydromad. The observed data excluding warmup can be extracted from a model object as Q = observed(model) or DATA = observed(model, select = TRUE).
name: character giving the name of a statistic.

Details

hmadstat returns the named statistical function, while hydromad.stats gets or sets a list of named functions.

The default set of available statistics can be listed with str(hydromad.stats()), and consists of:

abs.err: the mean absolute error.
RMSE: Root Mean Squared Error.
bias: bias in data units, \(\sum ( X - Q )\)
rel.bias: bias as a fraction of the total observed flow, \(\sum ( X - Q ) / \sum Q\) (excluding any time steps with missing values).
r.squared: R Squared (Nash-Sutcliffe Efficiency), \(1 - \sum (Q- X)^2 / \sum (Q - \bar{Q})^2\)
r.sq.sqrt: R Squared using square-root transformed data (less weight on peak flows), \(1 - \frac{\sum |\sqrt{Q} - \sqrt{X}|^2 }{ \sum |\sqrt{Q} - \bar{\sqrt{Q}}|^2 }\)
r.sq.log: R Squared using log transformed data, with an offset: \(1 - \frac{\sum | \log{(Q+\epsilon)} - \log{(X+\epsilon)}|^2 } {\sum |\log{(Q+\epsilon)} - \bar{\log{(Q+\epsilon)}}|^2 }\). Here \(\epsilon\) is the 10 percentile (i.e. lowest decile) of the non- zero values of Q.
r.sq.boxcox: R Squared using a Box-Cox transform. The power lambda is chosen to fit Q to a normal distribution. When lambda = 0 it is a log transform; otherwise it is \(y_* = \frac{(y+\epsilon)^\lambda - 1} {\lambda}\) Here \(\epsilon\) is the 10 percentile (i.e. lowest decile) of the non-zero values of Q.
r.sq.diff: R Squared using differences between successive time steps, i.e. rises and falls.
r.sq.monthly: R Squared with data aggregated into calendar months.
r.sq.smooth5: R Squared using data smoothed with a triangular kernel of width 5 time steps: c(1,2,3,2,1)/9.
r.sq.seasonal: R Squared where the reference model is the mean in each calendar month, rather than the default which is the overall mean.
r.sq.vartd: nseVarTd R Squared where the modelled peaks have been coalesced to observed peaks, minimising timing error. Note that this statistic requires event to be specified using eventseq
persistence: R Squared where the reference model predicts each time step as the previous observed value. This statistic therefore represents a model's performance compared to a naive one-time-step forecast.
X0: correlation of modelled flow with the model residuals.
X1: correlation of modelled flow with the model residuals from the previous time step.
ARPE: Average Relative Parameter Error. Requires that a variance- covariance matrix was estimated during calibration.
KGE: Kling-Gupta Efficiency. \(KGE = 1 - \sqrt{(r - 1)^2 + (\alpha - 1)^2 + (\Beta - 1)^2}\), where r is the linear correlation between observations and simulations, α a measure of the flow variability error, and β a bias term. Equivalently, it can be expressed as: 1 - sqrt( cor(X, Q)^2 + (sd(X)/sd(Q) - 1)^2 + (mean(X)/mean(Q) - 1)^2 ).

References

Gupta, H.V., Kling, H., Yilmaz, K.K., & Martinez, G.F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of Hydrology, 377(1), 80–91. https://doi.org/10.1016/j.jhydrol.2009.08.003

Examples


## see current set of stats
names(hydromad.stats())
#>  [1] "bias"           "rel.bias"       "abs.err"        "RMSE"          
#>  [5] "r.squared"      "r.sq.sqrt"      "r.sq.log"       "r.sq.boxcox"   
#>  [9] "r.sq.rank"      "r.sq.diff"      "r.sq.monthly"   "r.sq.365"      
#> [13] "r.sq.smooth5"   "r.sq.seasonal"  "r.sq.vs.tf"     "r.sq.vs.tf.bc" 
#> [17] "persistence"    "persistence.bc" "e.rain5"        "e.rain5.log"   
#> [21] "e.rain5.bc"     "e.q90"          "e.q90.log"      "e.q90.bc"      
#> [25] "e.q90.all"      "e.q90.all.log"  "e.q90.all.bc"   "e.q90.min"     
#> [29] "e.q90.min.log"  "ar1"            "X0"             "X1"            
#> [33] "U1"             "r.sq.vartd"     "KGE"           

## extract only one
hmadstat("RMSE")
#> function (Q, X, ...) 
#> sqrt(mean((X - Q)^2, na.rm = TRUE))
#> <bytecode: 0x7f831ef4e638>
#> <environment: 0x7f831ef24830>

## calculate stat value with random data
hmadstat("rel.bias")(Q = 1:10, X = 2:11)
#> [1] 0.1818182

## add a new objective function.
## A weighted combination of NSE and bias
## as proposed by Neil Viney
## OF = NSE – 5*|ln(1+Bias)|^2.5

hydromad.stats("viney" = function(Q, X, ...) {
  hmadstat("r.squared")(Q, X, ...) -
    5 * (abs(log(1 + hmadstat("rel.bias")(Q, X)))^2.5)
})

Arguments

Details

References

See also

Examples