These functions provide access to a built-in set of statistics, and also
allow the user to change or add named statistics. Its usage is similar to
hydromad.options
.
hydromad.stats(...)
buildCachedObjectiveFun(
objective,
model,
DATA = observed(model, select = TRUE),
Q = DATA[, "Q"]
)
hmadstat(name, DATA = NULL, Q = DATA[, "Q"])
new stats can be defined, or existing ones modified, using one or more arguments of the form 'name = value' or by passing a list of such tagged values. Existing values can be retrieved by supplying the names (as character strings) of the components as unnamed arguments.
Placeholder
Placeholder
If either DATA
or Q
is given, the returned
statistic function will be pre-evaluated with the given data. Technically,
any special .()
expressions in the named objective function will be
evaluated and may refer to variables DATA
and/or Q
. These
chunks are cached so that repeated evaluation of the objective function is
faster; but it is then only valid for models using the same dataset.
Remember that objective functions are normally evaluated on time series with
the warmup period removed, not the whole time series passed to
hydromad
. The observed data excluding warmup can be extracted from a
model object as Q = observed(model)
or DATA = observed(model,
select = TRUE)
.
character giving the name of a statistic.
hmadstat
returns the named statistical function, while
hydromad.stats
gets or sets a list of named functions.
The default set of available statistics can be listed with
str(hydromad.stats())
, and consists of:
the mean absolute error.
Root Mean Squared Error.
bias in data units, \(\sum ( X - Q )\)
bias as a fraction of the total observed flow, \(\sum ( X - Q ) / \sum Q\) (excluding any time steps with missing values).
R Squared (Nash-Sutcliffe Efficiency), \(1 - \sum (Q- X)^2 / \sum (Q - \bar{Q})^2\)
R Squared using square-root transformed data (less weight on peak flows), \(1 - \frac{\sum |\sqrt{Q} - \sqrt{X}|^2 }{ \sum |\sqrt{Q} - \bar{\sqrt{Q}}|^2 }\)
R Squared using log transformed data, with an offset: \(1 - \frac{\sum | \log{(Q+\epsilon)} - \log{(X+\epsilon)}|^2 } {\sum |\log{(Q+\epsilon)} - \bar{\log{(Q+\epsilon)}}|^2 }\). Here \(\epsilon\) is the 10 percentile (i.e. lowest decile) of the non- zero values of Q.
R Squared using a Box-Cox transform. The power lambda is chosen to fit Q to a normal distribution. When lambda = 0 it is a log transform; otherwise it is \(y_* = \frac{(y+\epsilon)^\lambda - 1} {\lambda}\) Here \(\epsilon\) is the 10 percentile (i.e. lowest decile) of the non-zero values of Q.
R Squared using differences between successive time steps, i.e. rises and falls.
R Squared with data aggregated into calendar months.
R Squared using data smoothed with a triangular kernel
of width 5 time steps: c(1,2,3,2,1)/9
.
R Squared where the reference model is the mean in each calendar month, rather than the default which is the overall mean.
nseVarTd
R Squared where the modelled peaks
have been coalesced to observed peaks, minimising timing error. Note that
this statistic requires event
to be specified using
eventseq
R Squared where the reference model predicts each time step as the previous observed value. This statistic therefore represents a model's performance compared to a naive one-time-step forecast.
correlation of modelled flow with the model residuals.
correlation of modelled flow with the model residuals from the previous time step.
Average Relative Parameter Error. Requires that a variance- covariance matrix was estimated during calibration.
Kling-Gupta Efficiency. \(KGE = 1 - \sqrt{(r - 1)^2 + (\alpha
- 1)^2 + (\Beta - 1)^2}\), where r is the linear correlation between
observations and simulations, α a measure of the flow variability error, and
β a bias term. Equivalently, it can be expressed as: 1 -
sqrt( cor(X, Q)^2 + (sd(X)/sd(Q) - 1)^2 + (mean(X)/mean(Q) - 1)^2 )
.
Gupta, H.V., Kling, H., Yilmaz, K.K., & Martinez, G.F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of Hydrology, 377(1), 80–91. https://doi.org/10.1016/j.jhydrol.2009.08.003
## see current set of stats
names(hydromad.stats())
#> [1] "bias" "rel.bias" "abs.err" "RMSE"
#> [5] "r.squared" "r.sq.sqrt" "r.sq.log" "r.sq.boxcox"
#> [9] "r.sq.rank" "r.sq.diff" "r.sq.monthly" "r.sq.365"
#> [13] "r.sq.smooth5" "r.sq.seasonal" "r.sq.vs.tf" "r.sq.vs.tf.bc"
#> [17] "persistence" "persistence.bc" "e.rain5" "e.rain5.log"
#> [21] "e.rain5.bc" "e.q90" "e.q90.log" "e.q90.bc"
#> [25] "e.q90.all" "e.q90.all.log" "e.q90.all.bc" "e.q90.min"
#> [29] "e.q90.min.log" "ar1" "X0" "X1"
#> [33] "U1" "r.sq.vartd" "KGE"
## extract only one
hmadstat("RMSE")
#> function (Q, X, ...)
#> sqrt(mean((X - Q)^2, na.rm = TRUE))
#> <bytecode: 0x7f831ef4e638>
#> <environment: 0x7f831ef24830>
## calculate stat value with random data
hmadstat("rel.bias")(Q = 1:10, X = 2:11)
#> [1] 0.1818182
## add a new objective function.
## A weighted combination of NSE and bias
## as proposed by Neil Viney
## OF = NSE – 5*|ln(1+Bias)|^2.5
hydromad.stats("viney" = function(Q, X, ...) {
hmadstat("r.squared")(Q, X, ...) -
5 * (abs(log(1 + hmadstat("rel.bias")(Q, X)))^2.5)
})