| Title: | Stability-Selection via Correlated Resampling for 'GAMLSS' Models |
|---|---|
| Description: | Extends the 'SelectBoost' approach to Generalized Additive Models for Location, Scale and Shape (GAMLSS). Implements bootstrap stability-selection across parameter-specific formulas (mu, sigma, nu, tau) via gamlss::stepGAIC(). Includes optional standardization of predictors and helper functions for corrected AIC calculation. More details can be found in Bertrand and Maumy (2024) <https://hal.science/hal-05352041> that highlights correlation-aware resampling to improve variable selection for GAMLSS and quantile regression when predictors are numerous and highly correlated. |
| Authors: | Frederic Bertrand [cre, aut]
|
| Maintainer: | Frederic Bertrand <[email protected]> |
| License: | GPL-3 |
| Version: | 0.2.2 |
| Built: | 2026-06-01 10:19:22 UTC |
| Source: | https://github.com/fbertran/selectboost.gamlss |
Adjust as needed per family docs
.family_defaults().family_defaults()
List of list of default values for parameters for each supported distribution.
Per-family numeric tolerance for equality checks
.family_tolerance().family_tolerance()
List of numerical values, the default tolerance, for each supported distribution.
Try to generate values for a family
.gen_family(fam, n).gen_family(fam, n)
fam |
Character scalar naming the family. This should correspond to a
distribution available in the |
n |
Positive integer giving the number of observations to generate for the requested family. |
If successful, a umerical vector of n values randomly generated
for the requested family. If the generator is not available or fails, then
returns NULL.
AICc for a gamlss fit
AICc_gamlss(object)AICc_gamlss(object)
object |
a 'gamlss' object |
numeric AICc value
Runs a c0 grid, picks the c0 that maximizes total confidence, and returns the corresponding sb_gamlss fit.
autoboost_gamlss( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, c0_grid = seq(0.1, 0.9, by = 0.1), B = 60, sample_fraction = 0.7, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, trace = TRUE, progress = TRUE, use_groups = TRUE, corr_func = "cor", group_fun = SelectBoost::group_func_2, ... )autoboost_gamlss( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, c0_grid = seq(0.1, 0.9, by = 0.1), B = 60, sample_fraction = 0.7, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, trace = TRUE, progress = TRUE, use_groups = TRUE, corr_func = "cor", group_fun = SelectBoost::group_func_2, ... )
formula |
Base formula for the location |
data |
Data frame. |
family |
A |
mu_scope |
Formula of candidate terms for |
sigma_scope, nu_scope, tau_scope
|
Formulas of candidate terms for |
base_sigma, base_nu, base_tau
|
Optional base (always-included) formulas for |
c0_grid |
Numeric vector of |
B |
Number of bootstrap subsamples for stability selection. |
sample_fraction |
Fraction of rows per subsample (e.g., 0.7). |
pi_thr |
Selection proportion threshold to define “stable” terms (e.g., 0.6). |
k |
Penalty weight for stepwise GAIC when |
direction |
Stepwise direction for |
pre_standardize |
Logical; standardize numeric predictors before penalized fits. |
trace |
Logical; print progress messages. |
progress |
Logical; show a progress bar in sequential runs. |
use_groups |
Logical; treat SelectBoost correlation groups during resampling. |
corr_func |
Correlation function passed to |
group_fun |
Grouping function passed to |
... |
Passed to underlying engines (e.g., to |
A SelectBoost_gamlss_grid with summary plots/tables.
Cross-sectional anthropometric records for 7,482 Dutch boys aged 0 to 21 years that were used to construct the 1997 Dutch growth references. The dataset stores standard auxological indicators together with puberty and regional classification variables.
A data frame with 7,482 rows and 9 variables:
Decimal age in years ranging from birth to 21.
Standing height in centimetres.
Body weight recorded in kilograms.
Body mass index.
Head circumference in centimetres.
Ordered factor describing genital Tanner stage (G1–G5).
Ordered factor indicating pubic hair stage (P1–P6).
Testicular volume in millilitres.
Region of residence (north, east, west, south, city).
The table combines the complete cross-sectional sample of Dutch boys aged 0–21 years that formed the basis of the Dutch 1997 growth references. Tanner stage variables are stored as ordered factors, while the region indicator is a nominal factor.
Fredriks, A. M., van Buuren, S., Burgmeijer, R. J., Meulmeester, J. F., Beuker, R. J., Brugman, E., Roede, M. J., Verloove-Vanhorick, S. P., & Wit, J. M. (2000). Continuing positive secular growth change in The Netherlands 1955-1997. Pediatric Research, 47, 316-323.
Fredriks, A. M., van Buuren, S., Wit, J. M., & Verloove-Vanhorick, S. P. (2000). Body index measurements in 1996-97 compared with 1980. Archives of Disease in Childhood, 82, 107-112.
Stef van Buuren (2012).
data(boys7482) str(boys7482)data(boys7482) str(boys7482)
Computes both loglik_gamlss_newdata_fast() and loglik_gamlss_newdata()
and reports the absolute difference. Useful for sanity checks.
check_fast_vs_generic(fit, newdata, tol = 1e-08)check_fast_vs_generic(fit, newdata, tol = 1e-08)
fit |
A |
newdata |
Data frame to evaluate on. |
tol |
Tolerance for pass/fail (default 1e-8). |
A list with fields: ll_fast, ll_generic, abs_diff, pass.
Summarize selection proportions across c0 (SelectBoost threshold) into single-number confidence scores per term/parameter.
confidence_functionals( x, pi_thr = NULL, q = c(0.5, 0.8, 0.9), weight_fun = NULL, conservative = FALSE, B = NULL, method = c("trapezoid", "step") )confidence_functionals( x, pi_thr = NULL, q = c(0.5, 0.8, 0.9), weight_fun = NULL, conservative = FALSE, B = NULL, method = c("trapezoid", "step") )
x |
An object from |
pi_thr |
Stability threshold; defaults to |
q |
Numeric vector of quantiles to compute (in 0..1). |
weight_fun |
Optional function w(c0) for weighted AUSC; default uniform. |
conservative |
If TRUE, use Wilson lower confidence bounds for proportions. |
B |
Number of bootstraps (if not inferable when conservative = TRUE). |
method |
Integration method: "trapezoid" (default) or "step". |
A data.frame with per-term summaries, classed as "sb_confidence".
Compute SelectBoost-like confidence table across c0
confidence_table(grid, pi_thr = NULL)confidence_table(grid, pi_thr = NULL)
grid |
an object returned by |
pi_thr |
optional override of the threshold (defaults to grid$pi_thr) |
data.frame with term, parameter, conf_index (mean positive excess), cover (fraction of c0 with prop>=thr)
K-fold deviance for an sb_gamlss configuration
cv_deviance_sb(K, build_fit, data)cv_deviance_sb(K, build_fit, data)
K |
folds |
build_fit |
function(...) that returns an sb_gamlss object |
data |
data.frame used inside build_fit |
numeric: mean deviance across folds (-2 * mean loglik)
Varies one variable and holds others at typical values (median/mode) to plot the predicted parameter curve (default: mu). Uses ggplot2 if available, otherwise base.
effect_plot(fit, var, data, what = "mu", grid = 100) ## S3 method for class 'effect_plot_failure' print(x, ...)effect_plot(fit, var, data, what = "mu", grid = 100) ## S3 method for class 'effect_plot_failure' print(x, ...)
fit |
sb_gamlss object (or gamlss) |
var |
character, name of the variable to vary |
data |
original data.frame used to fit |
what |
which parameter to predict ("mu","sigma","nu","tau") |
grid |
number of grid points for numeric variable |
x |
object returned by |
... |
unused |
a ggplot object if ggplot2 present; otherwise draws base plot and returns NULL
Invisibly returns x.
Compare fast vs generic deviance log-likelihood evaluation
fast_vs_generic_ll(fit, newdata, reps = 100L, unit = "us")fast_vs_generic_ll(fit, newdata, reps = 100L, unit = "us")
fit |
A |
newdata |
Data frame to evaluate on. |
reps |
Number of repetitions (default 100). |
unit |
microbenchmark unit (default "us"). |
A data.frame with method, median, and relative speed.
A faster variant with fewer bootstraps and smaller subsamples.
fastboost_gamlss( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, B = 30, sample_fraction = 0.6, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, use_groups = TRUE, c0 = 0.5, trace = TRUE, corr_func = "cor", group_fun = SelectBoost::group_func_2, ... )fastboost_gamlss( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, B = 30, sample_fraction = 0.6, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, use_groups = TRUE, c0 = 0.5, trace = TRUE, corr_func = "cor", group_fun = SelectBoost::group_func_2, ... )
formula |
Base formula for the location |
data |
Data frame. |
family |
A |
mu_scope |
Formula of candidate terms for |
sigma_scope, nu_scope, tau_scope
|
Formulas of candidate terms for |
base_sigma, base_nu, base_tau
|
Optional base (always-included) formulas for |
B |
Number of bootstrap subsamples for stability selection. |
sample_fraction |
Fraction of rows per subsample (e.g., 0.7). |
pi_thr |
Selection proportion threshold to define “stable” terms (e.g., 0.6). |
k |
Penalty weight for stepwise GAIC when |
direction |
Stepwise direction for |
pre_standardize |
Logical; standardize numeric predictors before penalized fits. |
use_groups |
Logical; treat SelectBoost correlation groups during resampling. |
c0 |
SelectBoost meta-parameter controlling reweighting/thresholding (see vignette). |
trace |
Logical; print progress messages. |
corr_func |
Correlation function passed to |
group_fun |
Grouping function passed to |
... |
Passed to underlying engines (e.g., to |
Fast SelectBoost (single c0)
An sb_gamlss fit at the given c0.
Get a density function for a gamlss family
get_density_fun(fit)get_density_fun(fit)
fit |
a gamlss fit (or family name) |
function(x, mu, sigma, nu, tau, log=FALSE)
Knockoff filter for mu (approximate group control)
knockoff_filter_mu(data, response, mu_scope, fdr = 0.1, df_smooth = 6L)knockoff_filter_mu(data, response, mu_scope, fdr = 0.1, df_smooth = 6L)
data |
data.frame |
response |
response variable name |
mu_scope |
RHS-only term labels |
fdr |
target FDR level |
df_smooth |
df for smoother proxies (splines::bs) |
character vector of selected term names
Knockoff filter for sigma/nu/tau (approximate group control)
knockoff_filter_param(data, scope, y_work, fdr = 0.1, df_smooth = 6L)knockoff_filter_param(data, scope, y_work, fdr = 0.1, df_smooth = 6L)
data |
data.frame |
scope |
RHS-only term labels |
y_work |
working response (numeric) |
fdr |
target FDR level |
df_smooth |
df for smoother proxies |
character vector of selected term names
Log-likelihood (sum) on newdata given a gamlss fit
loglik_gamlss_newdata(fit, newdata)loglik_gamlss_newdata(fit, newdata)
fit |
gamlss object |
newdata |
data.frame |
numeric scalar: sum of log-likelihoods
Plot selection frequencies for sb_gamlss
plot_sb_gamlss(x, top = Inf, ...)plot_sb_gamlss(x, top = Inf, ...)
x |
A sb_gamlss object |
top |
Show only the top N terms per-parameter (default all) |
... |
Graphical parameters. |
Invisibly returns x the plotted sb_gamlss object.
Plot stability curves p(c0) for selected terms
plot_stability_curves(grid, terms, parameter = NULL, ncol = 2L)plot_stability_curves(grid, terms, parameter = NULL, ncol = 2L)
grid |
An object from |
terms |
Character vector of term names to plot. |
parameter |
Optional parameter name ('mu','sigma','nu','tau'); if NULL, all. |
ncol |
Columns in the multi-panel layout. |
Invisibly returns grid the plotted object.
Two-panel plot: (1) scatter of area_pos vs cover (size by rank), (2) barplot of top-N rank_score.
## S3 method for class 'sb_confidence' plot(x, top = 15, label_top = 10, ...)## S3 method for class 'sb_confidence' plot(x, top = 15, label_top = 10, ...)
x |
An object from |
top |
Show top-N terms in the barplot (default 15). |
label_top |
Integer; number of points to label in the scatter (default 10). |
... |
Graphical parameters passed to plotting backend. |
An invisible copy of x.
Plot selection proportions for a single sb_gamlss
## S3 method for class 'SelectBoost_gamlss' plot(x, ...)## S3 method for class 'SelectBoost_gamlss' plot(x, ...)
x |
A |
... |
Graphical parameters. |
Invisibly returns x the plotted sb_gamlss object.
Plot summary for sb_gamlss_c0_grid
## S3 method for class 'SelectBoost_gamlss_grid' plot(x, top = 15, ...)## S3 method for class 'SelectBoost_gamlss_grid' plot(x, top = 15, ...)
x |
A |
top |
Integer; how many top terms to show in the confidence barplot. |
... |
Ignored (reserved for future). |
An invisible copy of x.
Predict distribution parameters on newdata
predict_params(fit, newdata)predict_params(fit, newdata)
fit |
a gamlss fit |
newdata |
data.frame |
list with available components: mu, sigma, nu, tau
SelectBoost for GAMLSS (stability selection)
sb_gamlss( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, B = 100, sample_fraction = 0.7, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, use_groups = FALSE, c0 = 0.5, engine = c("stepGAIC", "glmnet", "grpreg", "sgl"), engine_sigma = NULL, engine_nu = NULL, engine_tau = NULL, grpreg_penalty = c("grLasso", "grMCP", "grSCAD"), sgl_alpha = 0.95, df_smooth = 6L, progress = TRUE, glmnet_alpha = 1, glmnet_family = c("gaussian", "binomial", "poisson"), parallel = c("none", "auto", "multisession", "multicore"), workers = NULL, trace = TRUE, corr_func = "cor", group_fun = SelectBoost::group_func_2, ... )sb_gamlss( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, B = 100, sample_fraction = 0.7, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, use_groups = FALSE, c0 = 0.5, engine = c("stepGAIC", "glmnet", "grpreg", "sgl"), engine_sigma = NULL, engine_nu = NULL, engine_tau = NULL, grpreg_penalty = c("grLasso", "grMCP", "grSCAD"), sgl_alpha = 0.95, df_smooth = 6L, progress = TRUE, glmnet_alpha = 1, glmnet_family = c("gaussian", "binomial", "poisson"), parallel = c("none", "auto", "multisession", "multicore"), workers = NULL, trace = TRUE, corr_func = "cor", group_fun = SelectBoost::group_func_2, ... )
formula |
Base formula for the location |
data |
Data frame. |
family |
A |
mu_scope |
Formula of candidate terms for |
sigma_scope, nu_scope, tau_scope
|
Formulas of candidate terms for |
base_sigma, base_nu, base_tau
|
Optional base (always-included) formulas for |
B |
Number of bootstrap subsamples for stability selection. |
sample_fraction |
Fraction of rows per subsample (e.g., 0.7). |
pi_thr |
Selection proportion threshold to define “stable” terms (e.g., 0.6). |
k |
Penalty weight for stepwise GAIC when |
direction |
Stepwise direction for |
pre_standardize |
Logical; standardize numeric predictors before penalized fits. |
use_groups |
Logical; treat SelectBoost correlation groups during resampling. |
c0 |
SelectBoost meta-parameter controlling reweighting/thresholding (see vignette). |
engine |
Engine for |
engine_sigma, engine_nu, engine_tau
|
Optional engines for |
grpreg_penalty |
Group penalty for grpreg ( |
sgl_alpha |
Alpha for sparse group lasso. |
df_smooth |
Degrees of freedom for proxy spline bases ( |
progress |
Logical; show a progress bar in sequential runs. |
glmnet_alpha |
Elastic-net mixing for glmnet (1 = lasso, 0 = ridge). |
glmnet_family |
Family passed to glmnet-based selectors ("gaussian", "binomial", "poisson"). |
parallel |
Parallel mode ( |
workers |
Integer; number of workers if parallel. |
trace |
Logical; print progress messages. |
corr_func |
Correlation function passed to |
group_fun |
Grouping function passed to |
... |
Passed to underlying engines (e.g., to |
An object of class "sb_gamlss" with elements:
final_fit: the final gamlss object.
final_formula: list of formulas for mu/sigma/nu/tau.
selection: data.frame of selection counts and proportions.
B, sample_fraction, pi_thr, k.
scaler: list with center, scale, vars, response.
set.seed(1) dat <- data.frame( y = gamlss.dist::rNO(60, mu = 0), x1 = rnorm(60), x2 = rnorm(60), x3 = rnorm(60) ) fit <- sb_gamlss( y ~ 1, data = dat, family = gamlss.dist::NO(), mu_scope = ~ x1 + x2 + gamlss::pb(x3), B = 8, pi_thr = 0.6, trace = FALSE ) fit$final_formulaset.seed(1) dat <- data.frame( y = gamlss.dist::rNO(60, mu = 0), x1 = rnorm(60), x2 = rnorm(60), x3 = rnorm(60) ) fit <- sb_gamlss( y ~ 1, data = dat, family = gamlss.dist::NO(), mu_scope = ~ x1 + x2 + gamlss::pb(x3), B = 8, pi_thr = 0.6, trace = FALSE ) fit$final_formula
Stability curves over a c0 grid for sb_gamlss
sb_gamlss_c0_grid( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, c0_grid = seq(0.1, 0.9, by = 0.1), B = 60, sample_fraction = 0.7, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, trace = TRUE, progress = TRUE, use_groups = TRUE, corr_func = "cor", group_fun = SelectBoost::group_func_2, ... )sb_gamlss_c0_grid( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, c0_grid = seq(0.1, 0.9, by = 0.1), B = 60, sample_fraction = 0.7, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, trace = TRUE, progress = TRUE, use_groups = TRUE, corr_func = "cor", group_fun = SelectBoost::group_func_2, ... )
formula |
Base formula for the location |
data |
Data frame. |
family |
A |
mu_scope |
Formula of candidate terms for |
sigma_scope, nu_scope, tau_scope
|
Formulas of candidate terms for |
base_sigma, base_nu, base_tau
|
Optional base (always-included) formulas for |
c0_grid |
Numeric vector of |
B |
Number of bootstrap subsamples for stability selection. |
sample_fraction |
Fraction of rows per subsample (e.g., 0.7). |
pi_thr |
Selection proportion threshold to define “stable” terms (e.g., 0.6). |
k |
Penalty weight for stepwise GAIC when |
direction |
Stepwise direction for |
pre_standardize |
Logical; standardize numeric predictors before penalized fits. |
trace |
Logical; print progress messages. |
progress |
Logical; show a progress bar across |
use_groups |
Logical; treat SelectBoost correlation groups during resampling. |
corr_func |
Correlation function passed to |
group_fun |
Grouping function passed to |
... |
Passed to underlying engines (e.g., to |
An object of class "SelectBoost_gamlss_grid" containing
results: named list of sb_gamlss fits, names are c0 values
table: data.frame with parameter, term, count, prop, c0
pi_thr: the threshold used
A thin wrapper around sb_gamlss() with SelectBoost-flavored arguments.
SelectBoost_gamlss( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, B = 100, sample_fraction = 0.7, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, use_groups = TRUE, c0 = 0.5, trace = TRUE, ... ) ## S3 method for class 'SelectBoost_gamlss' summary(object, prop.level = 0.6, ...) ## S3 method for class 'summary.SelectBoost_gamlss' plot(x, ...)SelectBoost_gamlss( formula, data, family, mu_scope, sigma_scope = NULL, nu_scope = NULL, tau_scope = NULL, base_sigma = ~1, base_nu = ~1, base_tau = ~1, B = 100, sample_fraction = 0.7, pi_thr = 0.6, k = 2, direction = c("both", "forward", "backward"), pre_standardize = FALSE, use_groups = TRUE, c0 = 0.5, trace = TRUE, ... ) ## S3 method for class 'SelectBoost_gamlss' summary(object, prop.level = 0.6, ...) ## S3 method for class 'summary.SelectBoost_gamlss' plot(x, ...)
formula |
Base formula for the location |
data |
Data frame. |
family |
A |
mu_scope |
Formula of candidate terms for |
sigma_scope, nu_scope, tau_scope
|
Formulas of candidate terms for |
base_sigma, base_nu, base_tau
|
Optional base (always-included) formulas for |
B |
Number of bootstrap subsamples for stability selection. |
sample_fraction |
Fraction of rows per subsample (e.g., 0.7). |
pi_thr |
Selection proportion threshold to define “stable” terms (e.g., 0.6). |
k |
Penalty weight for stepwise GAIC when |
direction |
Stepwise direction for |
pre_standardize |
Logical; standardize numeric predictors before penalized fits. |
use_groups |
Logical; enable SelectBoost grouping. |
c0 |
Correlation threshold for grouping (as in SelectBoost::group_func_2). |
trace |
Logical; print progress messages. |
... |
Not used. |
object |
A a |
prop.level |
A target proportion level. |
x |
A summary of a |
An object of class c("SelectBoost_gamlss"), with slots similar to sb_gamlss.
A list with selection, threshold and confidence.
Invisibly returns x.
Selection table accessor
selection_table(x)selection_table(x)
x |
A sb_gamlss object |
data.frame with parameter, term, count, prop
Evaluates a grid of configurations and picks the one maximizing a stability-based score, optionally penalized by complexity. Designed to be lightweight and robust.
tune_sb_gamlss( config_grid, base_args, score_lambda = 0, B_small = 30, metric = c("stability", "deviance"), K = 3, progress = TRUE )tune_sb_gamlss( config_grid, base_args, score_lambda = 0, B_small = 30, metric = c("stability", "deviance"), K = 3, progress = TRUE )
config_grid |
a list of named lists, each containing a subset of sb_gamlss args (e.g., list(engine="grpreg", engine_sigma="sgl", grpreg_penalty="grLasso", sgl_alpha=0.9)) |
base_args |
a named list of arguments passed to |
score_lambda |
Numeric; complexity penalty weight for stability metric. |
B_small |
number of bootstraps to use during tuning (defaults to 30) |
metric |
Character; |
K |
Integer; folds for deviance CV. |
progress |
Logical; show progress bar across configs. |
a list: best_config, scores (data.frame), and the fitted sb_gamlss object for the best config.