This vignette provides a CRAN-friendly tour of the SelectBoost.beta
workflow. It simulates a reproducible beta-regression data set, runs the
high-level sb_beta() driver, and shows how to interpret the
stability matrix returned by the algorithm. All code is self-contained
and executes quickly under the default knitr settings.
We use the built-in simulation_DATA.beta() helper to
generate a correlated design with three truly associated predictors. The
response lives in (0, 1) and is already compatible with the
beta-regression selectors.
sb_beta()The sb_beta() wrapper orchestrates the full SelectBoost
loop: it normalises the design matrix, groups correlated predictors,
regenerates surrogate designs, and records selection frequencies for
each threshold.
The returned matrix has one row per correlation threshold. Attributes attached to the matrix document how the fit was produced:
Use summary() to obtain per-threshold summaries and
autoplot.sb_beta() (when ggplot2 is available)
to visualise the stability matrix.
The frequency values range between 0 and 1 and report how often each
predictor received a non-zero coefficient across the correlated
replicates. High values signal stable selections. If your data contain
zeros or ones, keep squeeze = TRUE (the default) so the
algorithm applies the standard SelectBoost transformation before fitting
the selectors.
When you wish to benchmark multiple selector families, the
compare_selectors_single() helper runs them once on the
same data set and returns both raw coefficients and a tidy summary
table. Column names are briefly shortened internally to satisfy each
selector and then mapped back in the outputs.
Bootstrap tallies add a stability perspective. The freq
column in the table below measures the proportion of resamples where the
variable was selected; values close to 1 indicate consistent
discoveries.
freq <- suppressWarnings(compare_selectors_bootstrap(sim$X, sim$Y, B = 100,
include_enet = FALSE, seed = 99))
head(freq)Merge both views with compare_table() and use
plot_compare_coeff() or plot_compare_freq()
for quick diagnostics.
If your outcome is interval-censored, run the
sb_beta_interval() convenience wrapper. It enables the
interval sampling logic inside sb_beta() while keeping the
same output format and attributes.
y_low <- pmax(sim$Y - 0.05, 0)
y_high <- pmin(sim$Y + 0.05, 1)
interval_fit <- sb_beta_interval(sim$X, y_low, y_high, B = 30,
sample = "uniform", seed = 321)
attr(interval_fit, "interval")The resulting stability matrix can be summarised and visualised exactly like the point-response output shown earlier. ```