---
title: "Focused Benchmark Workflow"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Focused Benchmark Workflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

The focused benchmark workflow is designed to demonstrate where FDA-aware
SelectBoost improves on plain SelectBoost for functional predictors: localized
dense signals, confounded blocks, high local correlation, and narrow active
regions.

The driver script writes only to an explicit `--output-dir` or, when omitted,
to `tempdir()`. The current named baseline is
`baseline_focused_benchmark_2026`. Its grid definition is stored in
`inst/extdata/benchmarks/config_focused_baseline.yml`, and each run writes a
copy named `benchmark_config_baseline.yml` beside the CSV outputs.

Three run profiles are available:

- `--quick --n-replicates=1` keeps the smoke-test benchmark small.
- `--medium` runs the n = 30 benchmark and writes `benchmark_summary_n30.csv`.
- `--final` runs the n = 50 benchmark and also writes
  `benchmark_summary_n50_or_n100.csv`.

All raw metrics include the replicate index, simulation seed, benchmark seed,
and method metadata. Summary tables include means, standard deviations,
standard errors, and `paired_gain_summary.csv` reports paired `F1` gains for
FDA-aware SelectBoost over plain SelectBoost. The paired comparison is computed
as a replicate-level difference,
`F1_selectboost_fda - F1_plain_selectboost`, for each matched setting. The
driver also writes `paired_gain_bootstrap_ci.csv` with deterministic percentile
bootstrap confidence intervals, win rates, valid paired replicate counts, and
method-failure flags. Long runs also update `progress.tsv` after study,
replicate, simulation, and setting milestones; append
`benchmark_raw_metrics_checkpoint.csv` at the cadence set by
`--checkpoint-every=N` and at each completed replicate; write setting-level
checkpoint files to `checkpoints/benchmark_raw_metrics_settingNNNNNN.csv`;
overwrite `checkpoints/benchmark_raw_metrics_latest.csv` with the latest
checkpointed setting; and write per-replicate raw metrics to
`checkpoints/benchmark_raw_metrics_repNNN.csv`. Each run writes
`run_metadata.yml`, creates `RUNNING` while active, writes `COMPLETED` on
successful completion, and removes `RUNNING` only after success. Use distinct
`--output-dir` values for parallel runs. `--resume` preserves previous
checkpoint files but does not yet skip completed settings. For reproducibility, the driver
uses a recorded deterministic SelectBoost perturbation backend by default. Pass
`--upstream-rfast-rvmf` only when comparing against the upstream `Rfast`
perturbation generator directly.

For assessment interpretation, use `benchmark_best_settings.csv` only together with
`assessment_all_setting_summary.csv`. The driver also writes
`assessment_top_positive_settings.csv`, `assessment_negative_gain_settings.csv`, and
`assessment_failure_modes.csv`, so the report can state both where FDA-aware
grouping helps and where it loses against plain SelectBoost. A short defensible
interpretation is: FDA-aware grouping is most useful in settings with local
correlation and localized signal, but its advantage is not uniform; negative
gain rows and failure-mode labels should be shown alongside top positive gains.

The driver also produces a compact two-parameter perturbation analysis for
representative scenario types. It evaluates selection surfaces
`(q, c0) -> Pi_hat_j(q, c0)` on a smoke-test grid in `--quick` runs and on the
baseline grid `q = 0.5, 0.632, 0.8` crossed with `c0 = 0.9, 0.7, 0.5, 0.3` in
larger runs. The outputs are `assessment_surface_summary.csv`,
`assessment_monotonicity_summary.csv`, `assessment_precision_recall_paths.csv`, and
`assessment_best_thresholds.csv`. Together they support heatmap-like summaries,
monotonicity checks across the two axes, precision-recall paths, the best
threshold by `F1`, and fixed-threshold summaries at `0.5`, `0.75`, and `0.9`.

Association geometry is measured separately from selection performance. The
driver writes `association_diagnostics.csv` with sparsity, mean and median
association, within-block and cross-block mass, local and nonlocal mass, and
effective degree. It also writes `association_group_size_summary.csv` with the
number of induced groups and group-size summaries at each `c0`, plus
`assessment_association_comparison_table.csv` for compact method comparisons. The
diagnostic tables retain the same scenario, representation, size, noise, and
association-setting keys used by the benchmark metrics.

Method comparison is measured separately from the main FDA-versus-plain
SelectBoost contrast. The driver writes `method_comparison_summary.csv`,
`method_comparison_runtime.csv`, and `assessment_method_comparison_table.csv` for
seven labeled methods: `plain_selectboost`, `selectboost_fda_lasso`,
`selectboost_fda_group_lasso`, `selectboost_fda_sparse_group_lasso`,
`stability_lasso`, `stability_group_lasso`, and
`stability_sparse_group_lasso`. These labels keep perturbation type separate
from base selector choice. Optional backends are guarded with package checks:
`glmnet` for lasso, `grpreg` for group lasso, and `SGL` for sparse-group
lasso. Missing packages produce skipped rows in the runtime and assessment tables
rather than aborting the benchmark.

Runtime reporting is available at two granularities. `runtime_by_setting.csv`
keeps one row per main benchmark method and setting, including elapsed time,
user time, system time, warning and failure counts, selected-feature summaries,
and fitted-object memory size where available. `runtime_by_method.csv`
summarizes the same diagnostics by method and runtime source, combining the
main benchmark and method-comparison runs. Failed settings are represented as
failed runtime rows, so they can be counted in assessment tables instead of being
silently dropped.

Size-resolution sensitivity is controlled with comma-separated grids:
`--n-grid=50,100,200` and `--grid-length-grid=30,75,150`. These values are
crossed with the scenario, representation, and SelectBoost setting grids. The
driver writes `benchmark_size_resolution_summary.csv` for recovery metrics by
`n` and `grid_length`, plus `benchmark_runtime_by_size_resolution.csv` for
runtime summaries at the corresponding setting level.

Noise and signal-strength sensitivity are controlled with
`--snr-grid=0.5,1,2,4` and `--noise-sd-grid=0.5,1,2`. The fixed-SNR grid is the
recommended axis for the main method comparison because it keeps relative
difficulty comparable across scenarios, representations, and signal structures.
The fixed-noise grid is useful as a stress test for absolute observation noise.
Both axes are recorded in the raw metrics and summaries through `noise_axis`,
`snr`, `noise_sd`, `effective_snr`, and `effective_variance_snr`. Here `snr`
is a signal-to-noise standard-deviation ratio; `effective_variance_snr` is
`effective_snr^2` for variance-ratio reporting. The driver writes
`benchmark_noise_summary.csv` for performance by noise condition and
`benchmark_noise_f1_gain_panel.csv` for a plot-ready `F1` gain panel.

## Campaign interface

The driver accepts comma-separated grids for assessment and extended benchmark
campaigns:

- `--representation-grid=grid,bspline,fpca` chooses the functional encoding.
- `--scenario-grid=localized_dense,confounded_blocks,smooth_sparse` chooses
  the simulation scenarios.
- `--n-grid=50,100,200` and `--grid-length-grid=30,75,150` choose sample size
  and functional resolution.
- `--snr-grid=0.5,1,2,4` and `--noise-sd-grid=0.5,1,2` choose fixed-SNR and
  fixed-noise sensitivity axes.
- `--q-grid=0.5,0.632,0.8` and `--c0-grid=0.9,0.7,0.5,0.3` choose the
  subsampling and SelectBoost perturbation surface.
- `--association-grid=correlation,neighborhood,hybrid,interval` and
  `--bandwidth-grid=4,8` choose the FDA-aware grouping geometry.
- `--bootstrap-reps=2000` controls the deterministic bootstrap used for paired
  `F1` gain confidence intervals.
- `--checkpoint-every=100` controls setting-level raw-metric checkpoint
  frequency; use `--checkpoint-every=1` when every completed setting should be
  materialized immediately.
- `--resume` preserves existing checkpoint files in an output directory but
  does not yet skip previously completed settings.
- `--surface-use-main-settings` makes surface diagnostics inherit `n`,
  `grid_length`, and noise/SNR from the first representative row of the main
  simulation grid. It does not run surface diagnostics for every main-grid
  setting.

Assessment-oriented summaries, perturbation surfaces, and association diagnostics
are written by default for compatibility with the saved baseline outputs. Use
`--assessment-summary`, `--save-surfaces`, and
`--save-association-diagnostics` to make that choice explicit in command logs.
For lighter local runs, use `--no-save-surfaces` or
`--no-save-association-diagnostics` to skip the heavier optional diagnostics.
The broader `--no-assessment-summary` shortcut disables both of those optional
diagnostic families while keeping the core benchmark and paired-gain tables.

```{r, eval = FALSE}
system2(
  file.path(R.home("bin"), "Rscript"),
  c(
    "tools/run_focused_benchmark.R",
    "--quick",
    "--n-replicates=1",
    "--seed=20260616",
    "--n-grid=50,100",
    "--grid-length-grid=30,75",
    "--snr-grid=0.5,1,2,4",
    "--checkpoint-every=1",
    paste0("--output-dir=", file.path(tempdir(), "selectboost_fda_focused_benchmark"))
  )
)
```

```{r, eval = FALSE}
system2(
  file.path(R.home("bin"), "Rscript"),
  c(
    "tools/run_focused_benchmark.R",
    "--medium",
    "--seed=20260616",
    "--representation-grid=grid,bspline",
    "--scenario-grid=localized_dense,confounded_blocks,smooth_sparse",
    "--n-grid=50,100",
    "--grid-length-grid=30,75",
    "--snr-grid=0.5,1,2,4",
    "--q-grid=0.5,0.632,0.8",
    "--c0-grid=0.9,0.7,0.5,0.3",
    "--association-grid=correlation,neighborhood,hybrid,interval",
    "--bandwidth-grid=4,8",
    "--checkpoint-every=100",
    "--assessment-summary",
    "--save-surfaces",
    "--surface-use-main-settings",
    "--save-association-diagnostics",
    "--bootstrap-reps=2000",
    paste0("--output-dir=", file.path(tempdir(), "selectboost_fda_focused_campaign"))
  )
)
```

```{r, eval = FALSE}
system2(
  file.path(R.home("bin"), "Rscript"),
  c(
    "tools/run_focused_benchmark.R",
    "--medium",
    "--seed=20260616",
    paste0("--output-dir=", file.path(tempdir(), "selectboost_fda_focused_n30"))
  )
)
```

The package also exposes table extractors for report material.

```{r}
library(SelectBoost.FDA)

report_summary_table()
report_method_table()
report_formula_blocks()[c("selection_surface", "precision_recall")]
```

Saved sensitivity-study artifacts shipped with the package can be turned into a
compact benchmark table:

```{r}
report_benchmark_table(top_n = 5)
```

For new benchmark objects, use the renderer-neutral summary extractor:

```{r}
metrics <- data.frame(
  scenario = "localized_dense",
  representation = "grid",
  family = "gaussian",
  method = c("selectboost", "plain_selectboost"),
  level = "feature",
  precision = c(0.8, 0.6),
  recall = c(0.7, 0.6),
  f1 = c(0.746, 0.6),
  jaccard = c(0.59, 0.43),
  selection_rate = c(0.2, 0.3),
  stringsAsFactors = FALSE
)

as_benchmark_summary_data(metrics)
```