Package: bigPLSR 0.7.2

bigPLSR: Partial Least Squares Regression Models with Big Matrices

Fast partial least squares (PLS) for dense and out-of-core data. Provides SIMPLS (straightforward implementation of a statistically inspired modification of the PLS method) and NIPALS (non-linear iterative partial least-squares) solvers, plus kernel-style PLS variants ('kernelpls' and 'widekernelpls') with parity to 'pls'. Optimized for 'bigmemory'-backed matrices with streamed cross-products and chunked BLAS (Basic Linear Algebra Subprograms) (XtX/XtY and XXt/YX), optional file-backed score sinks, and deterministic testing helpers. Includes an auto-selection strategy that chooses between XtX SIMPLS, XXt (wide) SIMPLS, and NIPALS based on (n, p) and a configurable memory budget. About the package, Bertrand and Maumy (2023) <https://hal.science/hal-05352069>, and <https://hal.science/hal-05352061> highlighted fitting and cross-validating PLS regression models to big data. For more details about some of the techniques featured in the package, Dayal and MacGregor (1997) <doi:10.1002/(SICI)1099-128X(199701)11:1%3C73::AID-CEM435%3E3.0.CO;2-%23>, Rosipal & Trejo (2001) <https://www.jmlr.org/papers/v2/rosipal01a.html>, Tenenhaus, Viennet, and Saporta (2007) <doi:10.1016/j.csda.2007.01.004>, Rosipal (2004) <doi:10.1007/978-3-540-45167-9_17>, Rosipal (2019) <https://ieeexplore.ieee.org/document/8616346>, Song, Wang, and Bai (2024) <doi:10.1016/j.chemolab.2024.105238>. Includes kernel logistic PLS with 'C++'-accelerated alternating iteratively reweighted least squares (IRLS) updates, streamed reproducing kernel Hilbert space (RKHS) solvers with reusable centering statistics, and bootstrap diagnostics with graphical summaries for coefficients, scores, and cross-validation workflows, alongside dedicated plotting utilities for individuals, variables, ellipses, and biplots. The streaming backend uses far less memory and keeps memory bounded across data sizes. For PLS1, streaming is often fast enough while preserving a small memory footprint; for PLS2 it remains competitive with a bounded footprint. On small problems that fit comfortably in RAM (random-access memory), dense in-memory solvers are slightly faster; the crossover occurs as n or p grow and the Gram/cross-product cost dominates.

Authors:Frederic Bertrand [cre, aut], Myriam Maumy [aut]

bigPLSR_0.7.2.tar.gz
bigPLSR_0.7.2.zip(r-4.7)bigPLSR_0.7.2.zip(r-4.6)bigPLSR_0.7.2.zip(r-4.5)
bigPLSR_0.7.2.tgz(r-4.6-x86_64)bigPLSR_0.7.2.tgz(r-4.6-arm64)bigPLSR_0.7.2.tgz(r-4.5-x86_64)bigPLSR_0.7.2.tgz(r-4.5-arm64)
bigPLSR_0.7.2.tar.gz(r-4.7-arm64)bigPLSR_0.7.2.tar.gz(r-4.7-x86_64)bigPLSR_0.7.2.tar.gz(r-4.6-arm64)bigPLSR_0.7.2.tar.gz(r-4.6-x86_64)
bigPLSR_0.7.2.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
bigPLSR/json (API)

# Install 'bigPLSR' in R:
install.packages('bigPLSR', repos = c('https://fbertran.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/fbertran/bigplsr/issues

Pkgdown/docs site:https://fbertran.github.io

Uses libs:
  • openblas– Optimized BLAS
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

Conda:

openblascpp

5.75 score 1 stars 16 scripts 142 downloads 22 exports 6 dependencies

Last updated from:2033eb023b. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK301
linux-devel-x86_64OK243
source / vignettesOK376
linux-release-arm64OK256
linux-release-x86_64OK260
macos-release-arm64OK152
macos-release-x86_64OK327
macos-oldrel-arm64OK167
macos-oldrel-x86_64OK438
windows-develOK331
windows-releaseOK275
windows-oldrelOK294
wasm-releaseOK215

Exports:.finalize_pls_fitbigPLSR_stream_kstatskf_pls_state_fitkf_pls_state_newkf_pls_state_updateplot_pls_biplotplot_pls_bootstrap_coefficientsplot_pls_bootstrap_scoresplot_pls_individualsplot_pls_variablesplot_pls_vippls_bootstrappls_cross_validatepls_cv_selectpls_fitpls_information_criteriapls_predict_responsepls_predict_scorespls_select_componentspls_thresholdpls_vipsummarise_pls_bootstrap

Dependencies:BHbigmemorybigmemory.sriRcppRcppArmadillouuid

Visualising PLS Fits with bigPLSR
Example data | Score plots with ellipses | Variable correlations and biplots | Bootstrap summaries

Last update: 2025-11-18
Started: 2025-11-10

Automatic Algorithm Selection in bigPLSR
When does each win? | Sanity check | Overview | Why these choices? | The decision rule | Configuring the memory budget | Reproducibility knobs | Examples | References | Appendix: streaming Gram math

Last update: 2025-11-18
Started: 2025-11-08

Benchmarking bigPLSR against external PLS implementations
Overview | Benchmark design | Helper summaries | Example: PLS1, fixed size, varying components | Example: PLS2, fixed size, varying components | Short commentary

Last update: 2025-11-18
Started: 2025-11-18

Benchmarking PLS1 Implementations
Overview | Simulated data | Internal benchmarks | External references | Takeaways

Last update: 2025-11-18
Started: 2025-11-04

Benchmarking PLS2 Implementations
Overview | Recent additions | Simulated data | Internal benchmarks | External references | Key messages

Last update: 2025-11-18
Started: 2025-11-04

Bootstrap strategies for bigPLSR
Introduction | Baseline fit | (X, Y) bootstrap | (X, T) bootstrap | Exploring bootstrap scores | Parallel execution | Conclusion

Last update: 2025-11-18
Started: 2025-11-10

Cross-validation and Information Criteria in bigPLSR
Overview | Cross-validation | Information criteria | Parallel execution with future | Summary

Last update: 2025-11-18
Started: 2025-11-10

Double RKHS PLS (rkhs_xy): Theory and Usage
Overview | Operator and Latent Directions | Centering for Prediction | Practical Notes | Minimal Example

Last update: 2025-11-18
Started: 2025-11-09

External PLS benchmarks for bigPLSR: detailed analysis
Introduction | Benchmark design and data structure | PLS1: dense versus streaming | Fixed size, varying number of components | Relative speed and memory ratios | PLS2: multiple responses | Influence of the number of responses | Kernel and wide kernel PLS | Discussion and practical guidance

Last update: 2025-11-18
Started: 2025-11-18

Kernel and Streaming PLS Methods in bigPLSR
Notation | Pseudo-code for bigPLSR algorithms | SIMPLS (dense/bigmem) | NIPALS (dense/streamed) | Kernel PLS / RKHS (dense & streamed) | Double RKHS ( algorithm = "rkhs_xy" ) | Kalman-filter PLS (algorithm = "kf_pls") | Centering the Gram matrix | KLPLS / Kernel PLS (Dayal & MacGregor) | Streaming Gram blocks (column- and row-chunked) | Kernel approximations: Nyström and Random Fourier Features | Kernel Logistic PLS (binary classification) | Sparse Kernel PLS (sketch) | PLS in RKHS for X and Y (double RKHS) | Kalman-Filter PLS (KF-PLS; streaming) | API quick start | Prediction in RKHS PLS | Dependency overview (wrappers → C++ entry points) | References

Last update: 2025-11-18
Started: 2025-11-08

Kernel Logistic PLS
Kernel Logistic PLS (klogitpls)

Last update: 2025-11-18
Started: 2025-11-10

KF-PLS: Streaming PLS with Kalman-style updates
Idea | API | Notes

Last update: 2025-11-18
Started: 2025-11-10

RKHS-based Algorithms in bigPLSR
Overview | Dense example | Streaming example | Logistic response

Last update: 2025-11-18
Started: 2025-11-10

Streaming Kernel PLS in bigPLSR: XX^T and Column-Chunked Variants
Overview | Math sketch | APIs | When to prefer each variant | References

Last update: 2025-11-18
Started: 2025-11-08

Readme and manuals

Help Manual

Help pageTopics
bigPLSR-packagebigPLSR-package bigPLSR
Finalize pls objects.finalize_pls_fit
Streamed centering statistics for RKHS kernelsbigPLSR_stream_kstats
Fast IRLS for binomial logit with class weightscpp_irls_binomial
Internal kernel and wide-kernel PLS solvercpp_kernel_pls
Benchmark results against external PLS implementationsexternal_pls_benchmarks
Finalize a KF-PLS state into a fitted modelkf_pls_state_fit
KF-PLS streaming state (constructor)kf_pls_state_new
Update a KF-PLS streaming state with a mini-batchkf_pls_state_update
PLS biplotplot_pls_biplot
Boxplots of bootstrap coefficient distributionsplot_pls_bootstrap_coefficients
Boxplots of bootstrap score distributionsplot_pls_bootstrap_scores
Plot individual scoresplot_pls_individuals
Plot variable loadingsplot_pls_variables
Plot Variable Importance in Projection (VIP)plot_pls_vip
Bootstrap a PLS modelpls_bootstrap
Cross-validate PLS modelspls_cross_validate
Select components from cross-validation resultspls_cv_select
Unified PLS fit with auto backend and selectable algorithmpls_fit
Compute information criteria for component selectionpls_information_criteria
Predict responses from a PLS fitpls_predict_response
Predict latent scores from a PLS fitpls_predict_scores
Component selection via information criteriapls_select_components
Naive sparsity control by coefficient thresholdingpls_threshold
Variable importance in projection (VIP) scorespls_vip
Predict method for big_plsr objectspredict.big_plsr
Print a 'summary.big_plsr' objectprint.summary.big_plsr
Summarise bootstrap estimatessummarise_pls_bootstrap
Summarize a 'big_plsr' modelsummary.big_plsr