Title: | Degrees of Freedom and Statistical Inference for Partial Least Squares Regression |
---|---|
Description: | The plsdof package provides Degrees of Freedom estimates for Partial Least Squares (PLS) Regression. Model selection for PLS is based on various information criteria (aic, bic, gmdl) or on cross-validation. Estimates for the mean and covariance of the PLS regression coefficients are available. They allow the construction of approximate confidence intervals and the application of test procedures (Kramer and Sugiyama 2012 <doi:10.1198/jasa.2011.tm10107>). Further, cross-validation procedures for Ridge Regression and Principal Components Regression are available. |
Authors: | Nicole Kraemer, Mikio L. Braun |
Maintainer: | Frederic Bertrand <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.3-2 |
Built: | 2025-01-17 04:52:15 UTC |
Source: | https://github.com/fbertran/plsdof |
The plsdof package provides Degrees of Freedom estimates for Partial Least Squares (PLS) Regression.
Model selection for PLS is based on various information criteria (aic, bic, gmdl) or on cross-validation. Estimates for the mean and covariance of the PLS regression coefficients are available. They allow the construction of approximate confidence intervals and the application of test procedures.
Further, cross-validation procedures for Ridge Regression and Principal Components Regression are available.
Package: | plsdof |
Type: | Package |
Version: | 0.2-9 |
Date: | 2019-31-01 |
License: | GPL (>=2) |
LazyLoad: | yes |
Nicole Kraemer, Mikio L. Braun
Maintainer: Frederic Bertrand <[email protected]>
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
# Boston Housing data data(Boston) X<-as.matrix(Boston[,-14]) y<-as.vector(Boston[,14]) # compute PLS coefficients for the first 5 components and plot Degrees of Freedom my.pls1<-pls.model(X,y,m=5,compute.DoF=TRUE) plot(0:5,my.pls1$DoF,pch="*",cex=3,xlab="components",ylab="DoF",ylim=c(0,14)) # add naive estimate lines(0:5,1:6,lwd=3) # model selection with the Bayesian Information criterion mypls2<-pls.ic(X,y,criterion="bic") # model selection based on cross-validation. # returns the estimated covariance matrix of the regression coefficients mypls3<-pls.cv(X,y,compute.covariance=TRUE) my.vcov<-vcov(mypls3) my.sd<-sqrt(diag(my.vcov)) # standard deviation of the regression coefficients
# Boston Housing data data(Boston) X<-as.matrix(Boston[,-14]) y<-as.vector(Boston[,14]) # compute PLS coefficients for the first 5 components and plot Degrees of Freedom my.pls1<-pls.model(X,y,m=5,compute.DoF=TRUE) plot(0:5,my.pls1$DoF,pch="*",cex=3,xlab="components",ylab="DoF",ylim=c(0,14)) # add naive estimate lines(0:5,1:6,lwd=3) # model selection with the Bayesian Information criterion mypls2<-pls.ic(X,y,criterion="bic") # model selection based on cross-validation. # returns the estimated covariance matrix of the regression coefficients mypls3<-pls.cv(X,y,compute.covariance=TRUE) my.vcov<-vcov(mypls3) my.sd<-sqrt(diag(my.vcov)) # standard deviation of the regression coefficients
This function computes the test error over several runs for different model selection strategies.
benchmark.pls( X, y, m = ncol(X), R = 20, ratio = 0.8, verbose = TRUE, k = 10, ratio.samples = 1, use.kernel = FALSE, criterion = "bic", true.coefficients = NULL )
benchmark.pls( X, y, m = ncol(X), R = 20, ratio = 0.8, verbose = TRUE, k = 10, ratio.samples = 1, use.kernel = FALSE, criterion = "bic", true.coefficients = NULL )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
m |
maximal number of Partial Least Squares components. Default is
|
R |
number of runs. Default is 20. |
ratio |
ratio no of training examples/(no of training examples + no of test examples). Default is 0.8 |
verbose |
If |
k |
number of cross-validation splits. Default is 10. |
ratio.samples |
Ratio of (no of training examples + no of test
examples)/ |
use.kernel |
Use kernel representation? Default is
|
criterion |
Choice of the model selection criterion. One of the three options aic, bic, gmdl. Default is "bic". |
true.coefficients |
The vector of true regression coefficients (without
intercept), if available. Default is |
The function estimates the optimal number of PLS components based on four different criteria: (1) cross-validation, (2) information criteria with the naive Degrees of Freedom DoF(m)=m+1, (3) information criteri with the Degrees of Freedom computed via a Lanczos represenation of PLS and (4) information criteri with the Degrees of Freedom computed via a Krylov represenation of PLS. Note that the latter two options only differ with respect to the estimation of the model error.
In addition, the function computes the test error of the "zero model", i.e.
mean(y)
on the training data is used for prediction.
If true.coefficients
are available, the function also computes the
model error for the different methods, i.e. the sum of squared differences
between the true and the estimated regression coefficients.
MSE |
data frame of size R x 5. It contains the test error for the five different methods for each of the R runs. |
M |
data frame of size R x 5. It contains the optimal number of components for the five different methods for each of the R runs. |
DoF |
data frame of size R x
5. It contains the Degrees of Freedom (corresponding to |
TIME |
data frame of size R x 4. It contains the runtime for all methods (apart from the zero model) for each of the R runs. |
M.CRASH |
data frame of size R x 2. It contains the number of components for which the Krylov representation and the Lanczos representation return negative Degrees of Freedom, hereby indicating numerical problems. |
ME |
if |
SIGMAHAT |
data frame of size R x 5. It contains the estimation of the noise level provided by the five different methods for each of the R runs. |
Nicole Kraemer
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
# generate artificial data n<-50 # number of examples p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) true.coefficients<-runif(p,1,3) y<-X%*%true.coefficients + rnorm(n,0,5) my.benchmark<-benchmark.pls(X,y,R=10,true.coefficients=true.coefficients)
# generate artificial data n<-50 # number of examples p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) true.coefficients<-runif(p,1,3) y<-X%*%true.coefficients + rnorm(n,0,5) my.benchmark<-benchmark.pls(X,y,R=10,true.coefficients=true.coefficients)
This function computes the test error over several runs for (a) PLS, (b) PCR
(c) Ridge Regression and (d) the null model, that is the mean of y
.
In the first three cases, the optimal model is selected via
cross-validation.
benchmark.regression( X, y, m = ncol(X), R = 20, ratio = 0.8, verbose = TRUE, k = 10, nsamples = nrow(X), use.kernel = FALSE, supervised = FALSE )
benchmark.regression( X, y, m = ncol(X), R = 20, ratio = 0.8, verbose = TRUE, k = 10, nsamples = nrow(X), use.kernel = FALSE, supervised = FALSE )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
m |
maximal number of components for PLS. Default is |
R |
number of runs. Default is 20. |
ratio |
ratio no of training examples/(no of training examples + no of test examples). Default is 0.8 |
verbose |
If |
k |
number of cross-validation splits. Default is 10. |
nsamples |
number of data points. Default is |
use.kernel |
Use kernel representation for PLS? Default is
|
supervised |
Should the principal components be sorted by decreasing squared correlation to the response? Default is FALSE. |
The function computes the test error, the cross-validation-optimal model parameters, their corresponding Degrees of Freedom, and the sum-of-squared-residuals (SSR) for PLS and PCR.
MSE |
data frame of size R x 4. It contains the test error for the four different methods for each of the R runs. |
M |
data frame of size R x 4. It contains the optimal model parameters for the four different methods for each of the R runs. |
DoF |
data frame of size R x 4. It
contains the Degrees of Freedom (corresponding to |
res.pls |
matrix of size R x (ncol(X+1)). It contains the SSR for PLS for each of the R runs. |
res.pcr |
matrix of size R x (ncol(X+1)). It contains the SSR for PCR for each of the R runs. |
DoF.all |
matrix of size R x (ncol(X+1)). It contains the Degrees of Freedom for PLS for all components for each of the R runs. |
Nicole Kraemer
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
# Boston Housing data library(MASS) data(Boston) X<-as.matrix(Boston[,1:4]) # select the first 3 columns as predictor variables y<-as.vector(Boston[,14]) my.benchmark<-benchmark.regression(X,y,ratio=0.5,R=10,k=5) # boxplot of the mean squared error boxplot(my.benchmark$MSE,outline=FALSE) # boxplot of the degrees of freedom, without the null model boxplot(my.benchmark$DoF[,-4])
# Boston Housing data library(MASS) data(Boston) X<-as.matrix(Boston[,1:4]) # select the first 3 columns as predictor variables y<-as.vector(Boston[,14]) my.benchmark<-benchmark.regression(X,y,ratio=0.5,R=10,k=5) # boxplot of the mean squared error boxplot(my.benchmark$MSE,outline=FALSE) # boxplot of the degrees of freedom, without the null model boxplot(my.benchmark$DoF[,-4])
This function returns the regression coefficients of a plsdof-object.
## S3 method for class 'plsdof' coef(object, ...)
## S3 method for class 'plsdof' coef(object, ...)
object |
an object of class "plsdof" that is returned by the functions
|
... |
additional parameters |
The function returns the regression coefficients (without intercept) for the optimal number of components.
regression coefficients.
Nicole Kraemer
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
vcov.plsdof
, pls.model
,
pls.ic
, pls.cv
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) pls.object<-pls.ic(X,y,criterion="bic") mycoef<-coef(pls.object)
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) pls.object<-pls.ic(X,y,criterion="bic") mycoef<-coef(pls.object)
This function computes the lower bound for the the Degrees of Freedom of PLS with 1 component.
compute.lower.bound(X)
compute.lower.bound(X)
X |
matrix of predictor observations. |
If the decay of the eigenvalues of cor(X)
is not too fast, we can
lower-bound the Degrees of Freedom of PLS with 1 component. Note that we
implicitly assume that we use scaled predictor variables to compute the PLS
solution.
bound |
logical. bound is |
lower.bound |
if bound is TRUE, this is the lower bound, otherwise, it is set to -1 |
Nicole Kraemer
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
# Boston Housing data library(MASS) data(Boston) X<-Boston[,-14] my.lower<-compute.lower.bound(X)
# Boston Housing data library(MASS) data(Boston) X<-Boston[,-14] my.lower<-compute.lower.bound(X)
This function computes the derivative of the function
with respect to y.
dA(w, A, dw)
dA(w, A, dw)
w |
vector of length n. |
A |
square matrix that defines the norm |
dw |
derivative of w with respect to y. As y is a vector of length n, the derivative is a matrix of size nxn. |
The first derivative of the normalization operator is
the Jacobian matrix of the normalization function. This is a matrix of size nxn.
Nicole Kraemer
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
w<-rnorm(15) dw<-diag(15) A<-diag(1:15) d.object<-dA(w,A,dw)
w<-rnorm(15) dw<-diag(15) A<-diag(1:15) d.object<-dA(w,A,dw)
This function computes the derivative of the function
with respect to y.
dnormalize(v, dv)
dnormalize(v, dv)
v |
vector of length n. |
dv |
derivative of v with respect to y. As y is a vector of length n, the derivative is a matrix of size nxn. |
The first derivative of the normalization operator is
the Jacobian matrix of the normalization function. This is a matrix of size nxn.
Nicole Kraemer, Mikio L. Braun
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
v<-rnorm(15) dv<-diag(15) d.object<-dnormalize(v,dv)
v<-rnorm(15) dv<-diag(15) d.object<-dnormalize(v,dv)
This function computes the first derivative of the projection operator
dvvtz(v, z, dv, dz)
dvvtz(v, z, dv, dz)
v |
orthonormal basis of the space on which |
z |
vector that is projected onto the columns of |
dv |
first derivative of the the columns of |
dz |
first derivative of |
For the computation of the first derivative, we assume that the columns of
v
are normalized and mutually orthogonal. (Note that the function
will not return an error message if these assumptionsa are not fulfilled. If
we denote the columns of v
by , the first
derivative of the projection operator is
Here, n denotes the length of the vectors .
The first derivative of the projection operator with respect to y.
This is a matrix of dimension nrow(v)
xlength(y)
.
This is an internal function.
Nicole Kraemer, Mikio L. Braun
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association. 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
This function computes the index of the first local minimum.
first.local.minimum(x)
first.local.minimum(x)
x |
vector. |
the index of the first local minimum of x
.
Nicole Kraemer
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association. ahead of print 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
v<-rnorm(30) out<-first.local.minimum(v)
v<-rnorm(30) out<-first.local.minimum(v)
This function computes the optimal model parameters using three different model selection criteria (aic, bic, gmdl).
information.criteria(RSS, DoF, yhat = NULL, sigmahat, n, criterion = "bic")
information.criteria(RSS, DoF, yhat = NULL, sigmahat, n, criterion = "bic")
RSS |
vector of residual sum of squares. |
DoF |
vector of Degrees of Freedom. The length of |
yhat |
vector of squared norm of yhat. The length of |
sigmahat |
Estimated model error. The length of |
n |
number of observations. |
criterion |
one of the options "aic", "bic" and "gmdl". |
The Akaike information criterion (aic) is defined as
The Bayesian information criterion (bic) is defined as
The generalized minimum description length (gmdl) is defined as
with
Note that it is also possible to use the function
information.criteria
for other regression methods than Partial Least
Squares.
DoF |
degrees of freedom |
score |
vector of the model selection criterion |
par |
index of the first local minimum of
|
Nicole Kraemer, Mikio Braun
Akaikie, H. (1973) "Information Theory and an Extension of the Maximum Likelihood Principle". Second International Symposium on Information Theory, 267 - 281.
Hansen, M., Yu, B. (2001). "Model Selection and Minimum Descripion Length Principle". Journal of the American Statistical Association, 96, 746 - 774
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
Schwartz, G. (1979) "Estimating the Dimension of a Model" Annals of Statistics 26(5), 1651 - 1686.
## This is an internal function called by pls.ic
## This is an internal function called by pls.ic
This function computes the Partial Least Squares fit. This algorithm scales mainly in the number of observations.
kernel.pls.fit( X, y, m = ncol(X), compute.jacobian = FALSE, DoF.max = min(ncol(X) + 1, nrow(X) - 1) )
kernel.pls.fit( X, y, m = ncol(X), compute.jacobian = FALSE, DoF.max = min(ncol(X) + 1, nrow(X) - 1) )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
m |
maximal number of Partial Least Squares components. Default is
|
compute.jacobian |
Should the first derivative of the regression
coefficients be computed as well? Default is |
DoF.max |
upper bound on the Degrees of Freedom. Default is
|
We first standardize X
to zero mean and unit variance.
coefficients |
matrix of regression coefficients |
intercept |
vector of regression intercepts |
DoF |
Degrees of Freedom |
sigmahat |
vector of estimated model error |
Yhat |
matrix of fitted values |
yhat |
vector of squared length of fitted values |
RSS |
vector of residual sum of error |
covariance |
|
TT |
matrix of normalized PLS components |
Nicole Kraemer, Mikio L. Braun
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
linear.pls.fit
,
pls.cv
,pls.model
, pls.ic
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) pls.object<-kernel.pls.fit(X,y,m=5,compute.jacobian=TRUE)
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) pls.object<-kernel.pls.fit(X,y,m=5,compute.jacobian=TRUE)
This function computes the Krylov sequence of a matrix and a vector.
krylov(A, b, m)
krylov(A, b, m)
A |
square matrix of dimension p x p. |
b |
vector of length p |
m |
length of the Krylov sequence |
A matrix of size p x m containing the sequence b,Ab,..., A^(m-1)b.
Nicole Kraemer
A<-matrix(rnorm(8*8),ncol=8) b<-rnorm(8) K<-krylov(A,b,4)
A<-matrix(rnorm(8*8),ncol=8) b<-rnorm(8) K<-krylov(A,b,4)
This function computes the Partial Least Squares solution and the first derivative of the regression coefficients. This implementation scales mostly in the number of variables
linear.pls.fit( X, y, m = ncol(X), compute.jacobian = FALSE, DoF.max = min(ncol(X) + 1, nrow(X) - 1) )
linear.pls.fit( X, y, m = ncol(X), compute.jacobian = FALSE, DoF.max = min(ncol(X) + 1, nrow(X) - 1) )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
m |
maximal number of Partial Least Squares components. Default is
|
compute.jacobian |
Should the first derivative of the regression
coefficients be computed as well? Default is |
DoF.max |
upper bound on the Degrees of Freedom. Default is
|
We first standardize X
to zero mean and unit variance.
coefficients |
matrix of regression coefficients |
intercept |
vector of regression intercepts |
DoF |
Degrees of Freedom |
sigmahat |
vector of estimated model error |
Yhat |
matrix of fitted values |
yhat |
vector of squared length of fitted values |
RSS |
vector of residual sum of error |
covariance
if
compute.jacobian
is TRUE
, the function returns the array of
covariance matrices for the PLS regression coefficients.
TT |
matrix of normalized PLS components |
Nicole Kraemer
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
kernel.pls.fit
,
pls.cv
,pls.model
, pls.ic
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) pls.object<-linear.pls.fit(X,y,m=5,compute.jacobian=TRUE)
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) pls.object<-linear.pls.fit(X,y,m=5,compute.jacobian=TRUE)
Normalization of vectors.
normalize(v, w = NULL)
normalize(v, w = NULL)
v |
vector |
w |
optional vector |
The vector v
is normalized to length 1. If w
is given, it is
normalized by the length of v
.
v |
normalized |
w |
normalized |
Nicole Kraemer, Mikio L. Braun
v<-rnorm(5) w<-rnorm(10) dummy<-normalize(v,w)
v<-rnorm(5) w<-rnorm(10) dummy<-normalize(v,w)
This function computes the Principal Components Regression (PCR) fit.
pcr( X, y, scale = TRUE, m = min(ncol(X), nrow(X) - 1), eps = 1e-06, supervised = FALSE )
pcr( X, y, scale = TRUE, m = min(ncol(X), nrow(X) - 1), eps = 1e-06, supervised = FALSE )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
scale |
Should the predictor variables be scaled to unit variance?
Default is |
m |
maximal number of principal components. Default is
|
eps |
precision. Eigenvalues of the correlation matrix of |
supervised |
Should the principal components be sorted by decreasing squared correlation to the response? Default is FALSE. |
The function first scales all predictor variables to unit variance, and then
computes the PCR fit for all components. Is supervised=TRUE
, we sort
the principal correlation according to the squared correlation to the
response.
coefficients |
matrix of regression coefficients, including the
coefficients of the null model, i.e. the constant model |
intercept |
vector of intercepts, including the intercept of the null
model, i.e. the constant model |
Nicole Kraemer
n<-50 # number of observations p<-15 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) my.pcr<-pcr(X,y,m=10)
n<-50 # number of observations p<-15 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) my.pcr<-pcr(X,y,m=10)
This function computes the optimal model parameter using cross-validation. Mdel selection is based on mean squared error and correlation to the response, respectively.
pcr.cv( X, y, k = 10, m = min(ncol(X), nrow(X) - 1), groups = NULL, scale = TRUE, eps = 1e-06, plot.it = FALSE, compute.jackknife = TRUE, method.cor = "pearson", supervised = FALSE )
pcr.cv( X, y, k = 10, m = min(ncol(X), nrow(X) - 1), groups = NULL, scale = TRUE, eps = 1e-06, plot.it = FALSE, compute.jackknife = TRUE, method.cor = "pearson", supervised = FALSE )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
k |
number of cross-validation splits. Default is 10. |
m |
maximal number of principal components. Default is
|
groups |
an optional vector with the same length as |
scale |
Should the predictor variables be scaled to unit variance?
Default is |
eps |
precision. Eigenvalues of the correlation matrix of |
plot.it |
Logical. If |
compute.jackknife |
Logical. If |
method.cor |
How should the correlation to the response be computed? Default is ”pearson”. |
supervised |
Should the principal components be sorted by decreasing squared correlation to the response? Default is FALSE. |
The function computes the principal components on the scaled predictors.
Based on the regression coefficients coefficients.jackknife
computed
on the cross-validation splits, we can estimate their mean and their
variance using the jackknife. We remark that under a fixed design and the
assumption of normally distributed y
-values, we can also derive the
true distribution of the regression coefficients.
cv.error.matrix |
matrix of cross-validated errors based on mean squared error. A row corresponds to one cross-validation split. |
cv.error |
vector of cross-validated errors based on mean squared error |
m.opt |
optimal number of components based on mean squared error |
intercept |
intercept of the optimal model, based on mean squared error |
coefficients |
vector of regression coefficients of the optimal model, based on mean squared error |
cor.error.matrix |
matrix of cross-validated errors based on correlation. A row corresponds to one cross-validation split. |
cor.error |
vector of cross-validated errors based on correlation |
m.opt.cor |
optimal number of components based on correlation |
intercept.cor |
intercept of the optimal model, based on correlation |
coefficients.cor |
vector of regression coefficients of the optimal model, based on correlation |
coefficients.jackknife |
Array of the regression coefficients on each
of the cross-validation splits, if |
Nicole Kraemer, Mikio L. Braun
n<-500 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) # compute PCR pcr.object<-pcr.cv(X,y,scale=FALSE,m=3) pcr.object1<-pcr.cv(X,y,groups=sample(c(1,2,3),n,replace=TRUE),m=3)
n<-500 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) # compute PCR pcr.object<-pcr.cv(X,y,scale=FALSE,m=3) pcr.object1<-pcr.cv(X,y,groups=sample(c(1,2,3),n,replace=TRUE),m=3)
This function computes the optimal model parameter using cross-validation.
pls.cv( X, y, k = 10, groups = NULL, m = ncol(X), use.kernel = FALSE, compute.covariance = FALSE, method.cor = "pearson" )
pls.cv( X, y, k = 10, groups = NULL, m = ncol(X), use.kernel = FALSE, compute.covariance = FALSE, method.cor = "pearson" )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
k |
number of cross-validation splits. Default is 10. |
groups |
an optional vector with the same length as |
m |
maximal number of Partial Least Squares components. Default is
|
use.kernel |
Use kernel representation? Default is
|
compute.covariance |
If |
method.cor |
How should the correlation to the response be computed? Default is ”pearson”. |
The data are centered and scaled to unit variance prior to the PLS
algorithm. It is possible to estimate the covariance matrix of the
cv-optimal regression coefficients (compute.covariance=TRUE
).
Currently, this is only implemented if use.kernel=FALSE
.
cv.error.matrix |
matrix of cross-validated errors based on mean squared error. A row corresponds to one cross-validation split. |
cv.error |
vector of cross-validated errors based on mean squared error |
m.opt |
optimal number of components based on mean squared error |
intercept |
intercept of the optimal model, based on mean squared error |
coefficients |
vector of regression coefficients of the optimal model, based on mean squared error |
cor.error.matrix |
matrix of cross-validated errors based on correlation. A row corresponds to one cross-validation split. |
cor.error |
vector of cross-validated errors based on correlation |
m.opt.cor |
optimal number of components based on correlation |
intercept.cor |
intercept of the optimal model, based on correlation |
coefficients.cor |
vector of regression coefficients of the optimal model, based on mean squared error |
covariance |
If
|
Nicole Kraemer, Mikio L. Braun
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) # compute linear PLS pls.object<-pls.cv(X,y,m=ncol(X)) # define random partioning groups<-sample(c("a","b","c"),n,replace=TRUE) pls.object1<-pls.cv(X,y,groups=groups)
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) # compute linear PLS pls.object<-pls.cv(X,y,m=ncol(X)) # define random partioning groups<-sample(c("a","b","c"),n,replace=TRUE) pls.object1<-pls.cv(X,y,groups=groups)
This function computes the Degrees of Freedom using the Krylov representation of PLS.
pls.dof(pls.object, n, y, K, m, DoF.max)
pls.dof(pls.object, n, y, K, m, DoF.max)
pls.object |
object returned by |
n |
number of observations |
y |
vector of response observations. |
K |
kernel matrix X X^t. |
m |
number of components |
DoF.max |
upper bound on the Degrees of Freedom. |
This computation of the Degrees of Freedom is based on the equivalence of
PLS regression and the projection of the response vector y
onto the
Krylov space spanned by
Details can be found in Kraemer and Sugiyama (2011).
coefficients |
matrix of regression coefficients |
intercept |
vector of regression intercepts |
DoF |
Degrees of Freedom |
sigmahat |
vector of estimated model error |
Yhat |
matrix of fitted values |
yhat |
vector of squared length of fitted values |
RSS |
vector of residual sum of error |
TT |
matrix of normalized PLS components |
Nicole Kraemer, Mikio L. Braun
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Sugiyama M., Braun, M.L. (2009) "Lanczos Approximations for the Speedup of Kernel Partial Least Squares Regression." Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), p. 272-279
# this is an internal function
# this is an internal function
This function computes the optimal model parameters using one of three different model selection criteria (aic, bic, gmdl) and based on two different Degrees of Freedom estimates for PLS.
pls.ic( X, y, m = min(ncol(X), nrow(X) - 1), criterion = "bic", naive = FALSE, use.kernel = FALSE, compute.jacobian = FALSE, verbose = TRUE )
pls.ic( X, y, m = min(ncol(X), nrow(X) - 1), criterion = "bic", naive = FALSE, use.kernel = FALSE, compute.jacobian = FALSE, verbose = TRUE )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
m |
maximal number of Partial Least Squares components. Default is
|
criterion |
Choice of the model selection criterion. One of the three options aic, bic, gmdl. |
naive |
Use the naive estimate for the Degrees of Freedom? Default is
|
use.kernel |
Use kernel representation? Default is
|
compute.jacobian |
Should the first derivative of the regression
coefficients be computed as well? Default is |
verbose |
If |
There are two options to estimate the Degrees of Freedom of PLS:
naive=TRUE
defines the Degrees of Freedom as the number of components
+1, and naive=FALSE
uses the generalized notion of Degrees of
Freedom. If compute.jacobian=TRUE
, the function uses the Lanczos
decomposition to derive the Degrees of Freedom, otherwise, it uses the
Krylov representation. (See Kraemer and Sugiyama (2011) for details.) The
latter two methods only differ with respect to the estimation of the noise
level.
The function returns an object of class "plsdof".
DoF |
Degrees of Freedom |
m.opt |
optimal number of components |
sigmahat |
vector of estimated model errors |
intercept |
intercept |
coefficients |
vector of regression coefficients |
covariance |
if |
m.crash |
the number of components for which the algorithm returns negative Degrees of Freedom |
Nicole Kraemer, Mikio L. Braun
Akaikie, H. (1973) "Information Theory and an Extension of the Maximum Likelihood Principle". Second International Symposium on Information Theory, 267 - 281.
Hansen, M., Yu, B. (2001). "Model Selection and Minimum Descripion Length Principle". Journal of the American Statistical Association, 96, 746 - 774
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Braun, M.L. (2007) "Kernelizing PLS, Degrees of Freedom, and Efficient Model Selection", Proceedings of the 24th International Conference on Machine Learning, Omni Press, 441 - 448
Schwartz, G. (1979) "Estimating the Dimension of a Model" Annals of Statistics 26(5), 1651 - 1686.
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) # compute linear PLS pls.object<-pls.ic(X,y,m=ncol(X))
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) # compute linear PLS pls.object<-pls.ic(X,y,m=ncol(X))
This function computes the Partial Least Squares fit.
pls.model( X, y, m = ncol(X), Xtest = NULL, ytest = NULL, compute.DoF = FALSE, compute.jacobian = FALSE, use.kernel = FALSE, method.cor = "pearson" )
pls.model( X, y, m = ncol(X), Xtest = NULL, ytest = NULL, compute.DoF = FALSE, compute.jacobian = FALSE, use.kernel = FALSE, method.cor = "pearson" )
X |
matrix of predictor observations. |
y |
vector of response observations. The length of |
m |
maximal number of Partial Least Squares components. Default is
|
Xtest |
optional matrix of test observations. Default is
|
ytest |
optional vector of test observations. Default is
|
compute.DoF |
Logical variable. If |
compute.jacobian |
Should the first derivative of the regression
coefficients be computed as well? Default is |
use.kernel |
Should the kernel representation be used to compute the
solution. Default is |
method.cor |
How should the correlation to the response be computed? Default is ”pearson”. |
This function computes the Partial Least Squares fit and its Degrees of
Freedom. Further, it returns the regression coefficients and various
quantities that are needed for model selection in combination with
information.criteria
.
coefficients |
matrix of regression coefficients |
intercept |
vector of intercepts |
DoF |
vector of Degrees of Freedom |
RSS |
vector of residual sum of error |
sigmahat |
vector of estimated model error |
Yhat |
matrix of fitted values |
yhat |
vector of squared length of fitted values |
covariance |
if
|
prediction
if Xtest
is provided, the predicted y-values for
Xtest
. mse
if Xtest
and ytest
are provided, the
mean squared error on the test data. cor
if Xtest
and
ytest
are provided, the correlation to the response on the test data.
Nicole Kraemer, Mikio L. Braun
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Sugiyama, M., Braun, M.L. (2009) "Lanczos Approximations for the Speedup of Partial Least Squares Regression", Proceedings of the 12th International Conference on Artificial Intelligence and Stastistics, 272 - 279
n<-50 # number of observations p<-15 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) ntest<-200 # Xtest<-matrix(rnorm(ntest*p),ncol=p) # test data ytest<-rnorm(ntest) # test data # compute PLS + degrees of freedom + prediction on Xtest first.object<-pls.model(X,y,compute.DoF=TRUE,Xtest=Xtest,ytest=NULL) # compute PLS + test error second.object=pls.model(X,y,m=10,Xtest=Xtest,ytest=ytest)
n<-50 # number of observations p<-15 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) ntest<-200 # Xtest<-matrix(rnorm(ntest*p),ncol=p) # test data ytest<-rnorm(ntest) # test data # compute PLS + degrees of freedom + prediction on Xtest first.object<-pls.model(X,y,compute.DoF=TRUE,Xtest=Xtest,ytest=NULL) # compute PLS + test error second.object=pls.model(X,y,m=10,Xtest=Xtest,ytest=ytest)
This function computes the optimal ridge regression model based on cross-validation.
ridge.cv( X, y, lambda = NULL, scale = TRUE, k = 10, plot.it = FALSE, groups = NULL, method.cor = "pearson", compute.jackknife = TRUE )
ridge.cv( X, y, lambda = NULL, scale = TRUE, k = 10, plot.it = FALSE, groups = NULL, method.cor = "pearson", compute.jackknife = TRUE )
X |
matrix of input observations. The rows of |
y |
vector of responses. The length of y must equal the number of rows of X |
lambda |
Vector of penalty terms. |
scale |
Scale the columns of X? Default is scale=TRUE. |
k |
Number of splits in |
plot.it |
Plot the cross-validation error as a function of
|
groups |
an optional vector with the same length as |
method.cor |
How should the correlation to the response be computed? Default is ”pearson”. |
compute.jackknife |
Logical. If |
Based on the regression coefficients coefficients.jackknife
computed
on the cross-validation splits, we can estimate their mean and their
variance using the jackknife. We remark that under a fixed design and the
assumption of normally distributed y
-values, we can also derive the
true distribution of the regression coefficients.
cv.error.matrix |
matrix of cross-validated errors based on mean squared error. A row corresponds to one cross-validation split. |
cv.error |
vector of cross-validated errors based on mean squared error |
lambda.opt |
optimal value of |
intercept |
intercept of the optimal model, based on mean squared error |
coefficients |
vector of regression coefficients of the optimal model, based on mean squared error |
cor.error.matrix |
matrix of cross-validated errors based on correlation. A row corresponds to one cross-validation split. |
cor.error |
vector of cross-validated errors based on correlation |
lambda.opt.cor |
optimal value of |
intercept.cor |
intercept of the optimal model, based on correlation |
coefficients.cor |
vector of regression coefficients of the optimal model, based on mean squared error |
coefficients.jackknife |
Array of
the regression coefficients on each of the cross-validation splits. The
dimension is |
Nicole Kraemer
pls.cv
, pcr.cv
,
benchmark.regression
n<-100 # number of observations p<-60 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) ridge.object<-ridge.cv(X,y)
n<-100 # number of observations p<-60 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) ridge.object<-ridge.cv(X,y)
This function computes the trace of a matrix.
tr(M)
tr(M)
M |
square matrix |
The trace of the matrix M.
Nicole Kraemer
M<-matrix(rnorm(8*8),ncol=8) tr.M<-tr(M)
M<-matrix(rnorm(8*8),ncol=8) tr.M<-tr(M)
This function returns the variance-covariance matrix of a plsdof-object.
## S3 method for class 'plsdof' vcov(object, ...)
## S3 method for class 'plsdof' vcov(object, ...)
object |
an object of class "plsdof" that is returned by the function
|
... |
additional parameters |
The function returns the variance-covariance matrix for the optimal number
of components. It can be applied to objects returned by pls.ic
and
pls.cv
.
variance-covariance matrix
Nicole Kraemer
Kraemer, N., Sugiyama M. (2011). "The Degrees of Freedom of Partial Least Squares Regression". Journal of the American Statistical Association 106 (494) https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.tm10107
Kraemer, N., Sugiyama M., Braun, M.L. (2009) "Lanczos Approximations for the Speedup of Kernel Partial Least Squares Regression." Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), p. 272-279
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) pls.object<-pls.ic(X,y,m=5,criterion="bic") my.vcov<-vcov(pls.object) my.sd<-sqrt(diag(my.vcov)) # standard deviation of regression coefficients
n<-50 # number of observations p<-5 # number of variables X<-matrix(rnorm(n*p),ncol=p) y<-rnorm(n) pls.object<-pls.ic(X,y,m=5,criterion="bic") my.vcov<-vcov(pls.object) my.sd<-sqrt(diag(my.vcov)) # standard deviation of regression coefficients
This function computes the projection operator
vvtz(v, z)
vvtz(v, z)
v |
orthonormal basis of the space on which |
z |
vector that is projected onto the columns of |
The above formula is only valid if the columns of v
are normalized
and mutually orthogonal.
value of the projection operator
Nicole Kraemer
# generate random orthogonal vectors X<-matrix(rnorm(10*100),ncol=10) # random data S<-cor(X) # correlation matrix of data v<-eigen(S)$vectors[,1:3] # first three eigenvectors of correlation matrix z<-rnorm(10) # random vector z projection.z<-vvtz(v,z)
# generate random orthogonal vectors X<-matrix(rnorm(10*100),ncol=10) # random data S<-cor(X) # correlation matrix of data v<-eigen(S)$vectors[,1:3] # first three eigenvectors of correlation matrix z<-rnorm(10) # random vector z projection.z<-vvtz(v,z)