--- title: "Partial Least Squares Regression for Beta Regression Models" subtitle: "The plsRbeta package" shorttitle: "PLSR Beta Models" author: - name: "Frédéric Bertrand and Myriam Maumy-Bertrand" affiliation: - IRMA, labex IRMIA, Université de Strasbourg and CNRS. LIST3N, Université de technologie de Troyes. email: frederic.bertrand@utt.fr date: "`r Sys.Date()`" output: prettydoc::html_pretty: theme: cayman highlight: github vignette: > %\VignetteIndexEntry{Partial Least Squares Regression for Beta Regression Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} #file.edit(normalizePath("~/.Renviron")) LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE") #LOCAL=TRUE knitr::opts_chunk$set(purl = LOCAL) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.height = 3, fig.width = 5 ) ``` ## Abstract Many variables of interest, such as experimental results, yields or economic indicators, are naturally expressed as rates, proportions or indices whose values are necessarily between zero and one or more generally two fixed values known in advance. Beta regression allows us to model these data with a great deal of flexibility, since the density functions of the Beta laws can take on a wide variety of forms. However, like all usual regression models, it cannot be applied directly when the predictors present multicollinearity problems or worse when they are more numerous than the observations. These situations are frequently encountered in chemistry, medicine, economics or marketing. To circumvent this difficulty, we formulate an extension of PLS regression for Beta regression models. This, along with several tools such as cross-validation and bootstrap techniques, is available for the `R` language in the `plsRbeta` library. ### Keywords Beta Regression, PLS Regression, PLS Beta Regression, Cross validation, Bootstrap techniques, `R` language Classification/MSC: 62F40, 62J07, 62J12, 62P10, 62P20, 62P30 ## Introduction PLS regression, the result of the NIPALS algorithm initially developed by Wold (Wold, 1966)[^wold66] and explained in detail by Tenenhaus (Tenenhaus, 1998)[^tenenhaus98], has already been successfully extended to generalized linear models by Bastien *et al.* (Bastien, 2005)[^baeste05] and to Cox models by Bastien (Bastien 2008)[^bastien08] and (Bastien 2015)[^bastien15]. We proposed an extension of PLS regression to Beta regression models in Bertrand (2013)[^Bertrand2013]. Indeed, the practical interest of the Beta distribution had been asserted several times, for example by Johnson *et al.* (Johnson, 1995)[^jokoba95] : "Beta distributions are very versatile and a variety of uncertanties can be usefully modelled by them. This flexibility encourages its empirical use in a wide range of applications." Several articles have focused on the study of Beta regression and its properties. Let us mention in particular, the article by Ferrari and Cribari-Neto (Ferrari, 2004)[^fecr04] for an introduction to these models and those by Kosmidis and Firth (Kosmidis, 2010)[^kofi10], Simas *et al.* (Simas, 2010)[^sibaro10] and Grün *et al.* (Grün, 2012)[^grkoze12] for extensions or improvements of the estimation techniques of these models. We will assume in the rest of the paper that the answer studied has values in the interval $[0;1]$. The model we propose can of course be used as soon as the answer $Y$ has values in a bounded interval $[a;b]$, with $a0$, $q>0$ and $\Gamma(\cdot)$ the Euler gamma function. If $Y$ follows a $\textrm{Beta}(p,q)$ distribution, its expectation and variance are equal to : \begin{equation} \mathbb{E}(Y)=\frac{p}{p+q} \quad \textrm{et} \quad \mathrm{Var}(Y)=\frac{pq}{(p+q)^2(p+q+1)}\cdot \quad \textrm{(E.2)} \end{equation} In order to be able to apply techniques similar to those used for generalized linear models by McCullagh and Nelder, Ferrari and Cribari-Neto (Ferrari, 2004)[^fecr04] propose to rephrase the Beta law as follows. By posing $\mu=p/(p+q)$ and $\phi=p+q$, i.e. $p=\mu\phi$ and $q=(1-\mu)\phi$, the Equation ($\textrm{E.2}$) becomes : \[ \mathbb{E}(Y)=\mu \quad \textrm{et} \quad \mathrm{Var}(Y)=\frac{V(\mu)}{1+\phi} \] where $V(\mu)=\mu(1-\mu)$. Thus $\mu$ is the mean value of the response and $\phi$ can be interpreted as a precision parameter since, for a fixed $\mu$, the higher the value of $\phi$, the smaller the variance of the response. With these new parameters, the density function given in Equation ($\textrm{E.1}$) is equal to : \begin{equation} \pi(y;\mu;\phi)=\frac{\Gamma(\phi)}{\Gamma(\mu\phi)\Gamma((1-\mu)\phi)}y^{\mu\phi-1}(1-y)^{(1-\mu)\phi-1}, \quad 0i$, which makes the matrix non-invertible, posing difficulties in fitting a regression model. The PLS Beta regression that we have developed is therefore particularly well suited to deal with allelotyping data in the particular context of predicting a rate variable. The example is that of allelotyping data obtained on a series of 93 patients with different types of lung cancer and with $23.2 \%$ missing values. The predicted variable is the tumor cellularity rate of the intraoperative tumor specimen. The explanatory variables are composed of 56 binary variables indicating the presence of an abnormality on each of the 56 microsatellites and three clinical variables. The selection of variables is very important in this example because it allows to define a subset of predictors, i.e. gene sites, able to predict the tumor cell rate. Indeed, cancer pathologies are acquired genetic pathologies and some of these anomalies are the cause and others the consequence of the tumor pathology. Moreover, the information contained in the different microsatellite gene sites is potentially redundant. Variable selection, separating variables that probably play a driving role in tumor development from variables that merely reflect random background noise induced by abnormalities caused by tumor development, is therefore an indispensable aid to understanding the underlying mechanisms of tumorigenesis. Table 3 summarizes the values of different criteria used to determine the number of components to be used to properly model the allelotype data with BR and a logit link. |Nb of Comp.|0|1|2|3|4|5|6|7|8|9| |--|--|--|--|--|--|--|--|--|--|--| |AIC|$-23.9$|$-48.2$|$-63.4$|$-76.2$|$-90.2$|$-101.5$|$-114.2$|$-116.9$|$-120.5$|$\mathbf{-121.6}$| |BIC|$-18.5$|$-40.2$|$-52.7$|$-62.9$|$-74.2$|$-82.8$|$-92.9$|$-92.9$|$\mathbf{-93.8}$|$-92.3$| |Pred Sig||$15$|$7$|$1$|$2$|$0$|$\mathbf{2}$|$0$|$0$|$0$| |$Q^2\chi^2$ (5-CV)||$\mathbf{-0.3}$|$-1.0$|$-1.8$|$-2.8$|$-3.8$|$-5.9$|$-8.9$|$-10.4$|$-8.0$| |$\chi^2$ Pearson|$98.3$|$97.2$|$96.0$|$\mathbf{100.0}$|$96.8$|$91.5$|$89.1$|$88.6$|$87.4$|$86.4$| |pseudo-$R^2$||$0.21$|$0.31$|$0.40$|$0.47$| $0.53$|$0.58$|$0.59$|$0.61$|$\mathbf{0.62}$| |$R^2$ Pearson|| $0.22$|$0.35$|$0.41$|$0.50$|$0.57$|$0.64$|$0.65$|$0.67$|$\mathbf{0.68}$| Table: Table 3. Selecting the number of components, BR method and logit link. The stopping criterion as soon as there is no significant predictor would lead us to choose 6 components. This choice is confirmed by the pseudo-$R^2$ or $R^2$ criteria for which a kink appears from 6 components. The BIC criterion invites us to choose 8 components and the AIC criterion an even higher number. During the simulation study, we noticed that these criteria are liberal and generally retain a few too many components. Depending on the criteria, 6 or 8 components should be retained for a logit link. Taking into account the previous simulation study, we decide to retain 6 components. The same study was performed with the BC method or a log-log link and leads to the same conclusions. Confidence intervals are then obtained for each of the predictors using bootstrap samples of size 1000 and the normal, basic, percentile or $BC_a$ techniques. The use of bootstrap techniques to establish the significance of the predictors of a PLS Beta regression model is illustrated in Figure 2 for the case of the 6-component model with the BR bias reduction technique. In the end, we use the $BC_a$ technique, known for its good properties, to select the variables that are significant at the 5 \% threshold by retaining those for which the $BC_a$ confidence interval does not contain 0, as shown in Figure 3 for the case of the 6-component model with the BR bias reduction technique. Thus, we retain the variables in Table 4 as having a significant effect on the tumor cellularity rate of the intraoperative tumor specimen. We find that the two techniques for dealing with the Beta regression bias have results that differ only in the single variable EGF3, significant at the $5 \%$ threshold for the BC technique and not for the BR technique. |Nb Comp|Significant Variables| |--|--|--|--|--|--|--| |6 components|P4 |C3M |RB |FL7A |W2 |W4 | | |MT4 |HLA |HLD |HLB |EA3 |EA2 | |Bias Correction|EGF2 |EGF3 |FL7B |VSFGFR3 |VSTOP1 |VSTOP2A | | |VSEGFR |AFRAEGFR |SRXRA |SMT |SHL |SEB | |6 components|P4 |C3M |RB |FL7A |W2 |W4 | | |MT4 |HLA |HLD |HLB |EA3 |EA2 | |Bias Reduction|EGF2 |FL7B |VSFGFR3 |VSTOP1 |VSTOP2A |VSEGFR | | |AFRAEGFR |SRXRA |SMT |SHL |SEB | | Table: Table 4. Significant Variables. Figure 4 allows us to evaluate the stability of the significant variables selected by the bootstrap technique $BC_a$ for a number of components varying from 1 to 6 and the two techniques BC and BR for fitting the Beta regression model. ```{r} library(plsRglm) library(plsRbeta) ``` ```{r, eval=FALSE} data(TxTum) TxTum.mod.bootBR6 <- bootplsbeta(plsRbeta(formula=I(CELTUMCO/100)~.,data=TxTum,nt=6,modele="pls-beta",type="BR"), sim="ordinary", stype="i", R=1000) #save(TxTum.mod.bootBR6,file="TxTum.mod.bootBR6.Rdata") ``` ```{r, warnings=FALSE} data("TxTum.mod.bootBR6") temp.ci <- suppressWarnings(plsRglm::confints.bootpls(TxTum.mod.bootBR6)) ``` ```{r, fig.cap="Figure 1. Boxplot of bootstrap distributions, 6 components BR."} boxplots.bootpls(TxTum.mod.bootBR6,indices = 2:nrow(temp.ci)) ``` ```{r, fig.cap="Figure 2. Bootstrap 95% confidence intervals, 6 components BR."} plsRglm::plots.confints.bootpls(temp.ci,prednames=TRUE,indices = 2:nrow(temp.ci)) ``` ```{r} ind_BCa_nt6BR <- (temp.ci[,7]<0&temp.ci[,8]<0)|(temp.ci[,7]>0&temp.ci[,8]>0) rownames(temp.ci)[ind_BCa_nt6BR] ``` ```{r, fig.cap="Figure 3. Bootstrap 95% confidence intervals, BCa significant variables, 6 components BR."} plsRglm::plots.confints.bootpls(temp.ci,prednames=TRUE,indices = (2:nrow(temp.ci))[ind_BCa_nt6BR[-1]],articlestyle=FALSE,legendpos="topright") ``` ```{r} #TxTum <- read.table("MET_rev_edt.txt",sep="\t",header=TRUE) data("TxTum.mod.bootBC1") temp.ci <- plsRglm::confints.bootpls(TxTum.mod.bootBC1) data("ind_BCa_nt1BC") data("ind_BCa_nt2BC") data("ind_BCa_nt3BC") data("ind_BCa_nt4BC") data("ind_BCa_nt5BC") data("ind_BCa_nt6BC") data("ind_BCa_nt1BR") data("ind_BCa_nt2BR") data("ind_BCa_nt3BR") data("ind_BCa_nt4BR") data("ind_BCa_nt5BR") data("ind_BCa_nt6BR") rownames(temp.ci)[ind_BCa_nt1BC] (colnames(TxTum.mod.bootBC1$data)[-1])[] indics <- cbind(ind_BCa_nt1BC, ind_BCa_nt2BC, ind_BCa_nt3BC, ind_BCa_nt4BC, ind_BCa_nt5BC, ind_BCa_nt6BC, ind_BCa_nt1BR, ind_BCa_nt2BR, ind_BCa_nt3BR, ind_BCa_nt4BR, ind_BCa_nt5BR, ind_BCa_nt6BR) nbpreds <- rbind(colSums(indics[,1:6]),colSums(indics[,7:12])) colnames(nbpreds) <- c("1","2","3","4","5","6") rownames(nbpreds) <- c("BC","BR") ``` ```{r} nbpreds ``` ```{r} inone <- indics[rowSums(indics)>0,c(1,7,2,8,3,9,4,10,5,11,6,12)] inone <- t(apply(inone,1,as.numeric)) rownames(inone) colnames(inone) <- c("BC1","BR1","BC2","BR2","BC3","BR3","BC4","BR4","BC5","BR5","BC6","BR6") colnames(inone) <- c("1 BC"," BR","2 BC"," BR","3 BC"," BR","4 BC"," BR","5 BC"," BR","6 BC"," BR") ``` ```{r} inone ``` ```{r, fig.cap="Figure 4. Selected variables using the bootstrap BCa technique."} library(bipartite) bipartite::visweb(t(inone),type="None",labsize=2,square="b",box.col="grey25",pred.lablength=7) ``` ## Example of application in chemometrics The goal is to find compounds that can predict the rate of cancer cell infiltration in patients from spectrometry data. A healthy patient has 0 \% of cancer cells in a biopsy while a sick patient will have in a biopsy a \% higher than the sample contains cancer cells. The interest of this experiment is to try to reduce considerably the time of analysis of the biopsies by avoiding the need for a counting by a doctor specialized in anatomy and pathological cytology. An additional statistical difficulty appears here: there are more variables (180) than individuals (80). More details on the experimental protocol followed are available in the article in which this dataset has already been published (Piotto, 2012)[^pinm12]. We begin by determining the number of components to use to fit a model with all areas of the spectrum as explanatory variables. Table 5 contains the components for selecting the number of components. We again see the tendency of the AIC and BIC criteria to retain a very high number of components, namely more than 10. On the other hand, in this example the $Q^2\chi^2$ criterion, estimated using a 10-group cross-validation (10-CV), seems relevant and invites us to keep 3 components. The same is true for the Pearson $chi^2$. The pseudo-$R^2$ is also in line with this since we observe a rapid decrease in the explanatory contribution following the addition of additional components beyond the third one. |Nb of Comp.|0|1|2|3|4|5|6|7|8|9|10| |--|--|--|--|--|--|--|--|--|--|--|--| |AIC|$-32.35$|$-110.26$|$-128.86$|$-141.93$|$-152.50$|$-165.93$|$-183.71$|$-193.49$|$-206.75$|$-217.21$|$\mathbf{-231.30}$| |BIC|$-27.58$|$-103.11$|$-119.33$|$-130.02$|$-138.21$|$-149.25$|$-164.66$|$-172.05$|$-182.93$|$-191.01$|$\mathbf{-202.71}$| |$Q^2\chi^2$ (10-CV)||$-1.05$|$-2.68$|$\mathbf{-1.00}$|$-1.51$|$-2.34$|$-3.71$|$-8.04$|$-211.19$|$-5.16E3$|$-4.16E4$| |$\chi^2$ Pearson|$76.78$|$76.44$|$\mathbf{80.58}$|$76.20$|$81.95$|$81.08$|$74.06$|$76.39$|$72.62$|$73.21$|$71.82$| |pseudo-$R^2$||$0.70$|$0.78$|$0.82$|$0.85$|$0.87$|$0.89$|$0.90$|$0.92$|$0.93$|$\mathbf{0.94}$| |$R^2$ Pearson||$0.57$|$0.64$|$0.73$|$0.75$|$0.81$|$0.86$|$0.88$|$0.90$|$0.92$|$\mathbf{0.94}$| Table: Table 5. Selecting the number of components. We then construct bootstrap samples of size 250 of the predictor coefficients for a 3-component model with the bias reduction (BR) method. Figure 5 gives the whisker boxes of these distributions. Confidence intervals are then obtained for each of the predictors with the normal, basic, percentile or $BC_a$ techniques. In the end, we use the $BC_a$ technique, known for its good properties (Diciccio, 1996)[^dief96], to select the significant variables at the 5 \% threshold by retaining those for which the $BC_a$ confidence interval does not contain 0 as illustrated by Figure 6 for the case of the 3-component model with the BR bias reduction technique. ```{r} data(colon) orig <- colon ``` ```{r, eval=FALSE} modpls.boot3 <- bootplsbeta(plsRbeta(X..Cellules.tumorales~.,data=colon,nt=3,modele="pls-beta"), sim="ordinary", stype="i", R=250) #save(modpls.boot3,file="modpls.boot_nt3.Rdata") ``` ```{r} data("modpls.boot_nt3") temp.ci <- suppressWarnings(plsRglm::confints.bootpls(modpls.boot3)) ind_BCa_nt3 <- (temp.ci[,7]<0&temp.ci[,8]<0)|(temp.ci[,7]>0&temp.ci[,8]>0) #save(ind_BCa_nt3,file="ind_BCa_nt3.Rdata") ``` ```{r, fig.cap="Figure 5. Boxplots of the bootstrap distribution of the coefficients of the predictors, 3 components BR"} boxplots.bootpls(modpls.boot3,indices = 2:nrow(temp.ci)) ``` ```{r, fig.cap="Figure 6. Bootstrap 95% confidence intervals, 3 components BR"} plsRglm::plots.confints.bootpls(temp.ci,prednames=TRUE,indices = 2:nrow(temp.ci)) ``` ```{r} data(file="ind_BCa_nt3") rownames(temp.ci)[ind_BCa_nt3] ``` ```{r, fig.cap="Figure 7. Bootstrap 95% confidence intervals, BCa significant variables, 3 components BR."} plsRglm::plots.confints.bootpls(temp.ci,prednames=TRUE,indices = (2:nrow(temp.ci))[ind_BCa_nt3[-1]],articlestyle=FALSE) ``` ```{r, fig.cap="Figure 8. BCa Bootstrap 95% confidence intervals, BCa significant variables, 3 components BR."} plsRglm::plots.confints.bootpls(temp.ci,prednames=TRUE,indices = (2:nrow(temp.ci))[ind_BCa_nt3[-1]],articlestyle=FALSE,typeIC="BCa") ``` Finally, we fit a Beta PLS regression model constructed from the only 79 significant predictors for the $BC_a$ bootstrap technique. The selection of the number of components, not reproduced here due to lack of space, leads us to retain 2 of them. ```{r} data("ind_BCa_nt3") colon_sub4 <- colon[c(TRUE,ind_BCa_nt3[-1])] ``` ```{r, eval=FALSE} modpls_sub4 <- plsRbeta(X..Cellules.tumorales~.,data=colon_sub4,nt=10,modele="pls-beta") #save(modpls_sub4,file="modpls_sub4.Rdata") ``` ```{r} data("modpls_sub4") modpls_sub4 ``` ```{r, fig.cap="Figure 9. Residuals index plot.", fig.keep='last'} # Index plot sfit <- modpls_sub4$FinalModel; rd <- resid(sfit, type="sweighted2"); plot(seq(1,sfit$n), rd, xlab="Subject Index", ylab="Std Weigthed Resid 2", main="Index Plot"); abline(h=0, lty=3); ``` We note on Figure 10 that this model is validated by the use of a simulated envelope (Atkinson, 1981)[^atkinson1981]. This one is constructed, as recommended by Atkinson, from 19 order statistics. ```{r} library(betareg) # A Half-normal Plot with simulated envelop phat <- predict(sfit, type="response"); phihat <- predict(sfit, type="precision"); tt <- modpls_sub4$tt n <- sfit$n absrds <- list(); for (i in 1:19) { tYwotNA <- rbeta(sfit$n, phat*phihat, (1-phat)*phihat); tsfit <- betareg(tYwotNA~tt, x=TRUE); trd <- residuals(tsfit, type="sweighted2"); absrds[[i]]<-sort(abs(trd)); } lower <- upper <- middle <- rep(0, n); for (j in 1:n) { min <- max <- sum <- absrds[[1]][j]; min; max; for (i in 2:19) { if (min > absrds[[i]][j]) min <- absrds[[i]][j]; if (max < absrds[[i]][j]) max <- absrds[[i]][j]; sum <- sum + absrds[[i]][j]; } lower[j] <- min; upper[j] <- max; middle[j] <- sum / 19; } qnval <- qnorm((1:n+n-1/8) / (2*n+1/2)); ``` ```{r simulated_envelope_submod4, fig.cap="Figure 10. Residuals and simulated envelop.", fig.keep='last'} plot(qnval, sort(abs(rd)), ylim=range(0,3.5), xlab="Expected", ylab="Observed", main="Half-Normal Plot with Simulated Envelope"); lines(qnval, lower, lty=3); lines(qnval, upper, lty=3); lines(qnval, middle, lty=1, lwd=2); ``` As we can see in Figure 11, the two components of the model formed by the only 79 significant predictors for the bootstrap technique $BC_a$ allow a clear linear separation between the healthy subjects, in dark green, and the sick subjects whose condition is all the more serious as the color approaches red. To allow a graphical evaluation of the predictive quality of the model, we represent the observed values as a function of the predicted values on the $\textrm{logit}$ scale, Figure 12, and on the original scale Figure 13. Note that, since the transition from one to the other of the two graphs is done by means of a continuous, bijective and increasing transformation of the axes, the Kendall correlation coefficient between the observed and predicted values will be identical for both scales. For both graphs, we measure an association in the sense of Kendall's correlation coefficient between the observed values and the predicted values equal to $0.72$ for a $p$-value of $7.03*10^{-19}$, thus significant at the $1\%$ threshold. ```{r mod_sub4_nt2_12_beta, fig.cap="Figure 11. Representation of individuals on the plane formed by the first two components."} plot(modpls_sub4$tt[,1],modpls_sub4$tt[,2],col=plotrix::color.scale(colon[,1],c(0,1,1),c(1,1,0),0),pch=16,ylab="",xlab="") ``` ```{r plot_logit_xpred_yorig_submod4, fig.cap="Figure 12. Observed values versus predicted values on the logit scale.", fig.keep='last'} plot(binomial()$linkfun(modpls_sub4$ValsPredictY),binomial()$linkfun(orig[,1]),xlab="Valeurs prédites",ylab="Valeurs observées") abline(0,1,lwd=2,lty=2) ``` ```{r plot_xpred_yorig_submod4, fig.cap="Figure 13. Observed values versus predicted values on the original scale.", fig.keep='last'} plot(modpls_sub4$ValsPredictY,orig[,1],xlim=c(0,1),ylim=c(0,1)) abline(0,1,lwd=2,lty=2) ``` Note, however, that the data in this example are in fact formed by mixing healthy individuals for whom the infiltration rate is zero and sick individuals for whom these rates are non-zero and vary from 0 to 1. Thus the use of a PLS regression model based on the combination of a logistic PLS regression model and a Beta logistic regression model or a Beta regression model with inflation of 0 introduced by Ospina and Ferrari (Ospina, 2012)[^osfe12] would likely make sense and is currently being carried out by the authors. ## Conclusion and perspectives Our goal has been to propose an extension of PLS regression to Beta regression models, and then to make it available to users of the free language `R`. We thus offer the possibility to work with collinear predictors to model rates or proportions, which is an unavoidable difficulty when modeling mixtures or when analyzing spectra or studying genetic, proteomic or metabonomic data. Moreover, Beta PLS regression can also be applied to incomplete data sets. It is also possible in this case, as in the case of complete data, to select the number of components by repeated $k$-fold cross-validation. Finally, we propose bootstrap techniques in order to, for example, test the significance of each of the predictors present in the dataset and thus validate the models built. The study of two real datasets allowed the proposed tools to demonstrate their efficiency. ### Acknowledgements The authors are grateful to the two referees for their comments and suggestions which helped to improve the quality of several points of the article as well as to Jean-Pierre Gauchi for his availability and his pertinent and constructive remarks. [^atkinson1981]:*A.C. ATKINSON*. Two graphical displays for outlying and influential observations in regression. Biometrika, 68(1):13–20, 1981. [^atkinson85]:*A.C. ATKINSON*. Plots, Transformations and Regression : An Introduction to Graphical Methods of Diagnostic Regression Analysis. Oxford University Press, Oxford, 1985. [^baeste05]:*Ph. BASTIEN, V. ESPOSITO VINZI et M. TENENHAUS*. Pls generalised linear regression. Computational Statistics & Data Analysis, 48(1):17–46, 2005. [^bastien08]:*Ph. BASTIEN*. Deviance residuals based PLS regression for censored data in high dimensional setting. Chemometrics and Intelligent Laboratory Systems, 91(1):78–86, 2008. [^bastien15]:*Ph. BASTIEN, F. BERTRAND, N. MEYER and M. MAUMY-BERTRAND*. Deviance residuals based sparse PLS and sparse kernel PLS regression for censored data. Bioinformatics, 31(3):397–404, 2015. [^Bertrand2013]:*F. BERTRAND, N. MEYER, M. BEAU-FALLER, K. EL BAYED, I.-J. NAMER, M. MAUMY-BERTRAND*. Régression Bêta PLS. (French) [PLS Beta regression.]. J. SFdS, 154(3):143–159, 2013. [^cariboot]:*A. CANTY and B. RIPLEY*. boot: Bootstrap R (S-Plus) Functions. R package version 1.3-26, 2021. [^crze10]:*F. CRIBARI-NETO and A. ZEILEIS*. Beta Regression in R. Journal of Statistical Software, 34(2):1–24, 2010. [^dahi97]:*A.C. DAVISON et D.V. HINKLEY*. Bootstrap methods and their application. Cambridge University Press, Cambridge, 1997. [^dief96]:*T.J. DICICCIO et B. EFRON*. Bootstrap confidence intervals (with discussion). Statistical Science, 11:189–228, 1996. [^efti93]:*B. EFRON et R.J. TIBSHIRANI*. An Introduction to the Bootstrap. Chapman \& Hall, New York, 1993. [^fecr04]:*S.L.P. FERRARI, et F. CRIBARI-NETO*. Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7):799–815, 2004. [^grkoze12]:*B. GRÜN, I. KOSMIDIS et A. ZEILEIS*. Extended Beta Regression in R : Shaken, Stirred, Mixed and Partitioned. Journal of Statistical Software, 48(11):1–25, 2012. [^hoskuldsson88]:*A. HÖSKULDSSON*. PLS regression methods. Journal of Chemometrics, 2:211–228, 1988. [^jokoba95]:*N.L. JOHNSON, S. KOTZ et N. BALAKRISHNAN*. Continuous Univariate Distributions, volume 2. Wiley, New York, 2nd edition, 1995. [^kofi10]:*I. KOSMIDIS et D. FIRTH*. A Generic Algorithm for Reducing Bias in Parametric Estimation. Journal of Chemometrics, 4:1097–1112, 2010. [^krsu11]:*N. KRAEMER et M. SUGIYAMA*. The Degrees of Freedom of Partial Least Squares Regression. Journal of the American Statistical Association, 106(494):697–705, 2011. [^limoma02]:*B. LI, J. MORRIS et E.B. MARTIN*. Model selection for partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 64:79–89, 2002. [^mcne95]:*P. MCCULLAGH et J.A. NELDER*. Generalized Linear Models. Chapman & Hall/CRC, Boca Raton, 2nd édition, 1995. [^memabe10]:*N. MEYER, M. MAUMY-BERTRAND et F. BERTRAND*. Comparaison de variantes de régressions logistiques PLS et de régression PLS sur variables qualitatives : application aux donnés d’allélotypage. Journal de la Société Française De Statistique, 151(2):1–18, 2010. [^morris82]:*C.N. MORRIS*. Natural exponential families with quadratic variance functions. The Annals of Statistics, 10:65–80, 1982. [^nama85]:*T. NAES et H. MARTENS*. Comparison of prediction methods for multicollinear data. Communications in Statistics – Simulation and Computation, 14:545–576, 1985. [^newe72]:*J. NELDER et R. WEDDERBURN*. Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General), 135(3):370–384, 1972. [^osfe12]:*R. OSPINA et SLP. FERRARI*. A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(1):1609–1623, 2012. [^pinm12]:*M. PIOTTO, F.-M. MOUSSALLIEH, A. NEUVILLE, J.-P. BELLOCQ, K. ELBAYED et I.J. NAMER*. Towards real-time metabolic profiling of a biopsy specimen during a surgical operation by 1h high resolution magic angle spinning nuclear magnetic resonance : a case report. Journal of Medical Case Reports, 6(1), 2012. [^sibaro10]:*A.B. SIMAS,W. BARRETO-SOUZA et A.V. ROCHA*. Improved Estimators for a General Class of Beta Regression Models. Computational Statistics & Data Analysis, 54(2):348–366, 2010. [^tenenhaus98]:*M. TENENHAUS*. La régression PLS : Théorie et Pratique. Technip, Paris, 1998. [^wold66]:*H. WOLD*. Estimation of principal component and related models by iterative least squares. In *P.R. KRISHNAIAH*, editor : Multivariate Analysis, pages 391–420. Academic Press, New York, 1966. [^wosjer01]:*S. WOLD, M. SJÖSTRÖM et L. ERIKSSON*. PLS-regression : a basic tool of Chemometrics. Chemometrics and Intelligent Laboratory Systems, 58:109–130, 2001.