--- title: "Towards Confidence Estimates in Cascade Networks using the SelectBoost Package" shorttitle: "SelectBoost: Confidence Estimates in Cascade Networks" author: - name: "Frédéric Bertrand and Myriam Maumy-Bertrand" affiliation: - Université de Strasbourg and CNRS - IRMA, labex IRMIA email: frederic.bertrand@utt.fr date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Towards Confidence Estimates in Cascade Networks using the SelectBoost Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} #file.edit(normalizePath("~/.Renviron")) LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE") #LOCAL=TRUE knitr::opts_chunk$set(purl = LOCAL) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` #Introduction ## Aims of the vignette Extending results from the `Cascade` package: reverse engineering with selectboost to compute **confidence indices for a fitted model**. We first fit a model to a cascade network using the `Cascade` package `inference` function then we compute confidence indices for the inferred links using the `Selecboost` algorithm. If you are a Linux/Unix or a Macos user, you can install a version of SelectBoost with support for `doMC` from [github](https://github.com) with: ```{r, eval = FALSE} devtools::install_github("fbertran/SelectBoost", ref = "doMC") ``` ## References * Reference for the Cascade modelling: Vallat, L., Kemper, C. a., Jung, N., Maumy-Bertrand, M., Bertrand, F., Meyer, N., Pocheville, A., Fisher, J. W., Gribben, J. G. et Bahram, S. (2013). Reverse-engineering the genetic circuitry of a cancer cell with predicted intervention in chronic lymphocytic leukemia. *Proceedings of the National Academy of Sciences of the United States of America*, 110(2), 459-64. * Reference for the Cascade package: Jung, N., Bertrand, F., Bahram, S., Vallat, L. et Maumy-Bertrand, M. (2014). Cascade : A R package to study, predict and simulate the diffusion of a signal through a temporal gene network. *Bioinformatics*. # Code Code to reproduce the datasets saved with the package and some the figures of the article Bertrand et al. (2020), Bioinformatics. ## Data simulation ```{r Cascade, cache= FALSE, eval = LOCAL} library(Cascade) ``` We define the F array for the simulations. ```{r, cache= TRUE, eval = LOCAL} T<-4 F<-array(0,c(T-1,T-1,T*(T-1)/2)) for(i in 1:(T*(T-1)/2)){diag(F[,,i])<-1} F[,,2]<-F[,,2]*0.2 F[2,1,2]<-1 F[3,2,2]<-1 F[,,4]<-F[,,2]*0.3 F[3,1,4]<-1 F[,,5]<-F[,,2] ``` We set the seed to make the results reproducible ```{r, cache= TRUE, eval = LOCAL} set.seed(1) Net<-Cascade::network_random( nb=100, time_label=rep(1:4,each=25), exp=1, init=1, regul=round(rexp(100,1))+1, min_expr=0.1, max_expr=2, casc.level=0.4 ) Net@F<-F ``` We simulate gene expression according to the network Net ```{r message=FALSE, cache=TRUE, eval = LOCAL} M <- Cascade::gene_expr_simulation( network=Net, time_label=rep(1:4,each=25), subject=5, level_peak=200) ``` ## Network inference We infer the new network. ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} Net_inf_C <- Cascade::inference(M,cv.subjects=TRUE) ``` Heatmap of the coefficients of the Omega matrix of the network. Run the code to get the graph. ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} stats::heatmap(Net_inf_C@network, Rowv = NA, Colv = NA, scale="none", revC=TRUE) ``` ```{r, cache= TRUE, eval = LOCAL} Fab_inf_C <- Net_inf_C@F ``` ## Compute the confidence indices for the inference ```{r, cache= TRUE, eval = LOCAL} library(SelectBoost) set.seed(1) ``` By default the crossvalidation is made subjectwise according to a leave one out scheme and the resampling analysis is made at the .95 `c0` level. To pass CRAN tests, `use.parallel = FALSE` is required. Set `use.parallel = TRUE` and select the number of cores using `ncores = 4`. ```{r, cache= TRUE, eval = LOCAL} net_pct_selected <- selectboost(M, Fab_inf_C, use.parallel = FALSE) ``` ```{r, cache= TRUE, eval = LOCAL} net_pct_selected_.5 <- selectboost(M, Fab_inf_C, c0value = .5, use.parallel = FALSE) ``` ```{r, cache= TRUE, eval = LOCAL} net_pct_selected_thr <- selectboost(M, Fab_inf_C, group = group_func_1, use.parallel = FALSE) ``` Use `cv.subject=FALSE` to use default crossvalidation ```{r, cache= TRUE, eval = LOCAL} net_pct_selected_cv <- selectboost(M, Fab_inf_C, cv.subject=FALSE, use.parallel = FALSE) ``` ## Analysis of the confidence indices Use plot to display the result of the confidence analysis. ```{r, cache= TRUE, eval = LOCAL} plot(net_pct_selected) ``` Run the code to plot the other results. ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} plot(net_pct_selected_.5) ``` ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} plot(net_pct_selected_thr) ``` ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} plot(net_pct_selected_cv) ``` Run the code to plot the remaning graphs. Distribution of non-zero (absolute value > 1e-5) coefficients ```{r, cache= FALSE, fig.keep="none", eval = LOCAL} hist(Net_inf_C@network[abs(Net_inf_C@network)>1e-5]) ``` Plot of confidence at .95 resampling level versus coefficient value for non-zero (absolute value > 1e-5) coefficients ```{r, cache= FALSE, fig.keep="none", eval = LOCAL} plot(Net_inf_C@network[abs(Net_inf_C@network)>1e-5],net_pct_selected@network.confidence[abs(Net_inf_C@network)>1e-5]) ``` ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} hist(net_pct_selected@network.confidence[abs(Net_inf_C@network)>1e-5]) ``` Plot of confidence at .5 resampling level versus coefficient value for non-zero (absolute value > 1e-5) coefficients ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} plot(Net_inf_C@network[abs(Net_inf_C@network)>1e-5],net_pct_selected_.5@network.confidence[abs(Net_inf_C@network)>1e-5]) ``` ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} hist(net_pct_selected_.5@network.confidence[abs(Net_inf_C@network)>1e-5]) ``` Plot of confidence at .95 resamling level with groups created by thresholding the correlation matrix versus coefficient value for non-zero (absolute value > 1e-5) coefficients. ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} plot(Net_inf_C@network[abs(Net_inf_C@network)>1e-5],net_pct_selected_thr@network.confidence[abs(Net_inf_C@network)>1e-5]) ``` ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} hist(net_pct_selected_thr@network.confidence[abs(Net_inf_C@network)>1e-5]) ``` Plot of confidence at .95 resampling level versus coefficient value for non-zero (absolute value > 1e-5) coefficients using standard cross-validation. ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} plot(Net_inf_C@network[abs(Net_inf_C@network)>1e-5],net_pct_selected_cv@network.confidence[abs(Net_inf_C@network)>1e-5]) ``` ```{r, cache= TRUE, fig.keep='none', eval = LOCAL} hist(net_pct_selected_cv@network.confidence[abs(Net_inf_C@network)>1e-5]) ``` ## Further improvements For further recommandations on the choice of the `c0` parameter, for instance as a quantile, please consult the SI of Bertrand et al. 2020.