--- title: Accounting for uncertainty in residual variances for small sample studies author: William R.P. Denault date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Small data example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r knitr-opts, include=FALSE} knitr::opts_chunk$set( comment = "#", collapse = TRUE, results = "markup", fig.align = "center", fig.width = 4.5, fig.height = 3.2, warning = FALSE, message = FALSE ) ``` This is vignette is a modified example based on Figure 1 panel B-C-D in [Denault et al paper](https://doi.org/10.1101/2025.05.16.654543). ```{r load-pkgs} library(susieR) ``` For reproducibility, set the seed: ```{r} set.seed(1) ``` ## Data In this example, we analyze a simulated eQTL data set. The goal is to finemap causal variants for expression (eQTLs). ```{r load-data} data(data_small) y <- data_small$y X <- data_small$X dim(X) ``` ## Baseline SuSiE fit The original SuSiE method displays signs of misscalibration: the result is highly suspicious as we find 10 credible sets in a data set containing only 47 samples. ```{r run-susie, fig.height=3.5, fig.width=5, message=FALSE} res_susie <- susie(X,y,L = 10,verbose = TRUE) res_susie$sets$cs susie_plot(res_susie,y = "PIP") ``` Another clue is that the fine-mapped SNPs explain >99% of the variation in gene expression, which might be explained by overfitting: ```{r, fig.height=4.5, fig.width=4} ypred <- predict(res_susie, X) pve <- 1 - drop(res_susie$sigma2 / var(y)) round(100 * pve, 3) plot(y, ypred, pch = 20, xlab = "observed", ylab = "predicted") abline(0, 1, col = "magenta", lty = "dotted") ``` ## SuSiE with Servin-Stephens SER Setting `estimate_residual_method = "NIG"` switches SuSiE to a variant of the single-effect regression (SER) model that accounts for uncertainty in the residual variance. This is based on the linear regression model for single-SNP association tests described in [Servin and Stephens (2007)](https://doi.org/10.1371/journal.pgen.0030114). ```{r run-susie-small, message=FALSE, warning=FALSE} res_susie_small <- susie(X,y,L = 1,estimate_residual_method = "NIG", verbose = TRUE) res_susie_small$sets$cs ``` This analysis looks more plausible as it identifies only 1 CS: ```{r, fig.height=3, fig.width=5} susie_plot(res_susie_small,y = "PIP") ``` And, indeed, the predictions with the Servin-Stephens SER do not seem to "overfit" the expression data quite so strongly. ```{r, fig.height=4.5, fig.width=4} pred_small <- predict(res_susie_small, X) plot(y, ypred, pch = 20,col = "darkblue", xlab = "observed", ylab = "predicted") points(y, pred_small, pch = 20, col = "darkorange") abline(0, 1, col = "magenta", lty = "dotted") legend("topleft", pch = c(20, 20), col = c("darkblue","darkorange"), legend = c("SuSiE (default Gaussian SER)", "SuSiE (Servin-Stephens SER)")) ``` ### References Servin, B. & Stephens, M. (2007). Imputation-based analysis of association studies: Candidate regions and quantitative traits. *PLoS Genetics*, 3(7): e114. Denault et al (2025). Accounting for uncertainty in residual variances improves calibration for fine-mapping with small sample sizes. *bioRxiv* doi:10.1101/2025.05.16.654543.