---
title: Accounting for uncertainty in residual variances for small sample studies
author: William R.P. Denault
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Small data example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r knitr-opts, include=FALSE}
knitr::opts_chunk$set(
  comment = "#",
  collapse = TRUE,
  results  = "markup",
  fig.align = "center",
  fig.width = 4.5,
  fig.height = 3.2,
  warning = FALSE,
  message = FALSE
)
```

This is vignette is a modified example based on Figure 1 panel B-C-D
in [Denault et al paper](https://doi.org/10.1101/2025.05.16.654543).

```{r load-pkgs}
library(susieR)
```

For reproducibility, set the seed:

```{r}
set.seed(1)
```

## Data

In this example, we analyze a simulated eQTL data set. The goal
is to finemap causal variants for expression (eQTLs).

```{r load-data}
data(data_small)
y <- data_small$y
X <- data_small$X
dim(X)
```

## Baseline SuSiE fit

The original SuSiE method displays signs of misscalibration: the
result is highly suspicious as we find 10 credible sets in a data set
containing only 47 samples.

```{r run-susie, fig.height=3.5, fig.width=5, message=FALSE}
res_susie <- susie(X,y,L = 10,verbose = TRUE)
res_susie$sets$cs
susie_plot(res_susie,y = "PIP")
```

Another clue is that the fine-mapped SNPs explain >99% of the
variation in gene expression, which might be explained by overfitting:

```{r, fig.height=4.5, fig.width=4}
ypred <- predict(res_susie, X)
pve <- 1 - drop(res_susie$sigma2 / var(y))
round(100 * pve, 3)
plot(y, ypred, pch = 20,
     xlab = "observed",
     ylab = "predicted")
abline(0, 1, col = "magenta", lty = "dotted")
```

## SuSiE with Servin-Stephens SER

Setting `estimate_residual_method = "NIG"` switches SuSiE
to a variant of the single-effect regression (SER) model that accounts
for uncertainty in the residual variance. This is based on the linear
regression model for single-SNP association tests described in
[Servin and Stephens (2007)](https://doi.org/10.1371/journal.pgen.0030114).

```{r run-susie-small, message=FALSE, warning=FALSE} 
res_susie_small <-
  susie(X,y,L = 1,estimate_residual_method = "NIG",
        verbose = TRUE)
res_susie_small$sets$cs
```

This analysis looks more plausible as it identifies only 1 CS:

```{r, fig.height=3, fig.width=5}
susie_plot(res_susie_small,y = "PIP")
```

And, indeed, the predictions with the Servin-Stephens SER do not seem
to "overfit" the expression data quite so strongly.

```{r, fig.height=4.5, fig.width=4}
pred_small <- predict(res_susie_small, X)
plot(y, ypred, pch = 20,col = "darkblue",
     xlab = "observed",
     ylab = "predicted")
points(y, pred_small, pch = 20, col = "darkorange")
abline(0, 1, col = "magenta", lty = "dotted")
legend("topleft", pch = c(20, 20), col = c("darkblue","darkorange"),
       legend = c("SuSiE (default Gaussian SER)",
                  "SuSiE (Servin-Stephens SER)"))
```
  
### References

Servin, B. & Stephens, M. (2007). Imputation-based analysis of
association studies: Candidate regions and quantitative traits. *PLoS
Genetics*, 3(7): e114.

Denault et al (2025). Accounting for uncertainty in residual
variances improves calibration for fine-mapping with small sample
sizes. *bioRxiv* doi:10.1101/2025.05.16.654543.