| Title: | Sum of Single Effects Linear Regression |
|---|---|
| Description: | Implements methods for variable selection in linear regression based on the "Sum of Single Effects" (SuSiE) model, as described in Wang et al (2020) <DOI:10.1101/501114> and Zou et al (2021) <DOI:10.1101/2021.11.03.467167>. These methods provide simple summaries, called "Credible Sets", for accurately quantifying uncertainty in which variables should be selected. The methods are motivated by genetic fine-mapping applications, and are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse. The fitting algorithm, a Bayesian analogue of stepwise selection methods called "Iterative Bayesian Stepwise Selection" (IBSS), is simple and fast, allowing the SuSiE model be fit to large data sets (thousands of samples and hundreds of thousands of variables). |
| Authors: | Gao Wang [aut], Yuxin Zou [aut], Alexander McCreight [aut], Kaiqian Zhang [aut], William R.P. Denault [aut], Peter Carbonetto [aut, cre], Matthew Stephens [aut] |
| Maintainer: | Peter Carbonetto <[email protected]> |
| License: | BSD_3_clause + file LICENSE |
| Version: | 0.16.3 |
| Built: | 2026-06-04 03:17:14 UTC |
| Source: | https://github.com/stephenslab/susier |
This function orders the predictors by decreasing order of the magnitude of the estimated regression coefficient.
absolute.order(beta)absolute.order(beta)
beta |
A vector of estimated regression coefficients. |
An ordering of the predictors.
### generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ### order predictors by magnitude of univariate regression coefficient beta.hat = univariate_regression(X,y)$betahat order = absolute.order(beta.hat)### generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ### order predictors by magnitude of univariate regression coefficient beta.hat = univariate_regression(X,y)$betahat order = absolute.order(beta.hat)
Generic framework for post-convergence refinement of fitted models.
Repeatedly applies a block update (step_fn) to a fitted model,
monitoring ELBO for convergence and stability. Both CS refinement
(run_refine) and residual variance estimation (mvsusieR)
are instances of block coordinate ascent over different parameter blocks.
block_coordinate_ascent( model, data, step_fn, max_iter = 100, tol = 0.001, verbose = FALSE )block_coordinate_ascent( model, data, step_fn, max_iter = 100, tol = 0.001, verbose = FALSE )
model |
Fitted model (e.g., from |
data |
Data object passed to |
step_fn |
A function with signature
|
max_iter |
Maximum number of block ascent iterations (default 100). |
tol |
Convergence tolerance for relative ELBO change (default 1e-3). |
verbose |
If |
Convergence is reached when either:
step_fn returns converged = TRUE
(the block update signals no further improvement), or
the relative ELBO change falls below tol
(ELBO stabilized across block updates).
A warning is issued if the ELBO decreases between iterations.
The refined model, with model$converged set to
TRUE or FALSE.
Retrieve posterior mean estimates of the regression coefficients in a Mr.ASH model.
## S3 method for class 'mr.ash' coef(object, ...)## S3 method for class 'mr.ash' coef(object, ...)
object |
A Mr.ASH fit, usually the result of calling
|
... |
Additional arguments passed to the default S3 method. |
A p+1 vector. The first element gives the estimated intercept, and the remaining p elements are the estimated regression coefficients.
## generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X
## fit mr.ash fit.mr.ash = mr.ash(X, y)
## coefficient coef.mr.ash = coef(fit.mr.ash) intercept = coef.mr.ash[1] beta = coef.mr.ash[-1]
Extract regression coefficients from susie fit
## S3 method for class 'susie' coef(object, ...)## S3 method for class 'susie' coef(object, ...)
object |
A susie fit. |
... |
Additional arguments passed to the generic |
A p+1 vector, the first element being an intercept, and the remaining p elements being estimated regression coefficients.
Computes the marginal OLS regression coefficient and standard error for each '(X column, Y column)' pair, treating the regressions as independent. 'X' is assumed column-centred (no intercept term in the per-pair regression); each 'Y' column is treated independently. Returns the J x T matrices 'Bhat' and 'Shat'.
Used internally by single-effect-regression style routines that need a per-position marginal estimate. Vectorised across columns of 'Y' so callers can pass either a numeric vector (T = 1) or a numeric matrix (T > 1) without looping at the call site.
compute_marginal_bhat_shat(X, Y, predictor_weights = NULL, sigma2 = NULL)compute_marginal_bhat_shat(X, Y, predictor_weights = NULL, sigma2 = NULL)
X |
numeric matrix 'n x J', expected column-centred. |
Y |
numeric matrix 'n x T' or numeric vector of length 'n'. When a vector, is treated as a one-column matrix. |
predictor_weights |
optional numeric vector of length 'J' giving 'colSums(X^2)'. Computed internally when 'NULL'. Callers that have this cached on the data object pass it through to avoid recomputation. |
sigma2 |
optional numeric scalar giving a known residual variance. When supplied, 'Shat[j, t] = sqrt(sigma2 / predictor_weights[j])' (single-effect-residual form). When 'NULL', 'Shat' is the per-pair empirical residual standard error: for each '(j, t)' pair, the sample SD of 'Y[, t] - X[, j] * Bhat[j, t]' divided by 'sqrt(n - 1)'. The latter matches the form used by data-driven prior init routines (e.g., for fitting a normal-mixture prior via 'ashr::ash'). |
list with elements 'Bhat' ('J x T') and 'Shat' ('J x T').
set.seed(1) X <- matrix(rnorm(50 * 5), 50, 5) X <- scale(X, center = TRUE, scale = FALSE) Y <- matrix(rnorm(50 * 3), 50, 3) out <- compute_marginal_bhat_shat(X, Y) dim(out$Bhat) # 5 x 3 dim(out$Shat) # 5 x 3set.seed(1) X <- matrix(rnorm(50 * 5), 50, 5) X <- scale(X, center = TRUE, scale = FALSE) Y <- matrix(rnorm(50 * 3), 50, 3) out <- compute_marginal_bhat_shat(X, Y) dim(out$Bhat) # 5 x 3 dim(out$Shat) # 5 x 3
susie_ss
Computes the sufficient statistics
and after centering (and possibly standardizing) the
columns of and centering to have mean zero. We also
store the column means of and mean of .
compute_suff_stat(X, y, standardize = FALSE)compute_suff_stat(X, y, standardize = FALSE)
X |
An n by p matrix of covariates. |
y |
An n vector. |
standardize |
Logical flag indicating whether to standardize columns of X to unit variance prior to computing summary data |
A list of sufficient statistics (XtX, Xty, yty, n)
and X_colmeans, y_mean.
data(N2finemapping) ss <- compute_suff_stat(N2finemapping$X, N2finemapping$Y[, 1])data(N2finemapping) ss <- compute_suff_stat(N2finemapping$X, N2finemapping$Y[, 1])
A simulated eQTL data set with 47 individuals and 7,430 variables. The response is a simulated gene expression phenotype and the variables are genotypes. This data set illustrates the small sample-size setting considered in Denault et al (2025).
data_small is a list with the following elements:
Simulated gene expression response.
Genotype matrix.
W. R. P. Denault et al (2025). Accounting for uncertainty in residual variances improves fine-mapping in small sample studies. bioRxiv doi:10.1101/2025.05.16.654543.
The “Small data example” vignette.
data(data_small)data(data_small)
susie_rss Model Using Regularized LDThe estimated s gives information about the
consistency between the z scores and LD matrix. A larger
means there is a strong inconsistency between z scores and LD
matrix. The “null-mle” method obtains mle of under
, . The
“null-partialmle” method obtains mle of under
, in which is a matrix containing
the of eigenvectors that span the null space of R; that is, the
eigenvectors corresponding to zero eigenvalues of R. The estimated
from “null-partialmle” could be greater than 1. The
“null-pseudomle” method obtains mle of under
pseudolikelihood , .
estimate_s_rss(z, R, n, r_tol = 1e-08, method = "null-mle")estimate_s_rss(z, R, n, r_tol = 1e-08, method = "null-mle")
z |
A p-vector of z scores. |
R |
A p by p symmetric, positive semidefinite correlation matrix. |
n |
The sample size. (Optional, but highly recommended.) |
r_tol |
Tolerance level for eigenvalue check of positive semidefinite matrix of R. |
method |
a string specifies the method to estimate |
A number between 0 and 1.
set.seed(1) n <- 500 p <- 1000 beta <- rep(0, p) beta[1:4] <- 0.01 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) input_ss <- compute_suff_stat(X, y, standardize = TRUE) ss <- univariate_regression(X, y) R <- cor(X) attr(R, "eigen") <- eigen(R, symmetric = TRUE) zhat <- with(ss, betahat / sebetahat) # Estimate s using the unadjusted z-scores. s0 <- estimate_s_rss(zhat, R) # Estimate s using the adjusted z-scores. s1 <- estimate_s_rss(zhat, R, n)set.seed(1) n <- 500 p <- 1000 beta <- rep(0, p) beta[1:4] <- 0.01 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) input_ss <- compute_suff_stat(X, y, standardize = TRUE) ss <- univariate_regression(X, y) R <- cor(X) attr(R, "eigen") <- eigen(R, symmetric = TRUE) zhat <- with(ss, betahat / sebetahat) # Estimate s using the unadjusted z-scores. s0 <- estimate_s_rss(zhat, R) # Estimate s using the adjusted z-scores. s1 <- estimate_s_rss(zhat, R, n)
Data simulated using real genotypes from 50,000 individuals and 200 SNPs. Two of the SNPs have non-zero effects on the multivariate response. The response data are generated under a linear regression model. The simulated response and the columns of the genotype matrix are centered.
FinemappingConvergence is a list with the following
elements:
Summary statistics computed using the centered and scaled genotype matrix.
Summary statistics computed using the centered and scaled genotype data, and the centered simulated response.
yty is computed using the centered simulated response.
The sample size (50,000).
The coefficients used to simulate the responses.
z-scores from a simple (single-SNP) linear regression.
A similar data set with more SNPs is used in the “Refine SuSiE model” vignette.
data(FinemappingConvergence)data(FinemappingConvergence)
This function evaluates the correlation between single effect CSs. It is not part of the SuSiE inference. Rather, it is designed as a diagnostic tool to assess how correlated the reported CS are.
get_cs_correlation(model, X = NULL, Xcorr = NULL, max = FALSE)get_cs_correlation(model, X = NULL, Xcorr = NULL, max = FALSE)
model |
A SuSiE fit, typically an output from
|
X |
n by p matrix of values of the p variables (covariates) in
n samples. When provided, correlation between variables will be
computed and used to remove CSs whose minimum correlation among
variables is smaller than |
Xcorr |
p by p matrix of correlations between variables
(covariates). When provided, it will be used to remove CSs whose
minimum correlation among variables is smaller than
|
max |
When |
A matrix of correlations between CSs, or the maximum
absolute correlation when max = TRUE.
Recover the parameters specifying the variational
approximation to the posterior distribution of the regression
coefficients. To streamline the model fitting implementation, and
to reduce memory requirements, mr.ash does not store
all the parameters needed to specify the approximate posterior.
get.full.posterior(fit)get.full.posterior(fit)
fit |
A Mr.ASH fit obtained, for example, by running
|
A list object with the following elements:
phi |
An p x K matrix containing the posterior assignment
probabilities, where p is the number of predictors, and K is the
number of mixture components. (Each row of |
m |
An p x K matrix containing the posterior means conditional on assignment to each mixture component. |
s2 |
An p x K matrix containing the posterior variances conditional on assignment to each mixture component. |
## generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ## fit mr.ash fit.mr.ash = mr.ash(X, y) ## recover full posterior full.post = get.full.posterior(fit.mr.ash)## generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ## fit mr.ash fit.mr.ash = mr.ash(X, y) ## recover full posterior full.post = get.full.posterior(fit.mr.ash)
Under the null, the rss model with regularized LD
matrix is . We use a mixture of
normals to model the conditional distribution of z_j given other z
scores, ,
,
is a grid of fixed positive numbers. We estimate the mixture
weights We detect the possible allele switch issue
using likelihood ratio for each variant.
kriging_rss( z, R, n, r_tol = 1e-08, s = estimate_s_rss(z, R, n, r_tol, method = "null-mle") )kriging_rss( z, R, n, r_tol = 1e-08, s = estimate_s_rss(z, R, n, r_tol, method = "null-mle") )
z |
A p-vector of z scores. |
R |
A p by p symmetric, positive semidefinite correlation matrix. |
n |
The sample size. (Optional, but highly recommended.) |
r_tol |
Tolerance level for eigenvalue check of positive semidefinite matrix of R. |
s |
an estimated s from |
a list containing a ggplot2 plot object and a table. The plot compares observed z score vs the expected value. The possible allele switched variants are labeled as red points (log LR > 2 and abs(z) > 2). The table summarizes the conditional distribution for each variant and the likelihood ratio test. The table has the following columns: the observed z scores, the conditional expectation, the conditional variance, the standardized differences between the observed z score and expected value, the log likelihood ratio statistics.
# See also the vignette, "Diagnostic for fine-mapping with summary # statistics." set.seed(1) n <- 500 p <- 1000 beta <- rep(0, p) beta[1:4] <- 0.01 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) ss <- univariate_regression(X, y) R <- cor(X) attr(R, "eigen") <- eigen(R, symmetric = TRUE) zhat <- with(ss, betahat / sebetahat) cond_dist <- kriging_rss(zhat, R, n = n) cond_dist$plot# See also the vignette, "Diagnostic for fine-mapping with summary # statistics." set.seed(1) n <- 500 p <- 1000 beta <- rep(0, p) beta[1:4] <- 0.01 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) ss <- univariate_regression(X, y) R <- cor(X) attr(R, "eigen") <- eigen(R, symmetric = TRUE) zhat <- with(ss, betahat / sebetahat) cond_dist <- kriging_rss(zhat, R, n = n) cond_dist$plot
Model fitting algorithms for Multiple Regression with Adaptive Shrinkage ("Mr.ASH"). Mr.ASH is a variational empirical Bayes (VEB) method for multiple linear regression. The fitting algorithms (locally) maximize the approximate marginal likelihood (the "evidence lower bound", or ELBO) via coordinate-wise updates.
mr.ash( X, y, Z = NULL, sa2 = NULL, method_q = c("sigma_dep_q", "sigma_indep_q"), method_g = c("caisa", "accelerate", "block"), max.iter = 1000, min.iter = 1, beta.init = NULL, update.pi = TRUE, pi = NULL, update.sigma2 = TRUE, sigma2 = NULL, update.order = NULL, standardize = FALSE, intercept = TRUE, tol = set_default_tolerance(), verbose = TRUE )mr.ash( X, y, Z = NULL, sa2 = NULL, method_q = c("sigma_dep_q", "sigma_indep_q"), method_g = c("caisa", "accelerate", "block"), max.iter = 1000, min.iter = 1, beta.init = NULL, update.pi = TRUE, pi = NULL, update.sigma2 = TRUE, sigma2 = NULL, update.order = NULL, standardize = FALSE, intercept = TRUE, tol = set_default_tolerance(), verbose = TRUE )
X |
The input matrix, of dimension (n,p); each column is a single predictor; and each row is an observation vector. Here, n is the number of samples and p is the number of predictors. The matrix cannot be sparse. |
y |
The observed continuously-valued responses, a vector of length p. |
Z |
The covariate matrix, of dimension (n,k), where k is the
number of covariates. This feature is not yet implemented;
|
sa2 |
The vector of prior mixture component variances. The
variances should be in increasing order, starting at zero; that is,
|
method_q |
The algorithm used to update the variational
approximation to the posterior distribution of the regression
coefficients, |
method_g |
|
max.iter |
The maximum number of outer loop iterations allowed. |
min.iter |
The minimum number of outer loop iterations allowed. |
beta.init |
The initial estimate of the (approximate)
posterior mean regression coefficients. This should be |
update.pi |
If |
pi |
The initial estimate of the mixture proportions
|
, where
K = length(sa2).
update.sigma2 |
If |
sigma2 |
The initial estimate of the residual variance,
|
update.order |
The order in which the co-ordinate ascent
updates for estimating the posterior mean coefficients are
performed. |
standardize |
The logical flag for standardization of the columns of X variable, prior to the model fitting. The coefficients are always returned on the original scale. |
intercept |
When |
tol |
Additional settings controlling behaviour of the model
fitting algorithm. |
verbose |
If |
Mr.ASH is a statistical inference method for the following multiple linear regression model:
in which the regression coefficients
admit a mixture-of-normals prior,
Each mixture
component in the prior, , is a normal density centered at
zero, with variance .
The fitting algorithm, it run for a large enough number of
iterations, will find an approximate posterior for the regression
coefficients, denoted by , residual variance
parameter , and prior mixture weights maximizing the evidence lower bound,
where denotes the
Kullback-Leibler (KL) divergence, a measure of the "distance"
between (approximate) posterior and prior
. The fitting algorithm iteratively updates the
approximate posteriors , separately for each
(in an order determined by
update.order), then separately updates the mixture weights
and and residual variance . This
coordinate-wise update scheme iterates until the convergence
criterion is met, or until the algorithm hits an upper bound on
the number of iterations (specified by max.iter). Coordinate-wise
optimization algorithms for model fitting are implemented in C++ for
efficient handling of large-scale data
See ‘References’ for more details about the model and algorithm.
A list object with the following elements:
intercept |
The estimated intercept. |
beta |
A vector containing posterior mean estimates of the regression coefficients for all predictors. |
sigma2 |
The estimated residual variance. |
pi |
A vector of containing the estimated mixture proportions. |
iter |
The number of outer-loop iterations that were performed. |
update.order |
The ordering used for performing the
coordinate-wise updates. For |
varobj |
A vector of length |
data |
The preprocessed data (X, Z, y) provided as input to the model
fitting algorithm. |
Y. Kim (2020), Bayesian shrinkage methods for high dimensional regression. Ph.D. thesis, University of Chicago.
get.full.posterior, predict.mr.ash
### generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ### fit Mr.ASH fit.mr.ash = mr.ash(X,y, method_q = "sigma_indep_q") #' fit.mr.ash = mr.ash(X,y, method_q = "sigma_dep_q") ### prediction routine Xnew = matrix(rnorm(n*p),n,p) ynew = Xnew %*% beta + rnorm(n) ypred = predict(fit.mr.ash, Xnew) ### test error rmse = norm(ynew - ypred, '2') / sqrt(n) ### coefficients betahat = predict(fit.mr.ash, type = "coefficients") # this equals c(fit.mr.ash$intercept, fit.mr.ash$beta)### generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ### fit Mr.ASH fit.mr.ash = mr.ash(X,y, method_q = "sigma_indep_q") #' fit.mr.ash = mr.ash(X,y, method_q = "sigma_dep_q") ### prediction routine Xnew = matrix(rnorm(n*p),n,p) ynew = Xnew %*% beta + rnorm(n) ypred = predict(fit.mr.ash, Xnew) ### test error rmse = norm(ynew - ypred, '2') / sqrt(n) ### coefficients betahat = predict(fit.mr.ash, type = "coefficients") # this equals c(fit.mr.ash$intercept, fit.mr.ash$beta)
This function performs Bayesian multiple regression with a mixture-of-normals prior using summary statistics (RSS: Regression with Summary Statistics). It uses a C++ implementation for efficient computation.
mr.ash.rss( bhat, shat, R, var_y, n, s0, w0, sigma2_e = NULL, mu1_init = numeric(0), tol = 1e-08, max_iter = 1e+05, z = numeric(0), update_w0 = TRUE, update_sigma = TRUE, compute_ELBO = TRUE, standardize = FALSE )mr.ash.rss( bhat, shat, R, var_y, n, s0, w0, sigma2_e = NULL, mu1_init = numeric(0), tol = 1e-08, max_iter = 1e+05, z = numeric(0), update_w0 = TRUE, update_sigma = TRUE, compute_ELBO = TRUE, standardize = FALSE )
bhat |
Numeric vector of observed effect sizes (standardized). |
shat |
Numeric vector of standard errors of effect sizes. |
R |
Numeric matrix of the correlation matrix. |
var_y |
Numeric value of the variance of the outcome. If NULL, it is set to Inf (effects on standardized scale). |
n |
Integer value of the sample size. |
s0 |
Numeric vector of prior variances for the mixture components. |
w0 |
Numeric vector of prior weights for the mixture components. |
sigma2_e |
Numeric value of the initial error variance estimate.
If |
mu1_init |
Numeric vector of initial values for the posterior mean of
the coefficients. Default is |
tol |
Numeric value of the convergence tolerance. Default is 1e-8. |
max_iter |
Integer value of the maximum number of iterations. Default is 1e5. |
z |
Numeric vector of Z-scores. If not provided, computed as
|
update_w0 |
Logical value indicating whether to update the mixture weights. Default is TRUE. |
update_sigma |
Logical value indicating whether to update the error variance. Default is TRUE. |
compute_ELBO |
Logical value indicating whether to compute the Evidence Lower Bound (ELBO). Default is TRUE. |
standardize |
Logical value indicating whether to standardize the input data. Default is FALSE. |
A list containing the following components:
Numeric vector of posterior mean coefficients (same as mu1).
Numeric value of the residual variance (same as sigma2_e).
Numeric vector of mixture weights (same as w0).
Integer, number of iterations performed.
Numeric vector of ELBO values per iteration.
Numeric vector of the posterior mean of the coefficients.
Numeric vector of the posterior variance of the coefficients.
Numeric matrix of the posterior assignment probabilities.
Numeric value of the error variance.
Numeric vector of the mixture weights.
Numeric value of the Evidence Lower Bound (if compute_ELBO = TRUE).
# Generate example data set.seed(985115) n <- 350 p <- 16 sigmasq_error <- 0.5 zeroes <- rbinom(p, 1, 0.6) beta.true <- rnorm(p, 1, sd = 4) beta.true[zeroes] <- 0 X <- cbind(matrix(rnorm(n * p), nrow = n)) X <- scale(X, center = TRUE, scale = FALSE) y <- X %*% matrix(beta.true, ncol = 1) + rnorm(n, 0, sqrt(sigmasq_error)) y <- scale(y, center = TRUE, scale = FALSE) # Set the prior K <- 9 sigma0 <- c(0.001, .1, .5, 1, 5, 10, 20, 30, .005) omega0 <- rep(1 / K, K) # Calculate summary statistics b.hat <- sapply(1:p, function(j) { summary(lm(y ~ X[, j]))$coefficients[-1, 1] }) s.hat <- sapply(1:p, function(j) { summary(lm(y ~ X[, j]))$coefficients[-1, 2] }) R.hat <- cor(X) var_y <- var(y) sigmasq_init <- 1.5 # Run mr.ash.rss out <- mr.ash.rss(b.hat, s.hat, R = R.hat, var_y = var_y, n = n, sigma2_e = sigmasq_init, s0 = sigma0, w0 = omega0, mu1_init = rep(0, ncol(X)), tol = 1e-8, max_iter = 1e5, update_w0 = TRUE, update_sigma = TRUE, compute_ELBO = TRUE, standardize = FALSE )# Generate example data set.seed(985115) n <- 350 p <- 16 sigmasq_error <- 0.5 zeroes <- rbinom(p, 1, 0.6) beta.true <- rnorm(p, 1, sd = 4) beta.true[zeroes] <- 0 X <- cbind(matrix(rnorm(n * p), nrow = n)) X <- scale(X, center = TRUE, scale = FALSE) y <- X %*% matrix(beta.true, ncol = 1) + rnorm(n, 0, sqrt(sigmasq_error)) y <- scale(y, center = TRUE, scale = FALSE) # Set the prior K <- 9 sigma0 <- c(0.001, .1, .5, 1, 5, 10, 20, 30, .005) omega0 <- rep(1 / K, K) # Calculate summary statistics b.hat <- sapply(1:p, function(j) { summary(lm(y ~ X[, j]))$coefficients[-1, 1] }) s.hat <- sapply(1:p, function(j) { summary(lm(y ~ X[, j]))$coefficients[-1, 2] }) R.hat <- cor(X) var_y <- var(y) sigmasq_init <- 1.5 # Run mr.ash.rss out <- mr.ash.rss(b.hat, s.hat, R = R.hat, var_y = var_y, n = n, sigma2_e = sigmasq_init, s0 = sigma0, w0 = omega0, mu1_init = rep(0, ncol(X)), tol = 1e-8, max_iter = 1e5, update_w0 = TRUE, update_sigma = TRUE, compute_ELBO = TRUE, standardize = FALSE )
This data set contains a genotype matrix for 574 individuals and 1,002 variables. The variables are genotypes after centering and scaling, and therefore retain the correlation structure of the original genotype data. Two of the variables have non-zero effects on the multivariate response. The response data are generated under a multivariate linear regression model. See Wang et al (2020) for details.
N2finemapping is a list with the following elements:
Centered and scaled genotype data.
Chromomsome of the original data, in hg38 coordinates.
Chromomosomal position of the original data, in hg38
coordinates. The information can be used to compare impact of using
other genotype references of the same variables in susie_rss
application.
Simulated effect sizes.
Simulated residual covariance matrix.
Simulated multivariate response.
Allele frequencies based on the original genotype data.
Suggested prior covariance matrix for effect sizes of the two non-zero effect variables.
G. Wang, A. Sarkar, P. Carbonetto and M. Stephens (2020). A simple new approach to variable selection in regression, with application to genetic fine-mapping. Journal of the Royal Statistical Society, Series B doi:10.1101/501114.
data(N2finemapping)data(N2finemapping)
The data-set contains a matrix of 574 individuals and 1,001 variables. These variables are real-world genotypes centered and scaled, and therefore retains the correlation structure of variables in the original genotype data. 3 out of the variables have non-zero effects. The response data is generated under a multivariate linear regression model. See Wang et al (2020) for more details.
N3finemapping is a list with the following elements:
N by P variable matrix of centered and scaled genotype data.
Chromomsome of the original data, in hg38 coordinate.
Chromomosomal positoin of the original data, in hg38 coordinate. The information can be used to compare impact of using other genotype references of the same variables in susie_rss application.
The simulated effect sizes.
The simulated residual covariance matrix.
The simulated response variables.
Allele frequency of the original genotype data.
Prior covariance matrix for effect size of the three non-zero effect variables.
G. Wang, A. Sarkar, P. Carbonetto and M. Stephens (2020). A simple new approach to variable selection in regression, with application to genetic fine-mapping. Journal of the Royal Statistical Society, Series B doi:10.1101/501114.
data(N3finemapping)data(N3finemapping)
This function determines an ordering of the predictors based on the regularization path of the penalized regression; in particular, the predictors are ordered based on the order in which the coefficients are included in the model as the penalty strength decreases.
path.order(fit)path.order(fit)
fit |
A fit object whose |
An ordering of the predictors.
### generate synthetic data set.seed(1) n = 200 p = 30 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ### build a minimal example 'fit' object with the same structure as a ### fit from a penalized regression: a coefficient matrix with the ### intercept in row 1 and one column per (decreasing) penalty value. beta_path = matrix(0, p + 1, p) for (k in 1:p) beta_path[k + 1, k:p] = 1 fit = list(coefficients = beta_path) order = path.order(fit)### generate synthetic data set.seed(1) n = 200 p = 30 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ### build a minimal example 'fit' object with the same structure as a ### fit from a penalized regression: a coefficient matrix with the ### intercept in row 1 and one column per (decreasing) penalty value. beta_path = matrix(0, p + 1, p) for (k in 1:p) beta_path[k + 1, k:p] = 1 fit = list(coefficients = beta_path) order = path.order(fit)
This function predicts outcomes (y) given the observed variables (X) and a Mr.ASH model; alternatively, retrieve the estimates of the regression coefficients.
## S3 method for class 'mr.ash' predict(object, newx = NULL, type = c("response", "coefficients"), ...)## S3 method for class 'mr.ash' predict(object, newx = NULL, type = c("response", "coefficients"), ...)
object |
A mr_ash fit, usually the result of calling
|
newx |
The input matrix, of dimension (n,p); each column is a
single predictor; and each row is an observation vector. Here, n is
the number of samples and p is the number of predictors. When
|
type |
The type of output. For |
... |
Additional arguments passed to the default S3 method. |
For type = "response", predicted or fitted outcomes
are returned; for type = "coefficients", the estimated
coefficients are returned.
## generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ## fit mr.ash fit.mr.ash = mr.ash(X, y) ## predict Xnew = matrix(rnorm(n*p),n,p) ypred = predict(fit.mr.ash, Xnew)## generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) ## fit mr.ash fit.mr.ash = mr.ash(X, y) ## predict Xnew = matrix(rnorm(n*p),n,p) ypred = predict(fit.mr.ash, Xnew)
Predict outcomes or extract coefficients from susie fit.
## S3 method for class 'susie' predict(object, newx = NULL, type = c("response", "coefficients"), ...)## S3 method for class 'susie' predict(object, newx = NULL, type = c("response", "coefficients"), ...)
object |
A susie fit. |
newx |
A new value for X at which to do predictions. |
type |
The type of output. For |
... |
Other arguments used by generic predict function. These extra arguments are not used here. |
For type = "response", predicted or fitted outcomes
are returned; for type = "coefficients", the estimated
coefficients are returned. If the susie fit has intercept =
NA (which is common when using susie_ss) then
predictions are computed using an intercept of 0, and a warning is
emitted.
Pretty-prints the tidy tables built by [summary.susie_post_outcome_configuration()] with optional ANSI color highlighting. See that page for the color encoding.
## S3 method for class 'summary.susie_post_outcome_configuration' print(x, ...)## S3 method for class 'summary.susie_post_outcome_configuration' print(x, ...)
x |
Output of [summary.susie_post_outcome_configuration()]. |
... |
Ignored. |
The input invisibly.
[summary.susie_post_outcome_configuration()]
Summary statistics from one real-data gene region, together with a reference correlation matrix computed from a real-world 1000 Genomes European reference panel with roughly 500 samples. The data object is name-free and contains a 500-variant subset used to illustrate the 'R_finite' and 'R_mismatch' options in 'susie_rss'.
rss_mismatch_example is a list with the following
elements:
z-scores.
Estimated marginal regression coefficients.
Standard errors of bhat.
Reference sample-size estimate used by susie_rss.
Reference correlation matrix.
The “Diagnosing and modeling R-reference mismatch in SuSiE-RSS” vignette.
data(rss_mismatch_example)data(rss_mismatch_example)
Construct a prior specification for the slot activity model, which regularizes the number of active single effects in SuSiE. Two prior families are available: Beta-Binomial (default, recommended for single-locus) and Gamma-Poisson (recommended for genome-wide applications via susieAnn).
slot_prior_betabinom( a_beta = NULL, b_beta = NULL, c_hat_init = NULL, skip_threshold_multiplier = 0 ) slot_prior_poisson( C, nu = NULL, update_schedule = c("sequential", "batch"), c_hat_init = NULL, skip_threshold_multiplier = 0 )slot_prior_betabinom( a_beta = NULL, b_beta = NULL, c_hat_init = NULL, skip_threshold_multiplier = 0 ) slot_prior_poisson( C, nu = NULL, update_schedule = c("sequential", "batch"), c_hat_init = NULL, skip_threshold_multiplier = 0 )
a_beta |
Shape parameter for the Beta prior on inclusion probability rho. Default 1. |
b_beta |
Shape parameter for the Beta prior on inclusion
probability rho. Default 2, giving a moderate sparsity preference
with |
c_hat_init |
Optional numeric L-vector of initial slot activity probabilities for warm-starting. If NULL, initialized at the prior mean. |
skip_threshold_multiplier |
Multiplier for the adaptive skip threshold. Slots with c_hat below this fraction of the baseline (prior with zero signal) are skipped. Default 0 (no skipping). The threshold is recomputed after each IBSS iteration from the current model state, and is set to 0 on the first iteration so all slots are evaluated at least once. |
C |
Expected number of causal variants for the Gamma-Poisson prior
on the per-block causal rate. Must be positive. Not used by
|
nu |
Overdispersion parameter for the Gamma-Poisson prior on the
per-block causal rate. Not used by |
update_schedule |
How the Gamma shape parameter is updated
during IBSS iterations (Gamma-Poisson only; ignored for
Beta-Binomial which is inherently sequential).
|
Two prior types are available:
slot_prior_betabinomUses a Beta-Binomial model for slot inclusion. The inclusion probability rho is given a Beta(a_beta, b_beta) prior and integrated out analytically, yielding an adaptive multiplicity correction that penalizes less when more slots are active. This is the recommended default for single-locus applications. See Scott and Berger (2010) for the theoretical justification.
slot_prior_poissonUses the Gamma-Poisson model with Poisson approximation for slot indicators. Recommended for genome-wide applications via susieAnn, where C and nu are estimated across loci.
A list of class "slot_prior" with the appropriate
subclass.
Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Annals of Statistics, 38(5), 2587–2619.
# Default: Beta-Binomial with Beta(1, 2) prior on inclusion probability slot_prior_betabinom() # Gamma-Poisson for susieAnn slot_prior_poisson(C = 5, nu = 8) # Pass to susie # fit <- susie(X, y, slot_prior = slot_prior_betabinom())# Default: Beta-Binomial with Beta(1, 2) prior on inclusion probability slot_prior_betabinom() # Gamma-Poisson for susieAnn slot_prior_poisson(C = 5, nu = 8) # Pass to susie # fit <- susie(X, y, slot_prior = slot_prior_betabinom())
summary method for the “susie” class.
## S3 method for class 'susie' summary(object, ...) ## S3 method for class 'summary.susie' print(x, ...)## S3 method for class 'susie' summary(object, ...) ## S3 method for class 'summary.susie' print(x, ...)
object |
A susie fit. |
... |
Additional arguments passed to the generic |
x |
A susie summary. |
summary.susie returns a list containing a data frame
of variables and a data frame of credible sets.
Builds tidy tables from the nested list returned by [susie_post_outcome_configuration()] and prints them with ANSI color highlighting via [print.summary.susie_post_outcome_configuration()]. The summary itself is an S3 object: index '$susiex' and '$coloc_pairwise' to grab the data.frames for downstream use.
## S3 method for class 'susie_post_outcome_configuration' summary( object, prob_thresh = 0.8, ambiguous_lower = 0.5, signal_only = TRUE, color = "auto", ... )## S3 method for class 'susie_post_outcome_configuration' summary( object, prob_thresh = 0.8, ambiguous_lower = 0.5, signal_only = TRUE, color = "auto", ... )
object |
Output of [susie_post_outcome_configuration()]. |
prob_thresh |
Threshold above which 'marginal_prob' counts as a signal (default '0.8'). |
ambiguous_lower |
Lower edge of the "ambiguous" band for the SuSiEx color coding: marginals in '[ambiguous_lower, prob_thresh)' are colored yellow. Default '0.5'. Set to 'prob_thresh' to disable the band. |
signal_only |
Logical. If 'TRUE' (default), drop CS tuples where no trait is active and drop coloc rows whose dominant hypothesis is H0. Pass 'FALSE' to keep everything. |
color |
One of '"auto"' (default; honors [crayon::has_color()]), 'TRUE' (force colors on), or 'FALSE' (force them off). |
... |
Ignored. |
Color encoding (when ANSI colors are available):
SuSiEx per-trait marginal probability: bold dark green when '>= prob_thresh' (active), yellow when in '[ambiguous_lower, prob_thresh)', dim otherwise. The 'active' logical from the raw result is encoded by color and is not shown as a separate column.
Coloc verdict: bold dark green for H4 (shared causal), magenta for H3 (distinct causals), blue for H1 or H2 (single-trait causal), dim for H0 (no signal). The dominant PP per row is bolded.
Robustness: this method is defensive against malformed input. Empty lists, NULL components, missing fields, length-mismatched per-trait vectors, trait names that collide with reserved columns ('tuple', 'top_pattern', 'top_prob'), and coloc data.frames that lack some optional columns ('hit1', 'hit2') all degrade gracefully rather than erroring.
A list of class '"summary.susie_post_outcome_configuration"' with components:
'data.frame' (or 'NULL' when no signals): one row per CS tuple. Columns: 'tuple' (e.g. '"(1,1,1)"'), one numeric column per trait carrying that trait's 'marginal_prob', 'top_pattern' (binary configuration string for the most-probable configuration), 'top_prob' (its probability).
'data.frame' (or 'NULL'): the original coloc table extended with 'verdict' (named hypothesis label) and 'top_pp' (the dominant PP value).
row counts before and after 'signal_only' filtering, used by the print method to footer hidden rows.
parameters echoed for the print method.
[susie_post_outcome_configuration()], [print.summary.susie_post_outcome_configuration()]
Data simulated using real genotypes from 10,000 individuals and 200 SNPs. One SNP have non-zero effect on the multivariate response. The response data are generated under a linear regression model. There is also one SNP with flipped allele between summary statistics and the reference panel.
SummaryConsistency is a list with the following
elements:
z-scores computed by fitting univariate simple regression variable-by-variable.
LD matrix estimated from the reference panel.
The index of the SNP with the flipped allele.
The index of the SNP with the non-zero effect.
A similar data set with more samples is used in the “Diagnostic for fine-mapping with summary statistics” vignette.
data(SummaryConsistency)data(SummaryConsistency)
Performs a sparse Bayesian multiple linear regression
of y on X, using the "Sum of Single Effects" model from Wang et al
(2020). In brief, this function fits the regression model , where elements of are i.i.d. normal
with zero mean and variance residual_variance, is
an intercept term and is a vector of length p representing
the effects to be estimated. The “susie assumption” is that
where each is a vector of
length p with exactly one non-zero element. The prior on the
non-zero element is normal with zero mean and variance var(y)
* scaled_prior_variance. The value of L is fixed, and
should be chosen to provide a reasonable upper bound on the number
of non-zero effects to be detected. Typically, the hyperparameters
residual_variance and scaled_prior_variance will be
estimated during model fitting, although they can also be fixed as
specified by the user. See functions susie_get_cs and
other functions of form susie_get_* to extract the most
commonly-used results from a susie fit.
#' @details The function susie implements the IBSS algorithm
from Wang et al (2020). The option refine = TRUE implements
an additional step to help reduce problems caused by convergence of
the IBSS algorithm to poor local optima (which is rare in our
experience, but can provide misleading results when it occurs). The
refinement step incurs additional computational expense that
increases with the number of CSs found in the initial run.
The function susie_ss implements essentially the same
algorithms, but using sufficient statistics. (The statistics are
sufficient for the regression coefficients , but not for the
intercept ; see below for how the intercept is treated.)
If the sufficient statistics are computed correctly then the
results from susie_ss should be the same as (or very
similar to) susie, although runtimes will differ as
discussed below. The sufficient statistics are the sample
size n, and then the p by p matrix , the p-vector
, and the sum of squared y values , all computed
after centering the columns of and the vector to
have mean 0; these can be computed using compute_suff_stat.
The handling of the intercept term in susie_ss needs
some additional explanation. Computing the summary data after
centering X and y effectively ensures that the
resulting posterior quantities for allow for an intercept
in the model; however, the actual value of the intercept cannot be
estimated from these centered data. To estimate the intercept term
the user must also provide the column means of and the mean
of (X_colmeans and y_mean). If these are not
provided, they are treated as NA, which results in the
intercept being NA. If for some reason you prefer to have
the intercept be 0 instead of NA then set
X_colmeans = 0,y_mean = 0.
For completeness, we note that if susie_ss is run on
computed without centering and
, and with X_colmeans = 0,y_mean = 0, this is
equivalent to susie applied to with
intercept = FALSE (although results may differ due to
different initializations of residual_variance and
scaled_prior_variance). However, this usage is not
recommended for for most situations.
The computational complexity of susie is per
iteration, whereas susie_ss is per
iteration (not including the cost of computing the sufficient
statistics, which is dominated by the cost of
computing ). Because of the cost of computing ,
susie will usually be faster. However, if ,
and/or if is already computed, then
susie_ss may be faster.
susie( X, y, L = min(10, ncol(X)), scaled_prior_variance = 0.2, residual_variance = NULL, prior_weights = NULL, null_weight = 0, standardize = TRUE, intercept = TRUE, estimate_residual_variance = TRUE, estimate_residual_method = c("MoM", "MLE", "NIG"), estimate_prior_variance = TRUE, estimate_prior_method = c("optim", "EM", "simple"), prior_variance_grid = NULL, mixture_weights = NULL, unmappable_effects = c("none", "inf", "ash", "ash_filter_archived"), check_null_threshold = 0, prior_tol = 1e-09, residual_variance_upperbound = Inf, model_init = NULL, s_init = NULL, coverage = 0.95, min_abs_corr = 0.5, compute_univariate_zscore = FALSE, na.rm = FALSE, max_iter = 100, L_greedy = NULL, greedy_lbf_cutoff = 0.1, tol = NULL, convergence_method = c("elbo", "pip"), verbose = FALSE, track_fit = FALSE, residual_variance_lowerbound = NULL, refine = FALSE, n_purity = 100, alpha0 = NULL, beta0 = NULL, init_only = FALSE, slot_prior = NULL )susie( X, y, L = min(10, ncol(X)), scaled_prior_variance = 0.2, residual_variance = NULL, prior_weights = NULL, null_weight = 0, standardize = TRUE, intercept = TRUE, estimate_residual_variance = TRUE, estimate_residual_method = c("MoM", "MLE", "NIG"), estimate_prior_variance = TRUE, estimate_prior_method = c("optim", "EM", "simple"), prior_variance_grid = NULL, mixture_weights = NULL, unmappable_effects = c("none", "inf", "ash", "ash_filter_archived"), check_null_threshold = 0, prior_tol = 1e-09, residual_variance_upperbound = Inf, model_init = NULL, s_init = NULL, coverage = 0.95, min_abs_corr = 0.5, compute_univariate_zscore = FALSE, na.rm = FALSE, max_iter = 100, L_greedy = NULL, greedy_lbf_cutoff = 0.1, tol = NULL, convergence_method = c("elbo", "pip"), verbose = FALSE, track_fit = FALSE, residual_variance_lowerbound = NULL, refine = FALSE, n_purity = 100, alpha0 = NULL, beta0 = NULL, init_only = FALSE, slot_prior = NULL )
X |
An n by p matrix of covariates. |
y |
The observed responses, a vector of length n. |
L |
Maximum number of non-zero effects in the model. If L is larger than the number of covariates, p, L is set to p. |
scaled_prior_variance |
The prior variance, divided by
|
residual_variance |
Variance of the residual. If
|
prior_weights |
A vector of length p, in which each entry
gives the prior probability that corresponding column of X has a
nonzero effect on the outcome, y. The weights are internally
normalized to sum to 1. When |
null_weight |
Prior probability of no effect (a number between 0 and 1, and cannot be exactly 1). |
standardize |
If |
intercept |
If |
estimate_residual_variance |
If
|
estimate_residual_method |
The method used for estimating residual variance. For the original SuSiE model, "MLE" and "MoM" estimation is equivalent, but for the infinitesimal model, "MoM" is more stable. We recommend using "NIG" when n < 80 for improved coverage, although it is currently only implemented for individual-level data. |
estimate_prior_variance |
If |
estimate_prior_method |
The method used for estimating prior
variance. When |
prior_variance_grid |
Numeric vector of K prior variances defining
a mixture-of-normals prior on effect sizes. When provided, the SER
evaluates Bayes factors at each grid point and forms a mixture BF
weighted by |
mixture_weights |
Numeric vector of K non-negative weights summing
to 1, giving the mixture proportions for the variance grid. Default is
|
unmappable_effects |
The method for modeling unmappable effects: "none", "inf", "ash". |
check_null_threshold |
When the prior variance is estimated,
compare the estimate with the null, and set the prior variance to
zero unless the log-likelihood using the estimate is larger by this
threshold amount. For example, if you set
|
prior_tol |
When the prior variance is estimated, compare the
estimated value to |
residual_variance_upperbound |
Upper limit on the estimated
residual variance. It is only relevant when
|
model_init |
A previous susie fit with which to initialize. |
s_init |
Deprecated alias for |
coverage |
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets. |
min_abs_corr |
Minimum absolute correlation allowed in a
credible set. The default, 0.5, corresponds to a squared
correlation of 0.25, which is a commonly used threshold for
genotype data in genetic studies. This "purity" filter is
applied to the CSs reported in the fit object, so the CS list
returned here may be a subset of the one produced by calling
|
compute_univariate_zscore |
If |
na.rm |
Drop any missing values in y from both X and y. |
max_iter |
Maximum number of IBSS iterations to perform. For
|
L_greedy |
Integer or |
greedy_lbf_cutoff |
Numeric saturation threshold for the
|
tol |
A small, non-negative number specifying the convergence
tolerance for the IBSS fitting procedure. When |
convergence_method |
When |
verbose |
If |
track_fit |
If |
residual_variance_lowerbound |
Lower limit on the estimated
residual variance. It is only relevant when
|
refine |
If |
n_purity |
Passed as argument |
alpha0 |
Numerical parameter for the NIG prior when using
|
beta0 |
Numerical parameter for the NIG prior when using
|
init_only |
Logical. If |
slot_prior |
Optional slot activity prior created by
|
A "susie" object with some or all of the following elements:
alpha |
An L by p matrix of posterior inclusion probabilities. |
mu |
An L by p matrix of posterior means, conditional on inclusion. |
mu2 |
An L by p matrix of posterior second moments, conditional on inclusion. |
Xr |
A vector of length n, equal to |
lbf |
Log-Bayes Factor for each single effect. |
lbf_variable |
Log-Bayes Factor for each variable and single effect. |
intercept |
Intercept (fixed or estimated). |
sigma2 |
Residual variance (fixed or estimated). |
V |
Prior variance of the non-zero elements of b. |
elbo |
The variational lower bound (or ELBO) achieved at each iteration. |
fitted |
Vector of length n containing the fitted values. |
sets |
Credible sets estimated from model fit. |
pip |
A vector of length p giving the marginal posterior inclusion probabilities. |
z |
A vector of univariate z-scores. |
niter |
Number of IBSS iterations performed. |
converged |
|
theta |
If |
tau2 |
If |
susie_auto is an attempt to automate reliable
running of susie even on hard problems. It implements a three-stage
strategy for each L: first, fit susie with very small residual
error; next, estimate residual error; finally, estimate the prior
variance. If the last step estimates some prior variances to be
zero, stop. Otherwise, double L, and repeat. Initial runs are
performed with relaxed tolerance; the final run is performed using
the default susie tolerance.
susie_auto( X, y, L_init = 1, L_max = 512, verbose = FALSE, init_tol = 1, standardize = TRUE, intercept = TRUE, max_iter = 100, tol = 0.01, ... )susie_auto( X, y, L_init = 1, L_max = 512, verbose = FALSE, init_tol = 1, standardize = TRUE, intercept = TRUE, max_iter = 100, tol = 0.01, ... )
X |
An n by p matrix of covariates. |
y |
The observed responses, a vector of length n. |
L_init |
The initial value of L. |
L_max |
The largest value of L to consider. |
verbose |
If |
init_tol |
The tolerance to passed to |
standardize |
If |
intercept |
If |
max_iter |
Maximum number of IBSS iterations to perform. |
tol |
A small, non-negative number specifying the convergence
tolerance for the IBSS fitting procedure. The fitting procedure
will halt when the difference in the variational lower bound, or
“ELBO” (the objective function to be maximized), is
less than |
... |
Additional arguments passed to |
See susie for a description of return values.
set.seed(1) n = 1000 p = 1000 beta = rep(0,p) beta[1:4] = 1 X = matrix(rnorm(n*p),nrow = n,ncol = p) X = scale(X,center = TRUE,scale = TRUE) y = drop(X %*% beta + rnorm(n)) res = susie_auto(X,y) plot(beta,coef(res)[-1]) abline(a = 0,b = 1,col = "skyblue",lty = "dashed") plot(y,predict(res)) abline(a = 0,b = 1,col = "skyblue",lty = "dashed")set.seed(1) n = 1000 p = 1000 beta = rep(0,p) beta[1:4] = 1 X = matrix(rnorm(n*p),nrow = n,ncol = p) X = scale(X,center = TRUE,scale = TRUE) y = drop(X %*% beta + rnorm(n)) res = susie_auto(X,y) plot(beta,coef(res)[-1]) abline(a = 0,b = 1,col = "skyblue",lty = "dashed") plot(y,predict(res)) abline(a = 0,b = 1,col = "skyblue",lty = "dashed")
These functions access basic properties or draw inferences from a fitted susie model.
susie_get_cs_attainable returns credible sets after
a post-hoc filter based on attainable coverage and per-effect
entropy. Use this as a fallback when no LD reference is available
for the standard purity filter in susie_get_cs.
susie_get_objective(res, last_only = TRUE, warning_tol = 1e-06) susie_get_posterior_mean(res, prior_tol = 1e-09) susie_get_posterior_sd(res, prior_tol = 1e-09) susie_get_niter(res) susie_get_prior_variance(res) susie_get_residual_variance(res) susie_get_lfsr(res) susie_get_posterior_samples(susie_fit, num_samples) susie_get_cs( res, X = NULL, Xcorr = NULL, coverage = 0.95, min_abs_corr = 0.5, dedup = TRUE, squared = FALSE, check_symmetric = TRUE, n_purity = 100, use_rfast = NULL, ld_extend_threshold = 0.99 ) susie_get_cs_attainable(res, coverage = 0.95, ethres = NULL, ...) susie_get_pip(res, prune_by_cs = FALSE, prior_tol = 1e-09)susie_get_objective(res, last_only = TRUE, warning_tol = 1e-06) susie_get_posterior_mean(res, prior_tol = 1e-09) susie_get_posterior_sd(res, prior_tol = 1e-09) susie_get_niter(res) susie_get_prior_variance(res) susie_get_residual_variance(res) susie_get_lfsr(res) susie_get_posterior_samples(susie_fit, num_samples) susie_get_cs( res, X = NULL, Xcorr = NULL, coverage = 0.95, min_abs_corr = 0.5, dedup = TRUE, squared = FALSE, check_symmetric = TRUE, n_purity = 100, use_rfast = NULL, ld_extend_threshold = 0.99 ) susie_get_cs_attainable(res, coverage = 0.95, ethres = NULL, ...) susie_get_pip(res, prune_by_cs = FALSE, prior_tol = 1e-09)
res |
A susie fit, typically an output from
|
last_only |
If |
warning_tol |
Warn if ELBO is decreasing by this tolerance level. |
prior_tol |
Filter out effects having estimated prior variance smaller than this threshold. |
susie_fit |
A susie fit, an output from |
num_samples |
The number of draws from the posterior distribution. |
X |
n by p matrix of values of the p variables (covariates) in
n samples. When provided, correlation between variables will be
computed and used to remove CSs whose minimum correlation among
variables is smaller than |
Xcorr |
p by p matrix of correlations between variables
(covariates). When provided, it will be used to remove CSs whose
minimum correlation among variables is smaller than
|
coverage |
A number between 0 and 1 specifying desired coverage of each CS. |
min_abs_corr |
A "purity" threshold for the CS. Any CS that
contains a pair of variables with correlation less than this
threshold will be filtered out and not reported. This filter is
only applied when |
dedup |
If |
squared |
If |
check_symmetric |
If |
n_purity |
The maximum number of credible set (CS) variables used in calculating the correlation (“purity”) statistics. When the number of variables included in the CS is greater than this number, the CS variables are randomly subsampled. |
use_rfast |
Use the Rfast package for the purity calculations.
By default |
ld_extend_threshold |
Threshold for extending CS by LD (default 0.99). Variants with |correlation| > threshold with any CS member are added. Set to NULL to disable LD extension. Requires Xcorr (would not work if only X is provided). |
ethres |
Entropy threshold expressed as an effective number of
variables: an effect is dropped when its attainable-projection
entropy is at least |
prune_by_cs |
Whether or not to ignore single effects not in a reported CS when calculating PIP. |
For each effect , the attainable coverage is
computed by zeroing every entry of res$alpha that is not the
column maximum, then summing across variables. Intuitively, it is
the coverage effect could attain if it did not have to
share probability mass with other effects. Effects with attainable
coverage at most coverage, or with attainable-projection
entropy at least log(ethres), are dropped before delegating
to susie_get_cs. Any X or Xcorr passed through
... is ignored, since the procedure is intended for the
no-LD setting.
susie_get_objective returns the evidence lower bound
(ELBO) achieved by the fitted susie model and, optionally, at each
iteration of the IBSS fitting procedure.
susie_get_residual_variance returns the (estimated or
fixed) residual variance parameter.
susie_get_prior_variance returns the (estimated or fixed)
prior variance parameters.
susie_get_posterior_mean returns the posterior mean for the
regression coefficients of the fitted susie model.
susie_get_posterior_sd returns the posterior standard
deviation for coefficients of the fitted susie model.
susie_get_niter returns the number of model fitting
iterations performed.
susie_get_pip returns a vector containing the posterior
inclusion probabilities (PIPs) for all variables.
susie_get_lfsr returns a vector containing the average lfsr
across variables for each single-effect, weighted by the posterior
inclusion probability (alpha).
susie_get_posterior_samples returns a list containing the
effect sizes samples and causal status with two components: b,
an num_variables x num_samples matrix of effect
sizes; gamma, an num_variables x num_samples
matrix of causal status random draws.
susie_get_cs returns credible sets (CSs) from a susie fit,
as well as summaries of correlation among the variables included in
each CS. If desired, one can filter out CSs that do not meet a
specified “purity” threshold; to do this, either X or
Xcorr must be supplied. It returns a list with the following
elements:
cs |
A list in which each list element is a vector containing the indices of the variables in the CS. |
coverage |
The nominal coverage specified for each CS. |
purity |
If |
cs_index |
If |
set.seed(1) n <- 1000 p <- 1000 beta <- rep(0, p) beta[1:4] <- 1 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) s <- susie(X, y, L = 10) susie_get_objective(s) susie_get_objective(s, last_only = FALSE) susie_get_residual_variance(s) susie_get_prior_variance(s) susie_get_posterior_mean(s) susie_get_posterior_sd(s) susie_get_niter(s) susie_get_pip(s) susie_get_lfsr(s)set.seed(1) n <- 1000 p <- 1000 beta <- rep(0, p) beta[1:4] <- 1 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) s <- susie(X, y, L = 10) susie_get_objective(s) susie_get_objective(s, last_only = FALSE) susie_get_residual_variance(s) susie_get_prior_variance(s) susie_get_posterior_mean(s) susie_get_posterior_sd(s) susie_get_niter(s) susie_get_pip(s) susie_get_lfsr(s)
Initialize a susie object using regression coefficients
susie_init_coef(coef_index, coef_value, p)susie_init_coef(coef_index, coef_value, p)
coef_index |
An L-vector containing the the indices of the nonzero coefficients. |
coef_value |
An L-vector containing initial coefficient estimates. |
p |
A scalar giving the number of variables. |
A list with elements alpha, mu and mu2
to be used by susie.
set.seed(1) n = 1000 p = 1000 beta = rep(0,p) beta[sample(1:1000,4)] = 1 X = matrix(rnorm(n*p),nrow = n,ncol = p) X = scale(X,center = TRUE,scale = TRUE) y = drop(X %*% beta + rnorm(n)) # Initialize susie to ground-truth coefficients. s = susie_init_coef(which(beta != 0),beta[beta != 0],length(beta)) res = susie(X,y,L = 10,model_init=s)set.seed(1) n = 1000 p = 1000 beta = rep(0,p) beta[sample(1:1000,4)] = 1 X = matrix(rnorm(n*p),nrow = n,ncol = p) X = scale(X,center = TRUE,scale = TRUE) y = drop(X %*% beta + rnorm(n)) # Initialize susie to ground-truth coefficients. s = susie_init_coef(which(beta != 0),beta[beta != 0],length(beta)) res = susie(X,y,L = 10,model_init=s)
susie_plot produces a per-variable summary of
the SuSiE credible sets. susie_plot_iteration produces a
diagnostic plot for the susie model fitting. For
susie_plot_iteration, several plots will be created if
track_fit = TRUE when calling susie.
susie_plot( model, y, add_bar = FALSE, pos = NULL, b = NULL, max_cs = 400, add_legend = NULL, ... ) susie_plot_iteration(model, L, file_prefix, pos = NULL)susie_plot( model, y, add_bar = FALSE, pos = NULL, b = NULL, max_cs = 400, add_legend = NULL, ... ) susie_plot_iteration(model, L, file_prefix, pos = NULL)
model |
A SuSiE fit, typically an output from
|
y |
A string indicating what to plot: either |
add_bar |
If |
pos |
Indices of variables to plot. If |
b |
For simulated data, set |
max_cs |
The largest credible set to display, either based on
purity (set |
add_legend |
If |
... |
Additional arguments passed to
|
L |
An integer specifying the number of single-effect rows to plot. |
file_prefix |
Prefix to path of output plot file. If not
specified, the plot, or plots, will be saved to a temporary
directory generated using |
Invisibly returns NULL.
set.seed(1) n <- 1000 p <- 1000 beta <- rep(0, p) beta[sample(1:1000, 4)] <- 1 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) res <- susie(X, y, L = 10) susie_plot(res, "PIP") susie_plot(res, "PIP", add_bar = TRUE) susie_plot(res, "PIP", add_legend = TRUE) susie_plot(res, "PIP", pos = 1:500, add_legend = TRUE) # Plot selected regions with adjusted x-axis position label res$genomic_position <- 1000 + (1:length(res$pip)) susie_plot(res, "PIP", add_legend = TRUE, pos = list(attr = "genomic_position", start = 1000, end = 1500) ) # True effects are shown in red. susie_plot(res, "PIP", b = beta, add_legend = TRUE) set.seed(1) n <- 1000 p <- 1000 beta <- rep(0, p) beta[sample(1:1000, 4)] <- 1 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) res <- susie(X, y, L = 10, track_fit = TRUE) susie_plot_iteration(res, L = 10)set.seed(1) n <- 1000 p <- 1000 beta <- rep(0, p) beta[sample(1:1000, 4)] <- 1 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) res <- susie(X, y, L = 10) susie_plot(res, "PIP") susie_plot(res, "PIP", add_bar = TRUE) susie_plot(res, "PIP", add_legend = TRUE) susie_plot(res, "PIP", pos = 1:500, add_legend = TRUE) # Plot selected regions with adjusted x-axis position label res$genomic_position <- 1000 + (1:length(res$pip)) susie_plot(res, "PIP", add_legend = TRUE, pos = list(attr = "genomic_position", start = 1000, end = 1500) ) # True effects are shown in red. susie_plot(res, "PIP", b = beta, add_legend = TRUE) set.seed(1) n <- 1000 p <- 1000 beta <- rep(0, p) beta[sample(1:1000, 4)] <- 1 X <- matrix(rnorm(n * p), nrow = n, ncol = p) X <- scale(X, center = TRUE, scale = TRUE) y <- drop(X %*% beta + rnorm(n)) res <- susie(X, y, L = 10, track_fit = TRUE) susie_plot_iteration(res, L = 10)
Plots original data, y, overlaid with line showing susie fitted value and shaded rectangles showing credible sets for changepoint locations.
susie_plot_changepoint( s, y, line_col = "blue", line_size = 1.5, cs_col = "red" )susie_plot_changepoint( s, y, line_col = "blue", line_size = 1.5, cs_col = "red" )
s |
A susie fit generated by
|
y |
An n-vector of observations that are ordered in time or space (assumed equally-spaced). |
line_col |
Color for the line showing fitted values. |
line_size |
Size of the lines showing fitted values |
cs_col |
Color of the shaded rectangles showing credible sets. |
A ggplot2 plot object.
set.seed(1) mu <- c(rep(0, 50), rep(1, 50), rep(3, 50), rep(-2, 50), rep(0, 300)) y <- mu + rnorm(500) # Here we use a less sensitive tolerance so that the example takes # less time; in practice you will likely want to use a more stringent # setting such as tol = 0.001. s <- susie_trendfilter(y, tol = 0.1) # Produces ggplot with credible sets for changepoints. susie_plot_changepoint(s, y)set.seed(1) mu <- c(rep(0, 50), rep(1, 50), rep(3, 50), rep(-2, 50), rep(0, 300)) y <- mu + rnorm(500) # Here we use a less sensitive tolerance so that the example takes # less time; in practice you will likely want to use a more stringent # setting such as tol = 0.001. s <- susie_trendfilter(y, tol = 0.1) # Produces ggplot with credible sets for changepoints. susie_plot_changepoint(s, y)
Runs one of two complementary post-hoc analyses, selected by
method: "susiex" (default) for the SuSiEx
combinatorial enumeration, reporting the posterior probability of
every binary causality pattern across the input traits; or
"coloc_pairwise" for the coloc pairwise ABF, reporting the
five colocalisation hypothesis posteriors (H0/H1/H2/H3/H4) for every
pair of traits. To get both, call the function twice and combine.
susie_post_outcome_configuration( input, by = c("fit", "outcome"), method = c("susiex", "coloc_pairwise"), prob_thresh = 0.8, cs_only = TRUE, p1 = 1e-04, p2 = 1e-04, p12 = 5e-06, ... )susie_post_outcome_configuration( input, by = c("fit", "outcome"), method = c("susiex", "coloc_pairwise"), prob_thresh = 0.8, cs_only = TRUE, p1 = 1e-04, p2 = 1e-04, p12 = 5e-06, ... )
input |
A single fit of class |
by |
Either |
method |
Character scalar; one of |
prob_thresh |
Per-trait marginal threshold for the convenience
|
cs_only |
Logical. If |
p1, p2, p12
|
Coloc per-SNP causal priors: |
... |
Currently ignored. |
Two grouping modes are supported through the by argument:
"fit"Each input fit contributes a single trait view.
Multi-output fits (mvsusie, mfsusie) are kept whole: the
trait's per-(CS, SNP) log Bayes factors are the joint composite
stored on the fit as lbf_variable. Configuration enumeration
loops over the cross-product of CS
indices.
"outcome"Multi-output fits fan out into per-outcome views,
each with its own per-(CS, SNP) log Bayes factors read from
fit$lbf_variable_outcome (an or
array). All per-outcome views share the
joint fit's PIP matrix and CS list, so the configuration enumeration
reduces to a single index . Single-output susie
fits pass through unchanged. Requires $lbf_variable_outcome on the
fit (set attach_lbf_variable_outcome = TRUE when fitting).
For each credible-set tuple :
Per-trait CS-level log BF (alpha-weighted SNP average):
Enumerate the binary configurations
.
Configuration log BF:
Normalise under a uniform prior over the configurations.
Per-trait marginal: .
For each unordered trait pair and each CS pair
, with per-SNP log BFs
and
(length ), the
five hypothesis log-BFs are
and the corresponding posteriors are
, where
is the log-sum-exp.
H0: no causal variant in either CS.
H1: causal in trait only.
H2: causal in trait only.
H3: distinct causals in the two traits.
H4: a single shared causal variant.
A list of class "susie_post_outcome_configuration" with
exactly one of the following components, depending on method:
$susiex(when method = "susiex") A list of length
equal to the number of CS tuples considered. Each element has
components cs_indices (length-N integer tuple),
logBF_trait (length N), configs (
binary matrix), config_prob (length ),
marginal_prob (length-N per-trait marginal posterior
probability of being active across the configuration ensemble),
and active (logical, marginal_prob >= prob_thresh).
$coloc_pairwise(when method = "coloc_pairwise")
A data.frame with one row per (trait1, trait2, l1, l2)
combination, columns trait1, trait2, l1, l2, hit1, hit2,
PP.H0, PP.H1, PP.H2, PP.H3, PP.H4.
Pretty-print with summary(out).
SuSiEx, Nature Genetics 2024 (combinatorial enumeration).
Wallace, PLoS Genetics 2020 (coloc pairwise H0/H1/H2/H3/H4 ABF).
Performs SuSiE regression using z-scores and correlation matrix.
This is the sufficient-statistics RSS interface. For the specialized
regularized eigendecomposition likelihood with lambda > 0, use
susie_rss_lambda.
susie_rss( z = NULL, R = NULL, n = NULL, X = NULL, bhat = NULL, shat = NULL, var_y = NULL, L = min(10, if (is.list(R) && !is.matrix(R)) ncol(R[[1]]) else if (!is.null(R)) ncol(R) else if (is.list(X) && !is.matrix(X)) ncol(X[[1]]) else ncol(X)), maf = NULL, maf_thresh = 0, scaled_prior_variance = 0.2, residual_variance = NULL, prior_weights = NULL, null_weight = 0, standardize = TRUE, estimate_residual_variance = FALSE, estimate_residual_method = c("MoM", "MLE", "NIG"), estimate_prior_variance = TRUE, estimate_prior_method = c("optim", "EM", "simple"), prior_variance_grid = NULL, mixture_weights = NULL, unmappable_effects = c("none", "inf", "ash", "ash_filter_archived"), check_null_threshold = 0, prior_tol = 1e-09, residual_variance_lowerbound = 0, residual_variance_upperbound = Inf, model_init = NULL, s_init = NULL, coverage = 0.95, min_abs_corr = 0.5, max_iter = NULL, L_greedy = NULL, greedy_lbf_cutoff = 0.1, tol = NULL, convergence_method = c("elbo", "pip"), verbose = FALSE, track_fit = FALSE, check_input = FALSE, check_prior = TRUE, n_purity = 100, r_tol = 1e-08, refine = FALSE, R_finite = NULL, R_mismatch = c("none", "eb"), R_mismatch_method = c("mle", "map"), eig_delta_rel = 0.001, eig_delta_abs = 0, artifact_threshold = 0.1, R_sensitivity_threshold = log(20), alpha0 = NULL, beta0 = NULL, init_only = FALSE, slot_prior = NULL )susie_rss( z = NULL, R = NULL, n = NULL, X = NULL, bhat = NULL, shat = NULL, var_y = NULL, L = min(10, if (is.list(R) && !is.matrix(R)) ncol(R[[1]]) else if (!is.null(R)) ncol(R) else if (is.list(X) && !is.matrix(X)) ncol(X[[1]]) else ncol(X)), maf = NULL, maf_thresh = 0, scaled_prior_variance = 0.2, residual_variance = NULL, prior_weights = NULL, null_weight = 0, standardize = TRUE, estimate_residual_variance = FALSE, estimate_residual_method = c("MoM", "MLE", "NIG"), estimate_prior_variance = TRUE, estimate_prior_method = c("optim", "EM", "simple"), prior_variance_grid = NULL, mixture_weights = NULL, unmappable_effects = c("none", "inf", "ash", "ash_filter_archived"), check_null_threshold = 0, prior_tol = 1e-09, residual_variance_lowerbound = 0, residual_variance_upperbound = Inf, model_init = NULL, s_init = NULL, coverage = 0.95, min_abs_corr = 0.5, max_iter = NULL, L_greedy = NULL, greedy_lbf_cutoff = 0.1, tol = NULL, convergence_method = c("elbo", "pip"), verbose = FALSE, track_fit = FALSE, check_input = FALSE, check_prior = TRUE, n_purity = 100, r_tol = 1e-08, refine = FALSE, R_finite = NULL, R_mismatch = c("none", "eb"), R_mismatch_method = c("mle", "map"), eig_delta_rel = 0.001, eig_delta_abs = 0, artifact_threshold = 0.1, R_sensitivity_threshold = log(20), alpha0 = NULL, beta0 = NULL, init_only = FALSE, slot_prior = NULL )
z |
A p-vector of z-scores. |
R |
A p by p correlation matrix. Exactly one of |
n |
The sample size, not required but recommended. |
X |
A factor matrix (B x p) such that |
bhat |
Alternative summary data giving the estimated effects
(a vector of length p). This, together with |
shat |
Alternative summary data giving the standard errors of
the estimated effects (a vector of length p). This, together with
|
var_y |
The sample variance of y, defined as |
L |
Maximum number of non-zero effects in the model. If L is larger than the number of covariates, p, L is set to p. |
maf |
A p-vector of minor allele frequencies; to be used along with
|
maf_thresh |
Variants with MAF smaller than this threshold are not used. |
scaled_prior_variance |
The prior variance, divided by
|
residual_variance |
Variance of the residual. If
|
prior_weights |
A vector of length p, in which each entry
gives the prior probability that corresponding column of X has a
nonzero effect on the outcome, y. The weights are internally
normalized to sum to 1. When |
null_weight |
Prior probability of no effect (a number between 0 and 1, and cannot be exactly 1). |
standardize |
If |
estimate_residual_variance |
The default is FALSE, the
residual variance is fixed to 1 or variance of y. If the in-sample
R matrix is provided, we recommend setting
|
estimate_residual_method |
The method used for estimating residual variance. For the original SuSiE model, "MLE" and "MoM" estimation is equivalent, but for the infinitesimal model, "MoM" is more stable. We recommend using "NIG" when n < 80 for improved coverage, although it is currently only implemented for individual-level data. |
estimate_prior_variance |
If |
estimate_prior_method |
The method used for estimating prior
variance. When |
prior_variance_grid |
Numeric vector of K prior variances defining
a mixture-of-normals prior on effect sizes. When provided, the SER
evaluates Bayes factors at each grid point and forms a mixture BF
weighted by |
mixture_weights |
Numeric vector of K non-negative weights summing
to 1, giving the mixture proportions for the variance grid. Default is
|
unmappable_effects |
The method for modeling unmappable effects: "none", "inf", "ash". |
check_null_threshold |
When the prior variance is estimated,
compare the estimate with the null, and set the prior variance to
zero unless the log-likelihood using the estimate is larger by this
threshold amount. For example, if you set
|
prior_tol |
When the prior variance is estimated, compare the
estimated value to |
residual_variance_lowerbound |
Lower limit on the estimated
residual variance. It is only relevant when
|
residual_variance_upperbound |
Upper limit on the estimated
residual variance. It is only relevant when
|
model_init |
A previous susie fit with which to initialize. |
s_init |
Deprecated alias for |
coverage |
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets. |
min_abs_corr |
Minimum absolute correlation allowed in a
credible set. The default, 0.5, corresponds to a squared
correlation of 0.25, which is a commonly used threshold for
genotype data in genetic studies. This "purity" filter is
applied to the CSs reported in the fit object, so the CS list
returned here may be a subset of the one produced by calling
|
max_iter |
Maximum number of IBSS iterations to perform. For
|
L_greedy |
Integer or |
greedy_lbf_cutoff |
Numeric saturation threshold for the
|
tol |
A small, non-negative number specifying the convergence
tolerance for the IBSS fitting procedure. When |
convergence_method |
When |
verbose |
If |
track_fit |
If |
check_input |
If |
check_prior |
If |
n_purity |
Passed as argument |
r_tol |
Tolerance level for eigenvalue check of positive semidefinite
matrix |
refine |
If |
R_finite |
Controls variance inflation to account for estimating the R matrix from a finite reference panel. Accepts four types of input:
When active, this dynamically inflates the null variance of each
variable's score statistic at every IBSS iteration to account for
finite-reference uncertainty in the Single Effect Regression (SER).
When provided, the output includes a
|
R_mismatch |
R-mismatch correction mode. |
R_mismatch_method |
Estimator for the region-level
|
eig_delta_rel, eig_delta_abs
|
Cutoffs for "low-eigenvalue"
directions of |
artifact_threshold |
Flag threshold on the QC score |
R_sensitivity_threshold |
Flag threshold for the credible-set
Bayes-factor attenuation diagnostic. Default |
alpha0 |
Numerical parameter for the NIG prior when using
|
beta0 |
Numerical parameter for the NIG prior when using
|
init_only |
Logical. If |
slot_prior |
Optional slot activity prior created by
|
In addition to the standard "susie" output (see
susie), the returned object may contain:
R_finite_diagnostics |
A list of diagnostics for the
R-uncertainty correction (only present when |
Specialized interface for the regularized eigendecomposition RSS likelihood of Zou et al. (2022). This path accepts a single reference matrix or a single factor matrix and does not support multi-panel mixture, finite-reference inflation, or R-mismatch correction.
susie_rss_lambda( z = NULL, R = NULL, n = NULL, X = NULL, L = min(10, if (!is.null(R)) ncol(R) else ncol(X)), lambda, maf = NULL, maf_thresh = 0, prior_variance = 50, residual_variance = NULL, prior_weights = NULL, null_weight = 0, intercept_value = 0, estimate_residual_variance = FALSE, estimate_residual_method = "MLE", estimate_prior_variance = TRUE, estimate_prior_method = c("optim", "EM", "simple"), prior_variance_grid = NULL, mixture_weights = NULL, check_null_threshold = 0, prior_tol = 1e-09, residual_variance_lowerbound = 0, model_init = NULL, coverage = 0.95, min_abs_corr = 0.5, max_iter = NULL, L_greedy = NULL, greedy_lbf_cutoff = 0.1, tol = NULL, convergence_method = c("elbo", "pip"), verbose = FALSE, track_fit = FALSE, check_prior = TRUE, check_R = TRUE, check_z = FALSE, n_purity = 100, r_tol = 1e-08, refine = FALSE, init_only = FALSE, slot_prior = NULL )susie_rss_lambda( z = NULL, R = NULL, n = NULL, X = NULL, L = min(10, if (!is.null(R)) ncol(R) else ncol(X)), lambda, maf = NULL, maf_thresh = 0, prior_variance = 50, residual_variance = NULL, prior_weights = NULL, null_weight = 0, intercept_value = 0, estimate_residual_variance = FALSE, estimate_residual_method = "MLE", estimate_prior_variance = TRUE, estimate_prior_method = c("optim", "EM", "simple"), prior_variance_grid = NULL, mixture_weights = NULL, check_null_threshold = 0, prior_tol = 1e-09, residual_variance_lowerbound = 0, model_init = NULL, coverage = 0.95, min_abs_corr = 0.5, max_iter = NULL, L_greedy = NULL, greedy_lbf_cutoff = 0.1, tol = NULL, convergence_method = c("elbo", "pip"), verbose = FALSE, track_fit = FALSE, check_prior = TRUE, check_R = TRUE, check_z = FALSE, n_purity = 100, r_tol = 1e-08, refine = FALSE, init_only = FALSE, slot_prior = NULL )
z |
A p-vector of z-scores. |
R |
A p by p correlation matrix. Exactly one of |
n |
The sample size, not required but recommended. |
X |
Optional factor matrix (B by p) such that
|
L |
Maximum number of non-zero effects in the model. If L is larger than the number of covariates, p, L is set to p. |
lambda |
Regularization parameter for the RSS-lambda likelihood.
Must be supplied. |
maf |
A p-vector of minor allele frequencies; to be used along with
|
maf_thresh |
Variants with MAF smaller than this threshold are not used. |
prior_variance |
Prior variance for each non-zero effect on the
z-score scale. Replaces |
residual_variance |
Residual variance on the RSS-lambda scale. If
|
prior_weights |
A vector of length p, in which each entry
gives the prior probability that corresponding column of X has a
nonzero effect on the outcome, y. The weights are internally
normalized to sum to 1. When |
null_weight |
Prior probability of no effect (a number between 0 and 1, and cannot be exactly 1). |
intercept_value |
Intercept used by the RSS-lambda likelihood.
Default |
estimate_residual_variance |
If |
estimate_residual_method |
Variance-component estimator. The
RSS-lambda path supports |
estimate_prior_variance |
If |
estimate_prior_method |
The method used for estimating prior
variance. When |
prior_variance_grid |
Numeric vector of K prior variances defining
a mixture-of-normals prior on effect sizes. When provided, the SER
evaluates Bayes factors at each grid point and forms a mixture BF
weighted by |
mixture_weights |
Numeric vector of K non-negative weights summing
to 1, giving the mixture proportions for the variance grid. Default is
|
check_null_threshold |
When the prior variance is estimated,
compare its likelihood to the likelihood at zero and use zero
unless the larger value exceeds it by at least
|
prior_tol |
When the prior variance is estimated, compare the
estimated value to |
residual_variance_lowerbound |
Lower limit on the estimated
residual variance. It is only relevant when
|
model_init |
A previous susie fit with which to initialize. |
coverage |
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets. |
min_abs_corr |
Minimum absolute correlation allowed in a
credible set. The default, 0.5, corresponds to a squared
correlation of 0.25, which is a commonly used threshold for
genotype data in genetic studies. This "purity" filter is
applied to the CSs reported in the fit object, so the CS list
returned here may be a subset of the one produced by calling
|
max_iter |
Maximum number of IBSS iterations to perform. For
|
L_greedy |
Integer or |
greedy_lbf_cutoff |
Numeric saturation threshold for the
|
tol |
A small, non-negative number specifying the convergence
tolerance for the IBSS fitting procedure. When |
convergence_method |
When |
verbose |
If |
track_fit |
If |
check_prior |
If |
check_R |
If TRUE, verify that |
check_z |
If TRUE, verify that |
n_purity |
Passed as argument |
r_tol |
Tolerance level for eigenvalue check of positive semidefinite
matrix |
refine |
If |
init_only |
Logical. If |
slot_prior |
Optional slot activity prior created by
|
A "susie" fit (or, with init_only = TRUE, the
constructed data and params objects).
Fit a single-effect regression (SER) model directly from marginal summary
statistics. Unlike susie_rss, this interface does not take an
LD matrix or reference factor matrix, and it never constructs a diagonal LD
matrix internally. The model has exactly one single effect.
susie_ser( z = NULL, bhat = NULL, shat = NULL, n = NULL, var_y = NULL, prior_variance = 50, scaled_prior_variance = 0.2, prior_weights = NULL, null_weight = 0, estimate_prior_method = c("optim", "simple"), check_null_threshold = 0, prior_tol = 1e-09, coverage = 0.95, ethres = NULL )susie_ser( z = NULL, bhat = NULL, shat = NULL, n = NULL, var_y = NULL, prior_variance = 50, scaled_prior_variance = 0.2, prior_weights = NULL, null_weight = 0, estimate_prior_method = c("optim", "simple"), check_null_threshold = 0, prior_tol = 1e-09, coverage = 0.95, ethres = NULL )
z |
A vector of z-scores. Provide either |
bhat |
Alternative summary statistics: marginal effect estimates. |
shat |
Standard errors for |
n |
Optional sample size. When supplied, z-scores are adjusted using
the same PVE adjustment as |
var_y |
Optional sample variance of the outcome. When supplied with
|
prior_variance |
Prior variance on the z-score scale when |
scaled_prior_variance |
Prior variance divided by the working outcome
variance when |
prior_weights |
A vector of prior probabilities, one per variable. The weights are normalized internally. |
null_weight |
Prior probability of no effect. If nonzero, a null variable is appended with zero effect. |
estimate_prior_method |
Method used for the single prior variance.
|
check_null_threshold |
If the log-likelihood at prior variance zero is within this amount of the selected prior variance, set the prior variance to zero. |
prior_tol |
Prior-variance tolerance used by |
coverage |
Coverage for credible-set construction. If |
ethres |
Effective-number threshold passed to
|
A "susie" object with one single effect.
susie_rss, susie_get_cs_attainable,
susie_get_cs
Performs SuSiE regression using sufficient statistics (XtX, Xty, yty, n) instead of individual-level data (X, y).
susie_ss( XtX, Xty, yty, n, L = min(10, ncol(XtX)), X_colmeans = NA, y_mean = NA, maf = NULL, maf_thresh = 0, check_input = FALSE, r_tol = 1e-08, standardize = TRUE, scaled_prior_variance = 0.2, residual_variance = NULL, prior_weights = NULL, null_weight = 0, model_init = NULL, s_init = NULL, estimate_residual_variance = TRUE, estimate_residual_method = c("MoM", "MLE", "NIG"), residual_variance_lowerbound = 0, residual_variance_upperbound = Inf, estimate_prior_variance = TRUE, estimate_prior_method = c("optim", "EM", "simple"), prior_variance_grid = NULL, mixture_weights = NULL, unmappable_effects = c("none", "inf", "ash", "ash_filter_archived"), check_null_threshold = 0, prior_tol = 1e-09, max_iter = 100, L_greedy = NULL, greedy_lbf_cutoff = 0.1, tol = NULL, convergence_method = c("elbo", "pip"), coverage = 0.95, min_abs_corr = 0.5, n_purity = 100, verbose = FALSE, track_fit = FALSE, check_prior = FALSE, refine = FALSE, alpha0 = NULL, beta0 = NULL, slot_prior = NULL )susie_ss( XtX, Xty, yty, n, L = min(10, ncol(XtX)), X_colmeans = NA, y_mean = NA, maf = NULL, maf_thresh = 0, check_input = FALSE, r_tol = 1e-08, standardize = TRUE, scaled_prior_variance = 0.2, residual_variance = NULL, prior_weights = NULL, null_weight = 0, model_init = NULL, s_init = NULL, estimate_residual_variance = TRUE, estimate_residual_method = c("MoM", "MLE", "NIG"), residual_variance_lowerbound = 0, residual_variance_upperbound = Inf, estimate_prior_variance = TRUE, estimate_prior_method = c("optim", "EM", "simple"), prior_variance_grid = NULL, mixture_weights = NULL, unmappable_effects = c("none", "inf", "ash", "ash_filter_archived"), check_null_threshold = 0, prior_tol = 1e-09, max_iter = 100, L_greedy = NULL, greedy_lbf_cutoff = 0.1, tol = NULL, convergence_method = c("elbo", "pip"), coverage = 0.95, min_abs_corr = 0.5, n_purity = 100, verbose = FALSE, track_fit = FALSE, check_prior = FALSE, refine = FALSE, alpha0 = NULL, beta0 = NULL, slot_prior = NULL )
XtX |
A p by p matrix, X'X, with columns of X centered to have mean zero. |
Xty |
A p-vector, X'y, with y and columns of X centered to have mean zero. |
yty |
A scalar, y'y, with y centered to have mean zero. |
n |
The sample size. |
L |
Maximum number of non-zero effects in the model. If L is larger than the number of covariates, p, L is set to p. |
X_colmeans |
A p-vector of column means of |
y_mean |
A scalar containing the mean of |
maf |
A p-vector of minor allele frequencies; to be used along with
|
maf_thresh |
Variants with MAF smaller than this threshold are not used. |
check_input |
If |
r_tol |
Tolerance level for eigenvalue check of positive semidefinite
matrix |
standardize |
If |
scaled_prior_variance |
The prior variance, divided by
|
residual_variance |
Variance of the residual. If
|
prior_weights |
A vector of length p, in which each entry
gives the prior probability that corresponding column of X has a
nonzero effect on the outcome, y. The weights are internally
normalized to sum to 1. When |
null_weight |
Prior probability of no effect (a number between 0 and 1, and cannot be exactly 1). |
model_init |
A previous susie fit with which to initialize. |
s_init |
Deprecated alias for |
estimate_residual_variance |
If
|
estimate_residual_method |
The method used for estimating residual variance. For the original SuSiE model, "MLE" and "MoM" estimation is equivalent, but for the infinitesimal model, "MoM" is more stable. We recommend using "NIG" when n < 80 for improved coverage, although it is currently only implemented for individual-level data. |
residual_variance_lowerbound |
Lower limit on the estimated
residual variance. It is only relevant when
|
residual_variance_upperbound |
Upper limit on the estimated
residual variance. It is only relevant when
|
estimate_prior_variance |
If |
estimate_prior_method |
The method used for estimating prior
variance. When |
prior_variance_grid |
Numeric vector of K prior variances defining
a mixture-of-normals prior on effect sizes. When provided, the SER
evaluates Bayes factors at each grid point and forms a mixture BF
weighted by |
mixture_weights |
Numeric vector of K non-negative weights summing
to 1, giving the mixture proportions for the variance grid. Default is
|
unmappable_effects |
The method for modeling unmappable effects: "none", "inf", "ash". |
check_null_threshold |
When the prior variance is estimated,
compare the estimate with the null, and set the prior variance to
zero unless the log-likelihood using the estimate is larger by this
threshold amount. For example, if you set
|
prior_tol |
When the prior variance is estimated, compare the
estimated value to |
max_iter |
Maximum number of IBSS iterations to perform. For
|
L_greedy |
Integer or |
greedy_lbf_cutoff |
Numeric saturation threshold for the
|
tol |
A small, non-negative number specifying the convergence
tolerance for the IBSS fitting procedure. When |
convergence_method |
When |
coverage |
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets. |
min_abs_corr |
Minimum absolute correlation allowed in a
credible set. The default, 0.5, corresponds to a squared
correlation of 0.25, which is a commonly used threshold for
genotype data in genetic studies. This "purity" filter is
applied to the CSs reported in the fit object, so the CS list
returned here may be a subset of the one produced by calling
|
n_purity |
Passed as argument |
verbose |
If |
track_fit |
If |
check_prior |
If |
refine |
If |
alpha0 |
Numerical parameter for the NIG prior when using
|
beta0 |
Numerical parameter for the NIG prior when using
|
slot_prior |
Optional slot activity prior created by
|
Fits the non-parametric Gaussian regression model
, where the mean is modelled as , X is a matrix with columns containing an appropriate basis,
and b is vector with a (sparse) SuSiE prior. In particular, when
order = 0, the jth column of X is a vector with the first j
elements equal to zero, and the remaining elements equal to 1, so
that corresponds to the change in the mean of y between
indices j and j+1. For background on trend filtering, see
Tibshirani (2014). See also the "Trend filtering" vignette,
vignette("trend_filtering").
susie_trendfilter(y, order = 0, standardize = FALSE, use_mad = TRUE, ...)susie_trendfilter(y, order = 0, standardize = FALSE, use_mad = TRUE, ...)
y |
An n-vector of observations ordered in time or space (assumed to be equally spaced). |
order |
An integer specifying the order of trend filtering.
The default, |
standardize |
Logical indicating whether to standardize the X
variables ("basis functions"); |
use_mad |
Logical indicating whether to use the "median
absolute deviation" (MAD) method to the estimate residual
variance. If |
... |
Other arguments passed to |
This implementation exploits the special structure of X,
which means that the matrix-vector product is fast to
compute; in particular, the computation time is rather
than if X were formed explicitly. For
implementation details, see the "Implementation of SuSiE trend
filtering" vignette by running
vignette("trendfiltering_derivations").
A "susie" fit; see susie for details.
R. J. Tibshirani (2014). Adaptive piecewise polynomial estimation via trend filtering. Annals of Statistics 42, 285-323.
set.seed(1) mu <- c(rep(0, 50), rep(1, 50), rep(3, 50), rep(-2, 50), rep(0, 200)) y <- mu + rnorm(400) s <- susie_trendfilter(y) plot(y) lines(mu, col = 1, lwd = 3) lines(predict(s), col = 2, lwd = 2) # Calculate credible sets (indices of y that occur just before # changepoints). susie_get_cs(s) # Plot with credible sets for changepoints. susie_plot_changepoint(s, y)set.seed(1) mu <- c(rep(0, 50), rep(1, 50), rep(3, 50), rep(-2, 50), rep(0, 200)) y <- mu + rnorm(400) s <- susie_trendfilter(y) plot(y) lines(mu, col = 1, lwd = 3) lines(predict(s), col = 2, lwd = 2) # Calculate credible sets (indices of y that occur just before # changepoints). susie_get_cs(s) # Plot with credible sets for changepoints. susie_plot_changepoint(s, y)
This function extracts the ordering of the predictors according to the coefficients estimated in a basic univariate regression; in particular, the predictors are ordered in decreasing order by magnitude of the univariate regression coefficient estimate.
univar.order(X, y)univar.order(X, y)
X |
An input design matrix. This may be centered and/or standardized prior to calling function. |
y |
A vector of response variables. |
An ordering of the predictors.
### generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) univ.order = univar.order(X,y)### generate synthetic data set.seed(1) n = 200 p = 300 X = matrix(rnorm(n*p),n,p) beta = double(p) beta[1:10] = 1:10 y = X %*% beta + rnorm(n) univ.order = univar.order(X,y)
This function performs the univariate linear regression y ~ x separately for each column x of X. The estimated effect size and stardard error for each variable are outputted.
univariate_regression( X, y, Z = NULL, center = TRUE, scale = FALSE, return_residuals = FALSE, method = c("lmfit", "sumstats") ) calc_z(X, Y, center = FALSE, scale = FALSE)univariate_regression( X, y, Z = NULL, center = TRUE, scale = FALSE, return_residuals = FALSE, method = c("lmfit", "sumstats") ) calc_z(X, Y, center = FALSE, scale = FALSE)
X |
n by p matrix of regressors. |
y |
n-vector of response variables. |
Z |
Optional n by k matrix of covariates to be included in all
regresions. If Z is not |
center |
If |
scale |
If |
return_residuals |
Whether or not to output the residuals if Z
is not |
method |
Either “sumstats” (faster implementation) or
“lmfit” (uses |
A list with two vectors containing the least-squares
estimates of the coefficients (betahat) and their standard
errors (sebetahat). Optionally, and only when a matrix of
covariates Z is provided, a third vector residuals
containing the residuals is returned.
set.seed(1) n = 1000 p = 1000 beta = rep(0,p) beta[1:4] = 1 X = matrix(rnorm(n*p),nrow = n,ncol = p) X = scale(X,center = TRUE,scale = TRUE) y = drop(X %*% beta + rnorm(n)) res = univariate_regression(X,y) plot(res$betahat/res$sebetahat)set.seed(1) n = 1000 p = 1000 beta = rep(0,p) beta[1:4] = 1 X = matrix(rnorm(n*p),nrow = n,ncol = p) X = scale(X,center = TRUE,scale = TRUE) y = drop(X %*% beta + rnorm(n)) res = univariate_regression(X,y) plot(res$betahat/res$sebetahat)
A simulated data set with 1,000 individuals and 5,000 variants, combining 3 sparse, 5 oligogenic and 15 polygenic non-zero effects. The response is generated under a linear regression model. This data set illustrates fine-mapping with SuSiE-ash and SuSiE-inf.
unmappable_data is a list with the following elements:
Centered and scaled genotype matrix.
Simulated response.
Simulated effect sizes.
Total proportion of variance in the response explained by the simulated effects.
The “Fine-mapping with SuSiE-ash and SuSiE-inf” vignette.
data(unmappable_data)data(unmappable_data)