--- title: Extending ebnm with custom ebnm-style functions output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{Extending ebnm with custom ebnm-style functions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r knitr-opts, include=FALSE} knitr::opts_chunk$set(comment = "#", collapse = TRUE, results = "hold", fig.align = "center", dpi = 90) ``` The **ebnm** package, in addition to providing implementations of several commonly used priors (normal, Laplace, etc.), was designed to be easily extensible so that researchers are not limited by the existing options (despite the fact that a wide variety of options are available!). In this vignette, we illustrate how to extend **ebnm** by creating a *custom EBNM solver* in the style of other **ebnm** functions such as `ebnm_normal()` and `ebnm_point_laplace()`. Specifically, we implement an EBNM solver, `ebnm_t()`, that uses the family of scaled (Student's) *t* priors. (As of this writing, this is not one of the prior families included in **ebnm**.) **Please note:** This vignette assumes that you have read the [**ebnm** paper][ebnm-paper] and are familiar with the basic functionality of the **ebnm** package. ## The scaled-*t* prior family The *empirical Bayes normal means* (EBNM) model with scaled-*t* prior is: \begin{aligned} x_i &\sim \mathcal{N}(\theta_i, s_i^2), \\ \theta_i &\sim g \in \mathcal{G}_t. \end{aligned} $\mathcal{G}_t$ is the family of scaled-*t* priors, defined as follows: \begin{equation} \mathcal{G}_t := \{g: g(x) = \sigma t_{\nu}(x); \sigma > 0, \nu > 0\}, \end{equation} where $t_{\nu}(x)$ denotes the density function of the *t* distribution at $x$ with $\nu$ degrees of freedom. Fitting the prior $g \in \mathcal{G}_t$ therefore involves estimating two parameters: the scale parameter $\sigma$ and the degrees of freedom $\nu$. ## Overview of the implementation process The **ebnm** package is intended to encompass a very broad range of prior families. In general, creating a custom EBNM solver involves the following steps: 1. Define the prior family class $\mathcal{G}$. 2. Implement a function that estimates the prior $g \in \mathcal{G}$. 3. Implement a function that computes summaries of the posteriors $p(x_i \mid s_i, \hat{g})$. 4. Create the main EBNM solver function. 5. Test the new EBNM solver. 6. Use the solver to analyze a data set. In the following sections, we work through each of these steps in detail with the aim of creating a new function `ebnm_t()` that can fit the EBNM model with scaled-*t* prior. For readability, we advise adhering to the [Tidyverse style guide][tidyverse-style]. Functions should also be carefully tested; at minimum, functions should pass the tests in `ebnm_check_fn()`. Additional unit tests are strongly encouraged. The **ebnm** package implements a large suite of unit tests using the [**testthat** package][testthat]. ## Step 1: Define the prior family class First, we define a data structure for the priors in our prior family. **ebnm** uses these structures in two ways: (1) to store information about the fitted prior $\hat{g}$ (via the `fitted_g` field in the returned `"ebnm"` object); (2) to initialize solutions (via the `g_init` argument). Sometimes, an existing data structure can be used. For example, `ebnm_normal()`, `ebnm_point_normal()`, `ebnm_normal_scale_mixture()`, and `ebnm_point_mass()` all share the `"normalmix"` class. For the scaled-*t* prior, we define a new class, `"tdist"`, that includes the scale and degrees of freedom: ```{r tdist} tdist <- function (scale, df) { structure(data.frame(scale, df), class = "tdist") } ``` ## Step 2: Implement the optimization function Next we implement a function for estimating the two parameters specifying the prior. Prior estimation is typically done by maximizing the likelihood. There are many approaches one might take to solve this optimization problem, and the best approach very much depends on context. For an excellent overview of the many R packages that can be used for numerical optimization, please see the [CRAN task view on optimization][cran-task-view-opt]. Here, we use the L-BFGS-B method (implemented by the `optim()` function). There are at least a couple of reasons why we prefer using L-BFGS-B: (1) it doesn't require installing any additional packages; (2) it allows for bound constraints, which is helpful since the two parameters in the prior both need to be positive. Setting sensible upper and lower bounds can also help avoid numerical issues. Here, we use the constraints $\min_i s_i / 10 \le \sigma \le \max_i x_i$ and $1 \le \nu \le 1000$: ```{r opt_t} opt_t <- function (x, s, sigma_init, nu_init) { optim( par = c(sigma_init, nu_init), fn = function (par) -llik_t(x, s, par[1], par[2]), method = "L-BFGS-B", lower = c(min(s)/10, 1), upper = c(max(x), 1e3) ) } ``` Our optimization function `opt_t()` calls another function, `llik_t()`, which isn't yet implemented: this function should give us the log likelihood at the current parameter estimates. (Note that, since `optim()` seeks to minimize the objective, we compute the negative log likelihood.) Computing the log likelihood involves taking 1-d integrals, or *1-d convolutions*, over the unknown means $\theta_i$: \begin{equation} \log p(\mathbf{x} \mid g, \, \mathbf{s}) = \sum_{i=1}^n \textstyle \log \int p(x_i \mid \theta_i, s_i) \, g(\theta_i) \, d\theta_i. \end{equation} Since we do not have a convenient closed-form expression for these integrals, we compute them numerically using the `integrate()` function: ```{r llik_t} llik_t <- function (x, s, sigma, nu) { lik_one_obs <- function (x, s) { integrate(lik_times_prior, -Inf, Inf, x = x, s = s, sigma = sigma, nu = nu)$value } vlik <- Vectorize(lik_one_obs) return(sum(log(vlik(x, s)))) } ``` ```{r lik_times_prior} lik_times_prior <- function (theta, x, s, sigma, nu) { dnorm(x - theta, sd = s) * dt(theta / sigma, df = nu) / sigma } ``` ### (Optional) Include gradients in the optimization As we found empirically in our [numerical experiments][ebnm-paper], providing the gradient calculations to `optim()` can in some cases greatly speed up the optimization. When implementing your own custom EBNM solvers, you should consider providing gradients, particularly when analytic expressions are available (either via pen and paper or via automatic differentiation). Gradients for the scaled-*t* priors turn out to be difficult to obtain, but to illustrate how one might provide them, we estimate gradients numerically using the `grad()` function from the **numDeriv** package. We include this code for illustrative purposes; since `optim()` also computes gradients numerically, we do not expect this solution to provide any speedup. ```{r opt_grad, eval=FALSE} opt_t <- function (x, s, sigma_init, nu_init) { optim( par = c(sigma_init, nu_init), fn = function (par) -llik_t(x, s, par[1], par[2]), gr = function (par) -grad_t(x, s, par[1], par[2]), method = "L-BFGS-B", lower = c(min(s)/10, 1), upper = c(max(x), 1e3) ) } ``` The `grad_t()` function used above will estimate the gradients numerically using **numDeriv**: ```{r grad_t, eval=FALSE} library(numDeriv) grad_t <- function (x, s, sigma, nu) { grad(function(par) llik_t(x, s, par[1], par[2]), c(sigma, nu)) } ``` Using this version of the `opt_t` function should produce very similar results to the implementation that does not include the gradient. ## Step 3: Implement the posterior summary function Once we've estimated a prior $\hat{g} \in \mathcal{G}_t$, we can compute summary statistics (means, variances, etc.) from the posterior distributions. From Bayes' rule, the posterior distribution for the *i*-th unknown mean is \begin{equation} p(\theta_i \mid x_i, s_i, \hat{g}) \propto p(x_i \mid \theta_i, s_i) \, \hat{g}(\theta_i). \end{equation} For this example, we compute three posterior statistics: the posterior mean, the posterior second moment, and the posterior standard deviation. This is all accomplished by a single function that returns a data frame containing the posterior statistics: ```{r post_summary-t} post_summary_t <- function (x, s, sigma, nu) { samp <- post_sampler_t(x, s, sigma, nu, nsamp = 1000) return(data.frame( mean = colMeans(samp), sd = apply(samp, 2, sd), second_moment = apply(samp, 2, function (x) mean(x^2)) )) } ``` The missing piece is a function `post_sampler_t()` that draws random samples from the posteriors. While drawing independent samples is difficult, we can easily design an MCMC scheme to approximately draw samples from the posteriors. This is implemented using the **mcmc** package (which you should install if you haven't already): ```{r post_sampler_t} # install.packages("mcmc") library(mcmc) post_sampler_t <- function (x, s, sigma, nu, nsamp) { sample_one_theta <- function (x_i, s_i) { lpostdens <- function (theta) { dt(theta/sigma, df = nu, log = TRUE) - log(sigma) + dnorm(x_i - theta, sd = s_i, log = TRUE) } metrop(lpostdens, initial = x_i, nbatch = nsamp)$batch } vsampler <- Vectorize(sample_one_theta) return(vsampler(x, s)) } ``` This is most certainly not the most efficient nor numerically stable way to perform these computations. But we do it this way here to keep the example simple. ## Step 4: Put it all together Having implemented the key computations for our new EBNM solver, we will now incorporate these computations into a single function, `ebnm_t()`, which accepts the same inputs as the solvers in the **ebnm** package. For simplicity, we ignore the `output` parameter and just return all the results (data, posterior summaries, fitted prior, log likelihood and posterior sampler). See `help(ebnm)` for details about the expected structure of the return value. Here's the new function: ```{r ebnm_t} ebnm_t <- function (x, s = 1, mode = 0, scale = "estimate", g_init = NULL, fix_g = FALSE, output = ebnm_output_default(), optmethod = NULL, control = NULL) { # Some basic argument checks. if (mode != 0) { stop("The mode of the t-prior must be fixed at zero.") } if (scale != "estimate") { stop("The scale of the t-prior must be estimated rather than fixed ", "at a particular value.") } # If g_init is provided, extract the parameters. Otherwise, provide # reasonable initial estimates. if (!is.null(g_init)) { sigma_init <- g_init$scale nu_init <- g_init$df } else { sigma_init <- sqrt(mean(x^2)) nu_init <- 4 } # If g is fixed, use g_init. Otherwise optimize g. if (fix_g) { sigma <- sigma_init nu <- nu_init llik <- llik_t(x, s, sigma, nu) } else { opt_res <- opt_t(x, s, sigma_init, nu_init) sigma <- opt_res$par[1] nu <- opt_res$par[2] llik <- -opt_res$value } # Prepare the final output. retval <- structure(list( data = data.frame(x = x, s = s), posterior = post_summary_t(x, s, sigma, nu), fitted_g = tdist(scale = sigma, df = nu), log_likelihood = llik, post_sampler = function (nsamp) post_sampler_t(x, s, sigma, nu, nsamp) ), class = c("list", "ebnm")) return(retval) } ``` ## Step 5: Verify the EBNM function **ebnm** provides a function, `ebnm_check_fn()`, that runs basic tests to verify that the EBNM function works as expected. Let's run the checks using a small, simulated data set: ```{r check} library(ebnm) set.seed(1) x <- rnorm(10, sd = 2) s <- rep(1, 10) ebnm_check_fn(ebnm_t, x, s) ``` ## Step 6: Use the new EBNM function to analyze a data set Finally, we analyze a simulated data set in which the unobserved means are simulated from a *t* distribution with a scale of 2 and 5 degrees of freedom: ```{r sim-data} set.seed(1) theta <- 2 * rt(100, df = 5) x <- theta + rnorm(100) ``` Let's compare the use of the scaled-*t* prior with a normal prior: ```{r t-vs-normal} normal_res <- ebnm_normal(x, s = 1) t_res <- ebnm_t(x, s = 1) ``` (Note that the call to `ebnm_t()` is considerably slower than the call to `ebnm_normal()` because the computations with the scaled-*t* prior are more complex and we did not put any effort into making the computations efficient.) Let's compare the two results: ```{r plot-t-vs-normal, fig.width=6, fig.height=4} plot(normal_res, t_res) ``` `ebnm_t()` shrinks large observations less aggressively than `ebnm_normal()` and so the fit with the scaled-*t* prior results in slightly more accurate estimates: ```{r rmse} rmse_normal <- sqrt(mean((coef(normal_res) - theta)^2)) rmse_t <- sqrt(mean((coef(t_res) - theta)^2)) c(rmse_normal = rmse_normal, rmse_t = rmse_t) ``` Reassuringly, the parameters of the estimated prior are similar to the simulation parameters ($\sigma = 2$, $\nu = 5$): ```{r t-fitted-g} c(t_res$fitted_g) ``` ## Session information The following R version and packages were used to generate this vignette: ```{r session-info} sessionInfo() ``` [ebnm-paper]: https://arxiv.org/abs/2110.00152 [testthat]: https://testthat.r-lib.org [tidyverse-style]: https://style.tidyverse.org [cran-task-view-opt]: https://cran.r-project.org/web/views/Optimization.html