--- title: "Fast Robust Bayesian Meta-Analysis via Spike and Slab Algorithm" author: "František Bartoš" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: self_contained: yes bibliography: ../inst/REFERENCES.bib csl: ../inst/apa.csl vignette: > %\VignetteIndexEntry{Fast Robust Bayesian Meta-Analysis via Spike and Slab Algorithm} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown_notangle} --- ```{r setup, include = FALSE} is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_", "_R_CHECK_LICENSE_") %in% names(Sys.getenv())) || !file.exists("../models/FastRoBMA/fit_RoBMA.bridge.RDS") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = !is_check, dev = "png") if(.Platform$OS.type == "windows"){ knitr::opts_chunk$set(dev.args = list(type = "cairo")) } ``` ```{r include = FALSE} library(RoBMA) # we pre-load the RoBMA models, the fitting time is around 2-5 minutes fit_RoBMA.bridge <- readRDS(file = "../models/FastRoBMA/fit_RoBMA.bridge.RDS") fit_RoBMA.ss <- readRDS(file = "../models/FastRoBMA/fit_RoBMA.ss.RDS") ``` ```{r include = FALSE, eval = FALSE} library(RoBMA) t1_bridge <- Sys.time() fit_RoBMA.bridge <- RoBMA.reg( ~ measure + age, data = Andrews2021, algorithm = "bridge", chains = 3, sample = 5000, burnin = 2000, adapt = 500, parallel = TRUE ) t2_bridge <- Sys.time() t1_ss <- Sys.time() fit_RoBMA.ss <- RoBMA.reg( ~ measure + age, data = Andrews2021, algorithm = "ss", chains = 6, sample = 10000, burnin = 2500, adapt = 2500, parallel = TRUE ) t2_ss <- Sys.time() # save memory space fit_RoBMA.bridge <- RoBMA:::.remove_model_posteriors(fit_RoBMA.bridge) fit_RoBMA.bridge <- RoBMA:::.remove_model_margliks(fit_RoBMA.bridge) fit_RoBMA.ss$model$fit <- NULL saveRDS(fit_RoBMA.bridge, file = "../models/FastRoBMA/fit_RoBMA.bridge.RDS", compress = "xz") saveRDS(fit_RoBMA.ss, file = "../models/FastRoBMA/fit_RoBMA.ss.RDS", compress = "xz") ``` This vignette demonstrates how to fit a robust Bayesian model-averaged meta-regression (`RoBMA.reg()`) in the `RoBMA` R package using two different algorithms: - Bridge sampling (the current default, `algorithm = "bridge"`), - Spike-and-slab (`algorithm = "ss"`). We compare both approaches in terms of fitting time, side-by-side summaries, marginal summaries, and illustrative plots. We then discuss advantages and disadvantages of the spike-and-slab approach, especially for complex meta-regression with multiple moderators. We will use the example from [@bartos2023robust] which focuses on the effect of household chaos on child executive functions with two moderators: assessment type (measure: "direct" vs. "informant") and mean child age (age). See [@bartos2023robust] and [Robust Bayesian Model-Averaged Meta-Regression](MetaRegression.html) vignette for in depth discussion of the example and guidance on applying robust Bayesian model-averaged meta-regression. Here, we show that both algorithms produce essentially identical results; however, spike-and-slab can be much faster when many moderators (hence many models) are present. ### Fitting Models via Bridge Sampling vs. Spike-and-Slab Below, we specify and fit the same meta-regression with publication bias adjustment using the two algorithms. First, we load the package and inspect the data set. ```{r} library(RoBMA) data("Andrews2021", package = "RoBMA") head(Andrews2021) ``` Second, we estimate the meta-regression models using the two algorithms. We use the same formula, data, and settings for both algorithms. The bridge sampling approach is specified with `algorithm = "bridge"`, while the spike-and-slab approach is specified with `algorithm = "ss"`. We use the current default settings for the number of chains, samples, burn-in, and adaptation settings for the bridge algorithm, however, we significantly increase the number of chains and samples for the spike-and-slab algorithm to ensure convergence. This is important as the spike-and-slab algorithm estimates the complete model-averaged ensemble within a single MCMC run, while the bridge algorithm estimates each model separately and then combines them. We also time the estimation process for comparison using the `Sys.time()` function. ```r ## Bridge sampling t1_bridge <- Sys.time() fit_RoBMA.bridge <- RoBMA.reg( formula = ~ measure + age, data = Andrews2021, algorithm = "bridge", chains = 3, sample = 5000, burnin = 2000, adapt = 500, parallel = TRUE ) t2_bridge <- Sys.time() ## Spike-and-slab t1_ss <- Sys.time() fit_RoBMA.ss <- RoBMA.reg( formula = ~ measure + age, data = Andrews2021, algorithm = "ss", chains = 6, sample = 10000, burnin = 2500, adapt = 2500, parallel = TRUE ) t2_ss <- Sys.time() ``` Once the models are fitted, we compute the time taken to fit each model. ```r bridge_time <- difftime(t2_bridge, t1_bridge, units = "mins") ss_time <- difftime(t2_ss, t1_ss, units = "mins") ``` Running on a high-performing laptop with a 6c/12t Intel CPU, the bridge sampling approach took about 24 minutes, while the spike-and-slab approach took ~0.58 minutes. This highlights the efficiency advantage of the spike-and-slab algorithm as the number of moderators (and hence the number of potential models) grows. This is especially important as the number of models grows following 36x2^p, where p corresponds the number of moderators. For example, with 3 moderators, the number of models is 36x2^3 = 288, and with 4 moderators, the number of models is 36x2^4 = 576 (more is essentially unattainable on a consumer PC). ### Comparing Summary Outputs Below we compare the numeric summaries using the `summary()` for each fitted object. ```{r} summary(fit_RoBMA.bridge) summary(fit_RoBMA.ss) ``` The posterior probabilities for each component are very similar. For example, the posterior probability of including the measure moderator is ≈ 0.95 in both fits, and the probability of including the age moderator is ≈ 0.15. The posterior mean estimates for mu, tau, and the regression coefficients are essentially the same (differences are within the 3rd decimal place). Inclusion Bayes factors differ very slightly (0.499 vs. 0.508, etc.) but not in any meaningful or qualitatively different way. Hence, both algorithms yield consistent inferences but differ drastically in computation time when there are many models to evaluate. Visual inspection of the posterior distributions for the effect size and the moderators reveals that the two algorithms produce nearly identical results. ```{r fig_effect, dpi = 300, fig.width = 12, fig.height = 6, out.width = "100%", fig.align = "center"} par(mfrow = c(1, 2), mar = c(4, 4, 2, 5)) plot(fit_RoBMA.bridge, parameter = "mu", prior = TRUE, main = "Bridge Sampling") plot(fit_RoBMA.ss, parameter = "mu", prior = TRUE, main = "Spike-and-Slab") ``` Next, we compare the marginal estimates for the moderators at each level of measure, and age (centered at -1 SD, 0 SD, +1 SD, respectively). These “marginal means” clarify the estimated model-averaged effect for each factor level. We use the `marginal_summary()` function to extract these summaries. ```{r} marginal_summary(fit_RoBMA.bridge) marginal_summary(fit_RoBMA.ss) ``` Again, the mean estimates differ only in the third decimal place and the differences in the inclusion BF are qualitatively equal (e.g., 8.950 vs. 8.639 for the inclusion Bayes factor at the measure[informant] level). A visual comparison of the estimated marginal means reveals that the two algorithms produce nearly identical results. ```{r fig_marginal, dpi = 300, fig.width = 12, fig.height = 6, out.width = "100%", fig.align = "center"} par(mfrow = c(1, 2)) marginal_plot(fit_RoBMA.bridge, parameter = "measure", conditional = TRUE, prior = TRUE, xlim = c(-1, 1), ylim = c(0, 7), main = "Bridge Sampling") marginal_plot(fit_RoBMA.ss, parameter = "measure", conditional = TRUE, prior = TRUE, xlim = c(-1, 1), ylim = c(0, 7), main = "Spike-and-Slab") ``` ### Conclusion Spike-and-slab (`algorithm = "ss"`) is much faster than bridge sampling (`algorithm = "bridge"`) for estimating robust Bayesian meta-analysis model ensembles. This advantage further extends in cases of meta-regression models involving many predictors. Despite huge differences in computation time, both algorithms produce nearly identical inferences for model-averaged effect sizes, moderator effects, and posterior probabilities. Currently, the `"bridge"` algorithm remains the default option in the `RoBMA` R package but future major releases will switch to the `"ss"` algorithms the new default (and modify the default MCMC settings). ### References