Jake Callahan

Reverse-Annealed Sequential Monte Carlo for Efficient Bayesian Optimal Experiment Design

Neural Information Processing Systems December 2025

Jake Callahan Andrew Chin Jason Pacheco Tommie Catanach

Expected information gain (EIG) is a crucial quantity in Bayesian optimal experimental design (BOED), quantifying how useful an experiment is by the amount we expect the posterior to differ from the prior. However, evaluating the EIG can be computationally expensive since it generally requires estimating the posterior normalizing constant. In this work, we leverage two idiosyncrasies of BOED to improve efficiency of EIG estimation via sequential Monte Carlo (SMC). First, in BOED we simulate the data and thus know the true underlying parameters. Second, we ultimately care about the EIG, not the individual normalizing constants. Often we observe that the Monte Carlo variance of standard SMC estimators for the normalizing constant of a single dataset are significantly lower than the variance of the normalizing constants across datasets; the latter thus contributes the majority of the variance for EIG estimates. This suggests the potential to slightly increase variance while drastically decreasing computation time by reducing the SMC population size, which leads us to an EIG-specific SMC estimator that starts with a only a single sample from the posterior and tempers backwards towards the prior. Using this single-sample estimator, which we call reverse-annealed SMC (RA-SMC), we show that it is possible to estimate EIG with orders of magnitude fewer likelihood evaluations in three models: a four-dimensional spring-mass, a six-dimensional Johnson-Cook model and a four-dimensional source-finding problem.

Bayesian optimal experiment design Expected information gain mutual information model evidence sequential Monte Carlo Markov chain Monte Carlo Bayesian methods

Analysis and optimization of seismic monitoring networks with Bayesian optimal experimental design

Geophysical Journal International 17 January 2025

Jake Callahan Kevin Monogue Ruben Villareal Tommie Catanach

Monitoring networks increasingly aim to assimilate data from a large number of diverse sensors covering many sensing modalities. Bayesian optimal experimental design (OED) seeks to identify data, sensor configurations or experiments which can optimally reduce uncertainty and hence increase the performance of a monitoring network. Information theory guides OED by formulating the choice of experiment or sensor placement as an optimization problem that maximizes the expected information gain (EIG) about quantities of interest given prior knowledge and models of expected observation data. Therefore, within the context of seismo-acoustic monitoring, we can use Bayesian OED to configure sensor networks by choosing sensor locations, types and fidelity in order to improve our ability to identify and locate seismic sources. In this work, we develop the framework necessary to use Bayesian OED to optimize a sensor network’s ability to locate seismic events from arrival time data of detected seismic phases at the regional-scale.

Bayesian OED Seismic Sensing Bayesian inference

Details

Mathematical analysis of redistricting in Utah

Statistics and Public Policy 27 September 2022

Annika King Jacob Murri Jake Callahan Adrienne Russell Tyler J. Jarvis

We discuss difficulties of evaluating partisan gerrymandering in the congressional districts in Utah and the failure of many common metrics in Utah. We explain why the Republican vote share in the least-Republican district (LRVS) is a good indicator of the advantage or disadvantage each party has in the Utah congressional districts. Although the LRVS only makes sense in settings with at most one competitive district, in that setting it directly captures the extent to which a given redistricting plan gives advantage or disadvantage to the Republican and Democratic parties. We use the LRVS to evaluate the most common measures of partisan gerrymandering in the context of Utah’s 2011 congressional districts. We do this by generating large ensembles of alternative redistricting plans using Markov chain Monte Carlo methods. We also discuss the implications of this new metric and our results on the question of whether the 2011 Utah congressional plan was gerrymandered.

Ensemble methods Gerrymandering Markov chain Monte Carlo Partisan symmetry Redistricting Utah

Details

On the Misinformation in a Statistical Experiment

Under review.

Jake Callahan Tommie Catanach

The principle that more informative experiments are always better is a cornerstone of Bayesian experimental design. This principle assumes the practitioner’s model and inference are correct. In practice, both the data-generating model and the inferential approximation are inevitably misspecified, and we show that under these conditions the classical framework for comparing experiments breaks down. Designs ranked as most informative can become actively harmful, amplifying bias to produce confident but incorrect inferences. We demonstrate that the commonly-accepted axioms of experimental utility, such as Blackwell monotonicity, fail under misspecification, and that information measures proposed to handle it, like the Expected Generalized Information Gain (EGIG), do not obey these axioms. To resolve this, we propose a generalized axiomatic framework for robust Bayesian experimental design. We prove that EGIG satisfies our axioms as a criterion that penalizes inferential error, providing a principled foundation for its use in Bayesian experimental design. As a complementary approach, we derive a new measure that instead penalizes model error. Finally, we demonstrate our framework’s utility across common modes of misspecification, showing it provides a reliable guide for experimental design where classical methods fail.

Bayesian optimal experimental design Model misspecification Bayesian inference Robust Bayesian analysis

Robust Bayesian Optimal Experimental Design for Seismic Sensor Placement under Model Uncertainty

Under review.

Jake Callahan Thomas Luckie Robert Porritt Tommie Catanach

We study robust experimental design for optimizing seismic monitoring networks under model misspecification. Classical Bayesian optimal experimental design (BOED) maximizes expected information gain (EIG) but assumes an accurate forward model, which can lead to severe degradation when the model is misspecified. We introduce a criterion based on the Expected Generalized Information Gain (EGIG), which averages information gain over an ensemble of perturbed forward models to account for structural uncertainty. To compute EGIG at scale, we construct differentiable neural surrogates for seismic travel times and incorporate logistical constraints such as road access through differentiable cost functions. This enables gradient-based Pareto optimization of sensor networks that balance robustness and deployment feasibility. Numerical experiments on perturbed velocity models show that EGIG-optimized networks maintain informativeness under severe misspecification. Pareto front analyses further reveal how robustness interacts with deployment cost, providing interpretable trade-offs for practice. The methodology illustrates how generalized information measures, combined with differentiable surrogates, extend BOED to inverse problems with structural error and logistical constraints.

Bayesian OED Seismic Sensing Bayesian inference Robust Bayesian analysis Model misspecification

		The University of Arizona 2023-Present Ph.D in Applied Mathematics Publications: Reverse Adaptive Sequential Monte Carlo for Efficient Bayesian Optimal Experimental Design Analysis and Optimization of Seismic Monitoring Networks with Bayesian Optimal Experimental Design Mathematical Analysis of Redistricting in Utah
		Brigham Young University 2021-2023 M.Sc. in Mathematics Thesis: Hamiltonian Monte Carlo for Reconstructing Historical Earthquake-Induced Tsunamis Advisor: Jared Whitehead
		Brigham Young University 2016-2020 B.Sc. in Mathematics: Applied and Computational Emphasis Taken Courses: Deep Learning Natural Language Processing Computer Vision Optimal Control Theory Probability Theory Graph Theory Theory of Predictive Modeling Probabilistic Graphical Modeling Linear Algebra Numerical Methods

Hi, I'm Jake

Jake Callahan

PhD Student at The University of Arizona

Publications

Reverse-Annealed Sequential Monte Carlo for Efficient Bayesian Optimal Experiment Design

Analysis and optimization of seismic monitoring networks with Bayesian optimal experimental design

Mathematical analysis of redistricting in Utah

On the Misinformation in a Statistical Experiment

Robust Bayesian Optimal Experimental Design for Seismic Sensor Placement under Model Uncertainty

Education

The University of Arizona

Ph.D in Applied Mathematics

Publications:

Brigham Young University

M.Sc. in Mathematics

Thesis:

Advisor:

Brigham Young University

B.Sc. in Mathematics: Applied and Computational Emphasis

Taken Courses:

Experience

Sandia National Laboratories

Research & Design Intern

Responsibilities:

The University of Arizona

Instructor of Record — College Algebra (Math 112)

Responsibilities:

Brigham Young University

Research Assistant

Responsibilities:

Mathematics Instructor & TA

Responsibilities:

OrderBoard, Inc.

Data Scientist

Responsibilities:

Honeywell

Automation & Cognitive Services Intern

Responsibilities:

Skills

Python

JAX

PyTorch

PyMC

Bayesian Inference

Reinforcement Learning

SLURM

MPI

Git

Linux

NumPy/SciPy

Matplotlib

LaTeX

Markdown