Author Image

Hi, I'm Jake

Jake Callahan

PhD Student at The University of Arizona

I am a PhD student in Applied Mathematics at the University of Arizona, where I work with Dr. Jason Pacheco in the Stochastic Systems and Learning Group. My work blends math, machine learning and a healthy respect for uncertainty.

Before Arizona, I earned a Master’s in Mathematics at Brigham Young University, where I worked with Dr. Jared Whitehead on the TsunamiBayes project developing Bayesian tools for analyzing tsunamis from historical records.

I study how to make good decisions under uncertainty: How to ask the right questions, gather the right data, and take the right actions. My research focuses on Bayesian optimal experimental design, information-theoretic goals, and inference via MCMC, with applications in geophysics and political science.

Outside of research, you’ll find me in the mountains, backpacking or skiing, making music, or off-grid with my family.

Publications

Reverse-Annealed Sequential Monte Carlo for Efficient Bayesian Optimal Experiment Design

Expected information gain (EIG) is a crucial quantity in Bayesian optimal experimental design (BOED), quantifying how useful an experiment is by the amount we expect the posterior to differ from the prior. However, evaluating the EIG can be computationally expensive since it generally requires estimating the posterior normalizing constant. In this work, we leverage two idiosyncrasies of BOED to improve efficiency of EIG estimation via sequential Monte Carlo (SMC). First, in BOED we simulate the data and thus know the true underlying parameters. Second, we ultimately care about the EIG, not the individual normalizing constants. Often we observe that the Monte Carlo variance of standard SMC estimators for the normalizing constant of a single dataset are significantly lower than the variance of the normalizing constants across datasets; the latter thus contributes the majority of the variance for EIG estimates. This suggests the potential to slightly increase variance while drastically decreasing computation time by reducing the SMC population size, which leads us to an EIG-specific SMC estimator that starts with a only a single sample from the posterior and tempers backwards towards the prior. Using this single-sample estimator, which we call reverse-annealed SMC (RA-SMC), we show that it is possible to estimate EIG with orders of magnitude fewer likelihood evaluations in three models: a four-dimensional spring-mass, a six-dimensional Johnson-Cook model and a four-dimensional source-finding problem.

Analysis and optimization of seismic monitoring networks with Bayesian optimal experimental design

Monitoring networks increasingly aim to assimilate data from a large number of diverse sensors covering many sensing modalities. Bayesian optimal experimental design (OED) seeks to identify data, sensor configurations or experiments which can optimally reduce uncertainty and hence increase the performance of a monitoring network. Information theory guides OED by formulating the choice of experiment or sensor placement as an optimization problem that maximizes the expected information gain (EIG) about quantities of interest given prior knowledge and models of expected observation data. Therefore, within the context of seismo-acoustic monitoring, we can use Bayesian OED to configure sensor networks by choosing sensor locations, types and fidelity in order to improve our ability to identify and locate seismic sources. In this work, we develop the framework necessary to use Bayesian OED to optimize a sensor network’s ability to locate seismic events from arrival time data of detected seismic phases at the regional-scale.

Mathematical analysis of redistricting in Utah

We discuss difficulties of evaluating partisan gerrymandering in the congressional districts in Utah and the failure of many common metrics in Utah. We explain why the Republican vote share in the least-Republican district (LRVS) is a good indicator of the advantage or disadvantage each party has in the Utah congressional districts. Although the LRVS only makes sense in settings with at most one competitive district, in that setting it directly captures the extent to which a given redistricting plan gives advantage or disadvantage to the Republican and Democratic parties. We use the LRVS to evaluate the most common measures of partisan gerrymandering in the context of Utah’s 2011 congressional districts. We do this by generating large ensembles of alternative redistricting plans using Markov chain Monte Carlo methods. We also discuss the implications of this new metric and our results on the question of whether the 2011 Utah congressional plan was gerrymandered.

Education

Ph.D in Applied Mathematics
Publications:
B.Sc. in Mathematics: Applied and Computational Emphasis
Taken Courses:
  • Deep Learning
  • Natural Language Processing
  • Computer Vision
  • Optimal Control Theory
  • Probability Theory
  • Graph Theory
  • Theory of Predictive Modeling
  • Probabilistic Graphical Modeling
  • Linear Algebra
  • Numerical Methods

Experience

1

Livermore, CA

Sandia is a U.S. national lab conducting science-based technology development to support national security.

Research & Design Intern

May 2021 - Present

Responsibilities:
  • Developed Bayesian Optimal Experimental Design algorithms for seismic sensor placement.
  • Modeled detection accuracy using HPC resources and large-scale geophysical datasets.
  • Presented findings internally; co-authored a technical report on importance sampling for OED.

The University of Arizona

Aug 2023 - Present

Tucson, AZ

The University of Arizona is a tier-1 research university known for strengths in applied math, geosciences, and computational modeling.

Instructor of Record — College Algebra (Math 112)

Aug 2023 - Present

Responsibilities:
  • Designed and delivered lectures, homework, and assessments for a large undergraduate College Algebra course.
  • Taught and supported students in mathematical foundations, problem solving, and functional modeling.
  • Received consistently strong feedback from students for clarity, support, and engagement.
2

3
Brigham Young University

Jan 2021 - Apr 2023

Provo, UT

BYU is a major research university with a strong applied mathematics program focused on computation and modeling.

Research Assistant

Jun 2020 - Apr 2023

Responsibilities:
  • Conducted MCMC-based research on tsunami reconstruction and political redistricting.
  • Developed custom simulation tools and statistical models for historical geophysics and policy analysis.
  • Co-authored a peer-reviewed paper in Statistics and Public Policy.
Mathematics Instructor & TA

Jan 2021 - Apr 2023

Responsibilities:
  • Instructor of record for Math 102-Quantitative Reasoning; designed and taught all lectures, homework, and exams.
  • Led weekly recitation sessions for Calculus II and held office hours for Real Analysis II.
  • Consistently earned glowing student reviews for effective instruction and support.

OrderBoard, Inc.

May 2019 - Jun 2021

Orem, UT

OrderBoard was a recruiting tech startup focused on optimizing job placement in difficult-to-source markets.

Data Scientist

May 2019 - Jun 2021

Responsibilities:
  • Built NLP models to match candidates to job descriptions using free-text data.
  • Contributed to a product pipeline that drove $2M in ARR.
  • Designed systems for ingesting, cleaning, and modeling structured and unstructured data.
4

5
Honeywell

Jun 2020 - Aug 2020

Charlotte, NC

Honeywell is a global technology company working across aerospace, automation, and software sectors.

Automation & Cognitive Services Intern

Jun 2020 - Aug 2020

Responsibilities:
  • Designed a scalable computer vision pipeline to analyze contract documents at terabyte scale.
  • Built automated infrastructure for storage, transformation, and model deployment.

Skills