I am planning for three series along with the following diagram

1. Introduction to Bayesian calibration

2. Bayesian calibration meets decision and its application

3. $\text{Simulation-based calibration}^{TM}$ for Bayesian calibration

The goal of this first post:

  • Explore differences between inference and calibration plus the two terms in Bayesian and nonBayesian context
  • Interpret Bayesian update from optimization perspective

1. Inference

1.1 Previous calibration inference

$$
\text{Given the observation model} \;p_1(y|\theta),\
$$

$$
\underset{\theta} {min} \; div(p_1(\tilde{y}|\theta), p(y_{obs}))\
$$
*$divg()$ : divergence

Disadvantage
i) structural inability to separate epistemic (E) uncertainty form aleatoric (A) during calibratiton as it assumes no uncertainty exists in real world data; excluding modeling errors could make conventional calibration methods amplify model output uncertainty if data is biased (Muehleisen)

ii) inseparable contribution of each parameter on failed test

iii) can overfit

1.2. Bayesian calibration inference

$$\text{Given the observation model and prior} \;p_1(y|\theta), p(\theta)(=\pi)$$
Bayesian update’s loss function is
$$L(\nu; \pi, y) = \int l(\theta,y) \nu(d\theta) + d_{KL}(\nu, \pi)$$


$$l(\theta ; y_{1}, \ldots, y_{n})=-\sum_{i=1}^{n} \log {p_1 (y_{i} \mid \theta)}$$


which returns
$$\begin{aligned} \hat{\nu}(\theta) &=\arg \min _{\nu} L(\nu ; \pi, y) \ =\frac{\exp {-l(\theta, y)} \pi(\theta)}{\int \exp {-l(\theta, y)} \pi(\mathrm{d} \theta)} \end{aligned}$$

  • Bayesian update corresponds to minimizing the KL-regularized total-log loss over all distribution from Bissiri13 which introduces general, axiomatic treatment that could be extended to other losses
  • Along with the likelihood conditional on certain distribution over $\theta$, a term that measures the distance between assumed distribution and a given prior is required. This explains why formulations such as the following which lacks its distance from the prior are incorrect.
    $\underset{p(\theta)} {max} \; \int \; p_1(y_{test}|\theta)p(\theta)d\theta \text{ or }
    \underset{p(\theta)} {min} \; \int \; div(p_1(\tilde{y
    }|\theta), p(y_{test}))p(\theta)d\theta$
  • Posterior via Bayesian update is the most likely input parameter uncertainty $p(\theta)$ that yields data’s output uncertainty $y_{test}$
  • Prior term acts as a regularizing term
  • Uncertainty is included in its result enabling further update: starting from a crude uniform or tiragular prior, parameter distribution is iteratively updated to customized and empirical form
  • Different from popular optimization methods; but interpretation in terms of measure optmization is possible
  • Based on this optimization perspective of Bayesian inference, optimization computation strategy can be applied to Bayesian inference (i.e. distribtuion optimization; finding samples that best represent high probability mass i.e. typical set). This is the basis of numerous sampling-based non-convex optimization methods including:
    Stochastic Gradient Descent as Approximate Bayesian Inference which uses SGD as Bayesian posterior method to create a sample that represent typical set. Related to this, SWA has been suggested here and further SWAG
    Non-convex learning via Stochastic Gradient Langevin Dynamics
    Global Non-convex Optimization with Discretized Diffusions
    – Gradient flows and Langevin dynamics view sampling as optimization in the space of measures as a composite optimization problem as introduced in Wibisono18.
    – SA(L)SA(+) algorithm which propose Statistical Adaptive Stochastic Gradient Methods schedules stepsizes adaptively in stochastic optimization and can be applied to sampling schemes with the above connection.
    Viabel package provide robust convergence diagnostics for variational inference based on the similarity of sampling and optimization.

2. Bayesian Calibration

In Bayesian calibration, ground truth itself and the simulated results from it are compared for model diagnose and update.

$\theta’ \sim \pi_{S}(\theta)$
$y \sim \pi_{S}(y \mid \theta’)$
$\tilde{U}=U(a(y), \theta’)$
To be specific, repeatedly simulating model configurations from the prior distribution, observations from the corresponding data generating process, running our analysis to completion, and then scrutinizing the resulting likelihood functions, posterior distributions, and decision outcomes in the context of the simulated ground truth.

The compared results tell how decision making process is within the scope of the modeling assumptions. The utility $U(a(y), \theta)$ and action $a(y)$ are defined specific to diverse problems including prediction, posterior, computation algorithm etc (Betancourt, 2019). This newly proposed workflow uses this Bayesian calibration spirit as well. One example is simulation-based calibration.

Simulation-based calibration

Procedure for Simulated-based calibration (SBC) is as follows:

1. Take one draw of the vector of parameters $\theta’$ from the prior distribution, $\pi(\theta’)$.

2. Take one draw of the data vector $y$ from the likelihood model $f(y|\theta’)$.

3. Take a bunch of posterior draws $\theta^1,…,\theta^M$ of $\theta$ from $g(\theta|y)$. This is the part of the computation that typically needs to be checked.

4. For each scalar component of $\theta$ or quantity of interest $U(\theta)$, compute the quantile of the true value $U(\theta)$ within the distribution of values $U(theta^m)$. For a continuous parameter or summary, this quantile will take on one of the values 0/M, 1/M, …, 1.

If all the computations above are correct, then the result of step 4 should be uniformly distributed over the M+1 possible values. To do simulation-based computation, repeat the above 4 steps N times independently and then check that the distribution of the quantiles in step 4 is approximately uniform.

To rephrase, it compares the ensemble posterior sample and the prior sample using rank statistics $\rho= \Sigma_{m=1}^M I_{U(\theta_{m})<U(\theta’)}$ which is shown to be uniform if there are no problem in your method for $\theta’ \rightarrow y \rightarrow \theta$. The method takes prior distribution, likelihood model $f$, and computation algorithm $g$ as an input.

With uniform ranks, the process passes SBC test but even if it is not uniform, SBC will provide graphical feedback on the source of the error such as the bias direction and dispersion of your method.

Some comments on Bayesian Calibration outside SBC are:

  • Bootstrapping model development, since the improvement happens based on repeated fake-data simulation without the use of any real-data; initial crude model is calibrated by iterating simulate-then-compare. These calibrations can identify the limitations of that model and inform improvements such as more sophisticated observational models, more precise domain expertise, or even more robust methods for computing expectations.
  • Simulate-then-compare with $\bar U_{\mathrm{S}}=\int \mathrm{d} y \mathrm{~d} \theta \pi_{\mathrm{S}}(y, \theta) U(a(y), \theta)$ is the engine for calibration where the model configuration ($\pi_s(y, \theta)$) and $\theta$-divergence ($U$) need to be defined in advance.
  • One disadvantage is that improvement guidelines are not always clear, but Yao18 shows one example on improving approximate compuatation. This motivates this research to substitute the original “iterating simulate-then-compare” with “calibrate-then-choose”.
  • For simulation and utility is needed to measure the difference used to compare the expected performance of the model we constructed; utility function U and model configuration are needed.
  • Relate prior information with uncertainty to future information based on the likelihood of observed output.

3. Literature on Bayesian calibration

The two provide a modern view of Bayesian calibration that includes more stochasticity and computation concepts.

Kennedy 2002 Bayesian calibration of computer models

  • slide summary
  • Take account of all sources of uncertainty:
    Parameter uncertainty, Measurement error, Model discrepancy, Code uncertainty
  • Two information source:
    – Computer model $M(x, θ)$
    – Field data $D_{field}$: noisy measurement collection of reality at a variety of x values; compared with model outputs, $D_{sim} = {M(x_i, \theta_i), i = 1..N}$ .
    discrepancy measure $\delta(x)$ and measurement error $\epsilon$
    $\mathcal{D}_{\text {field }}(x)=M(x, \hat{\theta})+\delta(x)+\epsilon$

Betancourt 2019 Calibrating Model-Based Inferences and Decisions

  • Bayesian inference doesn’t say anything about what decisions to make, it just provides information to make decisions.
  • Bayesian decision theory combines a posterior distribution with a utility function to make a decision based on the information contained within that posterior distribution. The decision analysis in the manual is an example of this.
  • Bayesian calibration considers how effective any decision-making process might be within the scope of the modeling assumptions.

The following misuses the term Bayesian calibration:

Muehleisen 2016 Bayesian Calibration – What, Why And How

  • example of misuse of the term Bayesian calibration as the concept introduced here is Bayesian inference; all the same it explains advantages of Bayesian inference including how it avoids overfitting

More historic views could be found in:

Dawid, 1982 The Well-Calibrated Bayesian

  • Well-Calibration:= relative frequency of an actual event is similar to forecast the probability of the event

    Theorem 1: admissible selection process’ empirical probability converges to average probability with probability 1 (proved with Martingale convergence theorem).

Oakes, 1985 Self-Calibrating Priors Do Not Exist

  • Nonexistence of self-calibrating prior is shown with a counterexample: one P for which calibration property (i.e. limiting relative frequency of actual event equals forecast probability of the event) does not hold

Dawid, 1985 The Impossibility of Inductive Inference (Dawid’s comment for “Self-Calibrating Priors Do Not Exist”)

  • The implication of 2 is explained as “no statistical analysis of sequential data can be guaranteed to provide asymptotically valid forecasts for every possible set of outcomes”
  • However, with stochasticity added, the story becomes different.

The next post!

is about Bayesian calibration decisions aiming to

  • Apply Bayesian calibration to Bayesian decison theory
  • Introduce its application in neural network and reinforcement learning
  • Introduce its application in finance and bioinformatics