Iterative model validation including simulation-based calibration (SBC) and predictive checks are needed as introduced in Bayesian workflow. The aim of this writing is to show the importance of SBC for decision analysis with a thorough anatomy of compoenets that affect the final decision. The motivation was that in most business settings, calibration is focused on predictive rather than posterior. In short, SBC and predictive checks differ in the recovery test target — parameter vs data. SBC uses the self-consistency principle to argue the need for symmetry in $p(\theta|\tilde{\theta})$ while predictive check uses conformity to real data, $p(y|\tilde{y})$. Let us begin!

Road to Decison

There are three sequential propagation steps: posterior, predictive, and decision. $p(\theta), p(y), p(\theta_{D})$ are the endgoal, for two reasons we would be more interested in $p(\theta | \tilde{y}), p(y|\tilde{y}),p(\theta_{D}|\tilde{y})$. First, we don’t know the underlying true distribution. One approach is investigating investigate the convergence of $p(\theta|\tilde{y})$. way would be to track the but tracking their dynamical update could be more realistic in realand there ideal as we are also interested in their dynamic update, rather than the ideal form, conditional on the observed data, our aim is $p(\theta|\tilde{y}), p(y|\tilde{y}), p(\theta_{D}|\tilde{y})$.


  1. $p(\theta|\tilde{y})$

  2. $p(y|\tilde{y}) = \int d\theta p(y|\theta)p(\theta|\tilde{y})$

  3. $p(\theta_{D}|\tilde{y})$ $= P(I_{\underset{\theta_D }{\operatorname{argmax}}E_{p(y| \theta_D, \tilde{y}, \tilde{\theta}_D)}[U(y|\theta_D)]})$

Question: $U(y|\theta_D) vs U(y,\theta_D)$?

decision parameter $\theta_D: (D,\mathcal{D}) \rightarrow(R, \mathcal{R})$. Decision type could be either discrete or continuous; its form could be single or set. model index for a model selection, binary for hypothesis testing, integer for Interger programming etc. observation space $y \in Y$ observation $\tilde{y}, \tilde{\theta}_D$


Refer to this for more information on posterior and predictive checks.

Decision needs further explanation. This example from Stan manual consists of the choice of commute mode and the outcome is a time ($y$). The utility is the function of time which has different distributional parameter ($\theta$) that depends on the commute mode. The parameter does not appear in the final form as it is marginalized in the predictive distribution.

$p(y | \theta_D, \tilde{y}, \tilde{\theta}_D)=\int d\theta p(y| \tilde{\theta}_D, \theta) \cdot p(\theta| \tilde{y})$

The existence of $\tilde{\theta}_D$ should be noted. It is highly likely that past observations are the outcome of the decisions made. For example, for inventory optimization, past observed demands are the censored value of the real demand because of the inventory level constraint — past decision.

Viewing the decision as another parameter increases the complexity but allows us to model its role in data generation. This provides advantage over models where decisions are passively optimized given the previous data; the structure of prior and posterior helps online update through $p(\theta_D|\tilde{\theta}_D, \tilde{y})$ and add prior knowledge or reflect preference with $p(\theta_D)$ .


Need for SBC

SBC has largely two usecases as explained in Appendix. For 2, the advantage of SBC is clear as it is one of the few tools for evaluating the critical but frequently unexamined choice of computational method. This list shows its wide application. However, this writing focuses on 1. We investigate its usefulness in decision making context by suggesting the case where posterior, which are marginalized out for predictive, holds importance in decisions. My approach is to view decision as another model parameter.

Here are specific questions to develop this approach:

  1. Example situations where self-inconsistent model that false positively give decent predictive results can be a problem?

  2. For cases where only good predictive inference is needed, what could go wrong if the model is self-inconsistent?

  3. Could the term $p(\theta_D|\tilde{\theta}_D, \tilde{y})$ be the basis for the need of SBC in decision making?

For 1 or 2, understanding the following two papers which has high focus on predictive beharviors of the model could be helpful. Projection Predictive Inference for Generalized Linear and Additive Multilevel Models A Decision-Theoretic Approach for Model Interpretability in Bayesian Framework

Note) a validation methodology comparison concentrates on parameter distribution in Yao, 2018. It explains VSBC diagnostics could complement PSIS diagnostic when VI posterior produce good point estimates even though the underlying distribution has stark difference with the true posterior.

Appendix. SBC use case

  1. Test prior and likelihood on the basis of computational consistency Any canonical Bayesian Model has the self recovering property, in which averaging over posterior distributions fitted with samples from the prior predictive distribution will always be equal to the prior distribution. SBC uses the above principle and evaluates the combination of the prior and likelihood model under a fixed computation algorithm. Users should choose one computation algorithm in advance, such as full HMC, ADVI, Laplace approximation.

  2. Test approximation algorithms Approximation based Bayesian computation is very promising but one limitation is that it can be hard to diagnose its reliability. For example, full HMC benchmark is needed to measure its error. SBC which evaluates how well an algorithm samples from the posterior distribution, given a model and a prior could be an alternative tool for measuring reliability. $\pi(\theta) \simeq \int \mathrm{d} \tilde{y} \mathrm{~d} \tilde{\theta} \pi_{\text {approx }}(\theta \mid \tilde{y}) \pi(\tilde{y} \mid \tilde{\theta}) \pi(\tilde{\theta})$ Inconsistency is observed when approximate joint or conditional distribution of computation algorithm is not close enough to that of original model ie. $\pi(\theta |y)$ and $\pi_{approx}(\theta ,y)$ are largely different. However, the identity holds true for any $\pi(\tilde{y} \mid \tilde{\theta}) \pi(\tilde{\theta})$.