The current model in here is for prediction ie. the result is the optimal parameter value that minimizes the distance between prediction and real data. My goal is to develop this model into Bayesian setting; this includes designing a prior structure. Model parameter’s distributional information should be delivered to the final target which is the expected utility.
Basic outline for Bayesian decision model is as follows:
Few things to note:
- Form of the implementation depends on the cardinality of set D. For discrete and small size D, declare utility in
generated quantities
block and compare the average of each outcome,util[d]
.
- For policy optimization, cost parameter determination and optimization are needed which corresponds to step 3 and 4 from above. However, depending on how expected utility is calculated (analytical integration or stochastic Monte Carlo integration), the result of step 3 can be either the function form (15+10x−2x^2) or actual value calculated by averaging each sample generated with
_rng
functions.
I will show two codes which serve two different purpose: inference (data fitting) + utility prediction and utility optimization. Also, they have different use-cases.
The first is prediction code. It only applies to discrete decision problem. Contrary to the above framework where cost is the variable, we assume the cost is fixed; instead, feature (how much and long it took) is a variable. (c’x)
Inference code for discrete decision problem (for background info, please refere to stan manual )
functions { real U(real c) { return - c; } } # simplified from U(c, t) = -(c + 25 * t) data { int<lower = 0> N; int <lower = 0, upper = 4> d[N]; real c[N]; } parameters { vector[4] mu; vector<lower = 0>[4] sigma; } model { mu ~ normal(0, 1); sigma ~ lognormal(0, 0.25); for (n in 1:N) t[n] ~ lognormal(mu[d[n]], sigma[d[n]]); } generated quantities{ for (k in 1:4) util[k] = U(lognormal_rng(mu[k], sigma[k])); }
If the decision is continuous (eg. optimal state level, frequency of inspection) we need to know the explicit function form of our target. Note that the coefficient of the target function is an averaged values E[c|x] (from Aki’s eg, target function (5000/x 2) ∗ (x − 11) 5000/x^2 is the average number of customer for dinner price x). Therefore, before starting continuous decision optimization (CDO), we need to first infer the cost vector (eg. via interpolation on discrete data ).
The previous model and the second model is different in this sense. Contrary to code 1 where d
is only used as an index and therefore has discrete domain and codomain, d in code 2 is the actual feature and therefore has continuous domain and codomain. It could be viewed as value-index.
In other words, Stan can be used as a function from a choice to expected utilities. Then an external optimizer can call that function. This optimization can be difficult without gradient information. Gradients could be supplied by automatic differentiation, but Stan is not currently instrumented to calculate those derivatives.
Building on the Stan manual’s suggestion, I wish to use new grad_log_prob
function for gradient-based optimization of expected utility model. For this, the blocks should be redesigned: expected utility should be the target
and therefore moved to model
from data
and parameter d, the variable of interest for sensitivity calculation, should be moved to parameter
from data
. However, I don’t know how to design the model
block which allows the final expected utility to be affected by the parameter prior as in code1.
functions { real U(real d) { return - d; } } parameter{ real<lower = 1, upper = 10> d; } model{ ??????? mu ~ normal(0, 1); sigma ~ lognormal(0, 0.25); t ~ lognormal(mu, sigma); target += U(t); }
My question:
- How should the
model
block be designed for continuous decision variabled
if one wish to usegrad_log_prob
for optimization? - If two separate stanfiles (inference and optimization maybe?) are needed for this purpose, what would the input and output of each file be?
This might be related to smart predict then optimize framework suggested by Elmachtoub where cost vector prediction model is constructed with optimization loss, not prediction loss. Four step suggested are as follows:
- Nominal (downstream) optimization problem
- Training data of the form (x 1,c 1),(x2,c 2),…,(x n,c n),
- A hypothesis class H of cost vector prediction models f : X → R d , where cˆ := f(x) is interpreted as the predicted cost vector associated with feature vector x.
- A loss function l(·,·)
Comment is the energy for a writer, thanks!