The current model in here is for prediction ie. the result is the optimal parameter value that minimizes the distance between prediction and real data. My goal is to develop this model into Bayesian setting; this includes designing a prior structure. Model parameter’s distributional information should be delivered to the final target which is the expected utility.

Basic outline for Bayesian decision model is as follows:

Decision analysis 4 step from BDA

Few things to note:

  • Form of the implementation depends on the cardinality of set D. For discrete and small size D, declare utility in generated quantities block and compare the average of each outcome, util[d].
  • For policy optimization, cost parameter determination and optimization are needed which corresponds to step 3 and 4 from above. However, depending on how expected utility is calculated (analytical integration or stochastic Monte Carlo integration), the result of step 3 can be either the function form (15+10x−2x^2) or actual value calculated by averaging each sample generated with _rng functions.

I will show two codes which serve two different purpose: inference (data fitting) + utility prediction and utility optimization. Also, they have different use-cases.

The first is prediction code. It only applies to discrete decision problem. Contrary to the above framework where cost is the variable, we assume the cost is fixed; instead, feature (how much and long it took) is a variable. (c’x)


Inference code for discrete decision problem (for background info, please refere to stan manual )

functions {
  real U(real c) {
    return - c;
  }
} # simplified from U(c, t) = -(c + 25 * t)

data {
  int<lower = 0> N;
  int <lower = 0, upper = 4> d[N];
  real c[N];
}

parameters {
  vector[4] mu;
  vector<lower = 0>[4] sigma;
}

model {
  mu ~ normal(0, 1);
  sigma ~ lognormal(0, 0.25);
  for (n in 1:N)
    t[n] ~ lognormal(mu[d[n]], sigma[d[n]]);
}

generated quantities{
   for (k in 1:4)
     util[k] = U(lognormal_rng(mu[k], sigma[k]));
}

If the decision is continuous (eg. optimal state level, frequency of inspection) we need to know the explicit function form of our target. Note that the coefficient of the target function is an averaged values E[c|x] (from Aki’s eg, target function (5000/x 2) ∗ (x − 11) 5000/x^2 is the average number of customer for dinner price x). Therefore, before starting continuous decision optimization (CDO), we need to first infer the cost vector (eg. via interpolation on discrete data ).

The previous model and the second model is different in this sense. Contrary to code 1 where d is only used as an index and therefore has discrete domain and codomain, d in code 2 is the actual feature and therefore has continuous domain and codomain. It could be viewed as value-index.

 In other words, Stan can be used as a function from a choice to expected utilities. Then an external optimizer can call that function. This optimization can be difficult without gradient information. Gradients could be supplied by automatic differentiation, but Stan is not currently instrumented to calculate those derivatives.

Building on the Stan manual’s suggestion, I wish to use new grad_log_prob function for gradient-based optimization of expected utility model. For this, the blocks should be redesigned: expected utility should be the target and therefore moved to model from data and parameter d, the variable of interest for sensitivity calculation, should be moved to parameter from data. However, I don’t know how to design the model block which allows the final expected utility to be affected by the parameter prior as in code1.

functions {
  real U(real d) {
    return - d;
  }
}

parameter{
      real<lower = 1, upper = 10> d;
}

model{ ???????
      mu ~ normal(0, 1);
      sigma ~ lognormal(0, 0.25);
      t ~ lognormal(mu, sigma);
      target += U(t);
}

My question:

  1. How should the model block be designed for continuous decision variable d if one wish to use grad_log_prob for optimization?
  2. If two separate stanfiles (inference and optimization maybe?) are needed for this purpose, what would the input and output of each file be?

This might be related to smart predict then optimize framework suggested by Elmachtoub where cost vector prediction model is constructed with optimization loss, not prediction loss. Four step suggested are as follows:

  1. Nominal (downstream) optimization problem
  2. Training data of the form (x 1,c 1),(x2,c 2),…,(x n,c n),
  3. A hypothesis class H of cost vector prediction models f : X → R d , where cˆ := f(x) is interpreted as the predicted cost vector associated with feature vector x.
  4. A loss function l(·,·)