I wish to solve multimodal distribution optimization in two stage approach. Example usecase is newsvendor problem; order quantity should be optimized for the multimodal-shaped demand due to its mixed customer types or high seasonality. By two stage approach, I mean a support division approach where random variable is an indicator variable for divided region of a support (e.g. certain range of demand); this has the effect of preventing robust solution being too conservative by making final objective, worst case cost, as the combination of respective worst case cost of subproblems (approach in robust optimization1). Based on the experiment which compared this method with two existing models that assumed full or no distributional information at the decision stage, this approach outperformed the other two especially for multimodal distribution (korean preprint and its profit comparison plot below).

Below are few questions I need to address to further this research.

  1. Could I address variational approximation’s uncertainty underestimation problem with support division approach? This might prevent zero-avoiding property by forcing the probability to intermediate zero regions (normalization may be needed) and therefore keep the approximation to concentrate on one peak with the largest variance. In other words, inflating the denominator of KLD, true distribution density, could lead to lower penalty and improve compactness and concentration.
  2. What is the role and effect of time in applying variational approximation? Mean-field variational posterior becomes extremely concentrated in the existence of strong correlation. For example, temporally factored variational methods for time series models will result in much narrower posterior approximation. I wish to focus on these case and truly understand phrases like “compactness property of variational inference leads to a failure to propagate posterior uncertainty through time” and “mean-field approximations are most over-confident exactly when they are poorest” from Turner10. Comparing with its application on clustering would help.
  3. Could high seasonality be modeled as mixture models?
    This would be the extension of my previous research, Mixed pooling of seasonality for time series forecasting  [Paper] [Code].

This is the two stage approach, stochastic formulation provided in this paper.

range forecast for travel time on arc (i,j) of vehicle k is given as [\bar{t}^{\xi}-d^{\xi}, \bar{t}^{\xi}-d^{\xi}]

This is the profit comparison plot for multimodal distribution from preprint: Interval divide (support division) approach has highest performance.

profit ratio of three model by \beta (margin ratio)

This is a nice introduction on mixture models implementation in Stan by Michael Betancourt.

  1. In robust optimization, to address the problem of extreme conservatism of worst case scenario, the following have been attempted. Constraint relaxation approach where \lambda constraints out of m constraints are allowed to be violated from Bertsimas04. Another is support division (or two stage) approach where final worst case is the combination of respective subproblem’s worst case by adding range index random variable. Section 3 of this paper reframe the vehicle routing problem for cases where accurate range forecasts of each range being realized is possible.