Optimal exploration of model+algorithm space is my research interest. We, modelers, face this problem every day as we add, subtract, and transform the predictors to our time-series, linear regression, decision tree models. This partially ordered process constructs a network that spans on the continuous model space — manifold. Moreover, the algorithm which I define as posterior simulator1. Further discussion could be found in model topology (sec.7.4) and model exploration (sec.12.1) of this paper by Gelman (2020).
Bayesian statistics and computation, Optimal transport, and Timeseries filter happen on probability measure space and its geometric interpretation could guide the model+algorithm exploration. So how does the abstract structure turn into the specific guide that modelers could use? Here are some examples.
Model, a point on the model manifold, is transported to the new model point along a flow induced from the structure we equip on the manifold. The range and structure of this manifold depends on our model restriction; if we limit out scope to Gaussian distribution models, the manifold becomes hyperbolic space with negative curvature while multinomial distribution group reside on sphere manifold. Also, I view the term transport as passive update in that the optimal trajectory that model follows is predetermined by the global structure. A point (q,p) transported by the Hamiltonian flow2 in the phase space is one example.
Other examples include
- differential structure allows geodesic to be represented as differential equations such as Fokker Planck equation
- toric structure enables combinatorial calculations based on which practical calculations can be performed (need more comprehension)
The following are my research questions and Stan simulation-based calibration (SBC) conference (2021/08/31) where the simulation-based data exploration will be presented (joint work with Andrew Gelman on this direction) along with the tutorial of this SBC library is the starting point.
- how best to discretize the continuous flow
As a modeler, only a finite number of models could be inspected. How best to construct model candidate sets from which the modeler could choose from?
- workflow along the constraint-respecting flow
Modelers (especially scientists) would not just like to fit their data, but rather obtain models that are understandable and follow invariant and conservation principles. Structure, mass, momentum, and energy are the main objects of these principles with specific examples as follows: exchangeable structure of conditional independence (de Finetti), order-preserving of input and output, the symmetric structure of joint $\theta, y, \theta’$ space in SBC, flow amount in network’s cut, molecular mass in a chemical reaction, adsorption rate in chromatography, and the sum of potential and kinetic energy in HMC. This talk by Cranmer discuss this from a physicist’s perspective. Another example is geometric deep learning in this talk by Bronstein which relates equivariant relations observed in five popular data structures: grid, group, graph, geodesic, and gauge. Approaches to designing a low dimensional and interpretable latent space (e.g. variational autoencoder) share this view; however, this constraint-respecting flow has not been discussed much in the model development workflow.
- early stop model update once entered the outcome-consistent region
Practically, only a certain parameter functional is our interest. For instance, after all the hectic of sampling, optimizing, and pushforwarding, every process is translated into the language of ‘Decide between A or B’ 3. The following figure titled multiverse analysis from Gelman20 shows that moderately fitted models tend to have consensus on the conclusion; colors within each row, our interest, are homogeneous. The object of choice is not a model but the conclusion models give and therefore early stopping the model update once it enters the region that guarantees the consistent outcome would be economic. I would dub this as an outcome-consistent region demarcation problem.
- outcome-consistent region gradient descent
Once the outcome-consistent region (OCR) is demarcated, our goal becomes reaching the region with few updates as possible. Gradient descent with the loss function that measures the distance between a current model and OCR $inf(d(m,r) | r \in OCR)$ helps achieve the goal. This is different from approaches in this paper where the output is pointwisely checked (simulate the outcome at every form of a model) along the model trajectory that is not guided by the outer force. - Publications could encourage reporting model trajectory.
Further references:
Geometry
- Decision geometry (Dawid and Lauritzen, 2005)
- Geometrical Insights for Implicit Generative Modeling
- Differential geometry for machine learning
Optimal transport
- Wasserstein GAN and the Kantorovich-Rubinstein Duality: https://vincentherrmann.github.io/blog/wasserstein/
- this concept is general enough as deterministic solution corresponds to the simulator with zero variance
- formally, on the space equipped with $(M, \omega, H)$, exists hamiltonian flow $ dH ( . )= \omega(X_{H}, . )$ ; likewise the space equipped with (M, <>, f) has a gradient flow df(.) = <gradf, .>
- Even if the short-term goal is passive inference, why would you infer if you did not have the motivation to do some action?
Comment is the energy for a writer, thanks!