My question began from how and where could the idea of lifting or augmentation be used cleverly like HMC? Adding the momentum variable helps structured sampling in HMC. I listed concepts that I thought could connect; the first part of this post describes a static relationship created by lifting between the original (q) and its new coordinate (q,p). The second part lists how this relation could be developed into a converging process through “lift then project” iteration. The third part plans to present how the second strategy be used in optimization and verification. This is the image I have in mind for now:
1. Relation created by lifting
Origin X $\xrightarrow{\text{Lift}\; f}$ Augmented (X, f(X))
- newly created structure as the following on augmented space prompts 2 below i.e. efficient update.
- Convex structure through dual variable: Lagrangian-related
- Vector flow structure through dual variable: HMC, Augmented neural ODE(?)
- Markov structure through extra state: CVaR MDP, Stopping time
- Sampling structure through latent var: EM, Imputation
- Efficiency and diversity through copy: Coupling, Splitting
algorithm | Original, Aug (=Ori., New), New | New | Effect | Constrain/Invariant/Conserve | Ref |
---|---|---|---|---|---|
SBC | $\theta, (\theta, y)$ | data | verification, calib. | $(\theta, \theta’)$ symmetry | |
Coupling | $\theta(w), (\theta_1(w), \theta_2(w))$ | rng-share process | unbias, var. reduction | Maximal Couplings of the Metropolis–Hastings Algorithm | |
Splitting | $\theta(w), (\theta_1(w), \theta_2(w)?)$ | ? | |||
Copula | $x_a, (x_a, x_b)$ | param. with dependence | dependence inf, calib. | Uniform marginal | |
Info. bottleneck | $x, (x,\tilde{x})$ | encode of the source | $p(\tilde{x}/x)$ optimal assignment | self-consistent eq. for $X \rightarrow \tilde{X},\tilde{X} \rightarrow Y$ coding marginal for p$(y/\tilde{x}), p(\tilde{x})$ | Tishby2000(Thm.4) |
EM | $\theta, (\theta, z)$ | latent variable | sampling is easier for $p(\theta/z)$ than $p(\theta )$ | $p(\theta) =\int p(\theta/z)dz$ | |
Imputation | $\theta, (\theta, z)$ | data augmentation | |||
CVaR MDP1 | $X, (X , C)$ | running cost state | Bäuerle(2011), Huang(2016), Miller(2017), Chow(2018), Backhoff-Veraguas(2020) from Min20 | ||
CVaR MDP2 | $X, (X , Q)$ | risk aversion quantile state | CVaR dual | Pflug(2016), Chow et al. (2015), Chapman(2018), Li(2020), Min20 | |
Lagrangian dual | $x, (x,\lambda)$ | coeff. dual variable | dual’s convex strc. | dual ineq. infsup>=supinf | |
Aug. Lagrangian or ADMM | $x, (x, \lambda)$ or $x,z, (x,z, \lambda)$ | coeff. dual variable | convex strc.,better convergence | dual ineq. | A generalized risk budgeting approach to portfolio construction |
Stopping time | $X, (\alpha,X_\alpha)$ | stopping time | Markovian strc. | Chung, Intro to prob. Ch.9 | |
HMC | $q, (p,q)$ | momentum in phase space | Vector flow structure | Hamiltonian symplectic vol. | |
Augmented Neural ODE | $x, (x_1, x_2)$? | feature | flexible feature mapping | Augmented Neural ODEs | |
Augmented preference | ? | augmented preference | infer preference w.o. util. ftn. | partial order | Boyd, Convex Optimization Ch.6 |
2. Lift then project iteration
Origin $X_t \xrightarrow[\text{Lift}]{f} $ Augmented $(X_t, f(X_t)) \xrightarrow[\text{Project}]{f^{-1}}$ Origin $X_{t+1}$
- used for efficient update (~ calibration) or verification via model boostrap
- verification is my aim developing on attractor idea in coupling, copula, calibration
- similar to Source ($X_t, P$) $\xrightarrow[\text{Compress}]{f}$ Aug. $(X_t, f(X_t)) \xrightarrow[\text{Decompress}]{P(X_{t+1}|f(X_t))}$ Receiver $X_{t+1}$?
algorithm | Update, target | Lift f | Projection $f^{-1}$ |
---|---|---|---|
SBC | $\theta_t \xrightarrow{(\theta_t, y_t)} \theta_{t+1}$, joint dist.simulator | $p(\theta/y)$ data simulator | $p(y/\theta)$ posterior simulator |
Copula | $x^a_t \xrightarrow{(x^a_t, x^b_t)} x^a_{t+1},x^b_t \xrightarrow{(x^a_t, x^b_t)} x^b_{t+1}$, copula ftn. $C(x_a,x_b)$ | ||
Info. bottleneck | $x_t \xrightarrow{(x_t, \tilde{x}t)} x{t+1}$, $p^{*}(\tilde{x}/x)$ | ||
EM | $\theta_t/Y \xrightarrow{(\theta_t, z_t)/Y} \theta_{t+1}/Y$, $p(\theta/Y)$ | $p(\theta / z)$ | |
Imputation | $\theta_t \xrightarrow{(\theta_t, z_t)} \theta_{t+1}$, $p(\theta)$ | $Z_{t+1} \sim P (Z / Y, θ_t)$ | $θ{t+1} \sim P (θ / Y, Z{t+1})$ |
CVaR MDP1 | $X_t \xrightarrow{(X_t, C_t)} X_{t+1}$, optimum | ||
CVaR MDP2 | $X_t \xrightarrow{(X_t, Q_t)} X_{t+1}$, optimum | ||
Lagrangian dual | $x_t \xrightarrow{(x_t,\lambda_t)} x_{t+1}$, optimum | $f(x) +\lambda g(x)$ | |
Aug. Lagrangian or ADMM | $x_t \xrightarrow{(x_t, \lambda_t)} x_{t+1}$ or $x_t,z_t \xrightarrow{(x_t,z_t, \lambda_t)} x_{t+1}, z_{t+1}$, optimum | ||
HMC | $q_t \xrightarrow{(p_t,q_t)} q_{t+1}$, typical set | $q \rightarrow \pi^{-1}(q)$ | |
Augmented Neural ODE | $x^a_t \xrightarrow{(x^a_t, x^b_2)} x^a_{t+1}$ (Incorrect), ? |
3. Verify or optimize via lift-then-project iteration
tbc.
—
Quotes from papers
- combining the augmented Lagrangian approach with MCMC sampling to generate a point in the proximity of the global optimum of the GRB problem. sample points with a higher objective function value and simultaneously drive the sample path in the direction of the feasible region using the augmented Lagrangian terms. – A generalized risk budgeting approach to portfolio construction
- by introducing an extra state variable, an optimal policy can be sufficiently characterized as a Markov process defined on this augmented state space. – Min’s paper
- augment the known preferences (6.22) with the inequality u(ak) ≤ u(al). If augmented set of preferences is infeasible, it means that any concave nondecreasing utility function that is consistent with the original given consumer preference data must also satisfy u(ak) > u(al); conclude that basket k is preferred to basket l, without knowing the underlying utility function. – Boyd’s convex optimization textbook p.341
Q1. From the quote in Boyd’s textbook, it seems the role of “Augmented preference” is feasibility certificate, but its role is tricky to understand. Especially does ‘augment’ mean sampling in this context?
Q2. Could there be any relationship between splitting and coupling? Do you agree they have similar flavor; the only difference being the direction of the bifurcation –= vs =–?
Here are some related references that I am going through:
- Geometry of regularized optimal transport from here
- Information projection from Nielsen’s lecutre
Comment is the energy for a writer, thanks!