Applying the advantage of Bayesian paradigm(hierarchical modeling, coherent treatment of uncertainty) to big data setting is active in variational Bayes (VB).
In VB, marginal likelihood’s variational lower bound is an objective function and SVI applies stochastic gradient descent (SGD) to it.
SG is computed for a single or small subset (batch) of data points (documents) at a time, the posterior being targeted is a posterior for D data points.
approximate Bayes that is scalable like SVI but streaming in that it yields an approximate posterior for each processed collection of data points; sequence of posteriors, not approximation to fixed posteriors, is produced.
streaming in that posterior from b-1 minibatches are saved and normalizing constant for the bth posterior is calculated, reapply eq 1. give new posterior without recalculating old data points.
- streaming bayes
q(Θ) = A(C, p(Θ)): approximation algorithm A that calculates an approximate posterior q
2. distributed bayes
3. asynchronous bayes
global: xi_post, local: xi_loc, approximate: xi 1) collect new minibatch C 2) xi_loc <- xi_mater // copy master posterior value 3) xi <- A(C, xi_loc) // compute local approximate posterior 4) return to xi_mater, diff_xi <- xi - xi_loc
xi_post can change at master node as diff_xi is being computed at worker node.
posterior approximation algorithm
MFVB
factorize the joint distribution, stochastic gradient descent is viewed as streaming algorithm. optimization depends on D via posterior on D data points.
Comment is the energy for a writer, thanks!