automatic differentiation

Use cases:

map, mle estimators with gradient descent or quasi Newton methods (1)
posterior sampling with HMC (log density, 1 for Euclidean, 3 for Riemannian)
standard error, posterior cov. with Laplace approximation (log density, 2)
max. marginal dist, marginal a posteriori with Monte Carlo expectation method (E_mc, 1)
variational inference with stochastic gradient descent based on nested expectation (E_mc, 1)
sensitivity w.r.t data, hyperparameters, parameters

Two types of automatic differentiation(AD) exist: forward and reverse.

\dot{u} — forward is input-centered(starting from x), reverse is output centered(starting from y)

\bar{u} — forward is input-centered(starting from x), reverse is output centered(starting from y)

if b = f(a), tangent is b_dot = ~ while adjoint is a_bar. Recall the philosophy FW = output while RV = input centered, as their are R^M-> R and R -> R^N.

Reverse: Adjoints are propagated in the reverse order of function computation, hence the name of the method.

Without explicitly calculating the gradient of an expression, we could derive same result with vector operations; followings are basic operation results of <u, u’> and <v, v’>.

For example, final dual number’s tangent(<f, f’>) equals gradient-vector product (grad_f · v).

For reverse mode,

as there could be multiple incoming gradient component, a could have other input source c for example, cumulative derivatives are used.

Stan exposes Hessian and Hessian vector products in the modelfit class with which $ latex H_eta,eta$ can be calculated. Refer to the following Sensitivity post.

mode	forward	reverse
derivative	$\dot{u}$ (du/dx)	$\bar{u}$ (dy/du)
map	R -> R^N	R^M -> R
rule	dual v, tangent prop	adjoint prop

Categories

Comment is the energy for a writer, thanks!Cancel reply