What is the difference between Bayesian Model Averaging and Ensemble (model combination methods)?
-Question from [Forecasting in Business in Talks]
Let’s see the difference with the example:
density estimation using a mixture of Gaussians
several Gauessian components are combined probabilistically
model contains a latent binary variable, z, that indicates which component of the mixture is responsible for generating the corresponding data point
different models indexed by h=1,…, H with prior probabilities p(h)
summation over h is that one model is responsible for generating the whole data set
probability distribution over h reflects our unceratinty as to which model that is
as the size of the data set increases, the uncertainty reduces, and the posterior probabilities p(h|X) becomes increasingly focussed on just one of the models
In BMA, the whole data set is generatied by a single model. By contrast, when we combine multiple models, we see that different data points within the data set can be generated from differnt values of latent var. z and hence by different components.
References
Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
Comment is the energy for a writer, thanks!