What is the difference between Bayesian Model Averaging and Ensemble (model combination methods)?

-Question from [Forecasting in Business in Talks]

Let’s see the difference with the example:

density estimation using a mixture of Gaussians

several Gauessian components are combined probabilistically


gaussian

Model Combination1

bma

Model Combination2

model contains a latent binary variable, z, that indicates which component of the mixture is responsible for generating the corresponding data point


bam

Bayesian Model Averaging

different models indexed by h=1,…, H with prior probabilities p(h)

summation over h is that one model is responsible for generating the whole data set

probability distribution over h reflects our unceratinty as to which model that is

as the size of the data set increases, the uncertainty reduces, and the posterior probabilities p(h|X) becomes increasingly focussed on just one of the models

 

In BMA, the whole data set is generatied by a single model. By contrast, when we combine multiple models, we see that different data points within the data set can be generated from differnt values of latent var. z and hence by different components.

 

References

Bishop, C. M. (2006). Pattern recognition and machine learning. springer.