Next month I shall be attending a workshop at the American Institute of Mathematics on Singular learning theory: connecting algebraic geometry and model selection in statistics . The participants were invited to share their goals for the meeting. I can’t show you the other contributions, but here is mine.
My main interest is in statistical computing and model choice for Bayesian hierarchical models. In the last 10 years, the Deviance Information Criterion (DIC) has become a popular model choice criterion, largely because it is easy to calculate using Markov Chain Monte Carlo (MCMC) methods and, in particular, because it is implemented in the popular OpenBUGS software.
DIC extends the Akaike Information Criterion to Bayesian hierarchical models by replacing the number of parameters p with an estimated “effective number of parameters'” pD. DIC was introduced with only a heuristic justification by Spiegelhalter et al (2002). My own work (Plummer 2008) is an attempt to establish a rigorous foundation for DIC. This work suggests that DIC is an approximation that requires certain asymptotic conditions for its validity. In particular, a necessary (but not sufficient) condition is pD << n where n is the sample size. Thus it seems plausible that DIC is being mis-applied in models where the asymptotic conditions do not hold. Moreover, we currently lack an easily computable criterion for such models.
A second issue of interest to Bayesian statisticians is how to extend DIC to models with missing data. In the Bayesian approach, missing data and unknown parameters are treated symmetrically as unobserved random variables. However, in the model choice problem, we may wish to treat the missing data as a nuisance, and the model parameters as the “focus” of interest. Finite mixture models are an important test case for this problem. An extensive survey of possible solutions was provided by Celeux et al (2006) but the question remains unresolved. Again, the problem is to find a criterion that is both theoretically sound and computationally feasible.
I hope to gain some insight into these issues from the workshop by discussing parallel developments in machine learning.
- Celeux, G., Forbes, F., Robert, C. and Titterington, M. (2006). Deviance information criteria for missing data models. Bayesian Analysis 1: 701-706.
- Plummer M. (2008) Penalized loss functions for Bayesian model comparison. Biostatistics 9:523-539.
- Spiegelhalter D.J., Best, N., Carlin, B. Van der Linde A. (2002) Bayesian measures of of model complexity and fit (with discussion) Journal of the Royal Statistical Society, Series B, 64: 583-639.