Approximate Bayesian computation

Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters. Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences, e.g. in population genetics, ecology, epidemiology, and systems biology. The first ABC-related ideas date back to the 1980s. Donald Rubin, when discussing the interpretation of Bayesian statements in 1984, described a hypothetical sampling mechanism that yields a sample from the posterior distribution. This scheme was more of a conceptual thought experiment to demonstrate what type of manipulations are done when inferring the posterior distributions of parameters. The description of the sampling mechanism coincides exactly with that of the ABC-rejection scheme, and this article can be considered to be the first to describe approximate Bayesian computation. However, a two-stage quincunx was constructed by Francis Galton in the late 1800s that can be seen as a physical implementation of an ABC-rejection scheme for a single unknown (parameter) and a single observation. Another prescient point was made by Rubin when he argued that in Bayesian inference, applied statisticians should not settle for analytically tractable models only, but instead consider computational methods that allow them to estimate the posterior distribution of interest. This way, a wider range of models can be considered. These arguments are particularly relevant in the context of ABC. In 1984, Peter Diggle and Richard Gratton suggested using a systematic simulation scheme to approximate the likelihood function in situations where its analytic form is intractable. Their method was based on defining a grid in the parameter space and using it to approximate the likelihood by running several simulations for each grid point. The approximation was then improved by applying smoothing techniques to the outcomes of the simulations. While the idea of using simulation for hypothesis testing was not new, Diggle and Gratton seemingly introduced the first procedure using simulation to do statistical inference under a circumstance where the likelihood is intractable. Although Diggle and Gratton’s approach had opened a new frontier, their method was not yet exactly identical to what is now known as ABC, as it aimed at approximating the likelihood rather than the posterior distribution. An article of Simon Tavaré et al. was first to propose an ABC algorithm for posterior inference. In their seminal work, inference about the genealogy of DNA sequence data was considered, and in particular the problem of deciding the posterior distribution of the time to the most recent common ancestor of the sampled individuals. Such inference is analytically intractable for many demographic models, but the authors presented ways of simulating coalescent trees under the putative models. A sample from the posterior of model parameters was obtained by accepting/rejecting proposals based on comparing the number of segregating sites in the synthetic and real data. This work was followed by an applied study on modeling the variation in human Y chromosome by Jonathan K. Pritchard et al. using the ABC method. Finally, the term approximate Bayesian computation was established by Mark Beaumont et al., extending further the ABC methodology and discussing the suitability of the ABC-approach more specifically for problems in population genetics. Since then, ABC has spread to applications outside population genetics, such as systems biology, epidemiology, and phylogeography. A common incarnation of Bayes’ theorem relates the conditional probability (or density) of a particular parameter value θ {displaystyle heta } given data D {displaystyle D} to the probability of D {displaystyle D} given θ {displaystyle heta } by the rule where p ( θ | D ) {displaystyle p( heta |D)} denotes the posterior, p ( D | θ ) {displaystyle p(D| heta )} the likelihood, p ( θ ) {displaystyle p( heta )} the prior, and p ( D ) {displaystyle p(D)} the evidence (also referred to as the marginal likelihood or the prior predictive probability of the data).

Parent Topic

Child Topic

No Parent Topic