Good reference on Bayesian techniques
Reviewed in the United States 🇺🇸 on November 5, 2017
Almost every statistical literature I've seen that has any mention of bayesian analysis references this book. This is what brought me into finally purchasing a copy and reading it almost cover to cover.
First I want to comment on the bayesian vs frequentist debate, and why one may want to use bayesian methods. Anyone who objects to bayesian paradigm on the basis of subjectivity has to realize that all statistical models are subjective. The decision to use a linear model, logistic regression, or normal distribution for your data, to list a few examples, are subjective decisions. It's no more subjective than putting a prior on your parameters. A prior doesn't have to be very informative, but can encode reasonable range of values for the parameters, such as person's height is between 0 and 10 feet, or that the number of siblings is less than 100, rather than having data completely determine the parameters. When properly incorporated, prior knowledge will help produce more precise parameter estimates.
However Bayesian analysis is more than just incorporating prior knowledge into your models. It provides probability distributions on the parameters, instead of asymptotic interval estimates. It provides an automatic way of doing regularization, without a need for cross validation. This allows one to estimate more parameters than classical frequentist models can handle, and even deal with cases when p >= n. Another advantage is relaxing independence and identical distribution assumption, as hierarchical bayesian models automatically build dependence between observations, similar to latent variables in classical statistics.
So in my opinion classical statistics already incorporates bayesian ideas through subjective selection of parametric models, practice of regularization such as ridge regression and lasso, and dependence through latent variable models, although it's done in somewhat ad-hoc manner. Bayesian statistics formalizes these notions within probability theory, and together with simulation, allows easy extensions of them in various non-trivial directions.
Now about this book. It covers all these advantages of bayesian methods and more, although sometimes requires considerable effort from the reader to uncover and pull out the relevant concepts. It's definitely not meant to be an introduction to statistics. It's assumed the reader is well versed in classical statistics and has a good grasp on topics such as hypothesis testing and interval estimation, sufficient statistics and the exponential family, MLE and it's asymptotic properties, EM algorithm, and generalized linear models, to name a few. Also I think that bayesian methods require a deeper intuition in probability theory and involve more computation and approximation techniques to build even simple models. Considering the background needed it's likely that the reader would have had a considerable prior exposure to bayesian techniques, and I think this is the target audience that the authors had in mind when writing this book.
The book is definitely tough on the first reading, especially if this is your first book entirely devoted to this subject. But reading it is well worth the effort. It covers a lot of details and subtleties of bayesian approach that are not well emphasized in books devoted to general statistics and machine learning.
The book is of applied nature, written in a way that every applied book should be. There is enough discussion of the theory in order to understand, apply, and extend the described methods. Each chapter is followed by a small section discussing the relevant references if you need to follow the theory in more detail. The authors make a great use of non-trivial examples that show the implementation details and possible complications in the discussed models. In addition, there's an appendix covering computations with R and Stan software.
The first five chapters present a solid, if somewhat terse, introduction to general bayesian methods, including asymptotics and connection to MLE, and culminating in hierarchical bayesian models in chapter 5. Two chapters follow on the important topic of model testing and selection. Chapter 8 covers data collection, and while it's a fascinating read and a novel idea if you've never seen it before, I think it could be skipped on the first reading without affecting much the understanding of further chapters.
Chapters 10-13 deal with simulation and analytic approximations, two central tools for bayesian analysis, because for most practical models direct analytic expressions are intractable. The authors provide a good overview of the rejection sampling, Gibbs, and Metropolis-Hastings algorithms. The explanations are enough for basic implementations. Chapter 13 introduces approximations around posterior modes. There is a very intuitive explanation of the EM algorithm along with it's mathematical derivation. This is followed by variational inference and expectation propagation, approximations which are based on the Kullback-Leibler divergence.
Up to this point in the book is a solid overview of bayesian inference, model checking, simulation and approximation techniques. Further chapters are mixed in the level of presentation and content.
The second half of the book deals with regression. The chapters here become terser and the language less precise. The level of presentation deteriorates towards the end, where in my opinion the chapters on non-parametric models are almost impossible to understand without some prior exposure. There are more sections that require multiple re-readings and places where I feel reading the references prior to the book is a good idea (such as dirichlet processes). However I do think that the chapters on robust inference and finite mixture models were exceptionally good.
I was disappointed that only 2 pages were devoted to regularization and variable selection in linear regression. In my opinion bayesian techniques provide powerful alternatives to classical regularization methods, where instead of choosing the regularization hyperparameters through cross validation, we marginalize over it, thus effectively taking an average over all possible regularizations. Although authors do spend more time on regularization in the context of the basis function selection in chapter 20, I feel it's a pity they didn't choose to devote more space to it in linear regression setting.
Some other small negative things about the book in my opinion are:
- constantly referring to later chapters in the book
- various small typos/mistakes that detract from reading
- presentation of expectation propagation in chapter 13 is confusing and no mention is made that it's related to minimizng Kullback-Leibler divergence
- no mention of relevance vector machines for basis function selection in chapter 20
- no mention of bayesian dimensionality reduction and factor models
However I think that the excellent presentation in the first half of the book alone makes it well worth studying. It's use as a reference far outweighs it's shortcomings as an introduction, and I'm sure I'll be picking it up countless times when reading other bayesian material. I highly recommend this book for anyone with classical statistics background looking to understand bayesian methods in depth.
52 people found this helpful