Preface
This is the second volume in a graduate sequence on statistical computing for biostatistics. The first volume, Statistical Computing in the Age of AI, treats the foundational material a one-quarter graduate course typically covers: programming in R, numerical linear algebra, optimisation, simulation, bootstrap, the standard statistical models, and reproducibility infrastructure. This second volume picks up from where the first concludes and treats the topics that a one-year graduate sequence builds toward.
The book assumes one year of graduate biostatistics study, which means it expects: the linear algebra and probability of a typical first-year sequence, fluency in R at the level of R for Data Science and Advanced R, working knowledge of generalised linear models and mixed-effects models, and a working bootstrap and simulation literacy. Topics in the introductory SCAI volume are therefore prerequisite, not recap.
What this book covers
The 12-chapter structure is organised in five parts:
- Numerical foundations: computer arithmetic and conditioning; numerical linear algebra in depth.
- Optimisation and estimation: advanced optimisation; the EM algorithm and its extensions.
- Monte Carlo and Bayesian computation: Monte Carlo methods in depth; MCMC in depth; modern Bayesian computation with Stan, variational inference, and model comparison.
- Scaling and modelling: high-performance and distributed computing; high-dimensional and sparse methods; machine learning for biostatistics.
- Software engineering and communication: software engineering for statisticians; advanced interactive visualisation.
The chapter list was constructed by surveying advanced statistical-computing syllabi from a dozen major US biostatistics programmes (the survey is documented in docs/syllabi-survey.md) and adopting the topics that appeared in three or more of them. The result is the mainstream curriculum for an advanced graduate computing course, with the addition of explicit AI-collaboration sections in every chapter.
What this book does not cover
The book deliberately omits topics that are typically taught in dedicated methods courses elsewhere in a biostatistics curriculum:
- Causal inference (propensity scores, IV, mediation, marginal structural models).
- Longitudinal data analysis beyond the GLMM material in SCAI.
- Missing-data methods in depth (multiple imputation, MNAR sensitivity, FIML).
- Meta-analysis (network and IPD).
Each of these is its own course; their inclusion here would make the book unusual relative to peers and would dilute the computing focus. Pointers to standard references appear where the topics arise.
Age-of-AI framing
Every chapter has two named structural sections that earn the ‘in the Age of AI’ subtitle. The first, The statistician’s contribution, is front-loaded: it articulates the judgements at the centre of the chapter that no large language model can make on the reader’s behalf. The second, Collaborating with an LLM on topic, is at the end of the chapter and provides specific prompts paired with what to watch for and how to verify. Together they treat AI assistance as an amplifier to be used with discipline, not a replacement for the statistical judgement the rest of the curriculum exists to build.
Advanced material is exactly where the human-LLM division of labour matters most. Foundational chapters can survive a careless prompt; advanced chapters cannot. The framework is applied throughout.
How to read this book
Each content chapter follows the same structure: Learning objectives, Orientation, The statistician’s contribution, content sections (with collapsible Check-your-understanding callouts at natural pauses), Collaborating with an LLM, Exercises, Further reading.
Chapters can be read in order or out of order. Topics with a chain of dependencies (e.g., 06 builds on 05; 07 builds on 06) are noted in the relevant Orientation sections.
Acknowledgements
A peer survey of US biostatistics MS programmes shaped the chapter list. The authors of R for Data Science, Advanced R, R Packages, Bayesian Data Analysis, and Statistical Rethinking established conventions this book inherits. Mastering Shiny and ggplot2 informed the visual-design decisions and the chapter-template layout.