1 Introduction
1.1 What this book is
A graduate textbook in advanced statistical computing for biostatistics, intended as the second volume in a two-book sequence. The introductory volume, Statistical Computing in the Age of AI, covers programming, numerical linear algebra, optimisation, simulation, bootstrap, the standard statistical models, and reproducibility infrastructure at a one-quarter graduate pace. This second volume picks up where that leaves off.
1.2 What ‘advanced’ means here
Three meanings, in increasing strength.
Deeper treatment of foundational topics. Chapters 1 and 2 (numerical stability, numerical linear algebra) revisit material the introductory volume touches but does not treat in depth. Floating-point arithmetic, condition numbers, sparse and iterative solvers, and BLAS-level performance issues are the load-bearing fundamentals for everything else; an advanced book that omits them is missing a layer.
Topics that exceed introductory scope. Chapters 3 through 7 (advanced optimisation, EM and its extensions, Monte Carlo in depth, MCMC in depth, modern Bayesian computation) extend the introductory treatment of each topic into the territory needed for current methodological research and applied practice with modern tools.
Topics that did not appear in the introductory volume at all. Chapters 8 through 12 (high-performance computing, high-dimensional methods, machine learning, software engineering for statisticians, advanced interactive visualisation) cover ground the introductory volume deliberately did not. Each is the kind of topic a practising biostatistician encounters mid-career and that graduate training should prepare them for.
1.3 What ‘in the Age of AI’ commits the book to
The subtitle is a structural commitment, not decoration. Every chapter has two named sections that exercise it:
The statistician’s contribution. Front-loaded; an explicit articulation of the judgements at the centre of the chapter that no large language model can make on the reader’s behalf. Advanced material is exactly where the human-LLM division of labour matters most: a careless prompt in an introductory bootstrap chapter produces a bug that a re-run will catch, but a careless prompt in an HMC or high-dimensional chapter produces results that look plausible and are wrong.
Collaborating with an LLM on <topic>. End of chapter; specific prompts paired with what to watch for and how to verify. Each prompt is structured as a triple: the prompt itself, the failure modes the LLM may exhibit, and a verification step the reader runs to catch them.
The framework is applied uniformly across the 12 chapters.
1.4 Reading order
Chapters can be read in order or out of order, with the following dependencies:
- Chapters 1, 2 (numerical foundations) underpin all the numerical work in 3 through 7.
- Chapter 3 (advanced optimisation) feeds chapter 4 (EM and its extensions) and chapter 9 (high-dimensional methods).
- Chapter 5 (Monte Carlo) feeds chapter 6 (MCMC) which feeds chapter 7 (modern Bayesian).
- Chapters 10 (machine learning), 11 (software engineering), and 12 (interactive visualisation) are largely independent of each other and of the numerical chapters, except that 10 uses optimisation tools from chapter 3.
A reader following the dependencies in order treats the book as a course; a reader picking topics ad hoc treats it as a reference.
1.5 What this book does not cover
Topics that are typically taught in dedicated methods courses elsewhere in a biostatistics curriculum:
- Causal inference computing.
- Longitudinal data analysis beyond the introductory GLMM material.
- Missing-data computing in depth.
- Meta-analysis.
The book points to the standard references for each as they arise. The peer-syllabus survey in docs/syllabi-survey.md documents the rationale.
1.6 Software environment
The book assumes a current R installation, current Quarto, and the package set used in the introductory volume. New software introduced in this volume is named in the relevant chapter. The companion Biostatistics Practicum volume documents the workflow, infrastructure, and deployment conventions assumed throughout.