
(C) Differential expression is implemented in the model via a random log fold change, s, distributed according to the prior ρ( s| θ exp). The parameters of the measurement model are learned on pairs of sequenced repertoire replicates. The mean cell count scales with the number of cells in the sample, M, while the mean read count scales with with the number of cells, m, and the sampling efficiency, M/ N read, with N read the measured number of molecules in the sample. The negative binomial and two-step measurement models are parametrized through a mean-variance relationship specifying the power, γ, and coefficient, a, of the over-dispersion of cell count statistics. We consider 3 forms for P( n| f): Poisson, negative binomial, and a two-step (negative binomial to Poisson) model. (B) Each clone’s frequency f determines the count distribution, P( n| f), that governs its mRNA count statistics in the observed sample. (A) Clone frequencies are sampled from a prior density of power law form with power ν and minimum frequency, f min. Yellow fever serves as model of acute infection in humans and here we present analyses of this data set that highlights the inferential power of our approach to uncover perturbed repertoire dynamics. To guide its development, we have analyzed a longitudinal dataset around yellow fever vaccination (some results of this analysis are published ).


The models we consider are designed to be learnable using RepSeq data, and then used to infer properties of the repertoires of the individuals providing the samples. Our model incorporates known features of clonal frequency statistics and the statistics of the sequencing process.

Here, we take a generative modeling approach to repertoire dynamics. Moreover, the known structure of clonal populations may be leveraged for model-based inference using RepSeq, potentially providing advantages over existing RNAseq-based approaches. The differences between RNAseq and RepSeq data, however, means that direct translation of these methods is questionable. There, approaches are becoming standardized (DESEQ2, EdgeR, etc.) and technical problems have been formulated and partly addressed. Inference of frequency variation from sequencing data has been intensely researched in other areas of systems biology, such as in RNAseq studies. Finally, both the underlying clonal population dynamics and the transformation applied by the measurement is stochastic, each contributing its own variability, making inferences based on sample ratios of molecule counts inaccurate. We also only observe a small fraction of the total number of clones, so some extrapolation is necessary. In either case, a measurement model is needed since what is observed (molecule counts) is indirect. Another regime for model-based approaches is the response to a single, strong perturbation, such as a vaccine, giving rise to a stereotyped, transient response dynamics. Model-based approaches, in contrast, can in principle capture features of the actual repertoire response to, for instance ongoing, natural stimuli, modeled as a point process of infections, and giving rise to diffusion-like response dynamics. ) quantify repertoire response properties using measurement statistics that are limited to what is observed in the sample, rather than what transpires in the individual. Despite the large number of samples (clones) in these datasets lending it to model-based inference, there are few existing model-based approaches to this analysis. Longitudinal repertoire sequencing (RepSeq) makes possible the characterization of repertoire dynamics. Despite large-scale efforts, how repertoire statistics respond to such acute perturbations is unknown. Next generation sequencing allows us to gain access to repertoire-wide data supporting more comprehensive repertoire analysis and more robust vaccine design. Applying our approach to yellow fever vaccination as a model of acute infection in humans, we identify candidate clones participating in the response. We then use that null model as a baseline to infer a model of clonal expansion from two repertoire time points taken before and after an immune challenge.

Immune repertoire sequencing challanges how to#
Using replicate experiments, we first show how to learn the natural variability of read counts by inferring the distributions of clone sizes as well as an explicit noise model relating true frequencies of clones to their read count. Here, we present a general Bayesian approach to disentangle repertoire variations from these stochastic effects. However, quantitative comparison between repertoires is confounded by variability in the read count of each receptor clonotype due to sampling, library preparation, and expression noise. High-throughput sequencing of B- and T-cell receptors makes it possible to track immune repertoires across time, in different tissues, and in acute and chronic diseases or in healthy individuals.
