This study was approved by the Western Institutional Review Board, and all participants provided informed consent.
Participants
All oncampus students and employees of the University of Illinois at UrbanaChampaign are required to submit saliva for RT–qPCR testing every 2–4 days as part of the SHIELD campus surveillance testing programme. Individuals testing positive were instructed to isolate and were eligible to enrol in this study for a period of 24 h following receipt of their positive test result. Close contacts of individuals who test positive (particularly those cohoused with them) are instructed to quarantine and were eligible to enrol for up to 5 days after their last known exposure to an infected individual. All participants were also required to have received a negative saliva RT–qPCR result 7 days before enrolment.
Individuals were recruited via either a link shared in an automated text message providing isolation information sent within 30 min of a positive test result, a call from a study recruiter or a link shared by an enroled study participant or included in information provided to all quarantining close contacts. In addition, signs were used at each testing location and a website was available to inform the community about the study.
Participants were required to be at least 18 years of age, have a valid university ID, speak English, have Internet access and live within 8 miles of the university campus. After enrolment and consent, participants completed an initial survey to collect information on demographics and health history and were provided with sample collection supplies. Participants who tested positive before enrolment or during quarantine were followed for up to 14 days. Quarantining participants who continued to test negative by saliva RT–qPCR were followed for up to 7 days after their last exposure. All participants’ data and survey responses were collected in the Eureka digital study platform. All study participants were asked whether they had previously tested positive for SARSCoV2 or been vaccinated against SARSCoV2. All participants included in this cohort reported no previous SARSCoV2 infection and were unvaccinated at the time of enrolment.
Sample collection
Each day, participants were remotely observed by trained study staff, who collected the following samples.

(1)
Saliva (2 ml), into a 50ml conical tube

(2)
One nasal swab from a single nostril using a foamtipped swab that was placed within a dry collection tube

(3)
One nasal swab from the other nostril using a flocked swab that was subsequently placed in a collection vial containing 3 ml of viral transport medium (VTM). Swab and VTM manufacturer were not changed throughout the study.
The order of nostrils (left versus right) used for the two different swabs was randomized. For nasal swabs, participants were instructed to insert the soft tip of the swab at least 1 cm into the indicated nostril until they encountered mild resistance, rotate the swab around the nostril five times and leave it in place for 10–15 s. After daily sample collection, participants completed a symptom survey. A courier collected all participant samples within 1 h of sampling using a nocontact pickup protocol designed to minimize courier exposure to infected participants.
Saliva RT–qPCR
After collection, saliva samples were stored at room temperature and RT–qPCR was run within 12 h of initial collection in a Clinical Laboratory Improvement Amendments (CLIA)certified diagnostic laboratory. The protocol for the covidSHIELD direct salivatoRT–qPCR assay used has been detailed previously^{24}. In brief, saliva samples were heated at 95 °C for 30 min followed by the addition of 2× Tris/Borate/EDTA buffer (TBE) at a 1:1 ratio (final concentration 1× TBE) and Tween20 to a final concentration of 0.5%. Samples were assayed using the Thermo Taqpath COVID19 assay.
Antigen testing
Foamtipped nasal swabs were placed in collection tubes, transported in cold packs and stored at 4 °C overnight based on guidance from the manufacturer. The morning after collection, swabs were run through the Sofia SARS antigen FIA on Sofia devices according to the manufacturer’s protocol.
Nasal swab RT–qPCR
Collection tubes containing VTM and flocked nasal swabs were stored at −80 °C after collection and were subsequently shipped to Johns Hopkins University for RT–qPCR and virus culture testing. After thawing, VTM was aliquoted for RT–qPCR and infectivity assays. One millilitre of VTM from the nasal swab was assayed on the Abbott Alinity, according to the manufacturer’s instructions, in a College of American Pathologist and CLIAcertified laboratory.
Calibration curve for nasal swab RT–qPCR assay
Calibration curves for Alinity assay were determined using digital droplet PCR (ddPCR) as previously described^{56}. Nasal swab samples previously quantified using the Alinity assay were stored in a freezer at −80 °C between initial quantification and extraction for calibration curves. Samples were extracted simultaneously using the Perkin Elmer Chemagic 360 automated extraction platform, with sample input and eluate volumes of 300 and 60 µl, respectively. RNA eluates were stored at −80 °C. Digital droplet RT–PCR was performed following the BioRad EUA assay package insert (https://www.fda.gov/media/137579/download). A master mix was prepared per sample using the reagents provided in the ddPCR Supermix for Probes kit as follows: 5.5 µl of SuperMix (BioRad), 2.2 µl of reverse transcriptase (BioRad), 1.1 µl of dithiothreitol (BioRad), 1.1 µl of CDC triplex SARSCoV2 primer and probe mix (IDT) and 7.1 µl of nucleasefree water; 17 µl of master mix was then transferred to a 96well PCR plate and combined with 5 µl of RNA in eluate, and the plate was then loaded on to a QX200 automated droplet generator (BioRad). The dropletcontaining plate was then heat sealed with foil in a plate sealer (BioRad) and placed on a C1000 Touch thermal cycler (BioRad) to perform reverse transcription and amplification. Droplets were read using the QX200 droplet reader (BioRad). Data were analysed with QuantaSoft Analysis Pro 1.0 software.
Virus culture from nasal swabs
VeroTMPRSS2 cells were grown in complete medium (CM) consisting of DMEM with 10% foetal bovine serum (Gibco), 1 mM glutamine (Invitrogen), 1 mM sodium pyruvate (Invitrogen), 100 U ml^{–1} penicillin (Invitrogen) and 100 μg ml^{–1} streptomycin (Invitrogen)^{57}. Viral infectivity was assessed on VeroTMPRSS2 cells as previously described using infection medium (identical to CM except that FBS is reduced to 2.5%)^{26}. When a cytopathic effect was visible in >50% of cells in a given well, the supernatant was harvested. The presence of SARSCoV2 was confirmed through RT–qPCR, as described previously, by extracting RNA from the cell culture supernatant using the Qiagen viral RNA isolation kit and performing RT–qPCR using N1 and N2 SARSCoV2specific primers and probes, in addition to primers and probes for the human RNaseP gene with the CDC researchuseonly 2019Novel Coronavirus (2019nCoV) Realtime RT–PCR primer and probes sequences, and utilizing synthetic RNA target sequences to establish a standard curve^{58}.
Viral genome sequencing and analysis
Viral RNA was extracted from 140 µl of heatinactivated (30 min at 95 °C, as part of the protocol detailed in ref. ^{24}) saliva samples using the QIAamp viral RNA mini kit (Qiagen); 100 ng of viral RNA was used to generate complementary DNA using the SuperScript IV first strand synthesis kit (Invitrogen). Viral cDNA was then used to generate sequencing libraries utilizing the Swift SNAP Amplicon SARS CoV2 kit with additional coverage panel and unique dual indexing (Swift Biosciences), which were sequenced on an Illumina Novaseq SP lane. Data were run through the nfcore/viralrecon workflow (https://nfco.re/viralrecon/1.1.0) using the WuhanHu1 reference genome (NCBI accession NC_045512.2). Swift v.2 primer sequences were trimmed before variant analysis from iVar v.1.3.1 (https://doi.org/10.1186/s1305901816187), retaining all calls with a minimum allele frequency of 0.01 and higher. Viral lineages were called using the Pangolin tool (https://github.com/covlineages/pangolin) v.2.4.2, pango v.1.2.6 and the 5/19/21 version of the pangoLEARN model based on the nomenclature system described in ref. ^{59}.
Statistics and reproducibility
Details of statistical analysis methods are given below. No statistical method was used to predetermine sample size. For some analyses, a small number of individuals were excluded for reasons detailed above, where relevant. Experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.
Statistical analyses
The difference in the distribution of a parameter of interest between the nonB.1.1.7 and B.1.1.7 infection groups was assessed using univariate analysis, and P values calculated using the Wilcoxon ranksum test. Comparison of infectious virus shedding between the two groups was performed using multivariate analysis with age as an additional variate. Levels of infectious viral shedding, after adjusting for age, were predicted by assuming an age of 28 years—that, is the median age of the cohort (Fig. 4c).
Generation of figures
All figures, except for Fig. 2a, were generated using RStudio. Figure 2a was generated using Microsoft Powerpoint.
Overview of model construction and parameter estimation
The goal of quantitative analyses is to use mathematical models to characterize viral shedding dynamics based on both viral genome loads (as measured by RT–qPCR) and the presence or absence of infectious virus (as measured by viral culture assay). Analysing the model results, we quantify individuallevel heterogeneity in both viral genome shedding dynamics and individual infectiousness. See Extended Data Fig. 6 for an overview of the analysis workflow.
First, we performed experiments to derive the calibration curves for transformation of Ct/CN values from RT–qPCR to viral genome loads (Viral genome load calibration from Ct/CN values). Note that, due to the nature of RT–qPCR assays and sampling noise, viral genome loads derived using calibration curves represent a proxy for the actual quantities. Nonetheless, this approach is the best available to derive viral genome loads for the purpose of viral dynamic modelling, and is widely used in understanding SARSCoV2 dynamics^{21,60}.
Second, we constructed viral dynamic models and fit these to viral genome loads (Viral dynamics models). We estimated key parameters governing infection processes in the nasal and the salivaassociated compartments, such as viral exponential growth rate before peak viral genome load and viral clearance rate. This allows us to characterize individuallevel heterogeneity in infection kinetics.
Third, we constructed mathematical models to describe how the amount of infectious virus shed relates to changes in viral genome load, as measured by RT–qPCR (Modelling infectiousness of an individual). We fit the models to viral culture assay data. Using the best model and predicted viral genome load kinetics from the viral dynamics model, we predicted the extent of infectious virus shedding—that is the infectiousness, for each individual—and thus quantified the individuallevel heterogeneity in infectiousness.
Viral genome load calibration from Ct/CN values
Viral genome load calibration: nasal samples
To calculate viral genome loads from CN values reported for nasal samples, we performed calibration curve experiments to empirically define the relationship between CN values obtained from the RT–qPCR assay used on nasal swab samples, and absolute viral genome loads within samples, as quantified by ddPCR. We quantified viral genome loads for 62 nasal samples with CN values ranging between 17 and 38. For each sample, absolute copy numbers of viral genomes were measured using two different Ngenespecific primer sets (N1 and N2). To account for technical noise between samples, we also determined the concentration of the host RNAse P (RP) transcript as a control (Supplementary Table 10). We then normalized copy numbers of N1 and N2 targets by dividing by their corresponding RP target numbers, then multiplied the mean of RP concentration across all samples. Note that the unit of these measurements is per millilitre: this is because nasal swab samples were each collected in 3 ml of VTM.
Plotting the logarithm of normalized viral genome loads against the associated CN values shows a clear linear relationship, justifying the use of linear regression below. Linear regression lines with similar coefficients were used as calibration curves in other studies^{21,60}. We also note that the noise in genome viral loads is high when CN values are high (for example, >33), probably a reflection of increased noise when the signal is low^{26}. However, this high level of variation at high CN values will not impact on the conclusion of our study, because the range of viral loads relevant to transmission is much higher (>10^{6} copies ml^{–1}; Fig. 3d).
We then performed linear regression on measured CN values and log_{10} viral genome loads (Extended Data Fig. 9). This led to the following formula for the relationship between CN values and viral genome load:
$$log _{10}V = 11.35 – 0.25{mathrm{CN}}$$
where V and CN denote the viral genome load and CN value, respectively. Note that, because of the high number of data points measured, the level of uncertainty in the regression line is minimal (Extended Data Fig. 9).
Viral genome load calibration: saliva samples
Unlike for nasal samples, we were unable to measure the calibration curve using saliva samples taken from participants. To quantify the efficiency of the RT–qPCR assay used on saliva samples, we used data from calibration experiments in which saliva samples obtained from healthy donors were spiked with SARSCoV2 genomic RNA. More specifically, 0.9 ml of saliva from a healthy donor was spiked with 0.1 ml of 1.8 × 10^{8}, 5.4 × 10^{5} or 6.0 × 10^{4} RNA copies ml^{–1}. For samples spiked with 1.8 × 10^{8} RNA copies ml^{–1}, tenfold serial dilutions were performed to a final concentration of 1.8 × 10^{4} RNA copies ml^{–1}. A total of 24 samples were collected and Ct values of the N gene then measured (Supplementary Table 11).
As above, we plotted the logarithm of viral loads against Ct values (Extended Data Fig. 10). The plot shows a clear linear relationship, justifying the use of linear regression below. We then performed linear regression on measured CN values and log_{10} viral genome loads (Extended Data Fig. 10). This led to the following formula for the relationship between CN values and viral genome load:
$$log _{10}V = 14.24 – 0.28{mathrm{Ct}}$$
where V and Ct denote viral genome load and Ct value, respectively. In regard to the nasal calibration curve, the level of uncertainties in the regression line is minimal (Extended Data Fig. 10).
Note that a major difference between samples spiked with viral genomes and those taken from infected individuals is that the latter are likely to be noisier because of variation in the sample collection process. However, the two approaches should not differ substantially in assessing the efficiency of the RT–PCR protocol. The impact of noise in the nasal sample can be minimized by taking a large number of samples over a wide range of CN values, as we did for the nasal samples. Therefore, the calibration curves derived above represent an accurate translation of Ct/CN values to viral load.
Viral dynamics models
We constructed viral dynamics models to describe the dynamic changes in viral genome load. The viral genome load patterns in nasal and saliva samples are distinct from each other in many individuals, suggesting compartmentalization of infection dynamics in these two sample sites. Therefore, we use the models below to describe data collected from these two compartments separately. See Fig. 2a and Extended Data Fig. 4 for schematics of these models.
The targetcelllimited model
We first constructed a withinhost model based on the targetcelllimited (TCL) model used for other respiratory viruses such as influenza^{61} and, more recently, SARSCoV2 (refs. ^{27,29,62}). We keep track of the total numbers of target cells (T), cells in the eclipse phase of infection (E)—that is, infected cells not yet producing virus, productively infected cells (I) and viruses (V). The ordinary differential equations are:
$$begin{array}{*{20}{l}} {frac{{{mathrm{d}}T}}{{{mathrm{d}}t}}} hfill & = hfill & { – beta VT} hfill \ {frac{{{mathrm{d}}E}}{{{mathrm{d}}t}}} hfill & = hfill & {beta VT – kE} hfill \ {frac{{{mathrm{d}}I}}{{{mathrm{d}}t}}} hfill & = hfill & {kE – delta I} hfill \ {frac{{{mathrm{d}}V}}{{{mathrm{d}}t}}} hfill & = hfill & {pi I – cV} hfill end{array}$$
(1)
In this model, target cells are infected by virus with rate constant β, cells in the eclipse phase become productively infected cells at percapita rate k and productively infected cells die at percapita rate δ. We use V to describe viruses measured in nasal or saliva samples, representing a proportion of the total virus in the compartment under consideration. Therefore, rate π is the product of viral production rate per infected cell and the proportion of virus that is sampled (see Ke et al.^{27} for a detailed derivation). Viruses are cleared at percapita rate c.
Refractory cell model
We extend the TCL model by including an early innate response—that is the typeI/III interferon response, where interferons are secreted from infected cells and bind to receptors on uninfected target cells, stimulating an antiviral response that renders them refractory to viral infection. Note that this is the best model to describe the viral genome load dynamics as measured by RT–qPCR from nasal samples.
We keep track of interferon (F) and cells refractory to infection (R), in addition to other quantities in the TCL model. The full ordinary differential equations (ODEs) for target cells, refractory cells and interferon are
$$begin{array}{*{20}{l}} {frac{{{mathrm{d}}T}}{{{mathrm{d}}t}}} hfill & = hfill & { – beta VT – phi FT + rho R} hfill \ {frac{{{mathrm{d}}R}}{{{mathrm{d}}t}}} hfill & = hfill & {phi FT – rho R} hfill \ {frac{{{mathrm{d}}E}}{{{mathrm{d}}t}}} hfill & = hfill & {beta VT – kE} hfill \ {frac{{{mathrm{d}}I}}{{{mathrm{d}}t}}} hfill & = hfill & {kE – delta I} hfill \ {frac{{{mathrm{d}}V}}{{{mathrm{d}}t}}} hfill & = hfill & {pi I – cV} hfill \ {frac{{{mathrm{d}}F}}{{{mathrm{d}}t}}} hfill & = hfill & {sI – mu F} hfill end{array}$$
(2)
In this model, the impact of the innate immune response is to convert target cells into refractory cells at rate ϕFT where ϕ is a rate constant. Refractory cells can become target cells again at rate ρ. Interferon is produced and cleared at rates s and μ, respectively.
For simplicity, and due to a lack of empirical data on interferon responses in our study, we simplify the model by making the quasisteadystate assumption that the interferon dynamics are much faster than the dynamics of infected cells and assume that (frac{{{mathrm{d}}F}}{{{mathrm{d}}t}} = 0). Thus (sI = mu F) or (F = frac{s}{mu }I).
Let ({Phi} = phi frac{s}{mu }), so that the ODEs for the innate immunity model become:
$$begin{array}{*{20}{l}} {frac{{{mathrm{d}}T}}{{{mathrm{d}}t}}} hfill & = hfill & { – beta VT – {Phi}IT + rho R} hfill \ {frac{{{mathrm{d}}R}}{{{mathrm{d}}t}}} hfill & = hfill & {{Phi}IT – rho R} hfill \ {frac{{{mathrm{d}}E}}{{{mathrm{d}}t}}} hfill & = hfill & {beta VT – kE} hfill \ {frac{{{mathrm{d}}I}}{{{mathrm{d}}t}}} hfill & = hfill & {kE – delta I} hfill \ {frac{{{mathrm{d}}V}}{{{mathrm{d}}t}}} hfill & = hfill & {pi I – cV} hfill end{array}$$
(3)
Viral production reduction model
In addition to making target cells refractory to infection, the impact of interferons may include reducing virus production from infected cells. We include this action of interferons in the viral production reduction model. As above, we make the quasisteadystate assumption that interferon dynamics are much faster than those of infected cells and assume that F is proportional to I. The ODEs for the model are:
$$begin{array}{*{20}{l}} {frac{{{mathrm{d}}T}}{{{mathrm{d}}t}}} hfill & = hfill & { – beta VT} hfill \ {frac{{{mathrm{d}}E}}{{{mathrm{d}}t}}} hfill & = hfill & {beta VT – kE} hfill \ {frac{{{mathrm{d}}I}}{{{mathrm{d}}t}}} hfill & = hfill & {kE – delta I} hfill \ {frac{{{mathrm{d}}V}}{{{mathrm{d}}t}}} hfill & = hfill & {frac{pi }{{1 + gamma I}}I – cV} hfill end{array}$$
(4)
where γ is a constant representing the effect of interferon in reducing viral production.
Immune effector cell model
Over the course of infection, immune effector cells are activated and recruited to kill infected cells. These immune effector cells include innate immune cells such as macrophages and natural killer cells, as well as cells developed during the adaptive immune response such as cytotoxic T lymphocytes and antibodysecreting B cells. To consider the impact of these immune effector cells, we develop a model—the effector cell model—based on a previous model for influenza infection^{28}. In this model, we assume that the death rate of infected cells is δ_{1} at the beginning of the infection. This may reflect the cytotoxic effects of viral infection. After time t_{1}, the death rate of infected cells increases by δ_{2}, where δ_{2} models the killing of infected cells by immune effector cells. The ODEs for the model are:
$$begin{array}{*{20}{l}} {frac{{{mathrm{d}}T}}{{{mathrm{d}}t}}} hfill & = hfill & { – beta VT} hfill \ {frac{{{mathrm{d}}E}}{{{mathrm{d}}t}}} hfill & = hfill & {beta VT – kE} hfill \ {frac{{{mathrm{d}}I}}{{{mathrm{d}}t}}} hfill & = hfill & {kE – delta (t)I} hfill \ {frac{{{mathrm{d}}V}}{{{mathrm{d}}t}}} hfill & = hfill & {pi I – cV} hfill \ {delta left( t right)} hfill & = hfill & {left{ {begin{array}{*{20}{l}} {delta _1} hfill & {t < t_1} hfill \ {delta _1 + delta _2} hfill & {t ge t_1} hfill end{array}} right.} hfill end{array}$$
(5)
Note that this is the best model to describe the viral genome load dynamics as measured by RT–qPCR from saliva samples.
Combined model
In the full model, we combine the refractory cell model and immune effector cell model to consider both the immediate interferon response and immune effector response. The ODEs for the model are:
$$begin{array}{*{20}{l}} {frac{{{mathrm{d}}T}}{{{mathrm{d}}t}}} hfill & = hfill & { – beta VT – {Phi}IT + rho R} hfill \ {frac{{{mathrm{d}}R}}{{{mathrm{d}}t}}} hfill & = hfill & {{Phi}IT – rho R} hfill \ {frac{{{mathrm{d}}E}}{{{mathrm{d}}t}}} hfill & = hfill & {beta VT – kE} hfill \ {frac{{{mathrm{d}}I}}{{{mathrm{d}}t}}} hfill & = hfill & {kE – delta (t)I} hfill \ {frac{{{mathrm{d}}V}}{{{mathrm{d}}t}}} hfill & = hfill & {pi I – cV} hfill \ {delta left( t right)} hfill & = hfill & {left{ {begin{array}{*{20}{l}} {delta _1} hfill & {t < t_1} hfill \ {delta _1 + delta _2} hfill & {t ge t_1} hfill end{array}} right.} hfill end{array}$$
(6)
Choice parameter values
Total target cell numbers
We calculate the total numbers of target cells in the nasal and saliva compartments by multiplying the total number of epithelial cells in these two compartments by the fraction of epithelial cells expected to be targets for SARSCoV2 infection.
For the total number of epithelial cells in the nasal compartment, we use the estimate from Baccam et al.^{61}, 4 × 10^{8} cells. This is calculated from the estimate that the surface area of the nasal turbinates is 160 cm^{2} (ref. ^{63}) and the surface area per epithelial cell is 2 × 10^{−11} to 4 × 10^{−11} m^{2} per cell (ref. ^{61}). For the saliva compartment, the total surface area of the mouth was estimated to be 214.7 cm^{2} (ref. ^{64}). Therefore, we estimate that the total number of epithelial cells in the mouth is approximately 4 × 10^{8} × 214.7/160 = 5.4 × 10^{8}.
Hou et al. estimated that the fraction of cells expressing angiotensinconverting enzyme 2—that is, the receptor for SARSCoV2 entry—on the cell surface is approximately 20% in the upper respiratory tract^{65}. Therefore, in our model, the initial numbers of target cells in the nasal and saliva compartments are calculated as 4 × 10^{8} × 20% = 8 × 10^{7} and 5.4 × 10^{8} × 20% = 1.08 × 10^{8}, respectively.
Note that these estimates are approximations using available best estimates in the literature. For a standard viral dynamics model, the number of initial target cells and virus production rate are unidentifiable and only their product is identifiable^{66}. Thus, if the actual number of target cells differs from that estimated here, an increase in the initial number of target cells will lead to a corresponding decrease in the estimate of virus production rate, and vice versa.
Initial number of infected cells
We assume that one cell in the compartment of interest is infected at the start of infection, E_{0} = one cell, consistent with refs. ^{27,67}. The small number of infected cells is also consistent with a recent work which estimated from sequencing data that the transmission bottleneck is small for SARSCoV2 and that there are probably between one and three infected cells at the initiation of infection^{68,69,70}. Note that, in an earlier work, we showed that changes in the number of initially infected cells of between one and five in the model do not substaintially change the inference results^{27}.
Initial viral growth rate, r
For all models above, the initial growth of the viral population before peak viral genome load is dominated by viral infection. This means that the immune responses considered in our models act to change the viral growth trajectory substantially only at later time points^{71}. Thus, we derive an approximation to the initial viral growth rate using the TCL model only (equation (1)). This approximation also represents a good approximation for other models.
We first make two simplifying assumptions commonly used in analysis of the initial dynamics of viral dynamic models^{72,73}. First, because at the initial stage of infection the number of infected cells is orders of magnitude lower than the number of target cells, we assume that the number of target cells is at a constant level, T_{0}. Second, the dynamics of viruses are much faster than those of infected cells. For example, the rate of viral clearance is in the time scale of minutes and hours whereas the death of productively infected cells is in days. Therefore, we make the quasisteadystate assumption, (frac{{{mathrm{d}}V}}{{{mathrm{d}}t}} approx 0), such that the concentrations of viruses are always in proportion to the concentration of productively infected cells—that is, (pi I approx cV). This gives (V approx frac{pi }{c}I).
With these two assumptions, equation (1) becomes a system of linear ODEs with two variables, E and I:
$$begin{array}{*{20}{l}} {frac{{{mathrm{d}}E}}{{{mathrm{d}}t}}} hfill & = hfill & {beta frac{pi }{c}IT_0 – kE} hfill \ {frac{{{mathrm{d}}I}}{{{mathrm{d}}t}}} hfill & = hfill & {kE – delta I} hfill end{array}$$
(7)
The Jacobian matrix, J, for this system of ODEs is:
$$J = left[ {begin{array}{*{20}{c}} { – k} & {beta frac{pi }{c}T_0} \ k & { – delta } end{array}} right]$$
The initial growth rate, r, is the leading eigenvalue of the Jacobian matrix of the ODE system. We calculate the eigenvalues, λ, for the Jacobian matrix above from (left {J – lambda I} right = 0), where I is the identity matrix, and get:
(lambda = frac{1}{2}left[ { – left( {k + delta } right) pm sqrt {left( {k + delta } right)^2 + 4kdelta left( {R_0 – 1} right)} } right]), where (R_0 = frac{{beta pi }}{{delta c}}T_0).
Then, the leading eigenvalue—that is, the initial growth rate r— is:
$$r = frac{1}{2}left[ { – left( {k + delta } right) + sqrt {left( {k + delta } right)^2 + 4kdelta left( {R_0 – 1} right)} } right].$$
(8)
Model fitting strategy
Fitting viral dynamic models to viral genome load data
We took a nonlinear mixedeffect modelling approach to fit the viral dynamic models to viral genome load data from all individuals simultaneously. All estimations were performed using Monolix (Monolix Suite 2019R2, Lixoft: https://lixoft.com/products/monolix/). We allowed random effects on the fitted parameters (unless specified otherwise). All population parameters, except for the starting time of simulation, t_{0}, are positive and therefore we assume that they follow lognormal distributions. For t_{0} we assume a normal distribution because t_{0} can be positive or negative.
The parameters β and π in the viral dynamic models strongly correlate with each other when the models are fitted to viral genome load data^{66}. We tested three choices in handling this correlation in fitting all five viral dynamic models: (1) a correlation is assumed between parameter β and π in Monolix; (2) parameter β has a fixed effect only (that is, its value is set to be the same across all individuals); and (3) parameter π has a fixed effect only.
To test whether the age of the individuals and/or the infecting viral genotype (categorized as either nonB.1.1.7 or B.1.1.7) explains the heterogenous patterns in viral genome load trajectories across the cohort, we tested whether they covary with any of the fitted parameters in the model by setting the two variables as a continuous and a categorical covariate, respectively, in Monolix.
The assumptions on parameters β and π and the choice of parameters that covariate with age or viral strain of infection led to a large number of model choices for fitting. Therefore, we took the following strategy to ensure that we identified the best model and parameter combinations to describe the data.
First, we tested the three assumptions about parameters β and π in the five viral dynamic models without any covariate and selected the best assumption for further analysis based on their corrected Akaike information criterion (AICc) scores.
Second, using the best assumption, we tested the model by including the age of the individuals as a continuous covariate of all fitted parameter values with a random effect first. We then took an iterative approach to test whether the covariate should be removed from any of the parameters in the model using Pearson’s correlation test in Monolix. The parameter(s) that has a nonsignificant P value (P > 0.05) or with the lowest P value is removed from next round of parameter fitting. We iterated the process until all parameters were removed.
The best model variant with the lowest AICc score was then selected for analysis on whether parameter estimates differed in individuals infected by different viral strains. As before, we took an iterative approach. We first set the viral strain—that is, nonB.1.1.7 or B.1.1.7—as a categorical covariate of all fitted parameter values with a random effect in the model. We then tested whether the covariate should be removed from any of the parameters in the model using the analysis of variance in Monolix. The parameter(s) that has a nonsignificant P value (P > 0.05) or with the lowest P value is removed from the next round of parameter fitting. We iterated the process until all parameters were removed.
Finally, the model variant with the lowest AICc score was selected as the best model.
Prediction of viral genome load trajectories for nonB.1.1.7 and B.1.1.7 strains
We randomly sampled 5,000 sets of parameter combinations from the distribution specified by the bestfit population parameters (Supplementary Table 4). For the effector cell model for the saliva compartment, β and π are strongly correlated. We thus applied formulations such that correlations between the two parameter values are preserved in the random sampling in accordance with the estimated correlation coefficient. We simulated the bestfit model using the 5,000 sets of parameter combinations for each of the strain. The median and the fifth and 95th quantilse of viral genome loads at each time points are reported.
Modelling infectiousness of an individual
We model how infectiousness depends on the viral genome load in an individual, similarly to the framework proposed in Ke et al.^{27}. Specifically, we first use the viral culture data collected in this study to infer how the level of infectious virus shed relates to viral genome loads as measured by RT–qPCR. From this model, we predict how the level of infectious virus shedding changes over time in each individual and how the overall infectiousness of the infection varies among participants.
Relationship between viral genome load and infectious viruses
We first consider three alternative models describing how the amount of infectious virus in a sample is related to viral genome load (derived from the CN values): the ‘linear’ model, ‘powerlaw’ model and ‘saturation’ model. In these models, due to the nature of stochasticity in sampling, we assume the number of infectious viruses that was in the sample for cell culture experiment to be a random variable, Y, that follows a Poisson distribution, with V_{inf} representing the expected number of infectious viruses—that is, (V_{{mathrm{inf}}} = E(Y)).

(1)
The linear model:
We assume that V_{inf}, is proportional to the viral genome load, V, in the sample:
$$V_{{mathrm{inf }}} = E(Y) = AV$$
(9)
where A is a constant.

(2)
The powerlaw model:
We assume that V_{inf} is related to the viral genome load, V, by a power function:
$$V_{{mathrm{inf}}} = E(Y) = BV,^h$$
(10)
where B and h are constants.

(3)
The saturation model:
We assume that V_{inf} is related to the viral genome load, V, by a Hill function:
$$V_{{mathrm{inf}}} = E(Y) = V_mfrac{{V^h}}{{V^h + K_m^h}}$$
(11)
where V_{m} and K_{m} are constants and h is the Hill coefficient.
Probability of cell culture being positive
If each infectious virus has a probability ({it{varrho }}) to establish infection such that the cell culture becomes positive, the number of viruses that successfully establish an infection in cell culture is Poisson distributed with parameter (lambda = Eleft( Y right){it{varrho }} = V_{{mathrm{inf}}}{it{varrho }}). Thus, the probability of one or more viruses successfully infecting the culture so that it tests positive is
$$p_{mathrm{positive}} = 1 – exp left( { – lambda } right) = 1 – {{{mathrm{exp}}}}( – V_{{mathrm{inf}}}{it{varrho }})$$
(12)
Substituting the expressions of V_{inf} from the three models above, we get the following expressions for p_{positive} from the three models (note that we use the subscripts ‘1’, ’2’ and ‘3’ to denote the three models for V_{inf}):
$$p_{{mathrm{positive}},1} = 1 – exp left( { – V_{mathrm{inf}}{it{varrho }}} right) = 1 – exp left( { – DV} right)$$
(13)
where (D = A{it{varrho }}).
$$p_{{mathrm{positive}},2} = 1 – exp left( { – V_{{mathrm{inf}}}{it{varrho }}} right) = 1 – exp left( { – GV,^h} right)$$
(14)
where (G = B{it{varrho }}).
$$p_{{mathrm{positive}},3} = 1 – exp left( { – V_{{mathrm{inf}}}{it{varrho }}} right) = 1 – exp left( { – Jfrac{{V^h}}{{V^h + K_m^h}}} right)$$
(15)
where (J = V_m{it{varrho }}).
Note that, from the expressions above, it becomes clear that we will not be able to estimate parameters A, B and V_{m} in the three models because they appear as products with the unknown parameter ({it{varrho }}) in the equations. This means that the viral culture data do not allow us to estimate the absolute number of infectious viruses in a sample or provide a viral genome load; instead, we are able to estimate a quantity that is a constant proportion of the actual number of infectious viruses over time and across individuals. Therefore, we report estimations of infectious viruses in arbitrary units. These estimates represent a relative measure of infectiousness. Two estimates measured at different time points and/or from different individuals can be compared using this method.
Model fitting using a population effect modelling approach
For each sample, viral genome load and cell culture positivity were measured. Using these data, we estimate parameter values in the three models by minimizing the negative loglikelihood of the data.
More specifically, the likelihood of the m^{th} observation being positive or negative in cell culture is calculated as:
$$p_{i,m} = left{ {begin{array}{*{20}{l}} {p_{{mathrm{positive}},i}(V_m),} hfill & {{mathrm{if}},{mathrm{the}},{it{k}}{mathrm{th}},{mathrm{observation}},{mathrm{is}},{mathrm{positive}}} hfill \ {1 – p_{{mathrm{positive}},i}left( {V_m} right),} hfill & {{mathrm{if}},{mathrm{the}},{it{k}}{mathrm{th}},{mathrm{observation}},{mathrm{is}},{mathrm{negative}}} hfill end{array}} right.$$
(16)
where V_{m} is the viral genome load of the mth observation.
Because we have the paired nasal RT–qPCR and viral culture data for each individual, we fit the three mathematical models using a nonlinear mixedeffect modelling approach. Again, all estimations were performed using Monolix. We allowed random effects on the fitted parameters (unless specified otherwise). All population parameters with a random effect are assumed to follow lognormal distributions.
To find the best model explaining the data, we tested models with different combinations of parameters either with or without a random effect (Supplementary Table 7). The model with the lowest AIC score was selected as the best model.
Note that, for each of the three models, we tested a model variation where all parameters in the models have fixed effects only—that is, a single set of parameters is used to explain viral culture data from every individual. In this case, there is no heterogeneity in parameter values across individuals. The resulting AIC scores are significantly worse than the bestfit model assuming random effects on parameters (Supplementary Table 7). This indicates that there is a substantial level of individual heterogeneity in the relationship between infectious virus shedding and viral genome loads (as shown in Fig. 3d).
Calculation of CIs of the cell culture positivity curve (Fig. 3c)
Similar to the procedures performed for prediction of CIs of viral genome load trajectories, we randomly sampled 5,000 sets of parameter combinations from the distribution specified by the bestfit population parameters of the best model—that is, the saturation model assuming that K_{m} has only a fixed effect (Supplementary Table 8). More specifically, we sampled parameters from a lognormal distribution for J and h, with their means and standard deviations at the bestfit values. Using the parameter combinations, we generated curves of probability of cell culture positivity at CN values ranging between 10 and 40. The median and the fifth and 95th quantiles of viral genome loads at each CN values are reported.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.