ArticlePDF Available

Metagenomic Sequencing of HIV-1 in the Blood and Female Genital Tract Reveals Little Quasispecies Diversity during Acute Infection

American Society for Microbiology
Journal of Virology
Authors:

Abstract and Figures

Due to error-prone replication, HIV-1 generates a diverse population of viruses within a chronically infected individual. When HIV-1 is transmitted to a new individual, one or a few viruses establish the new infection, leading to a genetic bottleneck in the virus population. Understanding the timing and nature of this bottleneck may provide insight into HIV-1 vaccine design and other preventative strategies. We examined the HIV-1 population in three women enrolled in a unique prospective cohort in South Africa who were followed closely during the earliest stages of HIV-1 infection. We found very little HIV-1 diversity in the blood and female genital tract during the first 2 weeks after virus was detected in the bloodstream. These results are compatible with a very early HIV-1 population bottleneck, suggesting the need to study the HIV-1 population in the female genital tract before virus is detectable in the bloodstream.
Content may be subject to copyright.
Metagenomic Sequencing of HIV-1 in the Blood and Female
Genital Tract Reveals Little Quasispecies Diversity during
Acute Infection
Anne Piantadosi,
a,b,c
Catherine A. Freije,
b,d
Christina Gosmann,
c,e
Simon Ye,
b,f
Daniel Park,
b
Stephen F. Schaffner,
b,g,h
Damien C. Tully,
e
Todd M. Allen,
e
Krista L. Dong,
a,e
Pardis C. Sabeti,
b,g,h,i
Douglas S. Kwon
a,c,e
a
Division of Infectious Diseases, Massachusetts General Hospital, Boston, Massachusetts, USA
b
Broad Institute, Cambridge, Massachusetts, USA
c
Harvard Medical School, Boston, Massachusetts, USA
d
Ph.D. Program in Virology, Division of Medical Sciences, Harvard University, Boston, Massachusetts, USA
e
Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts, USA
f
Harvard-MIT Program of Health Sciences and Technology, Cambridge, Massachusetts, USA
g
FAS Center for Systems Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
h
Department of Immunology and Infectious Disease, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
i
Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
ABSTRACT Heterosexual transmission of human immunodeficiency virus type 1
(HIV-1) is associated with a significant bottleneck in the viral quasispecies popula-
tion, yet the timing of that bottleneck is poorly understood. We characterized HIV-1
diversity in the blood and female genital tract (FGT) within 2 weeks after detection
of infection in three women enrolled in a unique prospective cohort in South Africa.
We assembled full-length HIV-1 genomes from matched cervicovaginal lavage (CVL)
samples and plasma. Deep sequencing allowed us to identify intrahost single-
nucleotide variants (iSNVs) and to characterize within-sample HIV-1 diversity. Our re-
sults demonstrated very little HIV-1 diversity in the FGT and plasma by the time
viremia was detectable. Within each subject, the consensus HIV-1 sequences were
identical in plasma and CVL fluid. No iSNV was present at 6% frequency. One sub-
ject had 77 low-frequency iSNVs across both CVL fluid and plasma, another subject
had 14 iSNVs in only CVL fluid from the earliest time point, and the third subject
had no iSNVs in CVL fluid or plasma. Overall, the small amount of diversity that we
detected was greater in the FGT than in plasma and declined over the first 2 weeks
after viremia was detectable, compatible with a very early HIV-1 transmission bottle-
neck. To our knowledge, our study represents the earliest genomic analysis of HIV-1
in the FGT after transmission. Further, the use of metagenomic sequencing allowed
us to characterize other organisms in the FGT, including commensal bacteria and
sexually transmitted infections, highlighting the utility of the method to sequence
both HIV-1 and its metagenomic environment.
IMPORTANCE Due to error-prone replication, HIV-1 generates a diverse population
of viruses within a chronically infected individual. When HIV-1 is transmitted to a
new individual, one or a few viruses establish the new infection, leading to a ge-
netic bottleneck in the virus population. Understanding the timing and nature of
this bottleneck may provide insight into HIV-1 vaccine design and other preventa-
tive strategies. We examined the HIV-1 population in three women enrolled in a
unique prospective cohort in South Africa who were followed closely during the ear-
liest stages of HIV-1 infection. We found very little HIV-1 diversity in the blood and
female genital tract during the first 2 weeks after virus was detected in the blood-
stream. These results are compatible with a very early HIV-1 population bottleneck,
Citation Piantadosi A, Freije CA, Gosmann C,
Ye S, Park D, Schaffner SF, Tully DC, Allen TM,
Dong KL, Sabeti PC, Kwon DS. 2019.
Metagenomic sequencing of HIV-1 in the
blood and female genital tract reveals little
quasispecies diversity during acute infection. J
Virol 93:e00804-18. https://doi.org/10.1128/JVI
.00804-18.
Editor Viviana Simon, Icahn School of
Medicine at Mount Sinai
Copyright © 2019 Piantadosi et al. This is an
open-access article distributed under the terms
of the Creative Commons Attribution 4.0
International license.
Address correspondence to Anne Piantadosi,
apiantadosi@partners.org, or Douglas S. Kwon,
dkwon@mgh.harvard.edu.
P.C.S. and D.S.K. contributed equally to this
article.
Received 10 May 2018
Accepted 17 October 2018
Accepted manuscript posted online 31
October 2018
Published
GENETIC DIVERSITY AND EVOLUTION
crossm
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 1Journal of Virology
4 January 2019
suggesting the need to study the HIV-1 population in the female genital tract before
virus is detectable in the bloodstream.
KEYWORDS bottleneck, female genital tract, human immunodeficiency virus,
metagenomic
Heterosexual transmission of human immunodeficiency virus type 1 (HIV-1) is
associated with a significant bottleneck in the viral population. Typically, a trans-
mitting partner with chronic HIV-1 infection harbors a diverse viral quasispecies, which
is reduced to only one or a few viral variants after transmission to the recipient partner.
Prior studies investigating the HIV-1 transmission bottleneck, which mostly examined
HIV-1 in the plasma 1 to 3 months after infection, have demonstrated that sexually
acquired infection with HIV-1 is established by a single transmitted/founder (T/F) virus
in approximately 60% to 80% of individuals (1–7). However, some women acquire a
more heterogeneous viral population in the plasma (8) and in the female genital tract
(FGT) (9), and a recent study demonstrated greater HIV-1 env diversity in the FGT than
in blood within the first 3 months after infection (10). Understanding the timing and
location of the HIV-1 transmission bottleneck, as well as factors that contribute to it,
may provide critical insight into the design of an HIV-1 vaccine and other preventative
strategies.
Multiple factors likely contribute to the HIV-1 transmission bottleneck during male-
to-female transmission, including compartmentalization within the male genital tract
prior to transmission (11). The FGT is also believed to contribute to the HIV-1 trans-
mission bottleneck by providing a mucosal barrier and a limited number of target cells
for HIV-1 infection (11). When these factors are disrupted by sexually transmitted
infections (STIs) or hormonal contraceptive use, women can acquire a more diverse
HIV-1 population (12, 13). The FGT has also been described as supporting compart-
mentalized evolution of the HIV-1 population throughout infection (14–16). However,
relatively little is known about viral populations present in the FGT during the earliest
stages of acute infection.
We evaluated very early viral diversity and the HIV-1 transmission bottleneck in the
FGT in three subjects from a unique prospective cohort of South African women with
hyperacute HIV-1 infection in whom infection was detected prior to the time of peak
viral load (17–19). We performed metagenomic sequencing of RNA extracted from
plasma and cell-free cervicovaginal lavage (CVL) samples and examined HIV-1 quasi-
species present in the blood and FGT during the first 2 weeks after detection of viremia.
This approach allowed quantification of HIV-1 variants and assessment of other organ-
isms present in the FGT.
RESULTS
The Females Rising through Education, Support and Health (FRESH) study enrolls
HIV-negative women near Durban, South Africa, and provides a unique opportunity to
study HIV-1 infection within days of viremia being detectable (18–20). Women are
screened for HIV-1 infection by fingerstick testing every 3 or 4 days, and HIV-1 incidence
in the cohort is 8.2 per 100 person-years (19). After detection of infection, women
return for weekly collection of blood and FGT samples. Given this frequency of close
follow-up, HIV in the FGT is assessed with CVL samples rather than invasive tissue
samples; this approach has previously been used to study HIV-1 populations in the FGT
(15).
We obtained paired plasma and CVL samples from three subjects diagnosed with
HIV-1 infection in Feibig stage I (41). In all three subjects, we included paired plasma
and CVL samples from the time of peak HIV-1 load in CVL fluid. This occurred on the
fourth day after a positive HIV-1 fingerstick (day 4) for subject A, day 1 for subject B, and
day 7 for subject C (Fig. 1 and Table 1). We also included paired samples from the time
of peak viral load in plasma for subjects A (day 11) and B (day 7); for subject C, we did
not include samples from the time of peak viral load in plasma because it was only
Piantadosi et al. Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 2
3 days later than the earlier sample. All samples were collected prior to the initiation of
antiretroviral therapy. We sequenced a total of 40 million to 250 million RNA reads per
sample and performed both HIV-1-specific analysis and metagenomic classification (Fig.
2). As expected, given the unbiased nature of metagenomic sequencing, plasma
samples contained relatively few HIV-1 reads (mean, 1.5%) and had a mean of 48.5%
human reads. CVL samples had a mean of 0.3% HIV-1 reads and 7.9% human reads.
HIV-1 consensus genomes. Although the HIV-1 reads represented a small propor-
tion of the total reads obtained by metagenomic sequencing, we were able to assemble
a consensus HIV-1 genome sequence from each sample. The depth of HIV-1 genome
coverage based on unique reads was at least 50in all but one sample and was at least
1,000in the four samples with the highest HIV-1 RNA content (Table 1). HIV-1
genomes from all three subjects belonged to subtype C (Fig. 3), the most common
HIV-1 subtype in South Africa. The HIV-1 consensus genomes were distinct between
different subjects. However, within each subject, the HIV-1 consensus genomes from all
the samples were identical: there was no difference between plasma and CVL fluid or
between the first and second time points examined. As described below, the consensus
genomes represented the vast majority of viruses sequenced, indicating an overall
similarity of the HIV-1 populations between the two compartments and little change
during the first 2 weeks that viremia was detectable.
Within-sample HIV-1 single-nucleotide variants. In order to characterize the
diversity of the HIV-1 quasispecies in each sample, we first mapped all unique HIV-1
reads within each sample to the consensus sequence from that sample. Because our
unbiased sequencing approach generated cDNA fragments with unique start and end
positions prior to low-cycle library amplification, we were able to remove PCR dupli-
cates by collapsing reads with the same start and end positions to their consensus. We
calculated the depth of HIV-1 genome coverage based on unique reads; correspond-
ingly, samples with higher HIV-1 RNA content yielded higher coverage (Table 1). In all
the samples, we achieved robust coverage across the HIV-1 genome (Fig. 4), with some
variation in depth, likely due to known biases in random-hexamer priming and library
construction (21).
We identified within-sample HIV intrahost single-nucleotide variants (iSNVs) using
V-Phaser 2 (22). In order to distinguish iSNVs from errors introduced during library
construction (including reverse transcription to cDNA) and sequencing, we sequenced
duplicate libraries that had been independently constructed from RNA from each
sample (Fig. 2). Our stringent parameters for reporting an iSNV required that it be found
in both libraries, with at least 0.5% frequency overall, and without substantial strand
bias.
We validated this approach by performing metagenomic sequencing on a plasma
sample that was previously studied using single-genome amplification (SGA) and 454
FIG 1 Viral load patterns in plasma and CVL fluid. The plots indicate the HIV-1 load in plasma (solid lines) and the HIV-1 RNA quantification in CVL fluid (dashed
lines) for the 3 subjects in this study over time. Viral loads in plasma are not directly comparable to viral loads in CVL fluid due to differences in specimen
collection, processing and HIV-1 quantification. Samples used for sequencing in this study are indicated by large circles (plasma) and squares (CVL fluid).
HIV-1 Diversity in Acute Infection Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 3
sequencing (2). Our metagenomic sequencing and analysis pipeline recovered 90%
(44/49) of the previously identified iSNVs (see Table S1 in the supplemental material)
and produced no false positives. These results were consistent with expectations from
simulations performed to assess whether the differences between the two methods
were more than could be expected from sampling variance (Fig. 5). Upon further
investigation, we found that the remaining 5 iSNVs were present in our sequencing
reads but did not pass our stringent variant-calling filters. We nevertheless maintained
our stringent filters for iSNV identification, sacrificing some sensitivity for true iSNVs at
low sequencing depth in order to avoid false detection of iSNVs at high sequencing
depth.
We then evaluated iSNVs in samples from the three subjects in the study. A full
description of the iSNVs identified is provided in Table S2 in the supplemental material,
including the iSNV position, linkage with other iSNVs, consensus and variant alleles,
overall frequency, and frequency within each of the two duplicate sequencing libraries.
Overall, we observed relatively little HIV-1 diversity in these samples from acute
infection. Most samples contained few iSNVs, and no iSNV was present at greater than
6% frequency.
TABLE 1 Summary of samples, HIV-1 levels, and sequencing depths
a
Subject
Sample
name
Time point (no. of
days since fingerstick)
Plasma HIV-1 load
(copies/ml)
CD4 count
[cells/mm
3
(%)]
HIV-1 RNA copies/
l
HIV-1 sequencing
depth (mean)
First
positive
Last
negative Plasma CVL Plasma CVL
A D4 4 7 6,300,000 306 (30) 382,500 79,750 2,735 192
D11 11 14 30,000,000 217 (23) 538,500 72,550 1,835 338
B D1 1 5 445,000 432 (40) 5,686 251,900 37 2,328
D7 7 12 13,000,000 304 (41) 50,630 56,150 50 206
C D7 7 11 100,000,000 208 (50) 1,330,000 64,100 1,904 424
a
Plasma and CVL samples were selected during the first 2 weeks that viremia was detectable. For each sample, both the number of days since the first positive
fingerstick test for HIV-1 and the number of days since the last negative fingerstick test are listed. Clinical measurements of the HIV-1 load in plasma and the CD4
count are shown, as well as the HIV-1 quantification in RNA extracted from both plasma and CVL using qRT-PCR. The mean depth of HIV-1 genome sequencing for
each sample is shown, and the sequencing depth across the HIV-1 genome is shown in Fig. 4.
FIG 2 Metagenomic sequencing and analysis approach. The schematics indicate laboratory sequencing methods,
metagenomic analysis of microbial content, and HIV-1-specific analysis. The asterisks mark steps in which two
independent preparations were performed to ensure reproducibility.
Piantadosi et al. Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 4
Samples from subject A demonstrated the greatest diversity, with a total of 77
low-frequency iSNVs identified across the genome (see Table S2). Twenty-nine (38%) of
the iSNVs were synonymous, and 48 (62%) were nonsynonymous. Fifteen (19%) were
G-to-A changes in a dinucleotide context compatible with hypermutation by APOBEC3,
a family of human proteins that play a role in antiviral innate immunity. The day 4 CVL
sample had the highest number of iSNVs (n55) and the greatest average Shannon
entropy (Fig. 6a); fewer iSNVs were present in day 4 plasma (n24) and even fewer
at day 11 in CVL fluid (n5) and plasma (n6). These differences are unlikely to
reflect iSNVs lost to low sequencing depth: all the samples had good depth of coverage
(192to 2,735), and the sample with the lowest depth of coverage (day 4 CVL fluid)
had the greatest number of iSNVs. Overall HIV-1 diversity therefore decreased between
CVL fluid and plasma and between day 4 and day 11.
Upon closer examination of the iSNVs in subject A (Fig. 6a; see Table S2), several
patterns emerged. Most of the iSNVs that were present in day 4 CVL fluid (52 out of 55)
were at lower frequency or absent in day 4 plasma and were absent in both compart-
ments by day 11. Most of these “bottlenecked” iSNVs were linked with at least one
other iSNV within a 100-bp sequencing read at similar frequency (Table S2), suggesting
that they might have been acquired from the transmitting partner. We also observed
17 iSNVs that were present only in the plasma at day 4 and in no other samples,
including day 4 CVL fluid. Most of these were not linked with other iSNVs (n13; 76%),
suggesting that they could have arisen during viral replication in either the plasma or
regional lymph nodes prior to the development of viremia. The disappearance of all 69
FIG 3 Phylogenetic tree of HIV-1 consensus sequences from each sample. Sequences from subject A are
labeled in blue, those from subject B are labeled in purple, and those from subject C are labeled in
magenta. Reference subtype C sequences from South Africa are labeled in green, other subtype C
sequences are labeled in yellow, one subtype B sequence is labeled in orange, and one subtype A
sequence is labeled in red. Sequence names for samples in this study indicate the subject, day of
sampling, and sample type. Reference sequences are named by subtype, country of origin, year, and
GenBank accession number. Nodes with at least 80% support (out of 1,000 bootstraps) are labeled with
the bootstrap values.
HIV-1 Diversity in Acute Infection Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 5
of these iSNVs by day 11 in plasma suggests the presence of a bottleneck that restricted
most of the HIV-1 quasispecies variants that were both transmitted and generated very
early during infection.
A few iSNVs persisted between day 4 and day 11, while no new iSNVs arose during
that week. One of the persistent iSNVs was present only in CVL fluid, and three were
present only in plasma, suggesting possible compartmentalization. The remaining
three persistent iSNVs were present in both compartments. While these persistent
variants may have been transmitted, all were unlinked with other iSNVs, raising the
possibility that they arose in the recipient partner prior to the detection of viremia. Each
was present at similar frequencies between samples, suggesting that they may repre-
sent individual foci of infection in the FGT (being distributed through plasma) or
elsewhere (seeding the FGT).
FIG 4 HIV-1 genome coverage for each sample. The plots indicate the sequencing depth across the HIV-1 genome for each sample, which was calculated as
a sliding average with a bin width of 100 nt and a sliding window of 10 nt. The coverage represents the sum across both independently prepared sequencing
libraries. The dashed lines represent the mean coverage.
Piantadosi et al. Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 6
For subjects B and C, we observed substantially fewer iSNVs. In subject B, there were
14 iSNVs in the day 1 CVL sample (Fig. 6b; see Table S2), 9 (64%) of which were G-to-A
changes compatible with hypermutation by APOBEC3. No iSNVs were identified in any
other sample from this subject, which could in part be attributable to lower sequencing
depth (due to low input levels of HIV-1 RNA), but also likely reflects some true loss of
diversity. We used a binomial distribution to estimate the probability that iSNVs
detected in CVL fluid on day 1 would not be detected in CVL fluid on day 7 (depth of
coverage, 206) due to chance. We found there would be a 0.05% probability of not
detecting an iSNV with a frequency of 3.7% (the highest-frequency iSNV in day 1 CVL
fluid) and a 16% probability of not detecting an iSNV with a frequency of 0.9% (the
median frequency of iSNVs in day 1 CVL fluid, with 10 iSNVs in day 1 CVL fluid detected
at this frequency or higher). It therefore seems unlikely that sequencing depth alone
FIG 5 Comparison of SGA and metagenomic sequencing for iSNV detection. For each iSNV in the
validation sample, Fisher’s exact test was used to compare the frequency of the iSNV detected by SGA
with the frequency of the iSNV detected by metagenomic sequencing. The plot shows a frequency
distribution of the resulting Pvalues for the observed comparisons, as well as Pvalues for simulations
performed to assess whether the differences between the two methods were more than could be
expected from sampling variance.
FIG 6 Frequencies of iSNVs in subject A (a) and subject B (b). Each row represents one sample, and each column represents one iSNV position; invariant
positions are not shown. iSNV frequencies are indicated by color. No chart is shown for subject C because no iSNVs were detected. Table S2 contains further
information, including the exact position of each iSNV and its consensus and variant alleles.
HIV-1 Diversity in Acute Infection Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 7
accounted for the lack of iSNVs in day 7 CVL fluid compared to day 1 CVL fluid,
supporting a biological decrease in HIV-1 diversity between day 1 and day 7 in the FGT.
We did not perform a similar comparison for plasma samples, because the sequencing
depth was too low to make meaningful comparisons (mean depths of coverage, 37
for day 1 plasma and 50for day 7 plasma).
In subject C, we did not identify any iSNVs in either CVL fluid or plasma at day 7, the
earliest time point available to assess diversity (see Table S2). Both samples were
sequenced to high depth (mean depths of coverage, 1,904from day 7 plasma and
424from day 7 CVL fluid), so our results likely reflect truly low HIV-1 diversity in both
CVL fluid and plasma in this subject at day 7. These results are consistent with those for
subject A, in whom very few iSNVs were detected by day 11, and subject B, in whom
no iSNVs were detected at day 7.
Within-sample HIV-1 complex variants. In addition to iSNVs, we sought to identify
more complex intrahost variants by assembling reads de novo to capture regions of
high diversity or insertions/deletions that would have been missed by our standard
read mapping. We identified two complex variants in subject A, one in the gag gene
and one in env. The gag variant, a 36-bp in-frame deletion (Fig. 7a), was detected in
both plasma and CVL fluid at both time points and in both independent sequencing
libraries from each sample, arguing against its being a PCR artifact. Its frequency, based
on unique reads, was higher in CVL fluid than plasma at both time points (Fig. 7b, left).
FIG 7 gag and env variants in subject A. (a) (Top) Schematic showing the genome positions of the deletion in the
PTAP region of gag and the complex variant in env gp120 (prior to the V1 loop) in subject A, each marked with
an asterisk. (Bottom) Nucleotide positions are indicated relative to the HXB2 reference sequence. Red indicates
mismatches from subject A’s consensus sequence, and the red vertical line represents the 12-amino-acid deletion.
The italicized Ns indicate potential N-linked glycosylation sites. (b) Frequency of each variant in each sample from
subject A. (Left) Frequencies of the gag deletion. (Right) Frequencies of the env variant. For gag, the plot shows the
lower limit of the frequency of the deletion in each sample. Because the deletion occurred in an area of genome
duplication, not all reads could be unambiguously mapped; the upper limit of the frequency of the deletion in each
sample was approximately 25%. For env, all the reads could be unambiguously mapped, and the plot shows the
frequency of the variant in each sample. Each sample is identified by the number of days (D) after the first positive
HIV-1 fingerstick.
Piantadosi et al. Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 8
This deletion was found in the PTAP region of gag, which is known to harbor
duplications, particularly in HIV-1 subtype C viruses (23). PTAP duplication has been
associated with increased compensatory fitness in the setting of protease inhibitor
resistance (24), which was likely not a contributing factor in this subject, because
protease inhibitors are not commonly used in South Africa.
The env variant detected in subject A was located in gp120, outside the V1 loop on the
5=end (Fig. 7a). This variant, 346 bp in length, shared only 75% nucleotide identity with the
consensus HIV-1 sequence from subject A; it was equally different from the consensus
sequences in subjects B and C (73% and 74% nucleotide identity, respectively). Using BLAST
(NCBI), we found that the sequence was most similar (86% identity) to a subtype C HIV-1
isolate from South Africa (GenBank accession number AY463226.1). This was unlikely to be
a contaminant, because we identified reads bridging this env variant with the consen-
sus backbone on both the 5=and 3=ends, and other HIV-1 subtype C samples have not
been previously sequenced in our laboratory. The env variant was present at low
frequency overall, and similar to most of the iSNVs in subject A, it decreased in
frequency between CVL fluid and plasma and between day 4 and day 11 (Fig. 7b, right).
We did not identify any complex variants in subjects B and C.
Metagenomic analysis. We performed metagenomic classification of all nonhuman
sequencing reads to characterize microbial diversity in the FGT. In addition to using
water controls, we analyzed each CVL sample using its paired plasma sample as an
internal control. CVL samples had overall increased microbial diversity compared to
plasma, including a number of unique taxa that are commonly associated with the FGT
environment (Fig. 8). These taxa, which were not detected in water controls, include
commensal organisms, such as Porphyromonas and Prevotella, as well as organisms that
may be commensal or associated with the disease state, such as Ureaplasma and
Mycoplasma. Of note, our methods were not optimized to detect all bacterial species
because we used CVL supernatants rather than cell pellets and we did not perform
stringent lysis procedures, such as bead beating, that are necessary for some bacteria.
Perhaps for these reasons, we did not detect high abundances of the genera Prevotella,
FIG 8 Metagenomic profiles of CVL fluid and plasma over time. (a) Results of metagenomic classification, indicating the log proportions of genera frequently
found in CVL fluid after removal of human, HIV-1, and contaminating Burkholderiales reads. The columns represent genera, and the rows represent samples.
Each sample is identified by subject identifier–number of days after the first positive HIV-1 fingerstick (D)–sample type. (b) PCA of centered log ratio-transformed
genus abundance proportions for each sample.
HIV-1 Diversity in Acute Infection Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 9
Gardnerella, and Gemella, as had previously been detected in this cohort by 16S rRNA
amplification and sequencing (18). Interestingly, the CVL sample from subject C on day
11 did not have high abundances of CVL fluid-associated taxa, and the sample clustered
between plasma and CVL samples on principal-component analysis (PCA) (Fig. 8). The
observed reduction in CVL fluid-associated taxa may be the result of administration of
antibiotics (which was not noted at the study visit) or other interventions that would
drastically reduce the microbial diversity in the FGT.
In some CVL samples, we detected reads from pathogens known to cause STIs,
which were also not detected in our water controls. We detected Trichomonas and its
associated bacteriophage in subjects A and B, who were also found to have Trichomo-
nas by clinical STI screening. We detected Chlamydia trachomatis in subjects A and C,
which had not been detected by clinical screening using nucleic acid amplification.
Subject A was clinically diagnosed with Neisseria gonorrhoeae infection; although we
detected reads from Neisseria, we were not able to definitively classify these as N.
gonorrhoeae due to high rRNA sequence similarity among nonpathogenic species
within the genus Neisseria. Furthermore, we consistently observed low levels of back-
ground reads from the genus Neisseria in water controls, as well as CVL and plasma
samples from all three subjects (Fig. 8a). Therefore, our classification methods are not
specific enough to confidently identify N. gonorrhoeae.
Overall, although our current methods were not optimized for sequencing all
bacteria, we did detect known representatives of the FGT microbiome and STI, provid-
ing proof of concept for the use of metagenomic sequencing in evaluating complex
sites, such as the FGT, that contain viral and protozoan populations, which would not
be detected by bacterial 16S rRNA sequencing.
DISCUSSION
Overall, we found little diversity in the HIV-1 RNA quasispecies populations in both
the FGT (CVL samples) and plasma in three women with very early HIV-1 infection. Our
findings are compatible with a very early HIV-1 transmission bottleneck, which was
mostly complete by the time viremia was detectable in these three individuals. Because
our study design did not include samples from the transmitting partners, we cannot
exclude the possibility that the subjects were exposed to a relatively homogeneous
viral population (e.g., if the transmitting partners had acute infections themselves). We
also were not able to assess contributions to the HIV-1 transmission bottleneck prior to
the FGT or assess features of transmitted viruses compared to nontransmitted viruses,
as in prior studies that included transmission pairs (1, 3).
Nevertheless, by comparing HIV-1 populations in the FGT—the site of HIV-1 acqui-
sition—and blood compartments over the first 1 to 2 weeks after viremia was detect-
able, we observed a reduction in HIV-1 diversity suggestive of a bottleneck during
hyperacute infection. This was most evident in subject A, who had clearly demonstrable
low-frequency HIV-1 quasispecies diversity in the FGT sampled 4 days after detection of
viremia, including multiple sets of iSNVs linked on 100-bp sequencing reads and a
complex 300-bp env variant. Because it would be unlikely for multiple mutations to
occur so closely together on the same HIV-1 template this early in infection, many of
these variants were likely acquired from the transmitting partner. The complex env
variant was quite divergent from the consensus env sequence, but reads were present
that bridged it to the consensus backbone on both sides, arguing against contamina-
tion. We cannot exclude the possibility that this variant was transmitted by a different
partner in close temporal proximity and subsequently recombined with the consensus
variant. The disappearance of the env variant and other sets of linked iSNVs in the
plasma by day 11 supports the presence of a bottleneck between the FGT and blood
at this very early time in infection. However, the bottleneck was not absolute; in subject
A, three unlinked iSNVs and a gag deletion persisted at a frequency of 1% to 5% of the
HIV-1 population in both compartments through day 11. We therefore characterize
subject A as having multiple transmitted viruses (detected in day 4 CVL fluid) and a
smaller number of founder viruses (detected in day 11 CVL fluid and plasma). It remains
Piantadosi et al. Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 10
unclear whether the low-frequency founder viruses, which contained unlinked iSNVs,
were transmitted or arose through replication in the FGT during the eclipse phase prior
to the development of viremia.
In subject B, we observed HIV-1 quasispecies diversity in the FGT at day 1, but not
at day 7 or in plasma at either time point, though these samples had lower HIV-1 loads
and consequently lower depth of coverage in sequencing. In subject C, we observed no
HIV-1 quasispecies diversity in the FGT or blood 1 week after the detection of viremia.
We therefore characterize subjects B and C as having single founder viruses. Interest-
ingly, all three women in our study had little HIV-1 diversity despite using hormonal
contraceptives and despite the presence of concurrent STIs; these factors have been
associated with acquisition of a more diverse HIV-1 population in prior studies (12, 13).
Our study was limited by the inclusion of only three individuals. Notably, however,
the FRESH study design uniquely enabled us to investigate samples from the blood and
FGT within the first 2 weeks after viremia was detectable. Prior studies of the HIV-1
transmission bottleneck have examined samples several months after infection, mostly
from blood. Recently, Klein et al. observed a greater number of distinct HIV-1 env
C2-V3-C3 clones present in the FGT than in blood in 12 women with matched cervical
and plasma samples collected between 0 and 3 months after infection (10). Similarly,
we found greater HIV-1 diversity in FGT than in blood in two out of three subjects for
at least one time point. Interestingly, however, Klein et al. reported that the predom-
inant HIV-1 clone in the FGT was most often different from the predominant HIV-1
clone in blood, whereas we found the same HIV-1 consensus sequence in FGT and
blood in all three subjects and at all the time points examined. It is possible that by
chance we investigated three subjects who did not acquire as diverse an HIV-1
population as the subjects in the earlier study. The difference could also be explained
by compartmentalized evolution between the time of infection and sampling, with a
shorter period available for compartmentalized evolution in our study.
Our results are consistent with prior animal studies, in which macaques vaginally
infected with a diverse simian immunodeficiency virus (SIV) population were found to
have a reduced number of viral variants within weeks after inoculation, both in the
plasma (25, 26) and in the vaginal tract (27). Given the challenge of capturing similarly
early time points in human studies, our study represents the earliest examination of
HIV-1 quasispecies in the human FGT, to our knowledge. Our results suggest that future
work is needed to investigate viral diversity within the FGT even earlier in infection,
ideally including FGT samples collected prior to peak viral load in the FGT and prior to
the development of viremia, with comparison to the transmitting partner samples.
Our sequencing and analytic methods offer several advances for the field of HIV-1
genome sequencing. Metagenomic sequencing has previously been employed to
assemble consensus HIV-1 genomes (28, 29), and here, we expand the use of this
technique to quantify iSNVs from high-depth next-generation sequencing (NGS) in
HIV-1 infection. Historically, HIV-1 quasispecies diversity has been assessed by endpoint
dilution to single-genome templates, which limits the number of templates that can be
assessed, or, alternatively, by PCR amplification followed by NGS, which can be limited
by amplification bias, recombination, and resampling. Our sequencing libraries are
randomly generated so that each read that is derived from a unique template has a
unique start position and end position, allowing removal of the duplicates that are
generated by limited-cycle PCR. This approach is conceptually similar to primer ID (30)
but allows identification of unique reads based on characteristics of the starting and
ending positions of the reads themselves.
An additional benefit of this approach is the opportunity to perform metagenomic
analysis of other organisms in a sample, which is especially important in HIV-1 trans-
mission at microbially diverse sites, such as the FGT. Although our current methods
were not optimized for sequencing bacteria, we did detect known representatives of
the FGT microbiome, including organisms causing sexually transmitted infections. A
more comprehensive evaluation of bacterial species could be achieved with the
addition of upstream processing steps, e.g., bead beating for bacterial lysis. Unlike
HIV-1 Diversity in Acute Infection Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 11
bacterial 16S rRNA sequencing, metagenomic sequencing offers the potential to inter-
rogate a wide range of microbes, including protozoa and viruses. This technique,
therefore, has broad application to the study of HIV-1, including viral population
diversity and coinfections.
In conclusion, we utilized metagenomic sequencing to study HIV-1 populations
present in the blood and FGT during the earliest stages of acute infection and observed
little viral diversity, supporting a very early transmission bottleneck.
MATERIALS AND METHODS
Ethics statement. The study protocol was approved by the Biomedical Research Ethics Committee
of the University of KwaZulu-Natal and the Partners Institutional Review Board (IRB) (2012P001812/MGH).
The Broad Institute of MIT and Harvard has a standing reliance agreement with Partners through which
it relied on the Partners IRB to provide a review of this study. Written informed consent was obtained
from all participants following the explanation of the nature and possible consequences of the study; all
the participants were 18 years of age or older.
Study cohort. The women were enrolled in the FRESH study, a prospective observational study
conducted near Durban, South Africa (19). The participants received HIV-1 infection prevention coun-
seling, and both female and male condoms were provided at the study site. To be eligible for the study,
participants had to be female, 18 to 23 years old, HIV-1 uninfected, and sexually active. They further had
to be willing to adhere to study requirements, to have HIV-1 tests performed twice per week, and to have
samples stored. Exclusion criteria included pregnancy, anemia, enrollment in another study, and en-
gagement in full-time employment or school. None of the subjects included had a known history of
injection drug use.
Clinical procedures. Twice per week, the participants attended classes focused on HIV-1 infection
prevention, personal empowerment, and job skills training. At each visit, they underwent a finger prick
blood draw for quantitative HIV-1 RNA testing. Every 3 months, the participants had a peripheral blood
draw and pelvic examination (not performed during menstruation) that included the collection of CVL
fluid for sampling from the FGT. CVL fluid was obtained by washing the cervicovaginal walls with 5 ml
of sterile saline, which was then centrifuged at 1,700 rpm at 4°C to pellet cells, and the cell-free
supernatant, containing free viral particles, was used for RNA sequencing and further analysis in the
study (18–20). The participants also completed a detailed HIV-1 risk questionnaire, which was adminis-
tered by a counselor; it included STI history, sexual behavior, family planning, use of antibiotics, and diet.
Upon detection of a positive HIV-1 RNA test result, the participants underwent blood collection and
pelvic examinations with CVL sample collection at 1, 2, 3, 5, 9, 12, 24, 36, and 48 weeks postdetection.
Measurement of HIV-1 load in CVL fluid and plasma. HIV-1 clinical viral load testing in plasma was
performed by the Global Clinical Viral Laboratory, South Africa, as previously described (18). Viral RNA
was extracted from 500 to 1,000
l of plasma and 500
l of CVL supernatant using a QIAamp Viral RNA
Mini Kit (Qiagen) according to the manufacturer’s instructions, including a step for on-column DNase.
HIV-1 RNA was quantified by one-step quantitative reverse transcription (qRT)-PCR using a QuantiFast
SYBR Green RT-PCR kit (Qiagen) and the following gag primers (from the Amplicor HIV-1 Monitor viral
load test): SK145 primer (forward), AGTGGGGGGACATCAAGCAGCCATGCAAAT; SK431 primer (reverse),
TGCTATGTCACTTCCCCTTGGTTCTCT (IDT). PCR conditions were as follows: (i) RT, 50°C for 10 min; (ii)
reactivation, 95°C for 5 min; (iii) 40 cycles of 95°C for 10 s and 60°C for 30 s; (iv) melting curve, 95°C for
15 s, 60°C for 15 s, and then ramp to 95°C; (v) cooling, 40°C for 30s. Precise calculation of viral copy
numbers was achieved by using a standard curve derived from a linear, nearly full-length plasmid HIV-1B
genome fragment that had been prepared by digestion of an HIV-1B infectious molecular clone (pNL4-3
and pHXB2-RU3), gel purification, and quantification by spectrophotometry (Nanodrop; Qubit).
Metagenomic sequencing. Sequencing libraries were constructed from 5
l of RNA, corresponding
to a starting input of at least 25,000 HIV-1 copies (and in most cases at least 250,000 HIV-1 copies). The
library construction methods have been previously described (31, 32). Briefly, carrier RNA [poly(rA)] that
had been introduced during the RNA extraction process was depleted using 40-nucleotide (nt) oligo(dT)
probes and Hybridase thermostable RNase H (Epicentre), followed by RNase-free DNase treatment
(Qiagen). cDNA was constructed using random-hexamer primers and SuperScript III (Invitrogen) for
first-strand synthesis, followed by second-strand synthesis (NEB). There is no amplification during this
step. Sequencing libraries were generated using the Nextera XT DNA Library Prep kit (Illumina), which
uses transposases to randomly fragment the cDNA, resulting in unique start and end positions of the
cDNA fragments prior to any amplification steps. Sequencing adapters including dual indexes were
added, and the cDNA fragments underwent amplification with 16 cycles of PCR. The libraries were
quantified using a KAPA universal complete kit (Roche), pooled to equal concentration with 4 to 10
samples per lane, and sequenced on an Illumina MiSeq or HiSeq using paired-end 101-bp reads. As a
negative control, a water sample was included with each batch of library construction and sequencing.
Duplicate independent sequencing libraries were made from each RNA sample. Reads from duplicate
libraries were merged and analyzed together to assemble consensus HIV-1 genome sequences. Duplicate
libraries were also analyzed independently to verify the presence of iSNVs, as described below.
Metagenomic analysis. Reads were taxonomically classified using a combination of BWA (42),
Kraken (43), and DIAMOND (44). The first-pass classification was via bwa mem v0.7.15 with a custom
database containing SILVA (45) LSU and SSU Ref rRNA sequences (release 128), the whole human
genome (hg38), and all the sequences from NCBI’s viral accession list (46) with a human host as of
Piantadosi et al. Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 12
October 2015. PCR and optical duplicates were removed via Picard MarkDuplicates v2.6.0. The remaining
reads were further classified with DIAMOND v0.8.18.80 on NCBI’s nr database. Lastly, reads were classified
using Kraken v0.10.6 on a custom database built on the default “full” database containing all RefSeq
whole genomes of bacteria and viruses as of October 2015. All Kraken classified reads were additionally
subjected to a kraken-filter step with a threshold of 0.05 to reduce false-positive hits. Sequences added
to the “full” Kraken database include all whole chromosomes from PlasmoDB (47) and RefSeq (48) whole
genomes of fungi, protozoa, and plasmids (October 2015). Reads that had hits to multiple taxa were
assigned to the hits’ lowest common ancestor taxon. Taxa that had 0.01% cumulative read abundance
had their reads pushed up the taxonomy tree until all nodes contained at least 0.01% read abundance.
Given the high sensitivity and unbiased nature of metagenomic sequencing, we and others (33–36)
frequently detect microbial reads that are present as background in reagents or the laboratory environ-
ment. Failure to account for this background can lead to erroneous identification of pathogenic
microbes, as recently described (37). For this study, we addressed the presence of background microbial
reads in two ways. First, we included a water sample as a negative control with each sequencing batch,
which went through the entire library construction and sequencing process. Microbial species that we
commonly detect in water controls include the genera Burkholderia,Ralstonia,Cupriavidus, and Pseu-
domonas. Second, we analyzed each CVL sample using its paired plasma sample as an internal control.
HIV-1 genome assembly. HIV-1 genomes were assembled using a published and freely available
pipeline called viral-ngs (38), described in detail at https://viral-ngs.readthedocs.io. Briefly, samples were
demultiplexed, and reads from human and known laboratory microbial contaminants (e.g., Escherichia
coli and Pseudomonas fluorescens) were removed. For each sample, reads from duplicate independent
sequencing libraries were combined. The consensus HIV-1 genome sequence of the viral population from
each sample was constructed by de novo assembly using reads that matched a database of HIV-1
reference genomes (GenBank accession numbers are provided in File S1 in the supplemental material).
Assemblies were completed by scaffolding de novo contigs against a subtype C reference genome
(GenBank accession no. AF286227), followed by two rounds of refinement with the unfiltered reads.
Consensus sequences represent the full HIV-1 genome, excluding long terminal repeats (LTRs), which
could not be unambiguously assembled using this method.
To calculate the depth of sequencing and analyze within-sample variants, all HIV-1 reads from a
sample were mapped to the consensus HIV-1 genome from that sample using Novoalign (Novocraft).
Because cDNA fragments generated by Nextera XT library preparation have unique start and end
positions, all reads with the same start and end positions were collapsed to their consensus, allowing
removal of PCR duplicates and correction for errors generated during PCR and sequencing. The reported
HIV-1 genome coverage is based on unique (deduplicated) reads.
Phylogenetic analysis. Consensus genome sequences were aligned to reference sequences using
Geneious 8.1.7 (Biomatters). Regions that could not be unambiguously aligned were manually removed.
A maximum-likelihood tree was constructed using PhyML (39) with automatic model selection (GTR) and
1,000 bootstrap replicates.
Identification and analysis of iSNVs. iSNVs were identified using V-Phaser2 (22). To distinguish true
iSNVs from sequencing errors, iSNVs were restricted to those present in two independent sequencing
libraries and in at least one forward and one reverse read and with forward or reverse strand bias of
5-fold or less. We also restricted our analysis to iSNVs present at 0.5% frequency or greater, since
lower-frequency iSNVs could not be confidently identified with the above-mentioned criteria and the
sequencing depth achieved for these samples. Sites meeting these criteria were manually inspected and
removed from the final iSNV list if their positions in reads did not appear to be evenly distributed across
the read length (40). For each sample, Shannon entropy was calculated as the sum across all iSNVs of
negative ln(frequency) times frequency and then was divided by the total sequence length to yield the
average Shannon entropy.
Validation of the iSNV identification method. To validate the method described above for
identifying iSNVs, metagenomic sequencing was performed from RNA from a plasma sample from a
patient with Feibig stage IV infection in a different cohort. This sample had previously undergone SGA
and 454 sequencing of the 5=half of the genome, generating a total of 19 sequences, among which 49
iSNVs were identified (2). Metagenomic sequencing was performed as described above to a moderate
depth (45) in order to compare iSNV detection between the two methods. For each of the 5 iSNVs
identified by SGA but not by metagenomic sequencing, we manually inspected the metagenomic
sequencing reads.
To assess whether the differences between the two methods were more than could be expected
from sampling variance, we applied Fisher’s exact test to each iSNV, including those that failed the
duplicate library and strand bias filters. To better understand the resulting Pvalue distribution, we
simulated the comparison between the two methods. SGA and metagenomic sequencing results were
both modeled as binomial random variates, based on allele frequency. A set of 49 samples was
simulated, with coverage and allele frequencies taken from the observed data (the allele frequency was
estimated as the mean of those measured by the two methods); the coverage for the two metagenomic
libraries was also taken from the data. Metagenomic sequencing had two filters imposed: both alleles
had to be seen in two libraries, and both had to be seen on at least one forward and one reverse read.
Forward and reverse strand assignment was modeled as a binomial random variate with no strand bias.
Identification and analysis of complex intrahost variants. In addition to a full-length consensus
genome, viral-ngs also assembled subgenomic contigs using Trinity (version 2011-11-16), a transcriptome
assembly program that can produce alternative contigs often used to identify splice isoforms in eukaryotic
transcripts. This analysis allowed detection of complex variants that would have been missed in the initial
HIV-1 Diversity in Acute Infection Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 13
mapping of HIV-1 reads to the consensus genome from each sample, either due to the presence of
insertions/deletions or due to high diversity that would have been excluded by our Novoalign (Novo-
craft) parameters (-r Random -l 40 -g 40 -x 20 -t 1,500 -k). To assess the frequency of each variant
identified, all the sequencing reads were negatively filtered for human and known laboratory contam-
inants, and these reads were aligned independently to the consensus and variant genomes. The variant
frequency was defined as the percentage of reads that mapped to the variant genome unambiguously.
Accession number(s). The HIV-1 consensus genome sequences are available under GenBank acces-
sion numbers MH933704 to MH933714, and all the metagenomic sequencing reads (cleaned of human
reads) are available on NCBI under BioProject PRJNA473698.
SUPPLEMENTAL MATERIAL
Supplemental material for this article may be found at https://doi.org/10.1128/JVI
.00804-18.
SUPPLEMENTAL FILE 1, XLS file, 0.1 MB.
SUPPLEMENTAL FILE 2, XLSX file, 0.04 MB.
SUPPLEMENTAL FILE 3, XLSX file, 0.1 MB.
ACKNOWLEDGMENTS
We thank the FRESH study staff and the women who participated in this study. We
thank Morgane Rolland for very helpful comments on the manuscript and Christopher
Tomkins-Tinch for valuable assistance with the viral-ngs analysis pipeline.
This work was supported by National Institute of Allergy and Infectious Diseases
grants R01AI11918 (D.S.K.), U19AI110818 (P.C.S.), and T32 AI007387-26 (A.P.) as well as
the Bill and Melinda Gates Foundation and the Burroughs Wellcome Fund.
T.M.A.’s spouse was an employee of Bristol Myers Squibb (BMS), which has a focus
in virology, specifically, treatments for hepatitis B and C and HIV/AIDS. T.M.A.’s spouse
no longer works for BMS and retained only a small stock interest in the public company.
T.M.A.’s interests were reviewed and managed by Massachusetts General Hospital and
Partners HealthCare, in accordance with their conflict of interest policies.
REFERENCES
1. Carlson JM, Schaefer M, Monaco DC, Batorsky R, Claiborne DT, Prince
J, Deymier MJ, Ende ZS, Klatt NR, DeZiel CE, Lin T-H, Peng J, Seese AM,
Shapiro R, Frater J, Ndung’u T, Tang J, Goepfert P, Gilmour J, Price MA,
Kilembe W, Heckerman D, Goulder PJR, Allen TM, Allen S, Hunter E.
2014. HIV transmission. Selection bias at the heterosexual HIV-1
transmission bottleneck. Science 345:1254031. https://doi.org/10.1126/
science.1254031.
2. Tully DC, Ogilvie CB, Batorsky RE, Bean DJ, Power KA, Ghebremichael M,
Bedard HE, Gladden AD, Seese AM, Amero MA, Lane K, McGrath G,
Bazner SB, Tinsley J, Lennon NJ, Henn MR, Brumme ZL, Norris PJ,
Rosenberg ES, Mayer KH, Jessen H, Kosakovsky Pond SL, Walker BD,
Altfeld M, Carlson JM, Allen TM. 2016. Differences in the selection
bottleneck between modes of sexual transmission influence the genetic
composition of the HIV-1 founder virus. PLoS Pathog 12:e1005619.
https://doi.org/10.1371/journal.ppat.1005619.
3. Deymier MJ, Ende Z, Fenton-May AE, Dilernia DA, Kilembe W, Allen SA,
Borrow P, Hunter E. 2015. Heterosexual transmission of subtype C HIV-1
selects consensus-like variants without increased replicative capacity or
interferon-
resistance. PLoS Pathog 11:e1005154. https://doi.org/10
.1371/journal.ppat.1005154.
4. Janes H, Herbeck JT, Tovanabutra S, Thomas R, Frahm N, Duerr A, Hural
J, Corey L, Self SG, Buchbinder SP, McElrath MJ, O’Connell RJ, Paris RM,
Rerks-Ngarm S, Nitayaphan S, Pitisuttihum P, Kaewkungwal J, Robb ML,
Michael NL, Mullins JI, Kim JH, Gilbert PB, Rolland M. 2015. HIV-1
infections with multiple founders are associated with higher viral loads
than infections with single founders. Nat Med 21:1139 –1141. https://doi
.org/10.1038/nm.3932.
5. Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar
MG, Sun C, Grayson T, Wang S, Li H, Wei X, Jiang C, Kirchherr JL, Gao F,
Anderson JA, Ping L-H, Swanstrom R, Tomaras GD, Blattner WA, Goepfert
PA, Kilby JM, Saag MS, Delwart EL, Busch MP, Cohen MS, Montefiori DC,
Haynes BF, Gaschen B, Athreya GS, Lee HY, Wood N, Seoighe C, Perelson
AS, Bhattacharya T, Korber BT, Hahn BH, Shaw GM. 2008. Identification
and characterization of transmitted and early founder virus envelopes in
primary HIV-1 infection. Proc Natl Acad Sci U S A 105:7552–7557. https://
doi.org/10.1073/pnas.0802203105.
6. Zhu T, Mo H, Wang N, Nam DS, Cao Y, Koup RA, Ho DD. 1993. Genotypic
and phenotypic characterization of HIV-1 in patients with primary infec-
tion. Science 261:1179 –1181. https://doi.org/10.1126/science.8356453.
7. Herbeck JT, Rolland M, Liu Y, McLaughlin S, McNevin J, Zhao H, Wong K,
Stoddard JN, Raugi D, Sorensen S, Genowati I, Birditt B, McKay A, Diem
K, Maust BS, Deng W, Collier AC, Stekler JD, McElrath MJ, Mullins JI. 2011.
Demographic processes affect HIV-1 evolution in primary infection be-
fore the onset of selective processes. J Virol 85:7523–7534. https://doi
.org/10.1128/JVI.02697-10.
8. Long EM, Martin HL, Kreiss JK, Rainwater SM, Lavreys L, Jackson DJ,
Rakwar J, Mandaliya K, Overbaugh J. 2000. Gender differences in HIV-1
diversity at time of infection. Nat Med 6:71–75. https://doi.org/10.1038/
71563.
9. Poss M, Martin HL, Kreiss JK, Granville L, Chohan B, Nyange P, Mandaliya
K, Overbaugh J. 1995. Diversity in virus populations from genital secre-
tions and peripheral blood from women recently infected with human
immunodeficiency virus type 1. J Virol 69:8118 8122.
10. Klein K, Nickel G, Nankya I, Kyeyune F, Demers K, Ndashimye E, Kwok C,
Chen P, Rwambuya S, Poon A, Munjoma M, Chipato T, Byamugisha J,
Mugyenyi P, Salata RA, Morrison CS, Arts EJ. 2018. Higher sequence
diversity in the vaginal tract than in blood at early HIV-1 infection. PLoS
Pathog 14:e1006754. https://doi.org/10.1371/journal.ppat.1006754.
11. Joseph SB, Swanstrom R, Kashuba ADM, Cohen MS. 2015. Bottlenecks in
HIV-1 transmission: insights from the study of founder viruses. Nat Rev
Microbiol 13:414 425. https://doi.org/10.1038/nrmicro3471.
12. Sagar M, Lavreys L, Baeten JM, Richardson B, Mandaliya K, Ndinya-Achola
JO, Kreiss JK, Overbaugh J. 2004. Identification of modifiable factors that
affect the genetic diversity of the transmitted HIV-1 population. AIDS
18:615– 619. https://doi.org/10.1097/00002030-200403050-00005.
13. Haaland RE, Hawkins PA, Salazar-Gonzalez J, Johnson A, Tichacek A,
Karita E, Manigart O, Mulenga J, Keele BF, Shaw GM, Hahn BH, Allen SA,
Derdeyn CA, Hunter E. 2009. Inflammatory genital infections mitigate a
Piantadosi et al. Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 14
severe genetic bottleneck in heterosexual transmission of subtype A and
C HIV-1. PLoS Pathog 5:e1000274. https://doi.org/10.1371/journal.ppat
.1000274.
14. Boeras DI, Hraber PT, Hurlston M, Evans-Strickfaden T, Bhattacharya T,
Giorgi EE, Mulenga J, Karita E, Korber BT, Allen S, Hart CE, Derdeyn CA,
Hunter E. 2011. Role of donor genital tract HIV-1 diversity in the trans-
mission bottleneck. Proc Natl Acad SciUSA108:E1156 –E1163. https://
doi.org/10.1073/pnas.1103764108.
15. Bull ME, Heath LM, McKernan-Mullin JL, Kraft KM, Acevedo L, Hitti JE,
Cohn SE, Tapia KA, Holte SE, Dragavon JA, Coombs RW, Mullins JI,
Frenkel LM. 2013. Human immunodeficiency viruses appear compart-
mentalized to the female genital tract in cross-sectional analyses but
genital lineages do not persist over time. J Infect Dis 207:1206 –1215.
https://doi.org/10.1093/infdis/jit016.
16. Poss M, Rodrigo AG, Gosink JJ, Learn GH, de Vange Panteleeff D,
Martin HL, Bwayo J, Kreiss JK, Overbaugh J. 1998. Evolution of
envelope sequences from the genital tract and peripheral blood of
women infected with clade A human immunodeficiency virus type 1.
J Virol 72:8240 8251.
17. Ndhlovu ZM, Kamya P, Mewalal N, Kløverpris HN, Nkosi T, Pretorius K,
Laher F, Ogunshola F, Chopera D, Shekhar K, Ghebremichael M, Ismail N,
Moodley A, Malik A, Leslie A, Goulder PJR, Buus S, Chakraborty A, Dong
K, Ndung’u T, Walker BD. 2015. Magnitude and kinetics of CD8
T cell
activation during hyperacute HIV infection impact viral set point. Immu-
nity 43:591– 604. https://doi.org/10.1016/j.immuni.2015.08.012.
18. Anahtar MN, Byrne EH, Doherty KE, Bowman BA, Yamamoto HS,
Soumillon M, Padavattan N, Ismail N, Moodley A, Sabatini ME, Ghe-
bremichael MS, Nusbaum C, Huttenhower C, Virgin HW, Ndung’u T,
Dong KL, Walker BD, Fichorova RN, Kwon DS. 2015. Cervicovaginal
bacteria are a major modulator of host inflammatory responses in the
female genital tract. Immunity 42:965–976. https://doi.org/10.1016/j
.immuni.2015.04.019.
19. Dong KL, Moodley A, Kwon DS, Ghebremichael MS, Dong M, Ismail N,
Ndhlovu ZM, Mabuka JM, Muema DM, Pretorius K, Lin N, Walker BD,
Ndung’u T. 2018. Detection and treatment of Fiebig stage I HIV-1
infection in young at-risk women in South Africa: a prospective
cohort study. Lancet HIV 5:e35– e44. https://doi.org/10.1016/S2352
-3018(17)30146-7.
20. Gosmann C, Anahtar MN, Handley SA, Farcasanu M, Abu-Ali G, Bowman
BA, Padavattan N, Desai C, Droit L, Moodley A, Dong M, Chen Y, Ismail
N, Ndung’u T, Ghebremichael MS, Wesemann DR, Mitchell C, Dong KL,
Huttenhower C, Walker BD, Virgin HW, Kwon DS. 2017. Lactobacillus-
deficient cervicovaginal bacterial communities are associated with in-
creased HIV acquisition in young South African women. Immunity 46:
29 –37. https://doi.org/10.1016/j.immuni.2016.12.013.
21. Hansen KD, Brenner SE, Dudoit S. 2010. Biases in Illumina transcriptome
sequencing caused by random hexamer priming. Nucleic Acids Res
38:e131. https://doi.org/10.1093/nar/gkq224.
22. Yang X, Charlebois P, Macalalad A, Henn MR, Zody MC. 2013. V-Phaser 2:
variant inference for viral populations. BMC Genomics 14:674. https://
doi.org/10.1186/1471-2164-14-674.
23. Sharma S, Aralaguppe SG, Abrahams M-R, Williamson C, Gray C,
Balakrishnan P, Saravanan S, Murugavel KG, Solomon S, Ranga U. 2017.
The PTAP sequence duplication in HIV-1 subtype C Gag p6 in drug-naive
subjects of India and South Africa. BMC Infect Dis 17:95. https://doi.org/
10.1186/s12879-017-2184-4.
24. Martins AN, Waheed AA, Ablan SD, Huang W, Newton A, Petropoulos CJ,
Brindeiro RDM, Freed EO. 2016. Elucidation of the molecular mechanism
driving duplication of the HIV-1 PTAP late domain. J Virol 90:768 –779.
https://doi.org/10.1128/JVI.01640-15.
25. Stone M, Keele BF, Ma Z-M, Bailes E, Dutra J, Hahn BH, Shaw GM, Miller
CJ. 2010. A limited number of simian immunodeficiency virus (SIV) env
variants are transmitted to rhesus macaques vaginally inoculated with
SIVmac251. J Virol 84:7083–7095. https://doi.org/10.1128/JVI.00481-10.
26. Tsai L, Tasovski I, Leda A, Chin MP, Cheng-Mayer C. 2014. The number
and genetic relatedness of transmitted/founder virus impact clinical
outcome in vaginal R5 SHIVSF162P3N infection. Retrovirology 11:22.
https://doi.org/10.1186/1742-4690-11-22.
27. Enose Y, Ibukil K, Shimada T, Ui M, Hayami M. 1998. Genomic analysis of
the viral population in genital secretions early after infection of simian
immunodeficiency viruses in macaque monkeys. Microbiol Immunol
42:715–722. https://doi.org/10.1111/j.1348-0421.1998.tb02344.x.
28. Luk K-C, Berg MG, Naccache SN, Kabre B, Federman S, Mbanya D, Kaptué
L, Chiu CY, Brennan CA, Hackett J. 2015. Utility of metagenomic next-
generation sequencing for characterization of HIV and human pegivirus
diversity. PLoS One 10:e0141723. https://doi.org/10.1371/journal.pone
.0141723.
29. Manso CF, Bibby DF, Mbisa JL. 2017. Efficient and unbiased metag-
enomic recovery of RNA virus genomes from human plasma samples. Sci
Rep 7:4173. https://doi.org/10.1038/s41598-017-02239-5.
30. Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R. 2011. Accurate
sampling and deep sequencing of the HIV-1 protease gene using a
Primer ID. Proc Natl Acad SciUSA108:20166 –20171. https://doi.org/
10.1073/pnas.1110064108.
31. Gire SK, Goba A, Andersen KG, Sealfon RSG, Park DJ, Kanneh L, Jalloh S,
Momoh M, Fullah M, Dudas G, Wohl S, Moses LM, Yozwiak NL, Winnicki
S, Matranga CB, Malboeuf CM, Qu J, Gladden AD, Schaffner SF, Yang X,
Jiang P, Nekoui M, Colubri A, Coomber MR, Fonnie M, Moigboi A, Gbakie
M, Kamara FK, Tucker V, Konuwa E, Saffa S, Sellu J, Jalloh AA, Kovoma A,
Koninga J, Mustapha I, Kargbo K, Foday M, Yillah M, Kanneh F, Robert W,
Massally JLB, Chapman SB, Bochicchio J, Murphy C, Nusbaum C, Young
S, Birren BW, Grant DS, Scheiffelin JS, Lander ES, Happi C, Gevao SM,
Gnirke A, Rambaut A, Garry RF, Khan SH, Sabeti PC. 2014. Genomic
surveillance elucidates Ebola virus origin and transmission during the
2014 outbreak. Science 345:1369 –1372. https://doi.org/10.1126/science
.1259657.
32. Matranga CB, Andersen KG, Winnicki S, Busby M, Gladden AD, Tewhey R,
Stremlau M, Berlin A, Gire SK, England E, Moses LM, Mikkelsen TS, Odia
I, Ehiane PE, Folarin O, Goba A, Kahn S, Grant DS, Honko A, Hensley L,
Happi C, Garry RF, Malboeuf CM, Birren BW, Gnirke A, Levin JZ, Sabeti PC.
2014. Enhanced methods for unbiased deep sequencing of Lassa and
Ebola RNA viruses from clinical and biological samples. Genome Biol
15:519. https://doi.org/10.1186/PREACCEPT-1698056557139770.
33. Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, Turner
P, Parkhill J, Loman NJ, Walker AW. 2014. Reagent and laboratory
contamination can critically impact sequence-based microbiome analy-
ses. BMC Biol 12:1–12.
34. Laurence M, Hatzis C, Brash DE. 2014. Common contaminants in
next-generation sequencing that hinder discovery of low-abundance
microbes. PLoS One 9:e97876. https://doi.org/10.1371/journal.pone
.0097876.
35. Piantadosi A, Kanjilal S, Ganesh V, Khanna A, Hyle EP, Rosand J, Bold T,
Metsky HC, Lemieux J, Leone MJ, Freimark L, Matranga CB, Adams G,
McGrath G, Zamirpour S, Telford S, Rosenberg E, Cho T, Frosch MP,
Goldberg MB, Mukerji SS, Sabeti PC. 2018. Rapid detection of Powassan
virus in a patient with encephalitis by metagenomic sequencing. Clin
Infect Dis 66:789 –792. https://doi.org/10.1093/cid/cix792.
36. Kim D, Hofstaedter CE, Zhao C, Mattei L, Tanes C, Clarke E, Lauder A,
Sherrill-Mix S, Chehoud C, Kelsen J, Conrad M, Collman RG, Baldassano
R, Bushman FD, Bittinger K. 2017. Optimizing methods and dodging
pitfalls in microbiome research. Microbiome 5:52. https://doi.org/10
.1186/s40168-017-0267-5.
37. Wilson MR, O’Donovan BD, Gelfand JM, Sample HA, Chow FC, Betjemann
JP, Shah MP, Richie MB, Gorman MP, Hajj-Ali RA, Calabrese LH, Zorn KC,
Chow ED, Greenlee JE, Blum JH, Green G, Khan LM, Banerji D, Langelier
C, Bryson-Cahn C, Harrington W, Lingappa JR, Shanbhag NM, Green AJ,
Brew BJ, Soldatos A, Strnad L, Doernberg SB, Jay CA, Douglas V, Joseph-
son SA, DeRisi JL. 2018. Chronic meningitis investigated via metag-
enomic next-generation sequencing. JAMA Neurol 75: 947–955. https://
doi.org/10.1001/jamaneurol.2018.0463.
38. Park DJ, Dudas G, Wohl S, Goba A, Whitmer SLM, Andersen KG, Sealfon
RS, Ladner JT, Kugelman JR, Matranga CB, Winnicki SM, Qu J, Gire SK,
Gladden-Young A, Jalloh S, Nosamiefan D, Yozwiak NL, Moses LM, Jiang
P-P, Lin AE, Schaffner SF, Bird B, Towner J, Mamoh M, Gbakie M, Kanneh
L, Kargbo D, Massally JLB, Kamara FK, Konuwa E, Sellu J, Jalloh AA,
Mustapha I, Foday M, Yillah M, Erickson BR, Sealy T, Blau D, Paddock C,
Brault A, Amman B, Basile J, Bearden S, Belser J, Bergeron E, Campbell S,
Chakrabarti A, Dodd K, Flint M, Gibbons A, Goodman C, Klena J, McMul-
lan L, Morgan L, Russell B, Salzer J, Sanchez A, Wang D, Jungreis I,
Tomkins-Tinch C, Kislyuk A, Lin MF, Chapman S, MacInnis B, Matthews A,
Bochicchio J, Hensley LE, Kuhn JH, Nusbaum C, Schieffelin JS, Birren BW,
Forget M, Nichol ST, Palacios GF, Ndiaye D, Happi C, Gevao SM, Vandi
MA, Kargbo B, Holmes EC, Bedford T, Gnirke A, Ströher U, Rambaut A,
Garry RF, Sabeti PC. 2015. Ebola virus epidemiology, transmission, and
evolution during seven months in Sierra Leone. Cell 161:1516 –1526.
https://doi.org/10.1016/j.cell.2015.06.007.
39. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O.
2010. New Algorithms and mehtods to estimate maximum-likelihood
HIV-1 Diversity in Acute Infection Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 15
phylogenies: asessing the performance of PhyML 2.0. Syst Biol 59:
307–321. https://doi.org/10.1093/sysbio/syq010.
40. McCrone JT, Lauring AS. 2016. Measurements of intrahost viral diversity
are extremely sensitive to systematic errors in variant calling. J Virol
90:6884 6895. https://doi.org/10.1128/JVI.00667-16.
41. Fiebig EW, Wright DJ, Rawal BD, Garrett PE, Schumacher RT, Peddada L,
Heldebrant C, Smith R, Conrad A, Kleinman SH, Busch MP. 2003. Dynam-
ics of HIV viremia and antibody seroconversion in plasma donors: im-
plications for diagnosis and staging of primary HIV infection. AIDS
17:1871–1879. https://doi.org/10.1097/01.aids.0000076308.76477.b8.
42. Li H. 2013. Aligning sequence reads, clone sequences and assembly
contigs with BWA-MEM. arXiv arXiv:1303.3997. [q-bio].
43. Wood DE, Salzberg SL. 2014. Kraken: ultrafast metagenomic sequence
classification using exact alignments. Genome Biol 15:R46. https://doi
.org/10.1186/gb-2014-15-3-r46.
44. Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment
using DIAMOND. Nat Methods 12:59 60. https://doi.org/10.1038/nmeth
.3176.
45. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J,
Glöckner FO. 2013. The SILVA ribosomal RNA gene database project:
improved data processing and web-based tools. Nucleic Acids Res 41:
D590 –D596. https://doi.org/10.1093/nar/gks1219.
46. Brister JR, Ako-adjei D, Bao Y, Blinkova O. 2015. NCBI viral genomes
resource. Nucleic Acids Res 43:D571–D577. https://doi.org/10.1093/nar/
gku1207.
47. Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao
X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger
JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos
DS, Ross C, Stoeckert CJ, Treatman C, Wang H. 2009. PlasmoDB: a
functional genomic database for malaria parasites. Nucleic Acids Res
37:D539 –D543. https://doi.org/10.1093/nar/gkn814.
48. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput
B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao
Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell
CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali
VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O’Neill
K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A,
Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR,
Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio
M, Kitts P, Murphy TD, Pruitt KD. 2016. Reference sequence (RefSeq)
database at NCBI: current status, taxonomic expansion, and func-
tional annotation. Nucleic Acids Res 44:D733–D745. https://doi.org/
10.1093/nar/gkv1189.
Piantadosi et al. Journal of Virology
January 2019 Volume 93 Issue 2 e00804-18 jvi.asm.org 16
... Following primary infection, a population of viral variants, or quasispecies, becomes established in the host (Domingo and Perales, 2018). Historically, the viral population within an HIV-1-infected individual has been represented by the sequences obtained from single genome sequencing (SGS) or the consensus sequence obtained from ultra-deep sequencing (UDS) (Piantadosi et al., 2019;Wymant et al., 2018), which may not show the less frequent variations that occur at each nucleotide site (Capoferri et al., 2019;Lorenzo-Redondo et al., 2017;Wagner et al., 2013;Wagner et al., 2014). ...
... The iSNVs analysis which differed from the previous method, used counts and relative frequencies of iSNVs to indicate the intra-host viral diversity. In recent years, iSNVs have been widely used in studies of RNA viral quasispecies, such as Influenza virus (Gorman et al., 2016), Norovirus, Ebola virus (Ni et al., 2016), Dengue virus (Sim et al., 2015), Yellow Fever virus (Chen et al., 2018), and HIV (Piantadosi et al., 2019). iSNVs analysis can not J o u r n a l P r e -p r o o f only reflect intra-host dynamic changes at different infection stages, but also capture the differences between in plasma RNA and in cellular DNA. ...
Article
Full-text available
Objectives HIV quasispecies diversity presents a large barrier to eradicating HIV. The aim was to study the intra-host HIV quasispecies diversity and evolutionary patterns founder standing the mechanisms of viral pathogenesis during anti-retroviral therapy (ART). Methods Forty-five participants infected with HIV-1 were enrolled for more than 84 months’ follow-up cohort in 2004 and received a lamivudine-based first-line ART regimen. The blood samples were collected every six months for measurement of viral load and CD4 count. We used ultra-deep sequencing and phylogenetic analysis to characterize the dynamics governing quasispecies diversity of HIV-1 circulating between plasma RNA and cellular DNA of ART participants with either treatment failure (TF, n = 20) or virologic suppression (VS, n = 25). Results Analysis of the distribution of intra-host single-nucleotide variations (iSNVs), as well as their mutated allele frequencies revealed that ∼65% of the quasispecies co-occurred in plasma HIV RNA and cellular DNA either before or after ART. The number and frequency of iSNVs are more representative of intra-host HIV diversity and has better generalizability than the phylogenetic inference by measurements of phylogenetic associations. Furthermore, drug resistance-associated mutations (DRAMs) accumulated to high levels, dramatically increasing the DRAM-to-total-mutation ratio for TF patients. Linear regression analysis revealed that the emergent mutations accumulated faster in TF than in VS participants, at a rate of 0.02 mutations/day/kb. Conclusions Based on iSNVs analysis, the results from this study demonstrate the dynamics of intra-host HIV quasispecies diversity under ART and provided a novel insight on understanding HIV persistence and DRAMs development.
... Studies on the quasispecies diversity of other RNA viruses have found more abundant intra-host genetic variations. For example, metatranscriptomic sequencing results of HIV-1 in the blood and female genital tract have identified 77 iSNVs in an individual [74]. Research on intra-host dynamics of the Ebola virus during 2014 identified 710 iSNVs in 135 EBOV samples [56]. ...
Article
Full-text available
New SARS-CoV-2 mutants have been continuously indentified with enhanced transmission ever since its outbreak in early 2020. As an RNA virus, SARS-CoV-2 has a high mutation rate due to the low fidelity of RNA polymerase. To study the single nucleotide polymorphisms (SNPs) dynamics of SARS-CoV-2, 158 SNPs with high confidence were identified by deep meta-transcriptomic sequencing, and the most common SNP type was C > T. Analyses of intra-host population diversity revealed that intra-host quasispecies’ composition varies with time during the early onset of symptoms, which implicates viral evolution during infection. Network analysis of co-occurring SNPs revealed the most abundant non-synonymous SNP 22,638 in the S glycoprotein RBD region and 28,144 in the ORF8 region. Furthermore, SARS-CoV-2 variations differ in an individual’s respiratory tissue (nose, throat, BALF, or sputum), suggesting independent compartmentalization of SARS-CoV-2 populations in patients. The positive selection analysis of the SARS-CoV-2 genome uncovered the positive selected amino acid G251V on ORF3a. Alternative allele frequency spectrum (AAFS) of all variants revealed that ORF8 could bear alternate alleles with high frequency. Overall, the results show the quasispecies’ profile of SARS-CoV-2 in the respiratory tract in the first two months after the outbreak.
... RNA was extracted from the primary sample, and repeat SARS-CoV-2 testing was performed by triplex RT-PCR (2). Samples underwent RNA mNGS as previously described (3,4). Briefly, methods included DNase treatment, random primer cDNA synthesis, and Nextera XT tagmentation. ...
Article
Full-text available
Broad testing for respiratory viruses among persons under investigation (PUIs) for SARS-CoV-2 has been performed inconsistently, limiting our understanding of alternative viral infections and co-infections in these patients. RNA metagenomic next-generation sequencing (mNGS) offers an agnostic tool for the detection of both SARS-CoV-2 and other RNA respiratory viruses in PUIs. Herein, we used RNA mNGS to assess the frequencies of alternative viral infections in SARS-CoV-2 RT-PCR negative PUIs (n=30) and viral co-infections in SARS-CoV-2 RT-PCR positive PUIs (n=45). mNGS identified all viruses detected by routine clinical testing (Influenza A (N=3), Human metapneumovirus (N=2), Human coronavirus OC43 (N=2) and Human coronavirus HKU1(N=1)). mNGS also identified both co-infections (1, 2.2%) and alternative viral infections (4, 13.3%) that were not detected by routine clinical workup (Respiratory syncytial virus (N=3), Human metapneumovirus (N=1), Human coronavirus NL63 (N=1)). Among SARS-CoV-2 RT-PCR positive PUIs, lower cycle threshold (C T ) values correlated with greater SARS-CoV-2 read recovery by mNGS (R ² : 0.65, p -value: <0.001). Our results suggest that current broad-spectrum molecular testing algorithms identify most respiratory viral infections among SARS-CoV-2 PUIs, when available and implemented consistently.
... However, other works have supported other conflicting ideas with findings showing absence of compartmentalization of HIV-1 between the gut and blood (Avettand-Fenoel et al., 2011;Imamichi et al., 2011;Evering et al., 2012), providing evidence for cross infection between these two compartments (Chun et al., 2008). HIV-1 sequence diversity has been reported to be either higher (Klein et al., 2018) or similar (Piantadosi et al., 2019) to genital tract compared to blood. Viral compartmentalization between the blood and the male genital tract has been reported by multiple studies including SIV-infected macaques (Delwart et al., 1998;Paranjpe et al., 2002;Pillai et al., 2005;Coombs et al., 2006;Diem et al., 2008;Houzet et al., 2018). ...
Article
Full-text available
One of the most explored therapeutic approaches aimed at eradicating HIV-1 reservoirs is the “shock and kill” strategy which is based on HIV-1 reactivation in latently-infected cells (“shock” phase) while maintaining antiretroviral therapy (ART) in order to prevent spreading of the infection by the neosynthesized virus. This kind of strategy allows for the “kill” phase, during which latently-infected cells die from viral cytopathic effects or from host cytolytic effector mechanisms following viral reactivation. Several latency reversing agents (LRAs) with distinct mechanistic classes have been characterized to reactivate HIV-1 viral gene expression. Some LRAs have been tested in terms of their potential to purge latent HIV-1 in vivo in clinical trials, showing that reversing HIV-1 latency is possible. However, LRAs alone have failed to reduce the size of the viral reservoirs. Together with the inability of the immune system to clear the LRA-activated reservoirs and the lack of specificity of these LRAs, the heterogeneity of the reservoirs largely contributes to the limited success of clinical trials using LRAs. Indeed, HIV-1 latency is established in numerous cell types that are characterized by distinct phenotypes and metabolic properties, and these are influenced by patient history. Hence, the silencing mechanisms of HIV-1 gene expression in these cellular and tissue reservoirs need to be better understood to rationally improve this cure strategy and hopefully reach clinical success.
Article
Full-text available
Background The role of the human microbiome in health and disease is an emerging and important area of research; however, there is a concern that African populations are under-represented in human microbiome studies. We, therefore, conducted a systematic survey of African human microbiome studies to provide an overview and identify research gaps. Our secondary objectives were: (i) to determine the number of peer-reviewed publications; (ii) to identify the extent to which the researches focused on diseases identified by the World Health Organization [WHO] State of Health in the African Region Report as being the leading causes of morbidity and mortality in 2018; (iii) to describe the extent and pattern of collaborations between researchers in Africa and the rest of the world; and (iv) to identify leadership and funders of the studies. Methodology We systematically searched Medline via PubMed, Scopus, CINAHL, Academic Search Premier, Africa-Wide Information through EBSCOhost, and Web of Science from inception through to 1st April 2020. We included studies that characterized samples from African populations using next-generation sequencing approaches. Two reviewers independently conducted the literature search, title and abstract, and full-text screening, as well as data extraction. Results We included 168 studies out of 5515 records retrieved. Most studies were published in PLoS One (13%; 22/168), and samples were collected from 33 of the 54 African countries. The country where most studies were conducted was South Africa (27/168), followed by Kenya (23/168) and Uganda (18/168). 26.8% (45/168) focused on diseases of significant public health concern in Africa. Collaboration between scientists from the United States of America and Africa was most common (96/168). The first and/or last authors of 79.8% of studies were not affiliated with institutions in Africa. Major funders were the United States of America National Institutes of Health (45.2%; 76/168), Bill and Melinda Gates Foundation (17.8%; 30/168), and the European Union (11.9%; 20/168). Conclusions There are significant gaps in microbiome research in Africa, especially those focusing on diseases of public health importance. There is a need for local leadership, capacity building, intra-continental collaboration, and national government investment in microbiome research within Africa.
Chapter
Clinical metagenomics enables universal pathogen detection by unbiased next-generation sequencing. Recent advances in sequencing and bioinformatics technology have paved the way for increasing adoption of precision diagnostic tests based on metagenomic sequencing. In patients with infectious syndromes that can have many causes (e.g. bloodstream infections, bone and joint infections, meningitis/encephalitis, ocular infections, gastroenteritis, and respiratory tract infections), clinical metagenomics enables comprehensive detection of pathogens and profiling of the local microbiota. In addition, the genetic makeup of pathogens can be characterized in detail informing treatment decisions, infection prevention efforts, and pathogen surveillance. This chapter summarizes technological advances that have enabled integration of clinical metagenomic tests in diagnostic algorithms and discusses remaining challenges that need to be addressed to realize the full potential of this powerful technology.
Article
Purpose of review: Although HIV-1 diversity is a critical barrier to HIV-1 vaccine development, implementing vaccine strategies that directly address HIV-1 genetic specificities has been challenging. Here, we discuss the intersection between HIV-1 phylogenetics and vaccine development. Recent findings: We describe the vaccine regimens that are currently tested in two vaccine efficacy trials and recent research highlighting HIV-1 genetic features that were associated with the development of broadly neutralizing antibodies. Summary: Compared with how widely HIV-1 diversity is recognized as a critical issue for vaccine research, relatively few genetically informed vaccine solutions have been compared, in part because the lack of correlates of protection against HIV-1 limits the ability to develop and test multiple vaccine candidates in a fully rational manner. Yet, recent findings have provided a better understanding of the viral features associated with the development of broad and potent neutralizing antibodies, offering new avenues for engineering vaccine candidates. Future research should also plan to address potential consequences associated with the rollout of an efficacious vaccine, including the possibility of vaccine resistance spreading in the population.
Article
Full-text available
Importance Identifying infectious causes of subacute or chronic meningitis can be challenging. Enhanced, unbiased diagnostic approaches are needed. Objective To present a case series of patients with diagnostically challenging subacute or chronic meningitis using metagenomic next-generation sequencing (mNGS) of cerebrospinal fluid (CSF) supported by a statistical framework generated from mNGS of control samples from the environment and from patients who were noninfectious. Design, Setting, and Participants In this case series, mNGS data obtained from the CSF of 94 patients with noninfectious neuroinflammatory disorders and from 24 water and reagent control samples were used to develop and implement a weighted scoring metric based on z scores at the species and genus levels for both nucleotide and protein alignments to prioritize and rank the mNGS results. Total RNA was extracted for mNGS from the CSF of 7 participants with subacute or chronic meningitis who were recruited between September 2013 and March 2017 as part of a multicenter study of mNGS pathogen discovery among patients with suspected neuroinflammatory conditions. The neurologic infections identified by mNGS in these 7 participants represented a diverse array of pathogens. The patients were referred from the University of California, San Francisco Medical Center (n = 2), Zuckerberg San Francisco General Hospital and Trauma Center (n = 2), Cleveland Clinic (n = 1), University of Washington (n = 1), and Kaiser Permanente (n = 1). A weighted z score was used to filter out environmental contaminants and facilitate efficient data triage and analysis. Main Outcomes and Measures Pathogens identified by mNGS and the ability of a statistical model to prioritize, rank, and simplify mNGS results. Results The 7 participants ranged in age from 10 to 55 years, and 3 (43%) were female. A parasitic worm (Taenia solium, in 2 participants), a virus (HIV-1), and 4 fungi (Cryptococcus neoformans, Aspergillus oryzae, Histoplasma capsulatum, and Candida dubliniensis) were identified among the 7 participants by using mNGS. Evaluating mNGS data with a weighted z score–based scoring algorithm reduced the reported microbial taxa by a mean of 87% (range, 41%-99%) when taxa with a combined score of 0 or less were removed, effectively separating bona fide pathogen sequences from spurious environmental sequences so that, in each case, the causative pathogen was found within the top 2 scoring microbes identified using the algorithm. Conclusions and Relevance Diverse microbial pathogens were identified by mNGS in the CSF of patients with diagnostically challenging subacute or chronic meningitis, including a case of subarachnoid neurocysticercosis that defied diagnosis for 1 year, the first reported case of CNS vasculitis caused by Aspergillus oryzae, and the fourth reported case of C dubliniensis meningitis. Prioritizing metagenomic data with a scoring algorithm greatly clarified data interpretation and highlighted the problem of attributing biological significance to organisms present in control samples used for metagenomic sequencing studies.
Article
Full-text available
In the majority of cases, human immunodeficiency virus type 1 (HIV-1) infection is transmitted through sexual intercourse. A single founder virus in the blood of the newly infected donor emerges from a genetic bottleneck, while in rarer instances multiple viruses are responsible for systemic infection. We sought to characterize the sequence diversity at early infection, between two distinct anatomical sites; the female reproductive tract vs. systemic compartment. We recruited 72 women from Uganda and Zimbabwe within seven months of HIV-1 infection. Using next generation deep sequencing, we analyzed the total genetic diversity within the C2-V3-C3 envelope region of HIV-1 isolated from the female genital tract at early infection and compared this to the diversity of HIV-1 in plasma. We then compared intra-patient viral diversity in matched cervical and blood samples with three or seven months post infection. Genetic analysis of the C2-V3-C3 region of HIV-1 env revealed that early HIV-1 isolates within blood displayed a more homogeneous genotype (mean 1.67 clones, range 1–5 clones) than clones in the female genital tract (mean 5.7 clones, range 3–10 clones) (p<0.0001). The higher env diversity observed within the genital tract compared to plasma was independent of HIV-1 subtype (A, C and D). Our analysis of early mucosal infections in women revealed high HIV-1 diversity in the vaginal tract but few transmitted clones in the blood. These novel in vivo finding suggest a possible mucosal sieve effect, leading to the establishment of a homogenous systemic infection.
Article
Full-text available
We describe a patient with severe and progressive encephalitis of unknown etiology. We performed rapid metagenomic sequencing from cerebrospinal fluid and identified Powassan virus, an emerging tick-borne flavivirus that has been increasingly detected in the United States.
Article
Full-text available
RNA viruses cause significant human pathology and are responsible for the majority of emerging zoonoses. Mainstream diagnostic assays are challenged by their intrinsic diversity, leading to false negatives and incomplete characterisation. New sequencing techniques are expanding our ability to agnostically interrogate nucleic acids within diverse sample types, but in the clinical setting are limited by overwhelming host material and ultra-low target frequency. Through selective host RNA depletion and compensatory protocol adjustments for ultra-low RNA inputs, we are able to detect three major blood-borne RNA viruses - HIV, HCV and HEV. We recovered complete genomes and up to 43% of the genome from samples with viral loads of 104 and 103 IU/ml respectively. Additionally, we demonstrated the utility of this method in detecting and characterising members of diverse RNA virus families within a human plasma background, some present at very low levels. By applying this method to a patient sample series, we have simultaneously determined the full genome of both a novel subtype of HCV genotype 6, and a co-infecting human pegivirus. This method builds upon earlier RNA metagenomic techniques and can play an important role in the surveillance and diagnostics of blood-borne viruses
Article
Full-text available
Research on the human microbiome has yielded numerous insights into health and disease, but also has resulted in a wealth of experimental artifacts. Here, we present suggestions for optimizing experimental design and avoiding known pitfalls, organized in the typical order in which studies are carried out. We first review best practices in experimental design and introduce common confounders such as age, diet, antibiotic use, pet ownership, longitudinal instability, and microbial sharing during cohousing in animal studies. Typically, samples will need to be stored, so we provide data on best practices for several sample types. We then discuss design and analysis of positive and negative controls, which should always be run with experimental samples. We introduce a convenient set of non-biological DNA sequences that can be useful as positive controls for high-volume analysis. Careful analysis of negative and positive controls is particularly important in studies of samples with low microbial biomass, where contamination can comprise most or all of a sample. Lastly, we summarize approaches to enhancing experimental robustness by careful control of multiple comparisons and to comparing discovery and validation cohorts. We hope the experimental tactics summarized here will help researchers in this exciting field advance their studies efficiently while avoiding errors. Electronic supplementary material The online version of this article (doi:10.1186/s40168-017-0267-5) contains supplementary material, which is available to authorized users.
Article
Full-text available
Background HIV-1 subtype C demonstrates several biological properties distinct from other viral subtypes. One such variation is the duplication of PTAP motif in p6 Gag. PTAP motif is a key player in viral budding. Here, we studied the prevalence of PTAP motif duplication in subtype C viral strains in a longitudinal study. Methods In a prospective follow-up study, 65 HIV-1 seropositive drug-naive subjects were monitored in two different clinical cohorts of India for 2 years with repeated sampling at 6-month intervals. The viral RNA was extracted from plasma, the gag segment was amplified and sequenced. From a subset of viral isolates the sequences of pol, env and LTR were sequenced. Using HIV-1 gag amino acid sequences available from public databases and additional sequences derived from the Indian and South-African cohorts, we examined the nature of PTAP motif duplication in subtype C. Results In 16% (8 of 50) of the primary viral strains of India, we identified a sequence duplication of the PTAP motif in Gag p6. The length of the sequence duplication varied from 6 to 14 amino acids in the viral isolates but remained fixed within a subject over a period of 24–36 month follow-up. In the duplicated motif, the core PTAP motif was invariable, but the flanking residues were highly variable. In an acute phase clinical cohort of South Africa, in a subset of 75 subjects, we found the presence of the PTAP duplication at a frequency of 29.3%. An analysis of the gag sequences from the extant databases showed that unlike other subtypes of HIV-1, subtype C has a natural propensity to generate the PTAP motif duplication at a significantly higher frequency and of greater length. Additionally, the global prevalence of PTAP duplication in subtype C appears to be increasing progressively over the past 30 years. Conclusion We showed that in subtype C, the duplication of the PTAP motif in p6 Gag involves sequence stretches of greater length, and at a much higher frequency as compared to other HIV-1 subtypes. Given that subtype C naturally lacks the Alix binding motif, the acquisition of an additional PTAP motif may confer replication advantage on this HIV-1 subtype. Further investigation is warranted to examine the significance of PTAP motif duplication on the replicative fitness of HIV-1.
Article
Full-text available
Importance: Advances in sequencing technology have made it feasible to sequence patient-derived viral samples at a level sufficient for detection of rare mutations. These high-throughput, cost-effective methods are revolutionizing the study of within-host viral diversity. However, these techniques are error prone, and the methods commonly used to control for these errors have not been validated under the conditions that characterize patient-derived samples. Here we show that these conditions affect measurements of viral diversity. We found that the accuracy of previously benchmarked analysis pipelines were greatly reduced under patient-derived conditions. By carefully validating our sequencing analysis using known control samples, we were able to identify biases in our method and improve our accuracy to acceptable levels. Application of our modified pipeline to a set of influenza samples from a cohort study provide a realistic picture of intrahost diversity and suggest the need for rigorous quality control in such studies.
Article
Full-text available
Due to the stringent population bottleneck that occurs during sexual HIV-1 transmission, systemic infection is typically established by a limited number of founder viruses. Elucidation of the precise forces influencing the selection of founder viruses may reveal key vulnerabilities that could aid in the development of a vaccine or other clinical interventions. Here, we utilize deep sequencing data and apply a genetic distance-based method to investigate whether the mode of sexual transmission shapes the nascent founder viral genome. Analysis of 74 acute and early HIV-1 infected subjects revealed that 83% of men who have sex with men (MSM) exhibit a single founder virus, levels similar to those previously observed in heterosexual (HSX) transmission. In a metadata analysis of a total of 354 subjects, including HSX, MSM and injecting drug users (IDU), we also observed no significant differences in the frequency of single founder virus infections between HSX and MSM transmissions. However, comparison of HIV-1 envelope sequences revealed that HSX founder viruses exhibited a greater number of codon sites under positive selection, as well as stronger transmission indices possibly reflective of higher fitness variants. Moreover, specific genetic "signatures" within MSM and HSX founder viruses were identified, with single polymorphisms within gp41 enriched among HSX viruses while more complex patterns, including clustered polymorphisms surrounding the CD4 binding site, were enriched in MSM viruses. While our findings do not support an influence of the mode of sexual transmission on the number of founder viruses, they do demonstrate that there are marked differences in the selection bottleneck that can significantly shape their genetic composition. This study illustrates the complex dynamics of the transmission bottleneck and reveals that distinct genetic bottleneck processes exist dependent upon the mode of HIV-1 transmission.
Article
Background: HIV incidence among young women in sub-Saharan Africa remains high and their inclusion in vaccine and cure efforts is crucial. We aimed to establish a cohort of young women detected during Fiebig stage I acute HIV infection in whom treatment was initiated immediately after diagnosis to advance research in this high-risk group. Methods: 945 women aged 18-23 years in KwaZulu-Natal, South Africa, who were HIV uninfected and sexually active consented to HIV-1 RNA testing twice a week and biological sampling and risk assessment every 3 months during participation in a 48-96 week life-skills and job-readiness programme. We analysed the effect of immediate combination antiretroviral therapy (ART) on viraemia and immune responses, sexual risk behaviour, and the effect of the socioeconomic intervention. Findings: 42 women were diagnosed with acute HIV infection between Dec 1, 2012, and June 30, 2016, (incidence 8·2 per 100 person-years, 95% CI 5·9-11·1), of whom 36 (86%) were diagnosed in Fiebig stage I infection with a median initial viral load of 2·97 log10 copies per mL (IQR 2·42-3·85). 23 of these 36 women started ART at a median of 1 day (1-1) after detection, which limited the median peak viral load to 4·22 log10 copies per mL (3·27-4·83) and the CD4 nadir to 685 cells per μL (561-802). ART also suppressed viral load (to <20 copies per mL) within a median of 16 days (12-26) and, in 20 (87%) of 23 women, prevented seroconversion, as shown with western blotting. 385 women completed the 48 week socioeconomic intervention, of whom 231 were followed up for 1 year. 202 (87%) of these 231 women were placed in jobs, returned to school, or started a business. Interpretation: Frequent HIV screening combined with a socioeconomic intervention facilitated sampling and risk assessment before and after infection. In addition to detection of acute infection and immediate treatment, we established a cohort optimised for prevention and cure research. Funding: Bill & Melinda Gates Foundation, National Institute of Allergy and Infectious Diseases, International AIDS Vaccine Initiative, Wellcome Trust, Howard Hughes Medical Institute.
Article
Elevated inflammation in the female genital tract is associated with increased HIV risk. Cervicovaginal bacteria modulate genital inflammation; however, their role in HIV susceptibility has not been elucidated. In a prospective cohort of young, healthy South African women, we found that individuals with diverse genital bacterial communities dominated by anaerobes other than Gardnerella were at over 4-fold higher risk of acquiring HIV and had increased numbers of activated mucosal CD4⁺ T cells compared to those with Lactobacillus crispatus-dominant communities. We identified specific bacterial taxa linked with reduced (L. crispatus) or elevated (Prevotella, Sneathia, and other anaerobes) inflammation and HIV infection and found that high-risk bacteria increased numbers of activated genital CD4⁺ T cells in a murine model. Our results suggest that highly prevalent genital bacteria increase HIV risk by inducing mucosal HIV target cells. These findings might be leveraged to reduce HIV acquisition in women living in sub-Saharan Africa.