ArticlePDF Available

Human Cytomegalovirus Genomes Sequenced Directly From Clinical Material: Variation, Multiple-Strain Infection, Recombination, and Gene Loss

Authors:
MAJOR ARTICLE
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 1
The Journal of Infectious Diseases
Received 21 February 2019; editorial decision 17 April 2019; accepted 24 April 2019; published
online May 2, 2019.
Presented in part: Seventh International Congenital Cytomegalovirus (CMV) Conference and
17th International CMV Workshop, Birmingham, Alabama, April 2019.
Published as a bioRxiv preprint on 23 December 2018 and revised on 18 February 2019
(https://doi.org/10.1101/505735).
aN. M.S.and G.S. W.contributed equally to this work.
Present affiliations: bIllumina, Scoreseby, Victoria, Australia; cSGS Vitrology Ltd, Glasgow,
United Kingdom; dIT Services–Business Systems Team, University of Glasgow, United Kingdom.
Correspondence: Andrew J.Davison, MRC–University of Glasgow Centre for Virus Research,
Sir Michael Stoker Bldg, 464 Bearsden Road, Glasgow G61 1QH, UK (andrew.davison@
glasgow.ac.uk)
The Journal of Infectious Diseases® 2019;XX(XX):1–11
© The Author(s) 2019. Published by Oxford University Press for the Infectious Diseases Society
of America. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted
reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
DOI: 10.1093/infdis/jiz208
Human Cytomegalovirus Genomes Sequenced Directly
From Clinical Material: Variation, Multiple-Strain
Infection, Recombination, and GeneLoss
NicolásM. Suárez,1,a GavinS. Wilkie,1,a,b Elias Hage,2,3 Salvatore Camiolo,1 Marylouisa Holton,1,c Joseph Hughes,1, Maha Maabar,1,d SreenuB. Vattipally,1
Akshay Dhingra,2 UrsulaA. Gompels,4 GavinW.G. Wilkinson,5 Fausto Baldanti,6,7 Milena Furione,6 Daniele Lilleri,8 Alessia Arossa,9
Tina Ganzenmueller,2,3,10 Giuseppe Gerna,8 Petr Hubáček,11 ThomasF. Schulz,2,3 Dana Wolf,12 Maurizio Zavattoni,6 and AndrewJ. Davison1,
1Medical Research Council–University of Glasgow Centre for Virus Research, United Kingdom; 2Institute of Virology, Hannover Medical School, and 3German Center for Infection Research,
Hannover-Braunschweig site; 4Pathogen Molecular Biology Department, London School of Hygiene and Tropical Medicine, and 5Division of Infection and Immunity, School of Medicine, Cardiff
University, United Kingdom; 6Molecular Virology Unit, Microbiology and Virology Department, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Policlinico San Matteo,
7Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, and 8Laboratory of Genetics-Transplantology and Cardiovascular Diseases, and 9Departments of Obstetrics
and Gynecology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; 10Institute for Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany; 11Department
of Medical Microbiology, Motol University Hospital, Prague, Czech Republic; and 12Clinical Virology Unit, Department of Clinical Microbiology and Infectious Diseases, Hadassah University
Hospital, Jerusalem, Israel
e genomic characteristics of human cytomegalovirus (HCMV) strains sequenced directly from clinical pathology samples were
investigated, focusing on variation, multiple-strain infection, recombination, and gene loss. Atotal of 207 datasets generated in this
and previous studies using target enrichment and high-throughput sequencing were analyzed, in the process enabling the determi-
nation of genome sequences for 91 strains. Key ndings were that (i) it is important to monitor the quality of sequencing libraries
in investigating variation; (ii) many recombinant strains have been transmitted during HCMV evolution, and some have apparently
survived for thousands of years without further recombination; (iii) mutants with nonfunctional genes (pseudogenes) have been
circulating and recombining for long periods and can cause congenital infection and resulting clinical sequelae; and (iv) intrahost
variation in single-strain infections is much less than that in multiple-strain infections. Future population-based studies are likely to
continue illuminating the evolution, epidemiology, and pathogenesis of HCMV.
Keywords. human cytomegalovirus; genome sequence; target enrichment, genotype; variation; multiple-strain infection;
recombination; gene loss; mutation.
Human cytomegalovirus (HCMV) poses a risk, particularly to
people with immature or compromised immune systems, and
can have serious outcomes in congenitally infected children,
transplant recipients, and people with human immunodefi-
ciency virus/AIDS. Prior to the advent of high-throughput
technologies, studies of HCMV genomes in natural infections
were limited to Sanger sequencing of polymerase chain reac-
tion (PCR) amplicons, often focusing on a small number of
polymorphic (hypervariable) genes [1]. This left out most
of the genome and also restricted the characterization of
multiple-strain infections, which may have more serious
outcomes.
e rst complete HCMV genome sequence to be determined
was that of the high-passage strain AD169 [2], from a plasmid
library. Over a decade later, additional genomes were sequenced
from bacterial articial chromosomes [3–5], virion DNA [6] and
overlapping PCR amplicons [7, 8]. ese sequences were also de-
termined using Sanger technology, and were complemented sub-
sequently by many others, increasingly using high-throughput
methods [7, 9–13]. With only 3 exceptions [7, 11], all were de-
rived from laboratory strains isolated in cell culture. Mounting
evidence of the existence of multiple-strain infections and the
propensity of HCMV to mutate during cell culture [6–8, 14, 15]
added impetus to sequencing genomes directly from clinical ma-
terial to dene natural populations. One strategy for this involves
sequencing overlapping PCR amplicons [7, 16]. Another utilizes
an oligonucleotide bait library representing known HCMV di-
versity to select target sequences from random DNA fragments.
is target enrichment technology originated in commercial kits
for cellular exome sequencing, and was subsequently applied to
applyparastyle “g//caption/p[1]” parastyle “FigCapt”
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
2 • JID 2019:XX (XX XXXX) • Suárez etal
various pathogens [17, 18], including HCMV [19–21]. We have
applied it to HCMV since 2012 and have systematically released
via GenBank many genome sequences that have proved pivotal
in other studies [11, 12, 19–21].
e HCMV genome exhibits several evolutionary phe-
nomena, including variation, multiple-strain infection, recom-
bination, and gene loss, all of which were discovered prior to
high-throughput sequencing and have since been illuminated
by this technology (early references are [22–26]). We explore
these and other key genomic features of HCMV, with an em-
phasis on the strains present in clinical material.
METHODS
Samples
For convenience, samples were analyzed as collections
1–3, which are summarized in Table 1 and described in
Supplementary Tables 1–3, respectively. Collection 3 represents
samples sequenced by others in previous studies using target
enrichment with a different oligonucleotide bait library. The
features of the samples are shown in Supplementary Tables 1–3
(rows 3–6), and the clinical outcomes of congenital infection
are in Supplementary Table 1 (row205).
DNA Sequencing
Target enrichment and sequencing library preparation were
performed using the SureSelect XT version 1.7 system for
Illumina paired-end libraries with biotinylated RNA bait
libraries (Agilent) [21]. Bait libraries representing known
HCMV diversity were designed in February 2012 and April
2014 from 31 and 64 complete genome sequences, respec-
tively. Information on and access to the latter library (55210
baits of 120 nucleotides [nt] with overrepresentation of G
+ C–rich regions) are available from the corresponding au-
thor. Data on viral loads and library construction are shown
in Supplementary Tables 1–3 (rows 9–12). Datasets of 300
or 150 nt paired-end reads were generated using a MiSeq
(Illumina). Their names are shown in Supplementary Tables
1–3 (row 7). They were prepared for analysis using Trim
Galore version 0.4.0 (program available at http://www.bioin-
formatics.babraham.ac.uk/projects/trim_galore/; length = 21,
quality= 10, and stringency= 3). The numbers of trimmed
reads are in Supplementary Tables 1–3 (row 15).
Library Diversity
Estimating the number of reads in a dataset derived from
unique HCMV fragments initially involved using Bowtie2 ver-
sion 2.2.6 [29] to align the reads against the strain Merlin se-
quence (GenBank accession number AY446894.2), and, where
it could be determined, the consensus genome sequence derived
from the dataset. The relevant data are in Supplementary Tables
1–3 (rows 17–19 and 23–26). Reads containing insertions or
deletions were removed to preserve coordinate numbering, as
Table 1. Selected Characteristics on Sample Collections 1–3
Characteristic Collection 1 Collection 2 Collection 3
Patients, No.a48 29 25
Patient condition Congenital infection Mostly transplant recipients Various
Samples, No. 53 89 57
Sample source, city (prefix) Pavia (PAV), Jerusalem
(JER), Prague (PRA)
Hannover (Child, RTR, SCTR),
Pavia (PAV)
Rotterdam (Rot),
London (Lon, Pat_)
Datasets, No. 53 97b57c
Duplicated libraries, No. 0 7 0
HCMV load, IU/µLd26–559968 5–194840 104–18377
Genome copies for library, No.e225–8399520 280–3896800 Unknown
Reads in Merlin alignment, % 2–91 0–85 0–90
Coverage ratio in Merlin alignment, % unique/total reads 0.40–83.12 0.00–76.09 0.00–90.21
Genome sequences determined, No.f42 25 24
Details are provided in Supplementary Tables 1–3.
Abbreviation: HCMV, human cytomegalovirus.
aArchived diagnostic samples were used, and clinical data were retrieved, with the approval of the institutional review boards of Policlinico San Matteo, Pavia (reference numbers 35853/2010
and 35854/2010), Hadassah University Hospital, Jerusalem (reference number HMO-063911), Motol University Hospital, Prague (reference number EK-701a/16) and Hannover Medical
School, Hannover (reference number 2527-2014).
bWe reported 68 of the Hannover datasets previously [21].
cThese datasets were reported previously by others, and were either provided by the authors [19] or downloaded from the European Nucleotide Archive (study PRJEB12814) [20].
dViral load in most extracted samples was quantified in the laboratory of origin or the sequencing laboratory. In some instances, the entire sample was used blind to generate a sequencing
library.
eAssumes that 1 IU is equivalent to 1 genome copy.
fThe trimmed paired-read data were aligned to the UCSC hg19 human reference genome (http://genome.ucsc.edu/) using Bowtie2. Nonmatching reads were assembled de novo into contigs
using SPAdes version 3.5.0 [27]. The contigs were ordered using Scaffold_builder version 2.2 [28] by reference to a version of the strain Merlin sequence lacking all but 100 nt of the terminal
repeat regions (TRL at the left end and TRS at the right end; Figure 1), and merged into a draft genome sequence. Residual gaps were filled by identifying relevant reads anchored in flanking
regions and assembling them manually in a reiterative fashion. TRL and TRS were reinstated, and the complete genome sequence was verified by aligning it against the read data using
Bowtie2 and inspecting the alignment in Tablet. An annotated genome sequence was produced using Sequin (https://www.ncbi.nlm.nih.gov/Sequin/).
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 3
were duplicate read pairs sharing both end coordinates and
duplicate unpaired reads sharing one end coordinate, thereby
producing an alignment file for unique reads derived from
unique HCMV fragments (program available at https://centre-
for-virus-research.github.io/VATK/AssemblyPostProcessing).
This file was viewed using Tablet version 1.14.11.7 [30]. The
coverage depth values for total and unique fragment reads are
in Supplementary Tables 1–3 (rows 20–21 and 27–28).
Strain Enumeration
The number of strains represented in a dataset was estimated by
2 strategies: genotype read-matching and motif read-matching
(program available at https://centre-for-virus-research.github.
io/VATK/HCMV_pipeline). Both strategies utilized datasets
concatenated from the paired-end datasets. The genotype
designations used were either based on reported phylogenies
[6, 12, 25, 31, 32], amended or extended as appropriate, or
constructed afresh using Clustal Omega version 1.2.4 [33] and
MEGA version 6.0.6 [34] with data for the genomes listed in
Supplementary Table 4 and individual genes for which addi-
tional sequences were available in GenBank. Alignments and
phylogenetic reconstructions are in Supplementary Figures 1
and 2, respectively.
For genotype read-matching, Bowtie2 was used to align the
reads to sequences representing the genotypes of 2 hypervariable
genes, UL146 and RL13 [6, 12, 35]. e sequences from the en-
tire coding region of UL146 and the central coding region of
RL13 are in Supplementary Tables 1–3 (rows 34–58). In contrast
to the UL146 genotypes, the RL13 genotypes cross-matched
within 4 groups (G1, G2, G3; G4A, G4B; G6, G10; and G7, G8).
In these instances, the genotype within the group with most
matching reads was scored. e number of reads aligned to each
genotype is in Supplementary Tables 1–3 (rows 34–58). Ageno-
type was scored if the number of reads was >10 and represented
>2% of the total number detected for all genotypes of that gene.
For 14 samples in collection 1 that had been sequenced prior to
the availability of ultrapure (TruGrade) oligonucleotides, these
values were >25 and >5%, respectively. e number of strains in
a sample was scored as the greater of the numbers of genotypes
detected for the 2 target genes, and is in Supplementary Tables
1–3 (row13).
For motif read-matching, conserved genotype-specic motifs
(20–31 nt) were identied by visual inspection of alignments
(Supplementary Figure 1) for 12 hypervariable genes [6, 12, 19,
35]. Additional motifs for identifying common intergenotypic
recombinants were included. e motif sequences and number
of reads containing perfect matches to a sequence or its reverse
complement are in Supplementary Tables 1–3 (rows 60–170).
Genotypes were scored as described above. e number of
strains in a sample was estimated as the maximum number of
genotypes detected for at least 2 genes, and is in Supplementary
Tables 1–3 (row 14).
Pseudogene Analysis
The genomes of some HCMV strains exhibit gene loss apparent
as pseudogenes resulting from mutations causing premature
translational termination [7, 11, 12, 26]. These mutations are
substitutions that introduce in-frame stop codons or ablate
splice sites, or insertions or deletions that cause frameshifting
or loss of protein-coding regions. Motif read-matching was
used to assess the presence of common mutations and also to
determine the prevalence of mutations identified in collection
1.These data are in Supplementary Tables 1–3 (rows 171–178)
and Supplementary Table 1 (rows 180–203), respectively.
Intrahost Variation
Minor genome populations were analyzed by enumerating
single-nucleotide polymorphisms (SNPs) in datasets for which
consensus genome sequences had been determined. Thus, the
term mutant applies hereafter to a strain that has a mutation in
the consensus sequence resulting in a pseudogene, and the term
SNP applies to a minor variation from the consensus within
a population. To enumerate SNPs, original datasets were pre-
pared for analysis using Trim Galore (length=100, quality=30,
and stringency = 1), and trimmed reads were mapped using
Bowtie2. Alignment files in SAM format were converted into
BAM format, sorted using SAMtools version 1.3 [36], and
analyzed using LoFreq version 2.1.2 [37] and V-Phaser 2 [38].
Data Deposition
Original datasets were purged of human reads and deposited
in the European Nucleotide Archive (ENA; project number
PRJEB29585), and consensus genome sequences were deposited
in GenBank. The accession numbers are in Supplementary
Tables 1–3 (rows 8 and 29, respectively). Updated genome se-
quence determinations in collection 3 were deposited by the
original submitters in GenBank [19] or by us as third-party
annotations in ENA (project number PRJEB29374) [20].
Sequence features are in Supplementary Tables 1–3 (rows
30–32).
RESULTS
Operational Limitations
A total of 207 datasets from 199 samples and 102 individuals
were analyzed (Table 1 and Supplementary Tables 1–3). Library
quality was represented in the percentage of HCMV reads and
the coverage depth by unique fragment reads. These values were
related to sample type, being higher for urine than blood pre-
sumably because of a higher proportion of viral to host DNA.
They also depended on the number of viral genome copies used
to make the library, with >1000 copies generally being needed
to determine a complete genome sequence. However, despite
high library diversity, it was not possible to assemble complete
genome sequences from most datasets in collection 3 because of
gaps in RL12 and some G + C–rich regions, perhaps as a result
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
4 • JID 2019:XX (XX XXXX) • Suárez etal
of limitations in the bait library. The use of excessive PCR cycles
with some samples in collections 1 and 2 led to high coverage
depth by total fragment reads but low coverage depth by unique
fragment reads, and thus to highly clonal libraries (eg, PAV2 in
collection 1). Genotypes present at subthreshold levels may rep-
resent multiple-strain infections or cross-contamination during
the complex sample processing pathway (eg, PRA4 reads in
PRA6A in collection 1).
Genome Sequences
A total of 91 complete or almost complete HCMV genome
sequences were determined (Table 1). We reported 5 previ-
ously [21], and 16 are improvements on published sequences
[19]. Most originated from single-strain infections or multiple-
strain infections in which one strain was predominant, and
some originated from different strains that predominated in a
patient at different times. Defining a strain as a viral genome
present in an individual, these 91 sequences, plus an addi-
tional 49 deposited by our group and 104 by others, brought
the number of strains sequenced to 244 (Supplementary Table
4). Of these, 91 were sequenced directly from clinical material,
and all but one were determined in this and our previous study
[21]. The average size of the HCMV genome, based on the 78
complete sequences in this set, is 235465bp (range234316–
237120 bp).
Multiple-Strain Infections
Genotypic differences in hypervariable genes (Figure 1 and
Supplementary Figures 1 and 2) were exploited to distinguish
single-strain from multiple-strain infections by genotype read-
matching and motif read-matching with threshold values. To
our knowledge, these methods, employed in the present work
and the companion study [39], have not been used previously
for categorizing HCMV infections. Single strains were common
in congenitally infected patients (n=43/50 in collections 1 and
2), but significantly less so in transplant recipients (n=11/25 in
collections 2 and 3; χ2=14.583, P<.05). Intrahost variation is
discussedbelow.
Recombination
The 244 genome sequences were genotyped in the 12
hypervariable genes used for motif read-matching and then in 5
additional genes (Figure 1 and Supplementary Table 4).
Hypervariation in UL55, which encodes glycoprotein B
(gB), is located in 2 regions (UL55N near the N terminus, and
UL55X encompassing the proteolytic cleavage site) [23, 40].
Five genotypes (G1–G5) have been assigned to each region [23,
40–42], which are separated by 927bp that are 80% identical
in all strains. All genomes had a recognized UL55X genotype
(Supplementary Table 5). As reported previously [40], UL55N
G2 and G3 could not be distinguished reliably from each other,
and 2 additional genotypes (G6–G7) were detected that may
have arisen from ancient recombination events within UL55N
(Supplementary Tables 4 and 5 and Supplementary Figure 1).
ere was evidence for recombination in the region between
UL55N and UL55X in only 8 genomes. is low proportion of
recombination (3.3%) contrasts with the higher levels proposed
6050403020100
RNA2.7 RNA1.2
RL8A
RL9A UL2UL21A
UL23
UL24UL22AUL19
UL18
UL17
UL16
UL15A
UL14
UL13UL10UL7
UL8
UL6
UL5
UL4
RL11
RL10RL1
UL26
UL27 UL29 UL30
UL30A
UL32 UL36
UL38
UL40
UL41A
UL42
UL43
UL44
UL45UL35UL34UL31UL25RL5A
RL6 UL20UL1
RL13
RL12
UL11
UL9
UL37UL33
12011010090807060
UL46
UL48A
UL49 UL50
UL51
UL54 UL57UL53UL52UL48UL47 UL69 UL70 UL71
UL72
UL75
UL79
UL82UL80
UL80.5
UL78
UL77
UL76
UL74A
RNA4.9UL55 UL74
UL73
UL56
180170160150140130120
UL83 UL84 UL85 UL86 UL89 UL102UL100
UL99
UL98UL97
UL96
UL95
UL94
UL93
UL92
UL91UL88
UL87 UL105
UL103
UL104
UL114
UL115
UL116
UL117
UL119
UL121
UL123
UL122 UL128
UL130
UL131A
UL132
UL148
UL124UL112
UL111A
RNA5.0
UL120
230 kbp220210200190180
UL147A
UL147
UL145
UL144
UL142
UL141
UL140
UL138
UL136
UL135
UL133
UL148A
UL148B
UL150
US1
US2
US3
US6
US7
US8US10
US11
US12
US13
US14
US15
IRS1
UL150A
UL148D
UL148C
US16 US18
US17 US19
US20
US21
US22 US23 US24 US26 TRS1US34A
US34
US33A
US32
US31
US30
US29
US28
US27
UL146 UL139
US9
Figure 1. Locations in the human cytomegalovirus strain Merlin genome of genes used for genotyping. The genome consists of 2 unique regions, UL (1325–194343bp)
and US (197627–233108bp), the former flanked by inverted repeats TRL (1–1324bp) and IRL (194344–195667bp), and the latter flanked by inverted repeats IRS (195090–
197626bp) and TRS (233109–235646bp). Protein-coding regions are indicated by shaded arrows, and noncoding RNAs as narrower, white arrows, with gene nomenclature
below. Introns are shown as narrow white bars. The 12 genes (RL5A, RL6, RL12, RL13, UL1, UL9, UL11, UL73, UL74, UL120, UL146, and UL139) used for motif read-matching
are in dark gray (red in online version). Two of these genes (RL13 and UL146) were also used for genotype read-matching. The additional 5 genes (UL20, UL33, UL37, UL55,
and US9) used to genotype sequences by alignment are medium gray (orange in online version). All other genes are shown in white (pink in online version).
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 5
in UL55 from PCR-based studies [40, 43], which may have been
aected by artefactual recombination.
UL73 and UL74, which encode glycoproteins N and O (gN
and gO), respectively, are adjacent hypervariable genes that
exist as 8 genotypes each [25, 32, 44]. ere was evidence for
recombination between them in only 7 genomes (2.9%), in ac-
cordance with the low levels (2.2%) detected previously in PCR-
based studies [25, 32, 45]. In the region containing adjacent
hypervariable genes RL12, RL13, and UL1, recombinants were
also rare (1.2%) within RL12 and absent from RL13 and UL1.
In contrast, hypervariable genes UL146 and UL139, which en-
code a CXC chemokine and a membrane glycoprotein, respec-
tively, are separated by a well-conserved region of over 5 kbp.
e number (66) of the 126 possible genotype combinations
represented in the 244 genomes is too large to allow any under-
lying genotypic linkage to be discerned, consistent with previous
conclusions from PCR-based studies [31]. No recombinants
were noted withinUL146.
In principle, strains in multiple-strain infections have the
opportunity to recombine. In our previous analysis of RTR1 in
collection 2, we noted that one strain (RTR1A) predominated
at earlier times and another (RTR1B) at later times [21]. From
the low frequency of SNPs across a large part of the genome,
we concluded that the second strain had arisen either by re-
combination involving the rst strain or by reinfection with, or
reactivation of, a second strain fortuitously similar to the rst.
In the present study, recombination was strongly supported by
a comparison of the 2 genome sequences, which showed that
approximately two-thirds of the genome is almost identical
(diering by 3 substitutions in noncoding regions), whereas the
remaining third is highly dissimilar.
To investigate whether strains have been transmitted
without recombination occurring, identical genotypic
constellations were identied among the 244 genomes (Table
2). is revealed the existence of 12 haplotype groups within
which multiple strains lack signs of having recombined since
diverging from their last common ancestor; these are hence-
forth termed nonrecombinant strains. As an incidental out-
come, the 2 strains in group 1 (PRA8 and CZ/3/2012), which
were characterized in dierent studies, were conrmed as
having originated from the same patient, reducing the set of
sequenced strains to 243. e results from the other 11 groups
suggest that nonrecombinant strains have been circulating,
some for periods sucient to allow the accumulation of >100
substitutions. Among the highly divergent groups, group 9 (3
strains) exhibited 135 dierences, with the 50 that would af-
fect protein coding distributed among 38 genes, and group 10
(2 strains) exhibited 138 dierences, with the 38 that would
aect protein coding distributed among 27 genes. No obvious
bias was observed toward greater diversity in any particular
gene or group of genes, including those in the hypervariable
ca tegor y.
Pseudogenes
Among the strains sequenced from clinical material, 77%
are mutated in at least one gene (compared with 79% among
all sequenced strains), and one is mutated in as many as 6
genes (Pat_D in collection 3) (Supplementary Table 4). The
most frequently mutated genes are UL9, RL5A, UL1 and RL6
(members of the RL11 family), US7 and US9 (members of the
US6 gene family), and UL111A (encoding viral interleukin
10)(Table3). In addition, there was evidence from the PAV6
datasets (collection 1)for maternal transmission of a US7 mu-
tant (Supplementary Table 1), and from PCR data (not shown)
for maternal transmission of a UL111A mutant to PAV16 (col-
lection 1). Focusing on the most common mutations, strains
in which UL9, RL5A, UL1, US9, US7, and UL111A were af-
fected (singly or in combination) were, like strains that were not
mutated in any gene, transmitted in congenital infections and,
in some cases, linked to defects in neurological development
(Supplementary Table 1).
Intrahost Diversity
LoFreq and V-Phaser analyses showed that single-strain
infections contained markedly fewer SNPs (median values of 60
and 140, respectively) than multiple-strain infections (median
values of 2444 and 2955, respectively; Figure 2). The differences
between the values for single- and multiple-strain infections
were significant (Kruskal–Wallis rank-sum test; LoFreq:
χ2=67.918, P<2.2× 10-16; V-Phaser: χ2= 63.536, P= 1.6 ×
10-15).
DISCUSSION
Advances in high-throughput sequencing technology have
made it possible to generate a wealth of viral genome informa-
tion directly from clinical material. However, operational
limitations should be registered. These include sample charac-
teristics (source, viral content and presence of multiple strains),
confounding factors (technical limitations, logistical errors and
cross-contamination), design of the bait library (ability to en-
rich all strains and acquire data across the genome), and quality
and extent of the sequencing data (library diversity and coverage
depth). Since perceived levels of intrahost variation are partic-
ularly sensitive to these factors, we proceeded cautiously with
this aspect. However, as indicated in our previous study [21],
it is clear that the number of SNPs in single-strain infections
was markedly less than that in multiple-strain infections. It was
also far less than that reported by others in samples from con-
genital infections [16]. The factors listed above may have been
responsible for the outliers observed in single-strain infections;
for example, the PAV6 (collection 1) library was made using
non-TruGrade oligonucleotides, RTR6B (collection 2) had a
low coverage depth and also came from a patient from whom
other samples contained multiple strains, and CMV-35 (collec-
tion 3) may have contained subthreshold levels of additional
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
6 • JID 2019:XX (XX XXXX) • Suárez etal
Table 2. Groups of Nonrecombinant Strains
Genotypesa
Group Strain
RL5A
RL6
RL12
RL13
UL1
UL9
UL11
UL20
UL33
UL37
UL55N
UL73
UL74
UL120
UL146
UL139
US9
Mutated Genes DifferencesbShared Mutations
1 PRA8 1 1 6 6 6 6 2 5 1 5 2/3 4C 1C 2B 1 4 1 UL145 0 These strains share a UL145 mutation, were characterized
in different studies, and were confirmed a s having b een
derived from the same patient
CZ/3/2012 1 1 6 6 6 6 2 5 1 5 2/3 4C 1C 2B 1 4 1 UL145
2 BE/3/2011 2 4 1B 1 1 4 1 6 2 2 2/3 4A 3 1A 8 2 1 None 1 None
BE/21/2011 2 4 1B 1 1 4 1 6 2 2 2/3 4A 3 1A 8 2 1 None
3 UK/Lon6/Urine/2011 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None 23 None
2CEN15 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None
BE/5/2012 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None
4 BE/14/2012 3 5 4A 4A 4 6 6 5 4 5 4 3A 1B 2A 9 5 1 RL6 UL9 UL40 US7 26 These strains share a UL9 mutation and also RL6 and UL40
mutations that are present in other strains
BE/36/2011 3 5 4A 4A 4 6 6 5 4 5 4 3A 1B 2A 9 5 1 RL6 UL9 UL40
5 BE/10/2012 6 3 1A 1 1 1 1 2 4 5 4 3A 1B 2A 3 7 1 None 35 None
BE/26/2011 6 3 1A 1 1 1 1 2 4 5 4 3A 1B 2A 3 7 1 None
6 BE/1/2011 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL1 UL9 65 These strains bear a UL9 mutation that is present in other
strains, and 2 strains share a UL1 mutation
BE/8/2010 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL9
BE/9/2012 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL1 UL9
7 NAN1LA 3 5 5 5 5 7 3 2 2 6 2/3 4D 5 3B 7 5 2 RL6 US9 73 These strains share RL6 and US9 mutations that are present
in other strains
BE/6/2012 3 5 5 5 5 7 3 2 2 6 2/3 4D 5 3B 7 5 2 RL6 US9 US27
8 BE/7/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 RL5A RL13 UL150 125 These strains share a UL150 mutation that is present in
other strains
BE/11/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150
BE/16/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150
BE/26/2010 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150
BE/30/2011 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150
9 JER851 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL1 UL9 UL111A 135 These strains share a UL111A mutation that is present in
another strain
JER4041 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL111A
BE/25/2010 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL111A
10 JER5695 1 1 7 7 7 1 1 6 2 1 2/3 3B 2A 4B 13 2 1 UL9 UL111A 138 These strains share a UL111A mutation that is present in
other strains, and have different UL9 mutations
BE/15/2010 1 1 7 7 7 1 1 6 2 1 2/3 3B 2A 4B 13 2 1 RL1 UL9 UL111A
11 PRA7 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A 143 These strains share RL5A and UL111A mutations that are
present in other strains
JP 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A
BE/4/2010 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A
12 BE/6/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 UL9 155 Two strains share a UL9 mutation that is present in other
strains
BE/18/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 None
BE/27/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 UL9
aSee Supplementary Figures 1 and 2 for genotype definitions. G prefix omitted.
bTotal number of differences among all strains in the group, not including size variations in tandem repeats. To exclude repeat regions, sequences were aligned from the TATA box of RL1 to the end of US, omitting the region from the AATAAA polyadenylation
signal of UL150A to the beginning of TRS.
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 7
strains or cross-contaminants. In our view, accurate estimates
of the levels of intrahost variation in single-strain infections are
not available from the present and previous studies, and will
require sequencing and bioinformatic approaches that are de-
monstrably reliable, robust, and reproducible [46, 47].
Whole-genome analyses have conrmed the signicant role
of recombination during HCMV evolution reported in nu-
merous earlier studies [12, 19]. Recombination has occurred
over a very long period but nonetheless remains limited in ex-
tent, with surviving events being more numerous in long re-
gions, less numerous in short regions, and rare or absent in
hypervariable regions, consistent with the role of homologous
recombination. Recombination frequency may be restricted in
some circumstances by functional interdependence within the
same protein (eg, gB) or possibly between separate proteins (eg,
gN and gO [25, 32, 44]). However, it is not known whether dif-
ferential recombination due to sequence relatedness is of general
biological signicance for the virus. Also, strains have circulated
that seem not to have recombined for long periods. Application
of an evolutionary rate estimated for herpesviruses (3.5 × 10−8
substitutions/nt/year) [48] implies that these periods may have
extended to many thousands of years. Moreover, as suggested
by the lack of diversity within genotypes in comparison with the
marked diversity among them, the distribution of substitutions
Table 3. Mutated Genes in Order of Decreasing Frequency
Gene Feature(s)
Strains Mutated, No.aStrains Mutated, %a
PassagedbClinicalcAlldPassagedbClinicalcAlld
UL9 RL11 family; type 1 membrane protein 50 31 81 32.89 34.07 33.33
RL5A RL11 family 31 27 58 20.39 29.67 23.87
UL1 RL11 family; type 1 membrane protein 20 18 38 13. 16 19.78 15.64
RL6 RL11 family 23 14 37 15.13 15.38 15.23
US9 US6 family; type 1 membrane protein 26 11 37 1 7. 1 1 12.09 15.23
UL111A Viral interleukin-10 16 7 23 10.53 7.69 9.47
UL150 Unknown 11 314 7.24 3.30 5.76
US7 US6 family; type 1 membrane protein 7 7 14 4.61 7.69 5.76
UL40 Type 1 membrane protein 8 2 10 5.26 2.20 4.12
UL30 UL30 family 2 3 5 1.32 3.30 2.06
UL142 MHC family; type 1 membrane protein 2 3 5 1.32 3.30 2.06
RL12 RL11 family; type 1 membrane protein 3 1 4 1.97 1. 1 0 1.65
RL1 RL1 family 1 2 3 0.66 2.20 1.23
UL136 Potential transmembrane domain 3 0 3 1.97 0.00 1.23
US13 US12 family; type 3 membrane protein 3 0 3 1.97 0.00 1.23
UL133 Potential transmembrane domain 2 0 2 1.32 0.00 0.82
US6 US6 family; type 1 membrane protein 1 1 2 0.66 1. 1 0 0.82
US8 US6 family; type 1 membrane protein 0 2 2 0.00 2.20 0.82
US27 GPCR family; type 3 membrane protein 2 0 2 1.32 0.00 0.82
UL11 RL11 family; type 1 membrane protein 1 0 1 0.66 0.00 0.41
UL13 Unknown 0 1 1 0.00 1. 1 0 0.41
UL14 UL14 family; type 1 membrane protein 0 1 1 0.00 1. 1 0 0.41
UL15A Potential transmembrane domain 0 1 1 0.00 1. 10 0.41
UL20 Type 1 membrane protein 1 0 1 0.66 0.00 0.41
UL43 US22 family 0 1 1 0.00 1. 10 0.41
UL99 Envelope-associated protein 1 0 1 0.66 0.00 0.41
UL148 Type 1 membrane protein 1 0 1 0.66 0.00 0.41
UL147 CXCL family 1 0 1 0.66 0.00 0.41
UL145 Unknown 0 1 1 0.00 1. 10 0.41
UL150A Unknown 1 0 1 0.66 0.00 0.41
IRS1 US22 family 1 0 1 0.66 0.00 0.41
US1 US1 family 1 0 1 0.66 0.00 0.41
US12 US12 family; type 3 membrane protein 1 0 1 0.66 0.00 0.41
US19 US12 family; type 3 membrane protein 0 1 1 0.00 1. 1 0 0.41
Abbreviations: CXCL, chemokine (CXC motif) ligand; GPCR, G protein–coupled receptor; MHC, major histocompatibility complex.
aOmitting mutations that occurred in RL13, UL128, UL130, and UL131A probably during passage, or that were engineered during bacterial artificial chromosome construction.
bStrains sequenced from strains passaged in cell culture, not taking into account the minority of mutations confirmed from the clinical samples (n=152, excludes CZ/3/2012, which is the
same strain as PRA8).
cStrains sequenced directly from clinical material (n=91).
dStrains sequenced directly from clinical material or passaged virus (n=243).
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
8 • JID 2019:XX (XX XXXX) • Suárez etal
in nonrecombinant strains ts with the view that intense diver-
sication of the hypervariable genes occurred early in human or
pre–human history [25, 31] and has long since ceased.
Assessing the extent to which recombinants arise and sur-
vive in individuals with multiple-strain infections is problem-
atic. Except where populations uctuate signicantly and are
sampled serially (eg, RTR1 in collection 2), it is dicult to ap-
proach this using short-read data, as they are based on PCR
methodologies prone to generating recombinational artefacts.
Long- or single-read sequencing technologies and demonstrably
reliable bioinformatic approaches are needed. Also, conclusions
drawn from transplant recipients, who are immunosuppressed
and in whom HCMV populations may be diversied by trans-
plantation from HCMV-positive donors or selected with an-
tiviral drugs, are unlikely to represent other situations, such
maternal transmission via breast milk [39].
Evidence for pseudogenes was largely derived previously from
strains isolated in cell culture, and it was unclear to what extent
0
1000
2000
3000
4000
5000
B
Number of variants
0
1000
2000
3000
4000
5000
A
Number of variants
PAV6
PAV21, CMV-38, RTR2
RTR6B
CMV-35
ERR1279054, CMV-37
Single strains Multiple strains
PAV6
RTR6B CMV-35
CMV-37, CMV-19
CMV-38, PRA6ACMV-31, SCTR12
ERR1279054, RTR2
Single strains Multiple strains
Figure 2. Box-and-whisker graphs created using ggplot2 (https://ggplot2.tidyverse.org) showing the total number of single-nucleotide polymorphisms (SNPs) detected at
a frequency of >2% in single-strain and multiple-strain infections using LoFreq (A) and V-Phaser (B). Single-strain (n=134 and 131, respectively) and multiple-strain datasets
(n=29 and 29, respectively) for which consensus genome sequences had been derived were identified by motif read-matching, and the total number of SNPs in each dataset
was enumerated (insertions, deletions, and length polymorphisms were not considered). LoFreq employed a minimal coverage depth of 10 reads (minimal SNP quality [phred]
64)and strand-bias significance with a false discovery rate correction of P<.001. V-Phaser employed phasing with a window size of 500 nucleotides and quality score (phred)
20 for calibrating the significance of strand-bias at P<.05. Each box (light gray for single strains and dark gray for multiple strains) encompasses the first to third quartiles
(Q1–Q3) and shows the median as a thick line. For each box, the horizontal line at the end of the upper dashed whisker marks the upper extreme (defined as the smaller of
Q3+1.5 [Q3–Q1] and the highest single value), and the horizontal line at the end of the lower dashed whisker marks indicates the lower extreme (the greater of Q1– 1.5
[Q3–Q1] and the lowest single value).
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 9
pseudogenes presented in natural populations. For example, in a
study reporting that 75% of strains carry pseudogenes [12], 157
mutations were identied in 101 strains, with all but one of these
strains having been passaged in cell culture, although 35 mutations
were conrmed by PCR of the clinical material. Nonetheless, we
found that the distribution of pseudogenes among the 91 strains
sequenced in the present study directly from clinical material is
similar to that among strains isolated in cell culture, thus gener-
ally validating the earlier suppositions. e likelihood that many
of these mutants are ancient is supported by the nding that all
were detected at levels very close to 100% in collection 1, and by
previous observations identifying the same mutation in dierent
strains [7, 12]. Moreover, 9 of the groups of nonrecombinant
strains contained pseudogenes, and some of the mutations
were common to group members and even to additional strains
among the 243, indicating that they have been transferred by re-
combination. e implication that some mutants have a selective
advantage in certain individuals may be extended to their pres-
ence in pathogenic congenital infections, probably in combina-
tion with host factors. e genes from which pseudogenes have
arisen are involved, or are suspected to be involved, in immune
modulation. ey include UL111A, which encodes viral inter-
leukin 10 [49]; UL40, which is involved in protecting infected
cells against natural killer cell lysis [50] via its cleaved signal pep-
tide, in which mutations occur; and UL9, which bears a potential
immunoglobulin-binding domain [2]. ese ndings also sug-
gest, but do not prove, that maternal HCMV genotyping might
be useful in developing strategies for preventing congenitalCMV.
Modern approaches oer a powerful means for analyzing
HCMV genomes directly from clinical material, with the im-
portant proviso that the data should be quality assessed and
interpreted in the context of the known evolutionary and bio-
logical characteristics of the virus. Extensive high-throughput
sequence data are likely to illuminate further the epidemi-
ology, pathogenesis, and evolution of HCMV in clinical and
natural settings, thus facilitating the identication of virulence
determinants and the development of new interventions.
SupplementaryData
Supplementary materials are available at e Journal of Infectious
Diseases online. Consisting of data provided by the authors to
benet the reader, the posted materials are not copyedited and are
the sole responsibility of the authors, so questions or comments
should be addressed to the corresponding author.
Notes
Acknowledgments. We are grateful to Florent Lasalle, Daniel
Depledge, and Judith Breuer (University College London) for
providing unpublished collection 3 datasets and for updating
the associated genome sequences in GenBank. We also thank
Jenny Witthuhn (Hannover Medical School) for excellent tech-
nical assistance.
Financial support. is work was supported by the Medical
Research Council (grant numbers MC_UU_12014/3 and MC_
UU_12014/12 to A.J. D.); the Wellcome Trust (grant numbers
204870/Z/16/Z to A.J. D.and WT090323MA to G.W. G.W.);
the Ministry of Health of the Czech Republic for conceptual
development of research organization (University Hospital,
Motol, Prague, Czech Republic, grant number 00064203 to
P. H.); the Fondazione Regionale per la Ricerca Biomedica,
Regione Lombardia (grant number FRRB 2015-043 to D. L.);
the Niedersächsische Ministerium für Wissenscha und Kultur
(grant COALITION–Communities Allied in Infection to T.G.);
the Deutsche Forschungsgemeinscha Collaborative Research
Centre 900 (core project Z1, grant number SFB-9001 to T. F.
S.); and the German Center of Infection Research ematic
Translational Unit “Infections of the Immunocompromised
Host” (grant to T.G.and T.F. S.). Two authors (E. H.and A.D.)
were supported by the Infection Biology graduate program of
Hannover Biomedical Research School.
Potential conicts of interest. G.S. W.reports that his part
in the present study was completed prior to his present em-
ployment. G.W. G.W.has received a grant from the Wellcome
Trust. D.L.has received a grant from the Fondazione Regionale
per la Ricerca Biomedica, Regione Lombardia. T.G. has re-
ceived grants from the German Federal Ministry of Education
and Research and from the Niedersächsische Ministerium
für Wissenscha und Kultur. P.H. has received a grant from
the Ministry of Health of the Czech Republic for the concep-
tual development of University Hospital, Motol, Prague, Czech
Republic; personal fees and nonnancial support from MSD
and from Chimerix; and personal fees from Dynex. T.F. S.has
received grants from the Deutsche Forschungsgemeinscha
Collaborative Research Centre 900 and from the German
Federal Ministry of Education and Research. A.J. D. has re-
ceived grants from the Medical Research Council and the
Wellcome Trust. All other authors report no potential conicts
of interest.
All authors have submitted the ICMJE Form for Disclosure
of Potential Conicts of Interest. Conicts that the editors
consider relevant to the content of the manuscript have been
disclosed.
References
1. Puchhammer-Stöckl E, Görzer I. Cytomegalovirus and
Epstein-Barr virus subtypes—the search for clinical signifi-
cance. J Clin Virol 2006; 36:239–48.
2. CheeMS, BankierAT, BeckS, etal. Analysis of the protein-
coding content of the sequence of human cytomegalo-
virus strain AD169. Curr Top Microbiol Immunol 1990;
154:125–69.
3. Dunn W, Chou C, Li H, et al. Functional profiling of a
human cytomegalovirus genome. Proc Natl Acad Sci U S A
2003; 100:14223–8.
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
10 • JID 2019:XX (XX XXXX) • Suárez etal
4. MurphyE, YuD, GrimwoodJ, etal. Coding potential of lab-
oratory and clinical strains of human cytomegalovirus. Proc
Natl Acad Sci U S A 2003; 100:14976–81.
5. SinzgerC, HahnG, DigelM, etal. Cloning and sequencing
of a highly productive, endotheliotropic virus strain derived
from human cytomegalovirus TB40/E. J Gen Virol 2008;
89:359–68.
6. DolanA, CunninghamC, HectorRD, etal. Genetic content
of wild-type human cytomegalovirus. J Gen Virol 2004;
85:1301–12.
7. CunninghamC, GathererD, HilfrichB, etal. Sequences of
complete human cytomegalovirus genomes from infected
cell cultures and clinical specimens. J Gen Virol 2010;
91:605–15.
8. Dargan DJ, Douglas E, Cunningham C, et al. Sequential
mutations associated with adaptation of human cyto-
megalovirus to growth in cell culture. J Gen Virol 2010;
91:1535–46.
9. Bradley AJ, Lurain NS, Ghazal P, et al. High-throughput
sequence analysis of variants of human cytomegalo-
virus strains Towne and AD169. J Gen Virol 2009;
90:2375–80.
10. Jung GS, Kim YY, KimJI, et al. Full genome sequencing
and analysis of human cytomegalovirus strain JHC isolated
from a Korean patient. Virus Res 2011; 156:113–20.
11. SijmonsS, ThysK, CorthoutM, etal. A method enabling
high-throughput sequencing of human cytomegalovirus
complete genomes from clinical isolates. PLoS One 2014;
9:e95501.
12. Sijmons S, Thys K, Mbong Ngwese M, et al. High-
throughput analysis of human cytomegalovirus genome
diversity highlights the widespread occurrence of gene-
disrupting mutations and pervasive recombination. J Virol
2015; 89:7673–95.
13. ZhaoF, ShenZZ, LiuZY, etal. Identification and BAC con-
struction of Han, the first characterized HCMV clinical
strain in China. J Med Virol 2016; 88:859–70.
14. Cha TA, Tom E, Kemble GW, Duke GM, Mocarski ES,
SpaeteRR. Human cytomegalovirus clinical isolates carry
at least 19 genes not found in laboratory strains. J Virol
1996; 70:78–83.
15. StantonRJ, BaluchovaK, DarganDJ, etal. Reconstruction
of the complete human cytomegalovirus genome in a BAC
reveals RL13 to be a potent inhibitor of replication. J Clin
Invest 2010; 120:3191–208.
16. Renzette N, Bhattacharjee B, Jensen JD, Gibson L,
KowalikTF. Extensive genome-wide variability of human
cytomegalovirus in congenitally infected infants. PLoS
Pathog 2011; 7:e1001344.
17. MelnikovA, Galinsky K, Rogov P, et al. Hybrid selection
for sequencing pathogen genomes from clinical samples.
Genome Biol 2011; 12:R73.
18. DepledgeDP, Palser AL, WatsonSJ, etal. Specific capture
and whole-genome sequencing of viruses from clinical
samples. PLoS One 2011; 6:e27805.
19. LassalleF, DepledgeDP, ReevesMB, etal. Islands of linkage
in an ocean of pervasive recombination reveals two-speed
evolution of human cytomegalovirus genomes. Virus Evol
2016; 2:vew017.
20. HouldcroftCJ, Bryant JM, Depledge DP, etal. Detection
of low frequency multi-drug resistance and novel puta-
tive maribavir resistance in immunocompromised pedi-
atric patients with cytomegalovirus. Front Microbiol 2016;
7:1317.
21. Hage E, Wilkie GS, Linnenweber-Held S, et al.
Characterization of human cytomegalovirus genome di-
versity in immunocompromised hosts by whole-genome
sequencing directly from clinical specimens. J Infect Dis
2017; 215:1673–83.
22. Chou SW, Dennison KM. Analysis of interstrain var-
iation in cytomegalovirus glycoprotein B sequences
encoding neutralization-related epitopes. J Infect Dis 1991;
163:1229–34.
23. Meyer-KönigU, Ebert K, SchrageB, Pollak S, Hufert FT.
Simultaneous infection of healthy people with multiple
human cytomegalovirus strains. Lancet 1998; 352:1280–1.
24. RasmussenL, GeisslerA, WintersM. Inter- and intragenic
variations complicate the molecular epidemiology of
human cytomegalovirus. J Infect Dis 2003; 187:809–19.
25. MattickC, DewinD, PolleyS, etal. Linkage of human cyto-
megalovirus glycoprotein gO variant groups identified from
worldwide clinical isolates with gN genotypes, implications
for disease associations and evidence for N-terminal sites of
positive selection. Virology 2004; 318:582–97.
26. Sekulin K, Görzer I, Heiss-Czedik D, Puchhammer-
StöcklE. Analysis of the variability of CMV strains in the
RL11D domain of the RL11 multigene family. Virus Genes
2007; 35:577–83.
27. BankevichA, NurkS, AntipovD, etal. SPAdes: a new ge-
nome assembly algorithm and its applications to single-cell
sequencing. J Comput Biol 2012; 19:455–77.
28. Silva GG, Dutilh BE, Matthews TD, etal. Combining de
novo and reference-guided assembly with scaffold_builder.
Source Code Biol Med 2013; 8:23.
29. LangmeadB, SalzbergSL. Fast gapped-read alignment with
Bowtie 2. Nat Methods 2012; 9:357–9.
30. MilneI, StephenG, BayerM, et al. Using Tablet for visual
exploration of second-generation sequencing data. Brief
Bioinform 2013; 14:193–202.
31. BradleyAJ, KovácsIJ, GathererD, etal. Genotypic analysis
of two hypervariable human cytomegalovirus genes. J Med
Virol 2008; 80:1615–23.
32. Bates M, Monze M, Bima H, Kapambwe M, Kasolo FC,
Gompels UA; CIGNIS Study Group. High human
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 11
cytomegalovirus loads and diverse linked variable
genotypes in both HIV-1 infected and exposed, but unin-
fected, children in Africa. Virology 2008; 382:28–36.
33. SieversF, WilmA, DineenD, etal. Fast, scalable generation
of high-quality protein multiple sequence alignments using
Clustal Omega. Mol Syst Biol 2011; 7:539.
34. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S.
MEGA6: molecular evolutionary genetics analysis version
6.0. Mol Biol Evol 2013; 30:2725–9.
35. Davison AJ, Holton M, Dolan A, Dargan DJ,
Gatherer D, Hayward GS. Comparative genomics of
primate cytomegaloviruses. In: Reddehase MJ, ed.
Cytomegaloviruses: from molecular pathogenesis to inter-
vention. Vol 1. Norwich, UK: Caister Academic Press, 2013.
36. LiH, HandsakerB, WysokerA, etal. The sequence align-
ment/map format and SAMtools. Bioinformatics 2009;
25:2078–9.
37. Wilm A, Aw PP, Bertrand D, et al. LoFreq: a sequence-
quality aware, ultra-sensitive variant caller for uncovering
cell-population heterogeneity from high-throughput
sequencing datasets. Nucleic Acids Res 2012; 40:11189–201.
38. YangX, CharleboisP, MacalaladA, HennMR, ZodyMC.
V-Phaser 2: variant inference for viral populations. BMC
Genomics 2013; 14:674.
39. Suárez NM, MusondaKG, Escriva E, etal. Multiple-strain
infections of human cytomegalovirus with high genomic di-
versity are common in breast milk from HIV-positive women
in Zambia. J Infect Dis 2019; XX:XXX–XXX. doi:10.1093/
infdis/jiz209.
40. Meyer-König U, Haberland M, von Laer D, Haller O,
Hufert FT. Intragenic variability of human cytomegalo-
virus glycoprotein B in clinical strains. J Infect Dis 1998;
177:1162–9.
41. Shepp DH, MatchME, LipsonSM, PergolizziRG. A fifth
human cytomegalovirus glycoprotein B genotype. Res Virol
1998; 149:109–14.
42. DeckersM, HofmannJ, KreuzerKA, etal. High genotypic
diversity and a novel variant of human cytomegalovirus re-
vealed by combined UL33/UL55 genotyping with broad-
range PCR. Virol J 2009; 6:210.
43. HaberlandM, Meyer-KönigU, HufertFT. Variation within
the glycoprotein B gene of human cytomegalovirus is
due to homologous recombination. J Gen Virol 1999; 80:
1495–500.
44. Paterson DA, Dyer AP, Milne RS, Sevilla-Reyes E,
GompelsUA. A role for human cytomegalovirus glycopro-
tein O (gO) in cell fusion and a new hypervariable locus.
Virology 2002; 293:281–94.
45. Yan H, Koyano S, Inami Y, etal. Genetic linkage among
human cytomegalovirus glycoprotein N (gN) and gO genes,
with evidence for recombination from congenitally and
post-natally infected Japanese infants. J Gen Virol 2008;
89:2275–9.
46. Xu C, Nezami Ranjbar MR, Wu Z, DiCarlo J, Wang Y.
Detecting very low allele fraction variants using targeted
DNA sequencing and a novel molecular barcode-aware var-
iant caller. BMC Genomics 2017; 18:5.
47. Illingworth CJR, Roy S, Beale MA, TutillH, Williams R,
Breuer J. On the effective depth of viral sequence data.
Virus Evol 2017; 3:vex030.
48. McGeochDJ, CookS, DolanA, JamiesonFE, TelfordEA.
Molecular phylogeny and evolutionary timescale for the
family of mammalian herpesviruses. J Mol Biol 1995;
247:443–58.
49. McSharry BP, Avdic S, Slobedman B. Human cytomega-
lovirus encoded homologs of cytokines, chemokines and
their receptors: roles in immunomodulation. Viruses 2012;
4:2448–70.
50. Prod’hommeV, TomasecP, CunninghamC, et al. Human
cytomegalovirus UL40 signal peptide regulates cell sur-
face expression of the natural killer cell ligands HLA-E and
gpUL18. J Immunol 2012; 188:2794–804.
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
... Although HCMV mutates and evolves more slowly than many RNA viruses and not any faster than other herpes viruses, high levels of genetic variation due to mixed (i.e. multiple) viral strain infections in an individual are often observed [17][18][19][20]. These multiple strain infections likely result from reactivation of latent strains and/or re-infections [17,21,22]. ...
... Mixed infections with multiple HCMV strains are commonly observed in patients with active HCMV replication [10,[17][18][19][20]. Accurately reconstructing the genomic sequences of the individual haplotypes has implications for gaining a deeper understanding of viral pathogenicity and viral diversity within the host. ...
Article
Full-text available
Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a 'G.G' context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workf low named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/ hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data. studies the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients by analysis of large-scale biological and epidemiological data sets with computational techniques.
... This is in line with the authors' previous finding that knockdown of RL13 can increase cell-free infectivity and makes it a candidate to further accelerate the process by including an additional shRNA against this gene. Unfortunately, RL13 is highly polymorphic [36][37][38] with ten different genotypes [39], and the authors have previously failed to identify a suitable target region shared by all genotypes [28]. To solve this issue, either a combination of shRNAs would be necessary or the function of RL13 might be inhibited by a suitable drug, which, however, remains to be identified. ...
Article
Full-text available
Working with recent isolates of human cytomegalovirus (HCMV) is complicated by their strictly cell-associated growth with lack of infectivity in the supernatant. Adaptation to cell-free growth is associated with disruption of the viral UL128 gene locus. The authors transduced fibroblasts with a lentiviral vector encoding UL128-specific-shRNA to allow the release of cell-free infectivity without genetic alteration. Transduced cells were cocultured with fibroblasts containing cell-associated isolates, and knockdown of the UL128 protein was validated by immunoblotting. Cell-free infectivity increased 1000-fold in isolate cocultures with UL128-shRNA compared with controls, and virions could be purified by density gradients. Transduced fibroblasts also allowed direct isolation of HCMV from a clinical specimen and cell-free transfer to other cell types. In conclusion, UL128-shRNA-transduced fibroblasts allow applications previously unsuitable for recent isolates.
... What has become clear from the ability to reconstruct the wild-type HCMV genome using bacterial artificial chromosome technology is that HCMV strains that express the pentameric complex propagate in a highly cell-cell fashion, but when the pentamer is absent (as in Ad169 or Towne laboratory strains), a high proportion of cell-free virus is produced 136 . We now know (through whole-genome sequencing) that HCMV has the largest genome of any virus known to infect humans (double-stranded DNA of 235-250 kb) [137][138][139] . HCMV encodes ~170 canonical open reading frames although non-canonical open reading frames may increase this coding capacity fivefold 140,141 . ...
Article
Human cytomegalovirus (HCMV) is a herpesvirus that infects ~60% of adults in developed countries and more than 90% in developing countries. Usually, it is controlled by a vigorous immune response so that infections are asymptomatic or symptoms are mild. However, if the immune system is compromised, HCMV can replicate to high levels and cause serious end organ disease. Substantial progress is being made in understanding the natural history and pathogenesis of HCMV infection and disease in the immunocompromised host. Serial measures of viral load defined the dynamics of HCMV replication and are now used routinely to allow intervention with antiviral drugs in individual patients. They are also used as pharmacodynamic read-outs to evaluate prototype vaccines that may protect against HCMV replication and to define immune correlates of this protection. This novel information is informing the design of randomized controlled trials of new antiviral drugs and vaccines currently under evaluation. In this Review, we discuss immune responses to HCMV and countermeasures deployed by the virus, the establishment of latency and reactivation from it, exogenous reinfection with additional strains, pathogenesis, development of end organ disease, indirect effects of infection, immune correlates of control of replication, current treatment strategies and the evaluation of novel vaccine candidates. Human cytomegalovirus (HCMV) infection is ordinarily controlled by a vigorous immune response; however, HCMV can replicate to high levels and cause end organ disease when the immune system is compromised. In this Review, Griffiths and Reeves discuss HCMV pathogenesis in immunocompromised individuals and emerging strategies to treat and prevent infection and disease.
Article
Full-text available
The 2022 mpox outbreak has led to more than 91,000 cases in 115 countries. Whole genome sequencing (WGS) has been at the forefront of surveillance and outbreak investigations for different pathogens of public health significance. Many institutions performing WGS on Monkeypox virus (MPXV) use a resource-intensive metagenomic approach. Here we present a targeted amplification method for WGS of MPXV from clinical specimens. We designed 43 pairs of primers (amplicons ~5 kb) with PrimalScheme to span the ~200 kb viral genome and then added 12 additional primers to optimize amplification. We extracted nucleic acid from clinical specimens and amplified the two primer pools. All libraries were sequenced on the MiniSeq platform. Resulting reads were filtered by quality and then mapped to a MPXV reference genome. Consensus sequences were generated for phylogenetic analysis. A total of 91 specimens with a real-time-PCR cycle threshold (Ct) values ≤27.9 were sequenced using our targeted amplification protocol. The sequenced MPXV genomes were of high quality with mean genome coverage of 99.56% (95% CI 99.32-99.80%), mean depth 1,395× (95% CI 1275–1515), and mean mapping quality of 52.87 (95% CI 52.1–53.6) and allowed for greater multiplexing of samples relative to metagenomics. The MPXV genomes belong to 8 of the 13 clades observed during the 2022 global mpox outbreak. Targeted amplification enrichment provides high coverage, throughput, and short turnaround times. It is an efficient low-cost method for MPXV WGS and can benefit public health surveillance and outbreak management. IMPORTANCE We present a protocol to efficiently sequence genomes of the MPXV-causing mpox. This enables researchers and public health agencies to acquire high-quality genomic data using a rapid and cost-effective approach. Genomic data can be used to conduct surveillance and investigate mpox outbreaks. We present 91 mpox genomes that show the diversity of the 2022 mpox outbreak in Ontario, Canada.
Article
Full-text available
Human cytomegalovirus (HCMV) is a common viral pathogen of solid organ transplant recipients, neonates, and HIV-infected individuals. HCMV encodes homologs of several host genes with the potential to influence viral persistence and/or pathogenesis.
Article
Full-text available
Herpesvirus genomes show abundant evidence of past recombination. Its functional importance is unknown. A key question is whether recombinant viruses can outpace the immunity induced by their parents to reach higher loads. We tested this by co-infecting mice with attenuated mutants of Murid Herpesvirus-4 (MuHV-4). Infection by the natural olfactory route routinely allowed mutant viruses to reconstitute wild-type genotypes and reach normal viral loads. Lung co-infections rescued much less well. Attenuated murine cytomegalovirus mutants similarly showed recombinational rescue via the nose but not the lungs. These infections spread similarly, so route-specific rescue implied that recombination occurred close to the olfactory entry site. Rescue of replication-deficient MuHV-4 confirmed this, showing that coinfection occurred in the first encountered olfactory cells. This worked even with asynchronous inoculation, implying that a defective virus can wait here for later rescue. Virions entering the nose get caught on respiratory mucus, which the respiratory epithelial cilia push back towards the olfactory surface. Early infection was correspondingly focussed on the anterior olfactory edge. Thus, by concentrating incoming infection into a small area, olfactory entry seems to promote functionally significant recombination. Importance All organisms depend on genetic diversity to cope with environmental change. Small viruses rely on frequent point mutations. This is harder for herpesviruses because they have larger genomes. Recombination provides another means of genetic optimization. Human herpesviruses often co-infect, and they show evidence of past recombination, but whether this is rare and incidental or functionally important is unknown. We showed that herpesviruses entering mice via the natural olfactory route meet reliably enough for recombination routinely to repair crippling mutations and restore normal viral loads. It appeared to occur in the first encountered olfactory cells and reflected a concentration of infection at the anterior olfactory edge. Thus, natural host entry incorporates a significant capacity for herpesvirus recombination.
Preprint
Short read sequencing, which has extensively been used to decipher the genome diversity of human cytomegalovirus (HCMV) strains, often falls short to assess co-linearity of non-adjacent polymorphic sites in mixed HCMV populations. In the present study, we established a long amplicon sequencing workflow to identify number and relative quantities of unique HCMV haplotypes in mixtures. Accordingly, long read PacBio sequencing was applied to amplicons spanning over multiple polymorphic sites. Initial validation of this approach was performed with defined HCMV DNA templates derived from cell-free viruses and was further tested for its suitability on patient samples carrying mixed HCMV infections. Our data show that artificial HCMV DNA mixtures were correctly determined upon long amplicon sequencing down to 1% abundance of the minor DNA source. Total error rate of mapped reads ranged from 0.17 to 0.43 depending on the stringency of quality trimming. PCR products of up to 7.7 kb and a GC content <55% were efficiently generated when DNA was directly isolated from bronchoalveolar lavage samples, yet long range PCR may display a slightly lower sensitivity compared to short amplicons. In a single sample, up to three distinct haplotypes were identified showing varying relative frequencies. Intra-patient haplotype diversity is unevenly distributed across the target site and often interspersed by long identical stretches, thus unable to be linked by short reads. Moreover, diversity at single polymorphic regions as assessed by short amplicon sequencing may markedly underestimate the overall diversity of mixed populations. Quantitative haplotype determination by long amplicon sequencing provides a novel approach for HCMV strain characterisation in mixed infected samples which can be scaled up to cover the majority of the genome. This will substantially improve our understanding of intra-host HCMV strain diversity and its dynamic behaviour. Impact statement Human cytomegalovirus (HCMV), a large enveloped DNA virus, displays the highest inter-host genome variability among all human herpesviruses. Primary infection, reinfection and reactivation are mostly asymptomatic but may cause devastating harm in congenitally infected newborns and in immunosuppressed individuals. Multiple distinct strains circulate in humans, each characterised by a unique assembly of well-defined polymorphic genes, most of which are linked to cell entry, persistence and immune evasion. Mixed HCMV strain infections are common and may pose a high pathogenic potential for patients at risk for symptomatic infections. To better understand the biological behaviour and dynamics of individual viral genomes it is inevitable to assess the co-linearity of polymorphic sites in a genetically heterogeneous population. In this study, we established and successfully applied a long read sequencing technique to long amplicons and identified co-linear genome stretches (haplotypes) in patient samples with mixed HCMV populations. This strategy for haplotype determination allows linkage analysis of multiple non-adjacent polymorphic sites along up to 7.7 kb. This allows a better approximation to the true strain diversity in mixed samples, which short read sequencing approaches failed to do. Thereby, improving our knowledge on mixed HCMV infections important for the clinical outcome, diagnostics, treatment and vaccine development. Data Summary Sequence data generated in this study were deposited in GenBank with the accession numbers MW560357 - MW560373 . Raw data of Illumina and PacBio sequencing were submitted to the NCBI Sequence Read Archive (SRA) under project number SUB8972240. BioSample accession numbers are provided in Supplementary Table 3 and 4. Additional sequence data for reference purposes were accessed from GenBank. Accession numbers are listed in Supplementary Table 6 and 7.
Article
Full-text available
The endemic betaherpesvirus HCMV circulates in human populations as a complex mixture of genetically distinct variants, establishes lifelong persistent infections, and causes significant disease in neonates and immunocompromised adults. This study capitalizes on our recent characterizations of three genetically distinct HCMV BAC clones to discern the functions of the envelope glycoprotein complexes gH/gL/gO and gH/gL/pUL128-13, which are promising vaccine targets that share the herpesvirus core fusion apparatus component, gH/gL.
Article
Full-text available
Long-read, single-molecule DNA sequencing technologies have triggered a revolution in genomics by enabling the determination of large, reference-quality genomes in ways that overcome some of the limitations of short-read sequencing. However, the greater length and higher error rate of the reads generated on long-read platforms make the tools used for assembling short reads unsuitable for use in data assembly and motivate the development of new approaches. We present LoReTTA (Long Read Template-Targeted Assembler), a tool designed for performing de novo assembly of long reads generated from viral genomes on the PacBio platform. LoReTTA exploits a reference genome to guide the assembly process, an approach that has been successful with short reads. The tool was designed to deal with reads originating from viral genomes, which feature high genetic variability, possible multiple isoforms, and the dominant presence of additional organisms in clinical or environmental samples. LoReTTA was tested on a range of simulated and experimental datasets and outperformed established long-read assemblers in terms of assembly contiguity and accuracy. The software runs under the Linux operating system, is designed for easy adaptation to alternative systems, and features an automatic installation pipeline that takes care of the required dependencies. A command-line version and a user-friendly graphical interface version are available under a GPLv3 license at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset.
Article
Full-text available
Human cytomegalovirus (HCMV) is known for its broad cell tropism, as reflected by the different organs and tissues affected by HCMV infection. Hence, inhibition of HCMV entry into distinct cell types could be considered a promising therapeutic option to limit cell-free HCMV infection. Soluble forms of cellular entry receptor PDGFRα rather than those of entry receptor neuropilin-2 inhibit infection of multiple cell types. sPDGFRα specifically interacts with gO of the trimeric gH/gL/gO envelope glycoprotein complex. HCMV strains may differ with respect to the amounts of trimer in virions and the highly polymorphic gO sequence. In this study, we show that the major gO genotypes of HCMV that are also found in vivo are similarly well inhibited by sPDGFRα. Novel gO genotypic forms potentially emerging through recombination, however, may evade sPDGFRα inhibition on epithelial cells. These findings provide useful additional information for the future development of anti-HCMV therapeutic compounds based on sPDGFRα.
Article
Full-text available
Background: In developed countries, human cytomegalovirus (HCMV) is a major pathogen in congenitally infected and immunocompromised individuals, where multiple-strain infection appears linked to disease severity. The situation is less documented in developing countries. In Zambia, breast milk is a key route for transmitting HCMV and carries higher viral loads in human immunodeficiency virus (HIV)-infected women. We investigated HCMV strain diversity. Methods: High-throughput sequence datasets were generated from 28 HCMV-positive breast milk samples donated by 22 mothers (15 HIV-infected and 7 HIV-negative) at 4-16 weeks postpartum, then analyzed by genome assembly and novel motif-based genotyping in 12 hypervariable HCMV genes. Results: Among the 20 samples from 14 donors (13 HIV-infected and one HIV-negative) who yielded data meeting quality thresholds, 89 of the possible 109 genotypes were detected, and multiple-strain infections involving up to 5 strains per person were apparent in 9 HIV-infected women. Strain diversity was extensive among individuals but conserved compartmentally and longitudinally within them. Genotypic linkage was maintained within hypervariable UL73/UL74 and RL12/RL13/UL1 loci for virus entry and immunomodulation, but not between genes more distant from each other. Conclusions: Breast milk from HIV-infected women contains multiple HCMV strains of high genotypic complexity and thus constitutes a major source for transmitting viral diversity.
Article
Full-text available
Genome sequence data are of great value in describing evolutionary processes in viral populations. However, in such studies, the extent to which data accurately describes the viral population is a matter of importance. Multiple factors may influence the accuracy of a dataset, including the quantity and nature of the sample collected, and the subsequent steps in viral processing. To investigate this phenomenon, we sequenced replica datasets spanning a range of viruses, and in which the point at which samples were split was different in each case, from a dataset in which independent samples were collected from a single patient to another in which all processing steps up to sequencing were applied to a single sample before splitting the sample and sequencing each replicate. We conclude that neither a high read depth nor a high template number in a sample guarantee the precision of a dataset. Measures of consistency calculated from within a single biological sample may also be insufficient; distortion of the composition of a population by the experimental procedure or genuine within-host diversity between samples may each affect the results. Where it is possible, data from replicate samples should be collected to validate the consistency of short-read sequence data.
Article
Full-text available
Background: Advances in next-generation sequencing (NGS) technologies allow comprehensive studies of genetic diversity over the entire genome of human cytomegalovirus (HCMV), a significant pathogen for immunocompromised individuals. Methods: NGS was performed on target-enriched sequence libraries prepared directly from a variety of clinical specimens (blood, urine, breast-milk, respiratory samples, biopsies and vitreous humor) obtained longitudinally or from different anatomical compartments from 20 HCMV-infected patients (renal transplant recipients, stem cell transplant recipients and congenitally infected children). Results: De novo assembled HCMV genome sequences were obtained for 57/68 sequenced samples. Analysis of longitudinal or compartmental HCMV diversity revealed various patterns: no major differences were detected among longitudinal, intra-individual blood samples from 9/15 patients and in most of the patients with compartmental samples, whereas a switch of the major HCMV population was observed in six individuals with sequential blood samples and upon compartmental analysis of one patient with HCMV retinitis. Variant analysis revealed additional aspects of minor virus population dynamics and antiviral resistance mutations. Conclusions: In immunosuppressed patients, HCMV can remain relatively stable or undergo drastic genomic changes that are suggestive of the emergence of minor resident strains or de novo infection.
Article
Full-text available
Background Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. Results We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. Conclusions We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3425-4) contains supplementary material, which is available to authorized users.
Article
Full-text available
Human cytomegalovirus (HCMV) is a significant pathogen in immunocompromised individuals, with the potential to cause fatal pneumonitis and colitis, as well as increasing the risk of organ rejection in transplant patients. With the advent of new anti-HCMV drugs there is therefore considerable interest in using virus sequence data to monitor emerging resistance to antiviral drugs in HCMV viraemia and disease, including the identification of putative new mutations. We used target-enrichment to deep sequence HCMV DNA from 11 immunosuppressed paediatric patients receiving single or combination anti-HCMV treatment, serially sampled over 1-27 weeks. Changes in consensus sequence and resistance mutations were analysed for three ORFs targeted by anti-HCMV drugs and the frequencies of drug resistance mutations monitored. Targeted-enriched sequencing of clinical material detected mutations occurring at frequencies of 2%. Seven patients showed no evidence of drug resistance mutations. Four patients developed drug resistance mutations a mean of 16 weeks after starting treatment. In two patients, multiple resistance mutations accumulated at frequencies of 20% or less, including putative maribavir and ganciclovir resistance mutations P522Q (UL54) and C480F (UL97). In one patient, resistance was detected 14 days earlier than by PCR. Phylogenetic analysis suggested recombination or superinfection in one patient. Deep sequencing of HCMV enriched from clinical samples excluded resistance in 7 of eleven subjects and identified resistance mutations earlier than conventional PCR-based resistance testing in 2 patients. Detection of multiple low level resistance mutations was associated with poor outcome.
Article
Full-text available
Human cytomegalovirus (HCMV) infects most of the population worldwide, persisting throughout the host's life in a latent state with periodic episodes of reactivation. While typically asymptomatic, HCMV can cause fatal disease among congenitally infected infants and immunocompromised patients. These clinical issues are compounded by the emergence of antiviral resistance and the absence of an effective vaccine, the development of which is likely complicated by the numerous immune evasins encoded by HCMV to counter the host's adaptive immune responses, a feature that facilitates frequent super-infections. Understanding the evolutionary dynamics of HCMV is essential for the development of effective new drugs and vaccines. By comparing viral genomes from uncultivated or low-passaged clinical samples of diverse origins, we observe evidence of frequent homologous recombination events, both recent and ancient, and no structure of HCMV genetic diversity at the whole-genome scale. Analysis of individual gene-scale loci reveals a striking dichotomy: while most of the genome is highly conserved, recombines essentially freely and has evolved under purifying selection, 21 genes display extreme diversity, structured into distinct genotypes that do not recombine with each other. Most of these hyper-variable genes encode glycoproteins involved in cell entry or escape of host immunity. Evidence that half of them have diverged through episodes of intense positive selection suggests that rapid evolution of hyper-variable loci is likely driven by interactions with host immunity. It appears that this process is enabled by recombination unlinking hyper-variable loci from strongly constrained neighboring sites. It is conceivable that viral mechanisms facilitating super-infection have evolved to promote recombination between diverged genotypes, allowing the virus to continuously diversify at key loci to escape immune detection, while maintaining a genome optimally adapted to its asymptomatic infectious lifecycle.
Article
Full-text available
Human cytomegalovirus (HCMV) is the leading infectious cause of birth defects, and may lead to severe or lethal diseases in immunocompromised individuals. Several HCMV strains have been identified and widely applied in research, but no isolate from China has been characterized. In the present study, we isolated, characterized and sequenced the first Chinese HCMV clinical strain Han, and constructed the novel and functional HCMV infectious clone Han-BAC-2311. HCMV Han was isolated from the urine sample of a Chinese infant with multiple developmental disorders. It expresses HCMV specific proteins and contains a representative HCMV genome with minor differences compared to other strains. By homologous recombination using mini-F derived BAC vector pUS-F6, the infectious clone Han-BAC-2311 was constructed containing representative viral genes across the HCMV genome. The insertion site and orientation of BAC sequence were confirmed by restriction enzyme digestion and Southern blotting. The reconstituted recombinant virus HanBAC-2311 expresses typical viral proteins with the same pattern as that of wild-type Han, and also displayed a similar growth kinetics to wild-type Han. The identification of the first clinical HCMV strain in China and the construction of its infectious clone will greatly facilitate the pathogenesis studies and vaccine development in China. This article is protected by copyright. All rights reserved.
Article
Full-text available
Importance: Human cytomegalovirus has the largest genome of all viruses that infect humans. Currently, there is a great interest in establishing associations between genetic variants and strain pathogenicity of this herpesvirus. Since the number of publicly available full-genome sequences is limited, knowledge about strain diversity is highly fragmented and biased towards a small set of loci. Combined with our previous work, we have now contributed 101 complete genome sequences. We have used these data to conduct the first high-resolution analysis of interhost genome diversity, providing an unbiased and comprehensive overview of cytomegalovirus variability. These data are of major value to the development of novel antivirals and a vaccine and to identify potential targets for genotype-phenotype experiments. Furthermore, they have enabled a thorough study of the evolutionary processes that have shaped cytomegalovirus diversity.
Article
Full-text available
Human cytomegalovirus (HCMV) is a ubiquitous virus that can cause serious sequelae in immunocompromised patients and in the developing fetus. The coding capacity of the 235 kbp genome is still incompletely understood, and there is a pressing need to characterize genomic contents in clinical isolates. In this study, a procedure for the high-throughput generation of full genome consensus sequences from clinical HCMV isolates is presented. This method relies on low number passaging of clinical isolates on human fibroblasts, followed by digestion of cellular DNA and purification of viral DNA. After multiple displacement amplification, highly pure viral DNA is generated. These extracts are suitable for high-throughput next-generation sequencing and assembly of consensus sequences. Throughout a series of validation experiments, we showed that the workflow reproducibly generated consensus sequences representative for the virus population present in the original clinical material. Additionally, the performance of 454 GS FLX and/or Illumina Genome Analyzer datasets in consensus sequence deduction was evaluated. Based on assembly performance data, the Illumina Genome Analyzer was the platform of choice in the presented workflow. Analysis of the consensus sequences derived in this study confirmed the presence of gene-disrupting mutations in clinical HCMV isolates independent from in vitro passaging. These mutations were identified in genes RL5A, UL1, UL9, UL111A and UL150. In conclusion, the presented workflow provides opportunities for high-throughput characterization of complete HCMV genomes that could deliver new insights into HCMV coding capacity and genetic determinants of viral tropism and pathogenicity.