Content uploaded by Akshay Dhingra
Author content
All content in this area was uploaded by Akshay Dhingra on Jun 17, 2019
Content may be subject to copyright.
MAJOR ARTICLE
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 1
The Journal of Infectious Diseases
Received 21 February 2019; editorial decision 17 April 2019; accepted 24 April 2019; published
online May 2, 2019.
Presented in part: Seventh International Congenital Cytomegalovirus (CMV) Conference and
17th International CMV Workshop, Birmingham, Alabama, April 2019.
Published as a bioRxiv preprint on 23 December 2018 and revised on 18 February 2019
(https://doi.org/10.1101/505735).
aN. M.S.and G.S. W.contributed equally to this work.
Present affiliations: bIllumina, Scoreseby, Victoria, Australia; cSGS Vitrology Ltd, Glasgow,
United Kingdom; dIT Services–Business Systems Team, University of Glasgow, United Kingdom.
Correspondence: Andrew J.Davison, MRC–University of Glasgow Centre for Virus Research,
Sir Michael Stoker Bldg, 464 Bearsden Road, Glasgow G61 1QH, UK (andrew.davison@
glasgow.ac.uk)
The Journal of Infectious Diseases® 2019;XX(XX):1–11
© The Author(s) 2019. Published by Oxford University Press for the Infectious Diseases Society
of America. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted
reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
DOI: 10.1093/infdis/jiz208
Human Cytomegalovirus Genomes Sequenced Directly
From Clinical Material: Variation, Multiple-Strain
Infection, Recombination, and GeneLoss
NicolásM. Suárez,1,a GavinS. Wilkie,1,a,b Elias Hage,2,3 Salvatore Camiolo,1 Marylouisa Holton,1,c Joseph Hughes,1, Maha Maabar,1,d SreenuB. Vattipally,1
Akshay Dhingra,2 UrsulaA. Gompels,4 GavinW.G. Wilkinson,5 Fausto Baldanti,6,7 Milena Furione,6 Daniele Lilleri,8 Alessia Arossa,9
Tina Ganzenmueller,2,3,10 Giuseppe Gerna,8 Petr Hubáček,11 ThomasF. Schulz,2,3 Dana Wolf,12 Maurizio Zavattoni,6 and AndrewJ. Davison1,
1Medical Research Council–University of Glasgow Centre for Virus Research, United Kingdom; 2Institute of Virology, Hannover Medical School, and 3German Center for Infection Research,
Hannover-Braunschweig site; 4Pathogen Molecular Biology Department, London School of Hygiene and Tropical Medicine, and 5Division of Infection and Immunity, School of Medicine, Cardiff
University, United Kingdom; 6Molecular Virology Unit, Microbiology and Virology Department, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Policlinico San Matteo,
7Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, and 8Laboratory of Genetics-Transplantology and Cardiovascular Diseases, and 9Departments of Obstetrics
and Gynecology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; 10Institute for Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany; 11Department
of Medical Microbiology, Motol University Hospital, Prague, Czech Republic; and 12Clinical Virology Unit, Department of Clinical Microbiology and Infectious Diseases, Hadassah University
Hospital, Jerusalem, Israel
e genomic characteristics of human cytomegalovirus (HCMV) strains sequenced directly from clinical pathology samples were
investigated, focusing on variation, multiple-strain infection, recombination, and gene loss. Atotal of 207 datasets generated in this
and previous studies using target enrichment and high-throughput sequencing were analyzed, in the process enabling the determi-
nation of genome sequences for 91 strains. Key ndings were that (i) it is important to monitor the quality of sequencing libraries
in investigating variation; (ii) many recombinant strains have been transmitted during HCMV evolution, and some have apparently
survived for thousands of years without further recombination; (iii) mutants with nonfunctional genes (pseudogenes) have been
circulating and recombining for long periods and can cause congenital infection and resulting clinical sequelae; and (iv) intrahost
variation in single-strain infections is much less than that in multiple-strain infections. Future population-based studies are likely to
continue illuminating the evolution, epidemiology, and pathogenesis of HCMV.
Keywords. human cytomegalovirus; genome sequence; target enrichment, genotype; variation; multiple-strain infection;
recombination; gene loss; mutation.
Human cytomegalovirus (HCMV) poses a risk, particularly to
people with immature or compromised immune systems, and
can have serious outcomes in congenitally infected children,
transplant recipients, and people with human immunodefi-
ciency virus/AIDS. Prior to the advent of high-throughput
technologies, studies of HCMV genomes in natural infections
were limited to Sanger sequencing of polymerase chain reac-
tion (PCR) amplicons, often focusing on a small number of
polymorphic (hypervariable) genes [1]. This left out most
of the genome and also restricted the characterization of
multiple-strain infections, which may have more serious
outcomes.
e rst complete HCMV genome sequence to be determined
was that of the high-passage strain AD169 [2], from a plasmid
library. Over a decade later, additional genomes were sequenced
from bacterial articial chromosomes [3–5], virion DNA [6] and
overlapping PCR amplicons [7, 8]. ese sequences were also de-
termined using Sanger technology, and were complemented sub-
sequently by many others, increasingly using high-throughput
methods [7, 9–13]. With only 3 exceptions [7, 11], all were de-
rived from laboratory strains isolated in cell culture. Mounting
evidence of the existence of multiple-strain infections and the
propensity of HCMV to mutate during cell culture [6–8, 14, 15]
added impetus to sequencing genomes directly from clinical ma-
terial to dene natural populations. One strategy for this involves
sequencing overlapping PCR amplicons [7, 16]. Another utilizes
an oligonucleotide bait library representing known HCMV di-
versity to select target sequences from random DNA fragments.
is target enrichment technology originated in commercial kits
for cellular exome sequencing, and was subsequently applied to
applyparastyle “g//caption/p[1]” parastyle “FigCapt”
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
2 • JID 2019:XX (XX XXXX) • Suárez etal
various pathogens [17, 18], including HCMV [19–21]. We have
applied it to HCMV since 2012 and have systematically released
via GenBank many genome sequences that have proved pivotal
in other studies [11, 12, 19–21].
e HCMV genome exhibits several evolutionary phe-
nomena, including variation, multiple-strain infection, recom-
bination, and gene loss, all of which were discovered prior to
high-throughput sequencing and have since been illuminated
by this technology (early references are [22–26]). We explore
these and other key genomic features of HCMV, with an em-
phasis on the strains present in clinical material.
METHODS
Samples
For convenience, samples were analyzed as collections
1–3, which are summarized in Table 1 and described in
Supplementary Tables 1–3, respectively. Collection 3 represents
samples sequenced by others in previous studies using target
enrichment with a different oligonucleotide bait library. The
features of the samples are shown in Supplementary Tables 1–3
(rows 3–6), and the clinical outcomes of congenital infection
are in Supplementary Table 1 (row205).
DNA Sequencing
Target enrichment and sequencing library preparation were
performed using the SureSelect XT version 1.7 system for
Illumina paired-end libraries with biotinylated RNA bait
libraries (Agilent) [21]. Bait libraries representing known
HCMV diversity were designed in February 2012 and April
2014 from 31 and 64 complete genome sequences, respec-
tively. Information on and access to the latter library (55210
baits of 120 nucleotides [nt] with overrepresentation of G
+ C–rich regions) are available from the corresponding au-
thor. Data on viral loads and library construction are shown
in Supplementary Tables 1–3 (rows 9–12). Datasets of 300
or 150 nt paired-end reads were generated using a MiSeq
(Illumina). Their names are shown in Supplementary Tables
1–3 (row 7). They were prepared for analysis using Trim
Galore version 0.4.0 (program available at http://www.bioin-
formatics.babraham.ac.uk/projects/trim_galore/; length = 21,
quality= 10, and stringency= 3). The numbers of trimmed
reads are in Supplementary Tables 1–3 (row 15).
Library Diversity
Estimating the number of reads in a dataset derived from
unique HCMV fragments initially involved using Bowtie2 ver-
sion 2.2.6 [29] to align the reads against the strain Merlin se-
quence (GenBank accession number AY446894.2), and, where
it could be determined, the consensus genome sequence derived
from the dataset. The relevant data are in Supplementary Tables
1–3 (rows 17–19 and 23–26). Reads containing insertions or
deletions were removed to preserve coordinate numbering, as
Table 1. Selected Characteristics on Sample Collections 1–3
Characteristic Collection 1 Collection 2 Collection 3
Patients, No.a48 29 25
Patient condition Congenital infection Mostly transplant recipients Various
Samples, No. 53 89 57
Sample source, city (prefix) Pavia (PAV), Jerusalem
(JER), Prague (PRA)
Hannover (Child, RTR, SCTR),
Pavia (PAV)
Rotterdam (Rot),
London (Lon, Pat_)
Datasets, No. 53 97b57c
Duplicated libraries, No. 0 7 0
HCMV load, IU/µLd26–559968 5–194840 104–18377
Genome copies for library, No.e225–8399520 280–3896800 Unknown
Reads in Merlin alignment, % 2–91 0–85 0–90
Coverage ratio in Merlin alignment, % unique/total reads 0.40–83.12 0.00–76.09 0.00–90.21
Genome sequences determined, No.f42 25 24
Details are provided in Supplementary Tables 1–3.
Abbreviation: HCMV, human cytomegalovirus.
aArchived diagnostic samples were used, and clinical data were retrieved, with the approval of the institutional review boards of Policlinico San Matteo, Pavia (reference numbers 35853/2010
and 35854/2010), Hadassah University Hospital, Jerusalem (reference number HMO-063911), Motol University Hospital, Prague (reference number EK-701a/16) and Hannover Medical
School, Hannover (reference number 2527-2014).
bWe reported 68 of the Hannover datasets previously [21].
cThese datasets were reported previously by others, and were either provided by the authors [19] or downloaded from the European Nucleotide Archive (study PRJEB12814) [20].
dViral load in most extracted samples was quantified in the laboratory of origin or the sequencing laboratory. In some instances, the entire sample was used blind to generate a sequencing
library.
eAssumes that 1 IU is equivalent to 1 genome copy.
fThe trimmed paired-read data were aligned to the UCSC hg19 human reference genome (http://genome.ucsc.edu/) using Bowtie2. Nonmatching reads were assembled de novo into contigs
using SPAdes version 3.5.0 [27]. The contigs were ordered using Scaffold_builder version 2.2 [28] by reference to a version of the strain Merlin sequence lacking all but 100 nt of the terminal
repeat regions (TRL at the left end and TRS at the right end; Figure 1), and merged into a draft genome sequence. Residual gaps were filled by identifying relevant reads anchored in flanking
regions and assembling them manually in a reiterative fashion. TRL and TRS were reinstated, and the complete genome sequence was verified by aligning it against the read data using
Bowtie2 and inspecting the alignment in Tablet. An annotated genome sequence was produced using Sequin (https://www.ncbi.nlm.nih.gov/Sequin/).
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 3
were duplicate read pairs sharing both end coordinates and
duplicate unpaired reads sharing one end coordinate, thereby
producing an alignment file for unique reads derived from
unique HCMV fragments (program available at https://centre-
for-virus-research.github.io/VATK/AssemblyPostProcessing).
This file was viewed using Tablet version 1.14.11.7 [30]. The
coverage depth values for total and unique fragment reads are
in Supplementary Tables 1–3 (rows 20–21 and 27–28).
Strain Enumeration
The number of strains represented in a dataset was estimated by
2 strategies: genotype read-matching and motif read-matching
(program available at https://centre-for-virus-research.github.
io/VATK/HCMV_pipeline). Both strategies utilized datasets
concatenated from the paired-end datasets. The genotype
designations used were either based on reported phylogenies
[6, 12, 25, 31, 32], amended or extended as appropriate, or
constructed afresh using Clustal Omega version 1.2.4 [33] and
MEGA version 6.0.6 [34] with data for the genomes listed in
Supplementary Table 4 and individual genes for which addi-
tional sequences were available in GenBank. Alignments and
phylogenetic reconstructions are in Supplementary Figures 1
and 2, respectively.
For genotype read-matching, Bowtie2 was used to align the
reads to sequences representing the genotypes of 2 hypervariable
genes, UL146 and RL13 [6, 12, 35]. e sequences from the en-
tire coding region of UL146 and the central coding region of
RL13 are in Supplementary Tables 1–3 (rows 34–58). In contrast
to the UL146 genotypes, the RL13 genotypes cross-matched
within 4 groups (G1, G2, G3; G4A, G4B; G6, G10; and G7, G8).
In these instances, the genotype within the group with most
matching reads was scored. e number of reads aligned to each
genotype is in Supplementary Tables 1–3 (rows 34–58). Ageno-
type was scored if the number of reads was >10 and represented
>2% of the total number detected for all genotypes of that gene.
For 14 samples in collection 1 that had been sequenced prior to
the availability of ultrapure (TruGrade) oligonucleotides, these
values were >25 and >5%, respectively. e number of strains in
a sample was scored as the greater of the numbers of genotypes
detected for the 2 target genes, and is in Supplementary Tables
1–3 (row13).
For motif read-matching, conserved genotype-specic motifs
(20–31 nt) were identied by visual inspection of alignments
(Supplementary Figure 1) for 12 hypervariable genes [6, 12, 19,
35]. Additional motifs for identifying common intergenotypic
recombinants were included. e motif sequences and number
of reads containing perfect matches to a sequence or its reverse
complement are in Supplementary Tables 1–3 (rows 60–170).
Genotypes were scored as described above. e number of
strains in a sample was estimated as the maximum number of
genotypes detected for at least 2 genes, and is in Supplementary
Tables 1–3 (row 14).
Pseudogene Analysis
The genomes of some HCMV strains exhibit gene loss apparent
as pseudogenes resulting from mutations causing premature
translational termination [7, 11, 12, 26]. These mutations are
substitutions that introduce in-frame stop codons or ablate
splice sites, or insertions or deletions that cause frameshifting
or loss of protein-coding regions. Motif read-matching was
used to assess the presence of common mutations and also to
determine the prevalence of mutations identified in collection
1.These data are in Supplementary Tables 1–3 (rows 171–178)
and Supplementary Table 1 (rows 180–203), respectively.
Intrahost Variation
Minor genome populations were analyzed by enumerating
single-nucleotide polymorphisms (SNPs) in datasets for which
consensus genome sequences had been determined. Thus, the
term mutant applies hereafter to a strain that has a mutation in
the consensus sequence resulting in a pseudogene, and the term
SNP applies to a minor variation from the consensus within
a population. To enumerate SNPs, original datasets were pre-
pared for analysis using Trim Galore (length=100, quality=30,
and stringency = 1), and trimmed reads were mapped using
Bowtie2. Alignment files in SAM format were converted into
BAM format, sorted using SAMtools version 1.3 [36], and
analyzed using LoFreq version 2.1.2 [37] and V-Phaser 2 [38].
Data Deposition
Original datasets were purged of human reads and deposited
in the European Nucleotide Archive (ENA; project number
PRJEB29585), and consensus genome sequences were deposited
in GenBank. The accession numbers are in Supplementary
Tables 1–3 (rows 8 and 29, respectively). Updated genome se-
quence determinations in collection 3 were deposited by the
original submitters in GenBank [19] or by us as third-party
annotations in ENA (project number PRJEB29374) [20].
Sequence features are in Supplementary Tables 1–3 (rows
30–32).
RESULTS
Operational Limitations
A total of 207 datasets from 199 samples and 102 individuals
were analyzed (Table 1 and Supplementary Tables 1–3). Library
quality was represented in the percentage of HCMV reads and
the coverage depth by unique fragment reads. These values were
related to sample type, being higher for urine than blood pre-
sumably because of a higher proportion of viral to host DNA.
They also depended on the number of viral genome copies used
to make the library, with >1000 copies generally being needed
to determine a complete genome sequence. However, despite
high library diversity, it was not possible to assemble complete
genome sequences from most datasets in collection 3 because of
gaps in RL12 and some G + C–rich regions, perhaps as a result
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
4 • JID 2019:XX (XX XXXX) • Suárez etal
of limitations in the bait library. The use of excessive PCR cycles
with some samples in collections 1 and 2 led to high coverage
depth by total fragment reads but low coverage depth by unique
fragment reads, and thus to highly clonal libraries (eg, PAV2 in
collection 1). Genotypes present at subthreshold levels may rep-
resent multiple-strain infections or cross-contamination during
the complex sample processing pathway (eg, PRA4 reads in
PRA6A in collection 1).
Genome Sequences
A total of 91 complete or almost complete HCMV genome
sequences were determined (Table 1). We reported 5 previ-
ously [21], and 16 are improvements on published sequences
[19]. Most originated from single-strain infections or multiple-
strain infections in which one strain was predominant, and
some originated from different strains that predominated in a
patient at different times. Defining a strain as a viral genome
present in an individual, these 91 sequences, plus an addi-
tional 49 deposited by our group and 104 by others, brought
the number of strains sequenced to 244 (Supplementary Table
4). Of these, 91 were sequenced directly from clinical material,
and all but one were determined in this and our previous study
[21]. The average size of the HCMV genome, based on the 78
complete sequences in this set, is 235465bp (range234316–
237120 bp).
Multiple-Strain Infections
Genotypic differences in hypervariable genes (Figure 1 and
Supplementary Figures 1 and 2) were exploited to distinguish
single-strain from multiple-strain infections by genotype read-
matching and motif read-matching with threshold values. To
our knowledge, these methods, employed in the present work
and the companion study [39], have not been used previously
for categorizing HCMV infections. Single strains were common
in congenitally infected patients (n=43/50 in collections 1 and
2), but significantly less so in transplant recipients (n=11/25 in
collections 2 and 3; χ2=14.583, P<.05). Intrahost variation is
discussedbelow.
Recombination
The 244 genome sequences were genotyped in the 12
hypervariable genes used for motif read-matching and then in 5
additional genes (Figure 1 and Supplementary Table 4).
Hypervariation in UL55, which encodes glycoprotein B
(gB), is located in 2 regions (UL55N near the N terminus, and
UL55X encompassing the proteolytic cleavage site) [23, 40].
Five genotypes (G1–G5) have been assigned to each region [23,
40–42], which are separated by 927bp that are 80% identical
in all strains. All genomes had a recognized UL55X genotype
(Supplementary Table 5). As reported previously [40], UL55N
G2 and G3 could not be distinguished reliably from each other,
and 2 additional genotypes (G6–G7) were detected that may
have arisen from ancient recombination events within UL55N
(Supplementary Tables 4 and 5 and Supplementary Figure 1).
ere was evidence for recombination in the region between
UL55N and UL55X in only 8 genomes. is low proportion of
recombination (3.3%) contrasts with the higher levels proposed
6050403020100
RNA2.7 RNA1.2
RL8A
RL9A UL2UL21A
UL23
UL24UL22AUL19
UL18
UL17
UL16
UL15A
UL14
UL13UL10UL7
UL8
UL6
UL5
UL4
RL11
RL10RL1
UL26
UL27 UL29 UL30
UL30A
UL32 UL36
UL38
UL40
UL41A
UL42
UL43
UL44
UL45UL35UL34UL31UL25RL5A
RL6 UL20UL1
RL13
RL12
UL11
UL9
UL37UL33
12011010090807060
UL46
UL48A
UL49 UL50
UL51
UL54 UL57UL53UL52UL48UL47 UL69 UL70 UL71
UL72
UL75
UL79
UL82UL80
UL80.5
UL78
UL77
UL76
UL74A
RNA4.9UL55 UL74
UL73
UL56
180170160150140130120
UL83 UL84 UL85 UL86 UL89 UL102UL100
UL99
UL98UL97
UL96
UL95
UL94
UL93
UL92
UL91UL88
UL87 UL105
UL103
UL104
UL114
UL115
UL116
UL117
UL119
UL121
UL123
UL122 UL128
UL130
UL131A
UL132
UL148
UL124UL112
UL111A
RNA5.0
UL120
230 kbp220210200190180
UL147A
UL147
UL145
UL144
UL142
UL141
UL140
UL138
UL136
UL135
UL133
UL148A
UL148B
UL150
US1
US2
US3
US6
US7
US8US10
US11
US12
US13
US14
US15
IRS1
UL150A
UL148D
UL148C
US16 US18
US17 US19
US20
US21
US22 US23 US24 US26 TRS1US34A
US34
US33A
US32
US31
US30
US29
US28
US27
UL146 UL139
US9
Figure 1. Locations in the human cytomegalovirus strain Merlin genome of genes used for genotyping. The genome consists of 2 unique regions, UL (1325–194343bp)
and US (197627–233108bp), the former flanked by inverted repeats TRL (1–1324bp) and IRL (194344–195667bp), and the latter flanked by inverted repeats IRS (195090–
197626bp) and TRS (233109–235646bp). Protein-coding regions are indicated by shaded arrows, and noncoding RNAs as narrower, white arrows, with gene nomenclature
below. Introns are shown as narrow white bars. The 12 genes (RL5A, RL6, RL12, RL13, UL1, UL9, UL11, UL73, UL74, UL120, UL146, and UL139) used for motif read-matching
are in dark gray (red in online version). Two of these genes (RL13 and UL146) were also used for genotype read-matching. The additional 5 genes (UL20, UL33, UL37, UL55,
and US9) used to genotype sequences by alignment are medium gray (orange in online version). All other genes are shown in white (pink in online version).
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 5
in UL55 from PCR-based studies [40, 43], which may have been
aected by artefactual recombination.
UL73 and UL74, which encode glycoproteins N and O (gN
and gO), respectively, are adjacent hypervariable genes that
exist as 8 genotypes each [25, 32, 44]. ere was evidence for
recombination between them in only 7 genomes (2.9%), in ac-
cordance with the low levels (2.2%) detected previously in PCR-
based studies [25, 32, 45]. In the region containing adjacent
hypervariable genes RL12, RL13, and UL1, recombinants were
also rare (1.2%) within RL12 and absent from RL13 and UL1.
In contrast, hypervariable genes UL146 and UL139, which en-
code a CXC chemokine and a membrane glycoprotein, respec-
tively, are separated by a well-conserved region of over 5 kbp.
e number (66) of the 126 possible genotype combinations
represented in the 244 genomes is too large to allow any under-
lying genotypic linkage to be discerned, consistent with previous
conclusions from PCR-based studies [31]. No recombinants
were noted withinUL146.
In principle, strains in multiple-strain infections have the
opportunity to recombine. In our previous analysis of RTR1 in
collection 2, we noted that one strain (RTR1A) predominated
at earlier times and another (RTR1B) at later times [21]. From
the low frequency of SNPs across a large part of the genome,
we concluded that the second strain had arisen either by re-
combination involving the rst strain or by reinfection with, or
reactivation of, a second strain fortuitously similar to the rst.
In the present study, recombination was strongly supported by
a comparison of the 2 genome sequences, which showed that
approximately two-thirds of the genome is almost identical
(diering by 3 substitutions in noncoding regions), whereas the
remaining third is highly dissimilar.
To investigate whether strains have been transmitted
without recombination occurring, identical genotypic
constellations were identied among the 244 genomes (Table
2). is revealed the existence of 12 haplotype groups within
which multiple strains lack signs of having recombined since
diverging from their last common ancestor; these are hence-
forth termed nonrecombinant strains. As an incidental out-
come, the 2 strains in group 1 (PRA8 and CZ/3/2012), which
were characterized in dierent studies, were conrmed as
having originated from the same patient, reducing the set of
sequenced strains to 243. e results from the other 11 groups
suggest that nonrecombinant strains have been circulating,
some for periods sucient to allow the accumulation of >100
substitutions. Among the highly divergent groups, group 9 (3
strains) exhibited 135 dierences, with the 50 that would af-
fect protein coding distributed among 38 genes, and group 10
(2 strains) exhibited 138 dierences, with the 38 that would
aect protein coding distributed among 27 genes. No obvious
bias was observed toward greater diversity in any particular
gene or group of genes, including those in the hypervariable
ca tegor y.
Pseudogenes
Among the strains sequenced from clinical material, 77%
are mutated in at least one gene (compared with 79% among
all sequenced strains), and one is mutated in as many as 6
genes (Pat_D in collection 3) (Supplementary Table 4). The
most frequently mutated genes are UL9, RL5A, UL1 and RL6
(members of the RL11 family), US7 and US9 (members of the
US6 gene family), and UL111A (encoding viral interleukin
10)(Table3). In addition, there was evidence from the PAV6
datasets (collection 1)for maternal transmission of a US7 mu-
tant (Supplementary Table 1), and from PCR data (not shown)
for maternal transmission of a UL111A mutant to PAV16 (col-
lection 1). Focusing on the most common mutations, strains
in which UL9, RL5A, UL1, US9, US7, and UL111A were af-
fected (singly or in combination) were, like strains that were not
mutated in any gene, transmitted in congenital infections and,
in some cases, linked to defects in neurological development
(Supplementary Table 1).
Intrahost Diversity
LoFreq and V-Phaser analyses showed that single-strain
infections contained markedly fewer SNPs (median values of 60
and 140, respectively) than multiple-strain infections (median
values of 2444 and 2955, respectively; Figure 2). The differences
between the values for single- and multiple-strain infections
were significant (Kruskal–Wallis rank-sum test; LoFreq:
χ2=67.918, P<2.2× 10-16; V-Phaser: χ2= 63.536, P= 1.6 ×
10-15).
DISCUSSION
Advances in high-throughput sequencing technology have
made it possible to generate a wealth of viral genome informa-
tion directly from clinical material. However, operational
limitations should be registered. These include sample charac-
teristics (source, viral content and presence of multiple strains),
confounding factors (technical limitations, logistical errors and
cross-contamination), design of the bait library (ability to en-
rich all strains and acquire data across the genome), and quality
and extent of the sequencing data (library diversity and coverage
depth). Since perceived levels of intrahost variation are partic-
ularly sensitive to these factors, we proceeded cautiously with
this aspect. However, as indicated in our previous study [21],
it is clear that the number of SNPs in single-strain infections
was markedly less than that in multiple-strain infections. It was
also far less than that reported by others in samples from con-
genital infections [16]. The factors listed above may have been
responsible for the outliers observed in single-strain infections;
for example, the PAV6 (collection 1) library was made using
non-TruGrade oligonucleotides, RTR6B (collection 2) had a
low coverage depth and also came from a patient from whom
other samples contained multiple strains, and CMV-35 (collec-
tion 3) may have contained subthreshold levels of additional
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
6 • JID 2019:XX (XX XXXX) • Suárez etal
Table 2. Groups of Nonrecombinant Strains
Genotypesa
Group Strain
RL5A
RL6
RL12
RL13
UL1
UL9
UL11
UL20
UL33
UL37
UL55N
UL73
UL74
UL120
UL146
UL139
US9
Mutated Genes DifferencesbShared Mutations
1 PRA8 1 1 6 6 6 6 2 5 1 5 2/3 4C 1C 2B 1 4 1 UL145 0 These strains share a UL145 mutation, were characterized
in different studies, and were confirmed a s having b een
derived from the same patient
CZ/3/2012 1 1 6 6 6 6 2 5 1 5 2/3 4C 1C 2B 1 4 1 UL145
2 BE/3/2011 2 4 1B 1 1 4 1 6 2 2 2/3 4A 3 1A 8 2 1 None 1 None
BE/21/2011 2 4 1B 1 1 4 1 6 2 2 2/3 4A 3 1A 8 2 1 None
3 UK/Lon6/Urine/2011 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None 23 None
2CEN15 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None
BE/5/2012 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None
4 BE/14/2012 3 5 4A 4A 4 6 6 5 4 5 4 3A 1B 2A 9 5 1 RL6 UL9 UL40 US7 26 These strains share a UL9 mutation and also RL6 and UL40
mutations that are present in other strains
BE/36/2011 3 5 4A 4A 4 6 6 5 4 5 4 3A 1B 2A 9 5 1 RL6 UL9 UL40
5 BE/10/2012 6 3 1A 1 1 1 1 2 4 5 4 3A 1B 2A 3 7 1 None 35 None
BE/26/2011 6 3 1A 1 1 1 1 2 4 5 4 3A 1B 2A 3 7 1 None
6 BE/1/2011 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL1 UL9 65 These strains bear a UL9 mutation that is present in other
strains, and 2 strains share a UL1 mutation
BE/8/2010 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL9
BE/9/2012 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL1 UL9
7 NAN1LA 3 5 5 5 5 7 3 2 2 6 2/3 4D 5 3B 7 5 2 RL6 US9 73 These strains share RL6 and US9 mutations that are present
in other strains
BE/6/2012 3 5 5 5 5 7 3 2 2 6 2/3 4D 5 3B 7 5 2 RL6 US9 US27
8 BE/7/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 RL5A RL13 UL150 125 These strains share a UL150 mutation that is present in
other strains
BE/11/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150
BE/16/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150
BE/26/2010 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150
BE/30/2011 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150
9 JER851 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL1 UL9 UL111A 135 These strains share a UL111A mutation that is present in
another strain
JER4041 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL111A
BE/25/2010 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL111A
10 JER5695 1 1 7 7 7 1 1 6 2 1 2/3 3B 2A 4B 13 2 1 UL9 UL111A 138 These strains share a UL111A mutation that is present in
other strains, and have different UL9 mutations
BE/15/2010 1 1 7 7 7 1 1 6 2 1 2/3 3B 2A 4B 13 2 1 RL1 UL9 UL111A
11 PRA7 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A 143 These strains share RL5A and UL111A mutations that are
present in other strains
JP 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A
BE/4/2010 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A
12 BE/6/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 UL9 155 Two strains share a UL9 mutation that is present in other
strains
BE/18/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 None
BE/27/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 UL9
aSee Supplementary Figures 1 and 2 for genotype definitions. G prefix omitted.
bTotal number of differences among all strains in the group, not including size variations in tandem repeats. To exclude repeat regions, sequences were aligned from the TATA box of RL1 to the end of US, omitting the region from the AATAAA polyadenylation
signal of UL150A to the beginning of TRS.
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 7
strains or cross-contaminants. In our view, accurate estimates
of the levels of intrahost variation in single-strain infections are
not available from the present and previous studies, and will
require sequencing and bioinformatic approaches that are de-
monstrably reliable, robust, and reproducible [46, 47].
Whole-genome analyses have conrmed the signicant role
of recombination during HCMV evolution reported in nu-
merous earlier studies [12, 19]. Recombination has occurred
over a very long period but nonetheless remains limited in ex-
tent, with surviving events being more numerous in long re-
gions, less numerous in short regions, and rare or absent in
hypervariable regions, consistent with the role of homologous
recombination. Recombination frequency may be restricted in
some circumstances by functional interdependence within the
same protein (eg, gB) or possibly between separate proteins (eg,
gN and gO [25, 32, 44]). However, it is not known whether dif-
ferential recombination due to sequence relatedness is of general
biological signicance for the virus. Also, strains have circulated
that seem not to have recombined for long periods. Application
of an evolutionary rate estimated for herpesviruses (3.5 × 10−8
substitutions/nt/year) [48] implies that these periods may have
extended to many thousands of years. Moreover, as suggested
by the lack of diversity within genotypes in comparison with the
marked diversity among them, the distribution of substitutions
Table 3. Mutated Genes in Order of Decreasing Frequency
Gene Feature(s)
Strains Mutated, No.aStrains Mutated, %a
PassagedbClinicalcAlldPassagedbClinicalcAlld
UL9 RL11 family; type 1 membrane protein 50 31 81 32.89 34.07 33.33
RL5A RL11 family 31 27 58 20.39 29.67 23.87
UL1 RL11 family; type 1 membrane protein 20 18 38 13. 16 19.78 15.64
RL6 RL11 family 23 14 37 15.13 15.38 15.23
US9 US6 family; type 1 membrane protein 26 11 37 1 7. 1 1 12.09 15.23
UL111A Viral interleukin-10 16 7 23 10.53 7.69 9.47
UL150 Unknown 11 314 7.24 3.30 5.76
US7 US6 family; type 1 membrane protein 7 7 14 4.61 7.69 5.76
UL40 Type 1 membrane protein 8 2 10 5.26 2.20 4.12
UL30 UL30 family 2 3 5 1.32 3.30 2.06
UL142 MHC family; type 1 membrane protein 2 3 5 1.32 3.30 2.06
RL12 RL11 family; type 1 membrane protein 3 1 4 1.97 1. 1 0 1.65
RL1 RL1 family 1 2 3 0.66 2.20 1.23
UL136 Potential transmembrane domain 3 0 3 1.97 0.00 1.23
US13 US12 family; type 3 membrane protein 3 0 3 1.97 0.00 1.23
UL133 Potential transmembrane domain 2 0 2 1.32 0.00 0.82
US6 US6 family; type 1 membrane protein 1 1 2 0.66 1. 1 0 0.82
US8 US6 family; type 1 membrane protein 0 2 2 0.00 2.20 0.82
US27 GPCR family; type 3 membrane protein 2 0 2 1.32 0.00 0.82
UL11 RL11 family; type 1 membrane protein 1 0 1 0.66 0.00 0.41
UL13 Unknown 0 1 1 0.00 1. 1 0 0.41
UL14 UL14 family; type 1 membrane protein 0 1 1 0.00 1. 1 0 0.41
UL15A Potential transmembrane domain 0 1 1 0.00 1. 10 0.41
UL20 Type 1 membrane protein 1 0 1 0.66 0.00 0.41
UL43 US22 family 0 1 1 0.00 1. 10 0.41
UL99 Envelope-associated protein 1 0 1 0.66 0.00 0.41
UL148 Type 1 membrane protein 1 0 1 0.66 0.00 0.41
UL147 CXCL family 1 0 1 0.66 0.00 0.41
UL145 Unknown 0 1 1 0.00 1. 10 0.41
UL150A Unknown 1 0 1 0.66 0.00 0.41
IRS1 US22 family 1 0 1 0.66 0.00 0.41
US1 US1 family 1 0 1 0.66 0.00 0.41
US12 US12 family; type 3 membrane protein 1 0 1 0.66 0.00 0.41
US19 US12 family; type 3 membrane protein 0 1 1 0.00 1. 1 0 0.41
Abbreviations: CXCL, chemokine (CXC motif) ligand; GPCR, G protein–coupled receptor; MHC, major histocompatibility complex.
aOmitting mutations that occurred in RL13, UL128, UL130, and UL131A probably during passage, or that were engineered during bacterial artificial chromosome construction.
bStrains sequenced from strains passaged in cell culture, not taking into account the minority of mutations confirmed from the clinical samples (n=152, excludes CZ/3/2012, which is the
same strain as PRA8).
cStrains sequenced directly from clinical material (n=91).
dStrains sequenced directly from clinical material or passaged virus (n=243).
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
8 • JID 2019:XX (XX XXXX) • Suárez etal
in nonrecombinant strains ts with the view that intense diver-
sication of the hypervariable genes occurred early in human or
pre–human history [25, 31] and has long since ceased.
Assessing the extent to which recombinants arise and sur-
vive in individuals with multiple-strain infections is problem-
atic. Except where populations uctuate signicantly and are
sampled serially (eg, RTR1 in collection 2), it is dicult to ap-
proach this using short-read data, as they are based on PCR
methodologies prone to generating recombinational artefacts.
Long- or single-read sequencing technologies and demonstrably
reliable bioinformatic approaches are needed. Also, conclusions
drawn from transplant recipients, who are immunosuppressed
and in whom HCMV populations may be diversied by trans-
plantation from HCMV-positive donors or selected with an-
tiviral drugs, are unlikely to represent other situations, such
maternal transmission via breast milk [39].
Evidence for pseudogenes was largely derived previously from
strains isolated in cell culture, and it was unclear to what extent
0
1000
2000
3000
4000
5000
B
Number of variants
0
1000
2000
3000
4000
5000
A
Number of variants
PAV6
PAV21, CMV-38, RTR2
RTR6B
CMV-35
ERR1279054, CMV-37
Single strains Multiple strains
PAV6
RTR6B CMV-35
CMV-37, CMV-19
CMV-38, PRA6ACMV-31, SCTR12
ERR1279054, RTR2
Single strains Multiple strains
Figure 2. Box-and-whisker graphs created using ggplot2 (https://ggplot2.tidyverse.org) showing the total number of single-nucleotide polymorphisms (SNPs) detected at
a frequency of >2% in single-strain and multiple-strain infections using LoFreq (A) and V-Phaser (B). Single-strain (n=134 and 131, respectively) and multiple-strain datasets
(n=29 and 29, respectively) for which consensus genome sequences had been derived were identified by motif read-matching, and the total number of SNPs in each dataset
was enumerated (insertions, deletions, and length polymorphisms were not considered). LoFreq employed a minimal coverage depth of 10 reads (minimal SNP quality [phred]
64)and strand-bias significance with a false discovery rate correction of P<.001. V-Phaser employed phasing with a window size of 500 nucleotides and quality score (phred)
20 for calibrating the significance of strand-bias at P<.05. Each box (light gray for single strains and dark gray for multiple strains) encompasses the first to third quartiles
(Q1–Q3) and shows the median as a thick line. For each box, the horizontal line at the end of the upper dashed whisker marks the upper extreme (defined as the smaller of
Q3+1.5 [Q3–Q1] and the highest single value), and the horizontal line at the end of the lower dashed whisker marks indicates the lower extreme (the greater of Q1– 1.5
[Q3–Q1] and the lowest single value).
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 9
pseudogenes presented in natural populations. For example, in a
study reporting that 75% of strains carry pseudogenes [12], 157
mutations were identied in 101 strains, with all but one of these
strains having been passaged in cell culture, although 35 mutations
were conrmed by PCR of the clinical material. Nonetheless, we
found that the distribution of pseudogenes among the 91 strains
sequenced in the present study directly from clinical material is
similar to that among strains isolated in cell culture, thus gener-
ally validating the earlier suppositions. e likelihood that many
of these mutants are ancient is supported by the nding that all
were detected at levels very close to 100% in collection 1, and by
previous observations identifying the same mutation in dierent
strains [7, 12]. Moreover, 9 of the groups of nonrecombinant
strains contained pseudogenes, and some of the mutations
were common to group members and even to additional strains
among the 243, indicating that they have been transferred by re-
combination. e implication that some mutants have a selective
advantage in certain individuals may be extended to their pres-
ence in pathogenic congenital infections, probably in combina-
tion with host factors. e genes from which pseudogenes have
arisen are involved, or are suspected to be involved, in immune
modulation. ey include UL111A, which encodes viral inter-
leukin 10 [49]; UL40, which is involved in protecting infected
cells against natural killer cell lysis [50] via its cleaved signal pep-
tide, in which mutations occur; and UL9, which bears a potential
immunoglobulin-binding domain [2]. ese ndings also sug-
gest, but do not prove, that maternal HCMV genotyping might
be useful in developing strategies for preventing congenitalCMV.
Modern approaches oer a powerful means for analyzing
HCMV genomes directly from clinical material, with the im-
portant proviso that the data should be quality assessed and
interpreted in the context of the known evolutionary and bio-
logical characteristics of the virus. Extensive high-throughput
sequence data are likely to illuminate further the epidemi-
ology, pathogenesis, and evolution of HCMV in clinical and
natural settings, thus facilitating the identication of virulence
determinants and the development of new interventions.
SupplementaryData
Supplementary materials are available at e Journal of Infectious
Diseases online. Consisting of data provided by the authors to
benet the reader, the posted materials are not copyedited and are
the sole responsibility of the authors, so questions or comments
should be addressed to the corresponding author.
Notes
Acknowledgments. We are grateful to Florent Lasalle, Daniel
Depledge, and Judith Breuer (University College London) for
providing unpublished collection 3 datasets and for updating
the associated genome sequences in GenBank. We also thank
Jenny Witthuhn (Hannover Medical School) for excellent tech-
nical assistance.
Financial support. is work was supported by the Medical
Research Council (grant numbers MC_UU_12014/3 and MC_
UU_12014/12 to A.J. D.); the Wellcome Trust (grant numbers
204870/Z/16/Z to A.J. D.and WT090323MA to G.W. G.W.);
the Ministry of Health of the Czech Republic for conceptual
development of research organization (University Hospital,
Motol, Prague, Czech Republic, grant number 00064203 to
P. H.); the Fondazione Regionale per la Ricerca Biomedica,
Regione Lombardia (grant number FRRB 2015-043 to D. L.);
the Niedersächsische Ministerium für Wissenscha und Kultur
(grant COALITION–Communities Allied in Infection to T.G.);
the Deutsche Forschungsgemeinscha Collaborative Research
Centre 900 (core project Z1, grant number SFB-9001 to T. F.
S.); and the German Center of Infection Research ematic
Translational Unit “Infections of the Immunocompromised
Host” (grant to T.G.and T.F. S.). Two authors (E. H.and A.D.)
were supported by the Infection Biology graduate program of
Hannover Biomedical Research School.
Potential conicts of interest. G.S. W.reports that his part
in the present study was completed prior to his present em-
ployment. G.W. G.W.has received a grant from the Wellcome
Trust. D.L.has received a grant from the Fondazione Regionale
per la Ricerca Biomedica, Regione Lombardia. T.G. has re-
ceived grants from the German Federal Ministry of Education
and Research and from the Niedersächsische Ministerium
für Wissenscha und Kultur. P.H. has received a grant from
the Ministry of Health of the Czech Republic for the concep-
tual development of University Hospital, Motol, Prague, Czech
Republic; personal fees and nonnancial support from MSD
and from Chimerix; and personal fees from Dynex. T.F. S.has
received grants from the Deutsche Forschungsgemeinscha
Collaborative Research Centre 900 and from the German
Federal Ministry of Education and Research. A.J. D. has re-
ceived grants from the Medical Research Council and the
Wellcome Trust. All other authors report no potential conicts
of interest.
All authors have submitted the ICMJE Form for Disclosure
of Potential Conicts of Interest. Conicts that the editors
consider relevant to the content of the manuscript have been
disclosed.
References
1. Puchhammer-Stöckl E, Görzer I. Cytomegalovirus and
Epstein-Barr virus subtypes—the search for clinical signifi-
cance. J Clin Virol 2006; 36:239–48.
2. CheeMS, BankierAT, BeckS, etal. Analysis of the protein-
coding content of the sequence of human cytomegalo-
virus strain AD169. Curr Top Microbiol Immunol 1990;
154:125–69.
3. Dunn W, Chou C, Li H, et al. Functional profiling of a
human cytomegalovirus genome. Proc Natl Acad Sci U S A
2003; 100:14223–8.
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
10 • JID 2019:XX (XX XXXX) • Suárez etal
4. MurphyE, YuD, GrimwoodJ, etal. Coding potential of lab-
oratory and clinical strains of human cytomegalovirus. Proc
Natl Acad Sci U S A 2003; 100:14976–81.
5. SinzgerC, HahnG, DigelM, etal. Cloning and sequencing
of a highly productive, endotheliotropic virus strain derived
from human cytomegalovirus TB40/E. J Gen Virol 2008;
89:359–68.
6. DolanA, CunninghamC, HectorRD, etal. Genetic content
of wild-type human cytomegalovirus. J Gen Virol 2004;
85:1301–12.
7. CunninghamC, GathererD, HilfrichB, etal. Sequences of
complete human cytomegalovirus genomes from infected
cell cultures and clinical specimens. J Gen Virol 2010;
91:605–15.
8. Dargan DJ, Douglas E, Cunningham C, et al. Sequential
mutations associated with adaptation of human cyto-
megalovirus to growth in cell culture. J Gen Virol 2010;
91:1535–46.
9. Bradley AJ, Lurain NS, Ghazal P, et al. High-throughput
sequence analysis of variants of human cytomegalo-
virus strains Towne and AD169. J Gen Virol 2009;
90:2375–80.
10. Jung GS, Kim YY, KimJI, et al. Full genome sequencing
and analysis of human cytomegalovirus strain JHC isolated
from a Korean patient. Virus Res 2011; 156:113–20.
11. SijmonsS, ThysK, CorthoutM, etal. A method enabling
high-throughput sequencing of human cytomegalovirus
complete genomes from clinical isolates. PLoS One 2014;
9:e95501.
12. Sijmons S, Thys K, Mbong Ngwese M, et al. High-
throughput analysis of human cytomegalovirus genome
diversity highlights the widespread occurrence of gene-
disrupting mutations and pervasive recombination. J Virol
2015; 89:7673–95.
13. ZhaoF, ShenZZ, LiuZY, etal. Identification and BAC con-
struction of Han, the first characterized HCMV clinical
strain in China. J Med Virol 2016; 88:859–70.
14. Cha TA, Tom E, Kemble GW, Duke GM, Mocarski ES,
SpaeteRR. Human cytomegalovirus clinical isolates carry
at least 19 genes not found in laboratory strains. J Virol
1996; 70:78–83.
15. StantonRJ, BaluchovaK, DarganDJ, etal. Reconstruction
of the complete human cytomegalovirus genome in a BAC
reveals RL13 to be a potent inhibitor of replication. J Clin
Invest 2010; 120:3191–208.
16. Renzette N, Bhattacharjee B, Jensen JD, Gibson L,
KowalikTF. Extensive genome-wide variability of human
cytomegalovirus in congenitally infected infants. PLoS
Pathog 2011; 7:e1001344.
17. MelnikovA, Galinsky K, Rogov P, et al. Hybrid selection
for sequencing pathogen genomes from clinical samples.
Genome Biol 2011; 12:R73.
18. DepledgeDP, Palser AL, WatsonSJ, etal. Specific capture
and whole-genome sequencing of viruses from clinical
samples. PLoS One 2011; 6:e27805.
19. LassalleF, DepledgeDP, ReevesMB, etal. Islands of linkage
in an ocean of pervasive recombination reveals two-speed
evolution of human cytomegalovirus genomes. Virus Evol
2016; 2:vew017.
20. HouldcroftCJ, Bryant JM, Depledge DP, etal. Detection
of low frequency multi-drug resistance and novel puta-
tive maribavir resistance in immunocompromised pedi-
atric patients with cytomegalovirus. Front Microbiol 2016;
7:1317.
21. Hage E, Wilkie GS, Linnenweber-Held S, et al.
Characterization of human cytomegalovirus genome di-
versity in immunocompromised hosts by whole-genome
sequencing directly from clinical specimens. J Infect Dis
2017; 215:1673–83.
22. Chou SW, Dennison KM. Analysis of interstrain var-
iation in cytomegalovirus glycoprotein B sequences
encoding neutralization-related epitopes. J Infect Dis 1991;
163:1229–34.
23. Meyer-KönigU, Ebert K, SchrageB, Pollak S, Hufert FT.
Simultaneous infection of healthy people with multiple
human cytomegalovirus strains. Lancet 1998; 352:1280–1.
24. RasmussenL, GeisslerA, WintersM. Inter- and intragenic
variations complicate the molecular epidemiology of
human cytomegalovirus. J Infect Dis 2003; 187:809–19.
25. MattickC, DewinD, PolleyS, etal. Linkage of human cyto-
megalovirus glycoprotein gO variant groups identified from
worldwide clinical isolates with gN genotypes, implications
for disease associations and evidence for N-terminal sites of
positive selection. Virology 2004; 318:582–97.
26. Sekulin K, Görzer I, Heiss-Czedik D, Puchhammer-
StöcklE. Analysis of the variability of CMV strains in the
RL11D domain of the RL11 multigene family. Virus Genes
2007; 35:577–83.
27. BankevichA, NurkS, AntipovD, etal. SPAdes: a new ge-
nome assembly algorithm and its applications to single-cell
sequencing. J Comput Biol 2012; 19:455–77.
28. Silva GG, Dutilh BE, Matthews TD, etal. Combining de
novo and reference-guided assembly with scaffold_builder.
Source Code Biol Med 2013; 8:23.
29. LangmeadB, SalzbergSL. Fast gapped-read alignment with
Bowtie 2. Nat Methods 2012; 9:357–9.
30. MilneI, StephenG, BayerM, et al. Using Tablet for visual
exploration of second-generation sequencing data. Brief
Bioinform 2013; 14:193–202.
31. BradleyAJ, KovácsIJ, GathererD, etal. Genotypic analysis
of two hypervariable human cytomegalovirus genes. J Med
Virol 2008; 80:1615–23.
32. Bates M, Monze M, Bima H, Kapambwe M, Kasolo FC,
Gompels UA; CIGNIS Study Group. High human
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019
Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 11
cytomegalovirus loads and diverse linked variable
genotypes in both HIV-1 infected and exposed, but unin-
fected, children in Africa. Virology 2008; 382:28–36.
33. SieversF, WilmA, DineenD, etal. Fast, scalable generation
of high-quality protein multiple sequence alignments using
Clustal Omega. Mol Syst Biol 2011; 7:539.
34. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S.
MEGA6: molecular evolutionary genetics analysis version
6.0. Mol Biol Evol 2013; 30:2725–9.
35. Davison AJ, Holton M, Dolan A, Dargan DJ,
Gatherer D, Hayward GS. Comparative genomics of
primate cytomegaloviruses. In: Reddehase MJ, ed.
Cytomegaloviruses: from molecular pathogenesis to inter-
vention. Vol 1. Norwich, UK: Caister Academic Press, 2013.
36. LiH, HandsakerB, WysokerA, etal. The sequence align-
ment/map format and SAMtools. Bioinformatics 2009;
25:2078–9.
37. Wilm A, Aw PP, Bertrand D, et al. LoFreq: a sequence-
quality aware, ultra-sensitive variant caller for uncovering
cell-population heterogeneity from high-throughput
sequencing datasets. Nucleic Acids Res 2012; 40:11189–201.
38. YangX, CharleboisP, MacalaladA, HennMR, ZodyMC.
V-Phaser 2: variant inference for viral populations. BMC
Genomics 2013; 14:674.
39. Suárez NM, MusondaKG, Escriva E, etal. Multiple-strain
infections of human cytomegalovirus with high genomic di-
versity are common in breast milk from HIV-positive women
in Zambia. J Infect Dis 2019; XX:XXX–XXX. doi:10.1093/
infdis/jiz209.
40. Meyer-König U, Haberland M, von Laer D, Haller O,
Hufert FT. Intragenic variability of human cytomegalo-
virus glycoprotein B in clinical strains. J Infect Dis 1998;
177:1162–9.
41. Shepp DH, MatchME, LipsonSM, PergolizziRG. A fifth
human cytomegalovirus glycoprotein B genotype. Res Virol
1998; 149:109–14.
42. DeckersM, HofmannJ, KreuzerKA, etal. High genotypic
diversity and a novel variant of human cytomegalovirus re-
vealed by combined UL33/UL55 genotyping with broad-
range PCR. Virol J 2009; 6:210.
43. HaberlandM, Meyer-KönigU, HufertFT. Variation within
the glycoprotein B gene of human cytomegalovirus is
due to homologous recombination. J Gen Virol 1999; 80:
1495–500.
44. Paterson DA, Dyer AP, Milne RS, Sevilla-Reyes E,
GompelsUA. A role for human cytomegalovirus glycopro-
tein O (gO) in cell fusion and a new hypervariable locus.
Virology 2002; 293:281–94.
45. Yan H, Koyano S, Inami Y, etal. Genetic linkage among
human cytomegalovirus glycoprotein N (gN) and gO genes,
with evidence for recombination from congenitally and
post-natally infected Japanese infants. J Gen Virol 2008;
89:2275–9.
46. Xu C, Nezami Ranjbar MR, Wu Z, DiCarlo J, Wang Y.
Detecting very low allele fraction variants using targeted
DNA sequencing and a novel molecular barcode-aware var-
iant caller. BMC Genomics 2017; 18:5.
47. Illingworth CJR, Roy S, Beale MA, TutillH, Williams R,
Breuer J. On the effective depth of viral sequence data.
Virus Evol 2017; 3:vex030.
48. McGeochDJ, CookS, DolanA, JamiesonFE, TelfordEA.
Molecular phylogeny and evolutionary timescale for the
family of mammalian herpesviruses. J Mol Biol 1995;
247:443–58.
49. McSharry BP, Avdic S, Slobedman B. Human cytomega-
lovirus encoded homologs of cytokines, chemokines and
their receptors: roles in immunomodulation. Viruses 2012;
4:2448–70.
50. Prod’hommeV, TomasecP, CunninghamC, et al. Human
cytomegalovirus UL40 signal peptide regulates cell sur-
face expression of the natural killer cell ligands HLA-E and
gpUL18. J Immunol 2012; 188:2794–804.
Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019