ArticlePDF Available

Human Cytomegalovirus Genomes Sequenced Directly From Clinical Material: Variation, Multiple-Strain Infection, Recombination, and Gene Loss

December 2018
The Journal of Infectious Diseases

December 2018

Authors:

Nicolás M Suárez

Universidad de Las Palmas de Gran Canaria

Gavin Scott Wilkie

Illumina

Elias Hage

Agilent Technologies Belgium

Salvatore Camiolo

University of Glasgow

Show all 22 authorsHide

Locations in the human cytomegalovirus strain Merlin genome of genes used for genotyping. The genome consists of 2 unique regions, U L (1325-194 343 bp) and U S (197 627-233 108 bp), the former flanked by inverted repeats TR L (1-1324 bp) and IR L (194 344-195 667 bp), and the latter flanked by inverted repeats IR S (195 090-197 626 bp) and TR S (233 109-235 646 bp). Protein-coding regions are indicated by shaded arrows, and noncoding RNAs as narrower, white arrows, with gene nomenclature below. Introns are shown as narrow white bars. The 12 genes (RL5A, RL6, RL12, RL13, UL1, UL9, UL11, UL73, UL74, UL120, UL146, and UL139) used for motif read-matching are in dark gray (red in online version). Two of these genes (RL13 and UL146) were also used for genotype read-matching. The additional 5 genes (UL20, UL33, UL37, UL55, and US9) used to genotype sequences by alignment are medium gray (orange in online version). All other genes are shown in white (pink in online version).

…

Selected Characteristics on Sample Collections 1-3

…

Figures - uploaded by Akshay Dhingra

Content may be subject to copyright.

Content uploaded by Akshay Dhingra

Content may be subject to copyright.

MAJOR ARTICLE

Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 1

The Journal of Infectious Diseases

Received 21 February 2019; editorial decision 17 April 2019; accepted 24 April 2019; published

online May 2, 2019.

Presented in part: Seventh International Congenital Cytomegalovirus (CMV) Conference and

17th International CMV Workshop, Birmingham, Alabama, April 2019.

Published as a bioRxiv preprint on 23 December 2018 and revised on 18 February 2019

(https://doi.org/10.1101/505735).

aN. M.S.and G.S. W.contributed equally to this work.

Present afﬁliations: bIllumina, Scoreseby, Victoria, Australia; cSGS Vitrology Ltd, Glasgow,

United Kingdom; dIT Services–Business Systems Team, University of Glasgow, United Kingdom.

Correspondence: Andrew J.Davison, MRC–University of Glasgow Centre for Virus Research,

Sir Michael Stoker Bldg, 464 Bearsden Road, Glasgow G61 1QH, UK (andrew.davison@

glasgow.ac.uk)

The Journal of Infectious Diseases® 2019;XX(XX):1–11

of America. This is an Open Access article distributed under the terms of the Creative Commons

Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted

reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.1093/infdis/jiz208

Human Cytomegalovirus Genomes Sequenced Directly

From Clinical Material: Variation, Multiple-Strain

Infection, Recombination, and GeneLoss

NicolásM. Suárez,1,a GavinS. Wilkie,1,a,b Elias Hage,2,3 Salvatore Camiolo,1 Marylouisa Holton,1,c Joseph Hughes,1, Maha Maabar,1,d SreenuB. Vattipally,1

Akshay Dhingra,2 UrsulaA. Gompels,4 GavinW.G. Wilkinson,5 Fausto Baldanti,6,7 Milena Furione,6 Daniele Lilleri,8 Alessia Arossa,9

Tina Ganzenmueller,2,3,10 Giuseppe Gerna,8 Petr Hubáček,11 ThomasF. Schulz,2,3 Dana Wolf,12 Maurizio Zavattoni,6 and AndrewJ. Davison1,

1Medical Research Council–University of Glasgow Centre for Virus Research, United Kingdom; 2Institute of Virology, Hannover Medical School, and 3German Center for Infection Research,

Hannover-Braunschweig site; 4Pathogen Molecular Biology Department, London School of Hygiene and Tropical Medicine, and 5Division of Infection and Immunity, School of Medicine, Cardiff

University, United Kingdom; 6Molecular Virology Unit, Microbiology and Virology Department, Fondazione Istituto di Ricovero e Cura a Carattere Scientiﬁco (IRCCS) Policlinico San Matteo,

7Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, and 8Laboratory of Genetics-Transplantology and Cardiovascular Diseases, and 9Departments of Obstetrics

and Gynecology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; 10Institute for Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany; 11Department

of Medical Microbiology, Motol University Hospital, Prague, Czech Republic; and 12Clinical Virology Unit, Department of Clinical Microbiology and Infectious Diseases, Hadassah University

Hospital, Jerusalem, Israel

e genomic characteristics of human cytomegalovirus (HCMV) strains sequenced directly from clinical pathology samples were

investigated, focusing on variation, multiple-strain infection, recombination, and gene loss. Atotal of 207 datasets generated in this

and previous studies using target enrichment and high-throughput sequencing were analyzed, in the process enabling the determi-

nation of genome sequences for 91 strains. Key ndings were that (i) it is important to monitor the quality of sequencing libraries

in investigating variation; (ii) many recombinant strains have been transmitted during HCMV evolution, and some have apparently

survived for thousands of years without further recombination; (iii) mutants with nonfunctional genes (pseudogenes) have been

circulating and recombining for long periods and can cause congenital infection and resulting clinical sequelae; and (iv) intrahost

variation in single-strain infections is much less than that in multiple-strain infections. Future population-based studies are likely to

continue illuminating the evolution, epidemiology, and pathogenesis of HCMV.

Keywords. human cytomegalovirus; genome sequence; target enrichment, genotype; variation; multiple-strain infection;

recombination; gene loss; mutation.

Human cytomegalovirus (HCMV) poses a risk, particularly to

people with immature or compromised immune systems, and

can have serious outcomes in congenitally infected children,

transplant recipients, and people with human immunodefi-

ciency virus/AIDS. Prior to the advent of high-throughput

technologies, studies of HCMV genomes in natural infections

were limited to Sanger sequencing of polymerase chain reac-

tion (PCR) amplicons, often focusing on a small number of

polymorphic (hypervariable) genes [1]. This left out most

of the genome and also restricted the characterization of

multiple-strain infections, which may have more serious

outcomes.

e rst complete HCMV genome sequence to be determined

was that of the high-passage strain AD169 [2], from a plasmid

library. Over a decade later, additional genomes were sequenced

from bacterial articial chromosomes [3–5], virion DNA [6] and

overlapping PCR amplicons [7, 8]. ese sequences were also de-

termined using Sanger technology, and were complemented sub-

sequently by many others, increasingly using high-throughput

methods [7, 9–13]. With only 3 exceptions [7, 11], all were de-

rived from laboratory strains isolated in cell culture. Mounting

evidence of the existence of multiple-strain infections and the

propensity of HCMV to mutate during cell culture [6–8, 14, 15]

added impetus to sequencing genomes directly from clinical ma-

terial to dene natural populations. One strategy for this involves

sequencing overlapping PCR amplicons [7, 16]. Another utilizes

an oligonucleotide bait library representing known HCMV di-

versity to select target sequences from random DNA fragments.

is target enrichment technology originated in commercial kits

for cellular exome sequencing, and was subsequently applied to

applyparastyle “g//caption/p[1]” parastyle “FigCapt”

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

2 • JID 2019:XX (XX XXXX) • Suárez etal

various pathogens [17, 18], including HCMV [19–21]. We have

applied it to HCMV since 2012 and have systematically released

via GenBank many genome sequences that have proved pivotal

in other studies [11, 12, 19–21].

e HCMV genome exhibits several evolutionary phe-

nomena, including variation, multiple-strain infection, recom-

bination, and gene loss, all of which were discovered prior to

high-throughput sequencing and have since been illuminated

by this technology (early references are [22–26]). We explore

these and other key genomic features of HCMV, with an em-

phasis on the strains present in clinical material.

METHODS

Samples

For convenience, samples were analyzed as collections

1–3, which are summarized in Table 1 and described in

Supplementary Tables 1–3, respectively. Collection 3 represents

samples sequenced by others in previous studies using target

enrichment with a different oligonucleotide bait library. The

features of the samples are shown in Supplementary Tables 1–3

(rows 3–6), and the clinical outcomes of congenital infection

are in Supplementary Table 1 (row205).

DNA Sequencing

Target enrichment and sequencing library preparation were

performed using the SureSelect XT version 1.7 system for

Illumina paired-end libraries with biotinylated RNA bait

libraries (Agilent) [21]. Bait libraries representing known

HCMV diversity were designed in February 2012 and April

2014 from 31 and 64 complete genome sequences, respec-

tively. Information on and access to the latter library (55210

baits of 120 nucleotides [nt] with overrepresentation of G

+ C–rich regions) are available from the corresponding au-

thor. Data on viral loads and library construction are shown

in Supplementary Tables 1–3 (rows 9–12). Datasets of 300

or 150 nt paired-end reads were generated using a MiSeq

(Illumina). Their names are shown in Supplementary Tables

1–3 (row 7). They were prepared for analysis using Trim

Galore version 0.4.0 (program available at http://www.bioin-

formatics.babraham.ac.uk/projects/trim_galore/; length = 21,

quality= 10, and stringency= 3). The numbers of trimmed

reads are in Supplementary Tables 1–3 (row 15).

Library Diversity

Estimating the number of reads in a dataset derived from

unique HCMV fragments initially involved using Bowtie2 ver-

sion 2.2.6 [29] to align the reads against the strain Merlin se-

quence (GenBank accession number AY446894.2), and, where

it could be determined, the consensus genome sequence derived

from the dataset. The relevant data are in Supplementary Tables

1–3 (rows 17–19 and 23–26). Reads containing insertions or

deletions were removed to preserve coordinate numbering, as

Table 1. Selected Characteristics on Sample Collections 1–3

Characteristic Collection 1 Collection 2 Collection 3

Patients, No.a48 29 25

Patient condition Congenital infection Mostly transplant recipients Various

Samples, No. 53 89 57

Sample source, city (preﬁx) Pavia (PAV), Jerusalem

(JER), Prague (PRA)

Hannover (Child, RTR, SCTR),

Pavia (PAV)

Rotterdam (Rot),

London (Lon, Pat_)

Datasets, No. 53 97b57c

Duplicated libraries, No. 0 7 0

HCMV load, IU/µLd26–559968 5–194840 104–18377

Genome copies for library, No.e225–8399520 280–3896800 Unknown

Reads in Merlin alignment, % 2–91 0–85 0–90

Coverage ratio in Merlin alignment, % unique/total reads 0.40–83.12 0.00–76.09 0.00–90.21

Genome sequences determined, No.f42 25 24

Details are provided in Supplementary Tables 1–3.

Abbreviation: HCMV, human cytomegalovirus.

aArchived diagnostic samples were used, and clinical data were retrieved, with the approval of the institutional review boards of Policlinico San Matteo, Pavia (reference numbers 35853/2010

and 35854/2010), Hadassah University Hospital, Jerusalem (reference number HMO-063911), Motol University Hospital, Prague (reference number EK-701a/16) and Hannover Medical

School, Hannover (reference number 2527-2014).

bWe reported 68 of the Hannover datasets previously [21].

cThese datasets were reported previously by others, and were either provided by the authors [19] or downloaded from the European Nucleotide Archive (study PRJEB12814) [20].

dViral load in most extracted samples was quantiﬁed in the laboratory of origin or the sequencing laboratory. In some instances, the entire sample was used blind to generate a sequencing

library.

eAssumes that 1 IU is equivalent to 1 genome copy.

fThe trimmed paired-read data were aligned to the UCSC hg19 human reference genome (http://genome.ucsc.edu/) using Bowtie2. Nonmatching reads were assembled de novo into contigs

using SPAdes version 3.5.0 [27]. The contigs were ordered using Scaffold_builder version 2.2 [28] by reference to a version of the strain Merlin sequence lacking all but 100 nt of the terminal

repeat regions (TRL at the left end and TRS at the right end; Figure 1), and merged into a draft genome sequence. Residual gaps were ﬁlled by identifying relevant reads anchored in ﬂanking

regions and assembling them manually in a reiterative fashion. TRL and TRS were reinstated, and the complete genome sequence was veriﬁed by aligning it against the read data using

Bowtie2 and inspecting the alignment in Tablet. An annotated genome sequence was produced using Sequin (https://www.ncbi.nlm.nih.gov/Sequin/).

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 3

were duplicate read pairs sharing both end coordinates and

duplicate unpaired reads sharing one end coordinate, thereby

producing an alignment file for unique reads derived from

unique HCMV fragments (program available at https://centre-

for-virus-research.github.io/VATK/AssemblyPostProcessing).

This file was viewed using Tablet version 1.14.11.7 [30]. The

coverage depth values for total and unique fragment reads are

in Supplementary Tables 1–3 (rows 20–21 and 27–28).

Strain Enumeration

The number of strains represented in a dataset was estimated by

2 strategies: genotype read-matching and motif read-matching

(program available at https://centre-for-virus-research.github.

io/VATK/HCMV_pipeline). Both strategies utilized datasets

concatenated from the paired-end datasets. The genotype

designations used were either based on reported phylogenies

[6, 12, 25, 31, 32], amended or extended as appropriate, or

constructed afresh using Clustal Omega version 1.2.4 [33] and

MEGA version 6.0.6 [34] with data for the genomes listed in

Supplementary Table 4 and individual genes for which addi-

tional sequences were available in GenBank. Alignments and

phylogenetic reconstructions are in Supplementary Figures 1

and 2, respectively.

For genotype read-matching, Bowtie2 was used to align the

reads to sequences representing the genotypes of 2 hypervariable

genes, UL146 and RL13 [6, 12, 35]. e sequences from the en-

tire coding region of UL146 and the central coding region of

RL13 are in Supplementary Tables 1–3 (rows 34–58). In contrast

to the UL146 genotypes, the RL13 genotypes cross-matched

within 4 groups (G1, G2, G3; G4A, G4B; G6, G10; and G7, G8).

In these instances, the genotype within the group with most

matching reads was scored. e number of reads aligned to each

genotype is in Supplementary Tables 1–3 (rows 34–58). Ageno-

type was scored if the number of reads was >10 and represented

>2% of the total number detected for all genotypes of that gene.

For 14 samples in collection 1 that had been sequenced prior to

the availability of ultrapure (TruGrade) oligonucleotides, these

values were >25 and >5%, respectively. e number of strains in

a sample was scored as the greater of the numbers of genotypes

detected for the 2 target genes, and is in Supplementary Tables

1–3 (row13).

For motif read-matching, conserved genotype-specic motifs

(20–31 nt) were identied by visual inspection of alignments

(Supplementary Figure 1) for 12 hypervariable genes [6, 12, 19,

35]. Additional motifs for identifying common intergenotypic

recombinants were included. e motif sequences and number

of reads containing perfect matches to a sequence or its reverse

complement are in Supplementary Tables 1–3 (rows 60–170).

Genotypes were scored as described above. e number of

strains in a sample was estimated as the maximum number of

genotypes detected for at least 2 genes, and is in Supplementary

Tables 1–3 (row 14).

Pseudogene Analysis

The genomes of some HCMV strains exhibit gene loss apparent

as pseudogenes resulting from mutations causing premature

translational termination [7, 11, 12, 26]. These mutations are

substitutions that introduce in-frame stop codons or ablate

splice sites, or insertions or deletions that cause frameshifting

or loss of protein-coding regions. Motif read-matching was

used to assess the presence of common mutations and also to

determine the prevalence of mutations identified in collection

1.These data are in Supplementary Tables 1–3 (rows 171–178)

and Supplementary Table 1 (rows 180–203), respectively.

Intrahost Variation

Minor genome populations were analyzed by enumerating

single-nucleotide polymorphisms (SNPs) in datasets for which

consensus genome sequences had been determined. Thus, the

term mutant applies hereafter to a strain that has a mutation in

the consensus sequence resulting in a pseudogene, and the term

SNP applies to a minor variation from the consensus within

a population. To enumerate SNPs, original datasets were pre-

pared for analysis using Trim Galore (length=100, quality=30,

and stringency = 1), and trimmed reads were mapped using

Bowtie2. Alignment files in SAM format were converted into

BAM format, sorted using SAMtools version 1.3 [36], and

analyzed using LoFreq version 2.1.2 [37] and V-Phaser 2 [38].

Data Deposition

Original datasets were purged of human reads and deposited

in the European Nucleotide Archive (ENA; project number

PRJEB29585), and consensus genome sequences were deposited

in GenBank. The accession numbers are in Supplementary

Tables 1–3 (rows 8 and 29, respectively). Updated genome se-

quence determinations in collection 3 were deposited by the

original submitters in GenBank [19] or by us as third-party

annotations in ENA (project number PRJEB29374) [20].

Sequence features are in Supplementary Tables 1–3 (rows

30–32).

RESULTS

Operational Limitations

A total of 207 datasets from 199 samples and 102 individuals

were analyzed (Table 1 and Supplementary Tables 1–3). Library

quality was represented in the percentage of HCMV reads and

the coverage depth by unique fragment reads. These values were

related to sample type, being higher for urine than blood pre-

sumably because of a higher proportion of viral to host DNA.

They also depended on the number of viral genome copies used

to make the library, with >1000 copies generally being needed

to determine a complete genome sequence. However, despite

high library diversity, it was not possible to assemble complete

genome sequences from most datasets in collection 3 because of

gaps in RL12 and some G + C–rich regions, perhaps as a result

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

4 • JID 2019:XX (XX XXXX) • Suárez etal

of limitations in the bait library. The use of excessive PCR cycles

with some samples in collections 1 and 2 led to high coverage

depth by total fragment reads but low coverage depth by unique

fragment reads, and thus to highly clonal libraries (eg, PAV2 in

collection 1). Genotypes present at subthreshold levels may rep-

resent multiple-strain infections or cross-contamination during

the complex sample processing pathway (eg, PRA4 reads in

PRA6A in collection 1).

Genome Sequences

A total of 91 complete or almost complete HCMV genome

sequences were determined (Table 1). We reported 5 previ-

ously [21], and 16 are improvements on published sequences

[19]. Most originated from single-strain infections or multiple-

strain infections in which one strain was predominant, and

some originated from different strains that predominated in a

patient at different times. Defining a strain as a viral genome

present in an individual, these 91 sequences, plus an addi-

tional 49 deposited by our group and 104 by others, brought

the number of strains sequenced to 244 (Supplementary Table

4). Of these, 91 were sequenced directly from clinical material,

and all but one were determined in this and our previous study

[21]. The average size of the HCMV genome, based on the 78

complete sequences in this set, is 235465bp (range234316–

237120 bp).

Multiple-Strain Infections

Genotypic differences in hypervariable genes (Figure 1 and

Supplementary Figures 1 and 2) were exploited to distinguish

single-strain from multiple-strain infections by genotype read-

matching and motif read-matching with threshold values. To

our knowledge, these methods, employed in the present work

and the companion study [39], have not been used previously

for categorizing HCMV infections. Single strains were common

in congenitally infected patients (n=43/50 in collections 1 and

2), but significantly less so in transplant recipients (n=11/25 in

collections 2 and 3; χ2=14.583, P<.05). Intrahost variation is

discussedbelow.

Recombination

The 244 genome sequences were genotyped in the 12

hypervariable genes used for motif read-matching and then in 5

additional genes (Figure 1 and Supplementary Table 4).

Hypervariation in UL55, which encodes glycoprotein B

(gB), is located in 2 regions (UL55N near the N terminus, and

UL55X encompassing the proteolytic cleavage site) [23, 40].

Five genotypes (G1–G5) have been assigned to each region [23,

40–42], which are separated by 927bp that are 80% identical

in all strains. All genomes had a recognized UL55X genotype

(Supplementary Table 5). As reported previously [40], UL55N

G2 and G3 could not be distinguished reliably from each other,

and 2 additional genotypes (G6–G7) were detected that may

have arisen from ancient recombination events within UL55N

(Supplementary Tables 4 and 5 and Supplementary Figure 1).

ere was evidence for recombination in the region between

UL55N and UL55X in only 8 genomes. is low proportion of

recombination (3.3%) contrasts with the higher levels proposed

6050403020100

RNA2.7 RNA1.2

RL8A

RL9A UL2UL21A

UL23

UL24UL22AUL19

UL18

UL17

UL16

UL15A

UL14

UL13UL10UL7

UL8

UL6

UL5

UL4

RL11

RL10RL1

UL26

UL27 UL29 UL30

UL30A

UL32 UL36

UL38

UL40

UL41A

UL42

UL43

UL44

UL45UL35UL34UL31UL25RL5A

RL6 UL20UL1

RL13

RL12

UL11

UL9

UL37UL33

12011010090807060

UL46

UL48A

UL49 UL50

UL51

UL54 UL57UL53UL52UL48UL47 UL69 UL70 UL71

UL72

UL75

UL79

UL82UL80

UL80.5

UL78

UL77

UL76

UL74A

RNA4.9UL55 UL74

UL73

UL56

180170160150140130120

UL83 UL84 UL85 UL86 UL89 UL102UL100

UL99

UL98UL97

UL96

UL95

UL94

UL93

UL92

UL91UL88

UL87 UL105

UL103

UL104

UL114

UL115

UL116

UL117

UL119

UL121

UL123

UL122 UL128

UL130

UL131A

UL132

UL148

UL124UL112

UL111A

RNA5.0

UL120

230 kbp220210200190180

UL147A

UL147

UL145

UL144

UL142

UL141

UL140

UL138

UL136

UL135

UL133

UL148A

UL148B

UL150

US1

US2

US3

US6

US7

US8US10

US11

US12

US13

US14

US15

IRS1

UL150A

UL148D

UL148C

US16 US18

US17 US19

US20

US21

US22 US23 US24 US26 TRS1US34A

US34

US33A

US32

US31

US30

US29

US28

US27

UL146 UL139

US9

Figure 1. Locations in the human cytomegalovirus strain Merlin genome of genes used for genotyping. The genome consists of 2 unique regions, UL (1325–194343bp)

and US (197627–233108bp), the former ﬂanked by inverted repeats TRL (1–1324bp) and IRL (194344–195667bp), and the latter ﬂanked by inverted repeats IRS (195090–

197626bp) and TRS (233109–235646bp). Protein-coding regions are indicated by shaded arrows, and noncoding RNAs as narrower, white arrows, with gene nomenclature

below. Introns are shown as narrow white bars. The 12 genes (RL5A, RL6, RL12, RL13, UL1, UL9, UL11, UL73, UL74, UL120, UL146, and UL139) used for motif read-matching

are in dark gray (red in online version). Two of these genes (RL13 and UL146) were also used for genotype read-matching. The additional 5 genes (UL20, UL33, UL37, UL55,

and US9) used to genotype sequences by alignment are medium gray (orange in online version). All other genes are shown in white (pink in online version).

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 5

in UL55 from PCR-based studies [40, 43], which may have been

aected by artefactual recombination.

UL73 and UL74, which encode glycoproteins N and O (gN

and gO), respectively, are adjacent hypervariable genes that

exist as 8 genotypes each [25, 32, 44]. ere was evidence for

recombination between them in only 7 genomes (2.9%), in ac-

cordance with the low levels (2.2%) detected previously in PCR-

based studies [25, 32, 45]. In the region containing adjacent

hypervariable genes RL12, RL13, and UL1, recombinants were

also rare (1.2%) within RL12 and absent from RL13 and UL1.

In contrast, hypervariable genes UL146 and UL139, which en-

code a CXC chemokine and a membrane glycoprotein, respec-

tively, are separated by a well-conserved region of over 5 kbp.

e number (66) of the 126 possible genotype combinations

represented in the 244 genomes is too large to allow any under-

lying genotypic linkage to be discerned, consistent with previous

conclusions from PCR-based studies [31]. No recombinants

were noted withinUL146.

In principle, strains in multiple-strain infections have the

opportunity to recombine. In our previous analysis of RTR1 in

collection 2, we noted that one strain (RTR1A) predominated

at earlier times and another (RTR1B) at later times [21]. From

the low frequency of SNPs across a large part of the genome,

we concluded that the second strain had arisen either by re-

combination involving the rst strain or by reinfection with, or

reactivation of, a second strain fortuitously similar to the rst.

In the present study, recombination was strongly supported by

a comparison of the 2 genome sequences, which showed that

approximately two-thirds of the genome is almost identical

(diering by 3 substitutions in noncoding regions), whereas the

remaining third is highly dissimilar.

To investigate whether strains have been transmitted

without recombination occurring, identical genotypic

constellations were identied among the 244 genomes (Table

2). is revealed the existence of 12 haplotype groups within

which multiple strains lack signs of having recombined since

diverging from their last common ancestor; these are hence-

forth termed nonrecombinant strains. As an incidental out-

come, the 2 strains in group 1 (PRA8 and CZ/3/2012), which

were characterized in dierent studies, were conrmed as

having originated from the same patient, reducing the set of

sequenced strains to 243. e results from the other 11 groups

suggest that nonrecombinant strains have been circulating,

some for periods sucient to allow the accumulation of >100

substitutions. Among the highly divergent groups, group 9 (3

strains) exhibited 135 dierences, with the 50 that would af-

fect protein coding distributed among 38 genes, and group 10

(2 strains) exhibited 138 dierences, with the 38 that would

aect protein coding distributed among 27 genes. No obvious

bias was observed toward greater diversity in any particular

gene or group of genes, including those in the hypervariable

ca tegor y.

Pseudogenes

Among the strains sequenced from clinical material, 77%

are mutated in at least one gene (compared with 79% among

all sequenced strains), and one is mutated in as many as 6

genes (Pat_D in collection 3) (Supplementary Table 4). The

most frequently mutated genes are UL9, RL5A, UL1 and RL6

(members of the RL11 family), US7 and US9 (members of the

US6 gene family), and UL111A (encoding viral interleukin

10)(Table3). In addition, there was evidence from the PAV6

datasets (collection 1)for maternal transmission of a US7 mu-

tant (Supplementary Table 1), and from PCR data (not shown)

for maternal transmission of a UL111A mutant to PAV16 (col-

lection 1). Focusing on the most common mutations, strains

in which UL9, RL5A, UL1, US9, US7, and UL111A were af-

fected (singly or in combination) were, like strains that were not

mutated in any gene, transmitted in congenital infections and,

in some cases, linked to defects in neurological development

(Supplementary Table 1).

Intrahost Diversity

LoFreq and V-Phaser analyses showed that single-strain

infections contained markedly fewer SNPs (median values of 60

and 140, respectively) than multiple-strain infections (median

values of 2444 and 2955, respectively; Figure 2). The differences

between the values for single- and multiple-strain infections

were significant (Kruskal–Wallis rank-sum test; LoFreq:

χ2=67.918, P<2.2× 10-16; V-Phaser: χ2= 63.536, P= 1.6 ×

10-15).

DISCUSSION

Advances in high-throughput sequencing technology have

made it possible to generate a wealth of viral genome informa-

tion directly from clinical material. However, operational

limitations should be registered. These include sample charac-

teristics (source, viral content and presence of multiple strains),

confounding factors (technical limitations, logistical errors and

cross-contamination), design of the bait library (ability to en-

rich all strains and acquire data across the genome), and quality

and extent of the sequencing data (library diversity and coverage

depth). Since perceived levels of intrahost variation are partic-

ularly sensitive to these factors, we proceeded cautiously with

this aspect. However, as indicated in our previous study [21],

it is clear that the number of SNPs in single-strain infections

was markedly less than that in multiple-strain infections. It was

also far less than that reported by others in samples from con-

genital infections [16]. The factors listed above may have been

responsible for the outliers observed in single-strain infections;

for example, the PAV6 (collection 1) library was made using

non-TruGrade oligonucleotides, RTR6B (collection 2) had a

low coverage depth and also came from a patient from whom

other samples contained multiple strains, and CMV-35 (collec-

tion 3) may have contained subthreshold levels of additional

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

6 • JID 2019:XX (XX XXXX) • Suárez etal

Table 2. Groups of Nonrecombinant Strains

Genotypesa

Group Strain

RL5A

RL6

RL12

RL13

UL1

UL9

UL11

UL20

UL33

UL37

UL55N

UL73

UL74

UL120

UL146

UL139

US9

Mutated Genes DifferencesbShared Mutations

1 PRA8 1 1 6 6 6 6 2 5 1 5 2/3 4C 1C 2B 1 4 1 UL145 0 These strains share a UL145 mutation, were characterized

in different studies, and were conﬁrmed a s having b een

derived from the same patient

CZ/3/2012 1 1 6 6 6 6 2 5 1 5 2/3 4C 1C 2B 1 4 1 UL145

2 BE/3/2011 2 4 1B 1 1 4 1 6 2 2 2/3 4A 3 1A 8 2 1 None 1 None

BE/21/2011 2 4 1B 1 1 4 1 6 2 2 2/3 4A 3 1A 8 2 1 None

3 UK/Lon6/Urine/2011 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None 23 None

2CEN15 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None

BE/5/2012 5 1 7 7 7 1 1 6 2 1 4 3A 1B 3B 13 1A 1 None

4 BE/14/2012 3 5 4A 4A 4 6 6 5 4 5 4 3A 1B 2A 9 5 1 RL6 UL9 UL40 US7 26 These strains share a UL9 mutation and also RL6 and UL40

mutations that are present in other strains

BE/36/2011 3 5 4A 4A 4 6 6 5 4 5 4 3A 1B 2A 9 5 1 RL6 UL9 UL40

5 BE/10/2012 6 3 1A 1 1 1 1 2 4 5 4 3A 1B 2A 3 7 1 None 35 None

BE/26/2011 6 3 1A 1 1 1 1 2 4 5 4 3A 1B 2A 3 7 1 None

6 BE/1/2011 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL1 UL9 65 These strains bear a UL9 mutation that is present in other

strains, and 2 strains share a UL1 mutation

BE/8/2010 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL9

BE/9/2012 1 1 4A 4A 4 6 6 7 4 6 2/3 4A 3 2A 1 4 1 UL1 UL9

7 NAN1LA 3 5 5 5 5 7 3 2 2 6 2/3 4D 5 3B 7 5 2 RL6 US9 73 These strains share RL6 and US9 mutations that are present

in other strains

BE/6/2012 3 5 5 5 5 7 3 2 2 6 2/3 4D 5 3B 7 5 2 RL6 US9 US27

8 BE/7/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 RL5A RL13 UL150 125 These strains share a UL150 mutation that is present in

other strains

BE/11/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150

BE/16/2012 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150

BE/26/2010 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150

BE/30/2011 2 4 1A 1 1 4 1 7 5 3 2/3 4A 3 2B 13 5 1 UL150

9 JER851 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL1 UL9 UL111A 135 These strains share a UL111A mutation that is present in

another strain

JER4041 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL111A

BE/25/2010 1 1 3 3 3 2 1 3 4 6 1 2 2B 3B 7 4 1 UL111A

10 JER5695 1 1 7 7 7 1 1 6 2 1 2/3 3B 2A 4B 13 2 1 UL9 UL111A 138 These strains share a UL111A mutation that is present in

other strains, and have different UL9 mutations

BE/15/2010 1 1 7 7 7 1 1 6 2 1 2/3 3B 2A 4B 13 2 1 RL1 UL9 UL111A

11 PRA7 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A 143 These strains share RL5A and UL111A mutations that are

present in other strains

JP 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A

BE/4/2010 1 1 4B 4B 4 9 6 5 5 6 6 4D 5 4B 10 2 1 RL5A UL111A

12 BE/6/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 UL9 155 Two strains share a UL9 mutation that is present in other

strains

BE/18/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 None

BE/27/2011 5 1 4B 4B 4 9 6 1 5 3 2/3 3A 1B 1B 9 5 1 UL9

aSee Supplementary Figures 1 and 2 for genotype deﬁnitions. G preﬁx omitted.

bTotal number of differences among all strains in the group, not including size variations in tandem repeats. To exclude repeat regions, sequences were aligned from the TATA box of RL1 to the end of US, omitting the region from the AATAAA polyadenylation

signal of UL150A to the beginning of TRS.

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 7

strains or cross-contaminants. In our view, accurate estimates

of the levels of intrahost variation in single-strain infections are

not available from the present and previous studies, and will

require sequencing and bioinformatic approaches that are de-

monstrably reliable, robust, and reproducible [46, 47].

Whole-genome analyses have conrmed the signicant role

of recombination during HCMV evolution reported in nu-

merous earlier studies [12, 19]. Recombination has occurred

over a very long period but nonetheless remains limited in ex-

tent, with surviving events being more numerous in long re-

gions, less numerous in short regions, and rare or absent in

hypervariable regions, consistent with the role of homologous

recombination. Recombination frequency may be restricted in

some circumstances by functional interdependence within the

same protein (eg, gB) or possibly between separate proteins (eg,

gN and gO [25, 32, 44]). However, it is not known whether dif-

ferential recombination due to sequence relatedness is of general

biological signicance for the virus. Also, strains have circulated

that seem not to have recombined for long periods. Application

of an evolutionary rate estimated for herpesviruses (3.5 × 10−8

substitutions/nt/year) [48] implies that these periods may have

extended to many thousands of years. Moreover, as suggested

by the lack of diversity within genotypes in comparison with the

marked diversity among them, the distribution of substitutions

Table 3. Mutated Genes in Order of Decreasing Frequency

Gene Feature(s)

Strains Mutated, No.aStrains Mutated, %a

PassagedbClinicalcAlldPassagedbClinicalcAlld

UL9 RL11 family; type 1 membrane protein 50 31 81 32.89 34.07 33.33

RL5A RL11 family 31 27 58 20.39 29.67 23.87

UL1 RL11 family; type 1 membrane protein 20 18 38 13. 16 19.78 15.64

RL6 RL11 family 23 14 37 15.13 15.38 15.23

US9 US6 family; type 1 membrane protein 26 11 37 1 7. 1 1 12.09 15.23

UL111A Viral interleukin-10 16 7 23 10.53 7.69 9.47

UL150 Unknown 11 314 7.24 3.30 5.76

US7 US6 family; type 1 membrane protein 7 7 14 4.61 7.69 5.76

UL40 Type 1 membrane protein 8 2 10 5.26 2.20 4.12

UL30 UL30 family 2 3 5 1.32 3.30 2.06

UL142 MHC family; type 1 membrane protein 2 3 5 1.32 3.30 2.06

RL12 RL11 family; type 1 membrane protein 3 1 4 1.97 1. 1 0 1.65

RL1 RL1 family 1 2 3 0.66 2.20 1.23

UL136 Potential transmembrane domain 3 0 3 1.97 0.00 1.23

US13 US12 family; type 3 membrane protein 3 0 3 1.97 0.00 1.23

UL133 Potential transmembrane domain 2 0 2 1.32 0.00 0.82

US6 US6 family; type 1 membrane protein 1 1 2 0.66 1. 1 0 0.82

US8 US6 family; type 1 membrane protein 0 2 2 0.00 2.20 0.82

US27 GPCR family; type 3 membrane protein 2 0 2 1.32 0.00 0.82

UL11 RL11 family; type 1 membrane protein 1 0 1 0.66 0.00 0.41

UL13 Unknown 0 1 1 0.00 1. 1 0 0.41

UL14 UL14 family; type 1 membrane protein 0 1 1 0.00 1. 1 0 0.41

UL15A Potential transmembrane domain 0 1 1 0.00 1. 10 0.41

UL20 Type 1 membrane protein 1 0 1 0.66 0.00 0.41

UL43 US22 family 0 1 1 0.00 1. 10 0.41

UL99 Envelope-associated protein 1 0 1 0.66 0.00 0.41

UL148 Type 1 membrane protein 1 0 1 0.66 0.00 0.41

UL147 CXCL family 1 0 1 0.66 0.00 0.41

UL145 Unknown 0 1 1 0.00 1. 10 0.41

UL150A Unknown 1 0 1 0.66 0.00 0.41

IRS1 US22 family 1 0 1 0.66 0.00 0.41

US1 US1 family 1 0 1 0.66 0.00 0.41

US12 US12 family; type 3 membrane protein 1 0 1 0.66 0.00 0.41

US19 US12 family; type 3 membrane protein 0 1 1 0.00 1. 1 0 0.41

Abbreviations: CXCL, chemokine (CXC motif) ligand; GPCR, G protein–coupled receptor; MHC, major histocompatibility complex.

aOmitting mutations that occurred in RL13, UL128, UL130, and UL131A probably during passage, or that were engineered during bacterial artiﬁcial chromosome construction.

bStrains sequenced from strains passaged in cell culture, not taking into account the minority of mutations conﬁrmed from the clinical samples (n=152, excludes CZ/3/2012, which is the

same strain as PRA8).

cStrains sequenced directly from clinical material (n=91).

dStrains sequenced directly from clinical material or passaged virus (n=243).

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

8 • JID 2019:XX (XX XXXX) • Suárez etal

in nonrecombinant strains ts with the view that intense diver-

sication of the hypervariable genes occurred early in human or

pre–human history [25, 31] and has long since ceased.

Assessing the extent to which recombinants arise and sur-

vive in individuals with multiple-strain infections is problem-

atic. Except where populations uctuate signicantly and are

sampled serially (eg, RTR1 in collection 2), it is dicult to ap-

proach this using short-read data, as they are based on PCR

methodologies prone to generating recombinational artefacts.

Long- or single-read sequencing technologies and demonstrably

reliable bioinformatic approaches are needed. Also, conclusions

drawn from transplant recipients, who are immunosuppressed

and in whom HCMV populations may be diversied by trans-

plantation from HCMV-positive donors or selected with an-

tiviral drugs, are unlikely to represent other situations, such

maternal transmission via breast milk [39].

Evidence for pseudogenes was largely derived previously from

strains isolated in cell culture, and it was unclear to what extent

1000

2000

3000

4000

5000

Number of variants

1000

2000

3000

4000

5000

Number of variants

PAV6

PAV21, CMV-38, RTR2

RTR6B

CMV-35

ERR1279054, CMV-37

Single strains Multiple strains

PAV6

RTR6B CMV-35

CMV-37, CMV-19

CMV-38, PRA6ACMV-31, SCTR12

ERR1279054, RTR2

Single strains Multiple strains

Figure 2. Box-and-whisker graphs created using ggplot2 (https://ggplot2.tidyverse.org) showing the total number of single-nucleotide polymorphisms (SNPs) detected at

a frequency of >2% in single-strain and multiple-strain infections using LoFreq (A) and V-Phaser (B). Single-strain (n=134 and 131, respectively) and multiple-strain datasets

(n=29 and 29, respectively) for which consensus genome sequences had been derived were identiﬁed by motif read-matching, and the total number of SNPs in each dataset

was enumerated (insertions, deletions, and length polymorphisms were not considered). LoFreq employed a minimal coverage depth of 10 reads (minimal SNP quality [phred]

64)and strand-bias signiﬁcance with a false discovery rate correction of P<.001. V-Phaser employed phasing with a window size of 500 nucleotides and quality score (phred)

20 for calibrating the signiﬁcance of strand-bias at P<.05. Each box (light gray for single strains and dark gray for multiple strains) encompasses the ﬁrst to third quartiles

(Q1–Q3) and shows the median as a thick line. For each box, the horizontal line at the end of the upper dashed whisker marks the upper extreme (deﬁned as the smaller of

Q3+1.5 [Q3–Q1] and the highest single value), and the horizontal line at the end of the lower dashed whisker marks indicates the lower extreme (the greater of Q1– 1.5

[Q3–Q1] and the lowest single value).

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 9

pseudogenes presented in natural populations. For example, in a

study reporting that 75% of strains carry pseudogenes [12], 157

mutations were identied in 101 strains, with all but one of these

strains having been passaged in cell culture, although 35 mutations

were conrmed by PCR of the clinical material. Nonetheless, we

found that the distribution of pseudogenes among the 91 strains

sequenced in the present study directly from clinical material is

similar to that among strains isolated in cell culture, thus gener-

ally validating the earlier suppositions. e likelihood that many

of these mutants are ancient is supported by the nding that all

were detected at levels very close to 100% in collection 1, and by

previous observations identifying the same mutation in dierent

strains [7, 12]. Moreover, 9 of the groups of nonrecombinant

strains contained pseudogenes, and some of the mutations

were common to group members and even to additional strains

among the 243, indicating that they have been transferred by re-

combination. e implication that some mutants have a selective

advantage in certain individuals may be extended to their pres-

ence in pathogenic congenital infections, probably in combina-

tion with host factors. e genes from which pseudogenes have

arisen are involved, or are suspected to be involved, in immune

modulation. ey include UL111A, which encodes viral inter-

leukin 10 [49]; UL40, which is involved in protecting infected

cells against natural killer cell lysis [50] via its cleaved signal pep-

tide, in which mutations occur; and UL9, which bears a potential

immunoglobulin-binding domain [2]. ese ndings also sug-

gest, but do not prove, that maternal HCMV genotyping might

be useful in developing strategies for preventing congenitalCMV.

Modern approaches oer a powerful means for analyzing

HCMV genomes directly from clinical material, with the im-

portant proviso that the data should be quality assessed and

interpreted in the context of the known evolutionary and bio-

logical characteristics of the virus. Extensive high-throughput

sequence data are likely to illuminate further the epidemi-

ology, pathogenesis, and evolution of HCMV in clinical and

natural settings, thus facilitating the identication of virulence

determinants and the development of new interventions.

SupplementaryData

Supplementary materials are available at e Journal of Infectious

Diseases online. Consisting of data provided by the authors to

benet the reader, the posted materials are not copyedited and are

the sole responsibility of the authors, so questions or comments

should be addressed to the corresponding author.

Notes

Acknowledgments. We are grateful to Florent Lasalle, Daniel

Depledge, and Judith Breuer (University College London) for

providing unpublished collection 3 datasets and for updating

the associated genome sequences in GenBank. We also thank

Jenny Witthuhn (Hannover Medical School) for excellent tech-

nical assistance.

Financial support. is work was supported by the Medical

Research Council (grant numbers MC_UU_12014/3 and MC_

UU_12014/12 to A.J. D.); the Wellcome Trust (grant numbers

204870/Z/16/Z to A.J. D.and WT090323MA to G.W. G.W.);

the Ministry of Health of the Czech Republic for conceptual

development of research organization (University Hospital,

Motol, Prague, Czech Republic, grant number 00064203 to

P. H.); the Fondazione Regionale per la Ricerca Biomedica,

Regione Lombardia (grant number FRRB 2015-043 to D. L.);

the Niedersächsische Ministerium für Wissenscha und Kultur

(grant COALITION–Communities Allied in Infection to T.G.);

the Deutsche Forschungsgemeinscha Collaborative Research

Centre 900 (core project Z1, grant number SFB-9001 to T. F.

S.); and the German Center of Infection Research ematic

Translational Unit “Infections of the Immunocompromised

Host” (grant to T.G.and T.F. S.). Two authors (E. H.and A.D.)

were supported by the Infection Biology graduate program of

Hannover Biomedical Research School.

Potential conicts of interest. G.S. W.reports that his part

in the present study was completed prior to his present em-

ployment. G.W. G.W.has received a grant from the Wellcome

Trust. D.L.has received a grant from the Fondazione Regionale

per la Ricerca Biomedica, Regione Lombardia. T.G. has re-

ceived grants from the German Federal Ministry of Education

and Research and from the Niedersächsische Ministerium

für Wissenscha und Kultur. P.H. has received a grant from

the Ministry of Health of the Czech Republic for the concep-

tual development of University Hospital, Motol, Prague, Czech

Republic; personal fees and nonnancial support from MSD

and from Chimerix; and personal fees from Dynex. T.F. S.has

received grants from the Deutsche Forschungsgemeinscha

Collaborative Research Centre 900 and from the German

Federal Ministry of Education and Research. A.J. D. has re-

ceived grants from the Medical Research Council and the

Wellcome Trust. All other authors report no potential conicts

of interest.

All authors have submitted the ICMJE Form for Disclosure

of Potential Conicts of Interest. Conicts that the editors

consider relevant to the content of the manuscript have been

disclosed.

References

1. Puchhammer-Stöckl E, Görzer I. Cytomegalovirus and

Epstein-Barr virus subtypes—the search for clinical signifi-

cance. J Clin Virol 2006; 36:239–48.

2. CheeMS, BankierAT, BeckS, etal. Analysis of the protein-

coding content of the sequence of human cytomegalo-

virus strain AD169. Curr Top Microbiol Immunol 1990;

154:125–69.

3. Dunn W, Chou C, Li H, et al. Functional profiling of a

human cytomegalovirus genome. Proc Natl Acad Sci U S A

2003; 100:14223–8.

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

10 • JID 2019:XX (XX XXXX) • Suárez etal

4. MurphyE, YuD, GrimwoodJ, etal. Coding potential of lab-

oratory and clinical strains of human cytomegalovirus. Proc

Natl Acad Sci U S A 2003; 100:14976–81.

5. SinzgerC, HahnG, DigelM, etal. Cloning and sequencing

of a highly productive, endotheliotropic virus strain derived

from human cytomegalovirus TB40/E. J Gen Virol 2008;

89:359–68.

6. DolanA, CunninghamC, HectorRD, etal. Genetic content

of wild-type human cytomegalovirus. J Gen Virol 2004;

85:1301–12.

7. CunninghamC, GathererD, HilfrichB, etal. Sequences of

complete human cytomegalovirus genomes from infected

cell cultures and clinical specimens. J Gen Virol 2010;

91:605–15.

8. Dargan DJ, Douglas E, Cunningham C, et al. Sequential

mutations associated with adaptation of human cyto-

megalovirus to growth in cell culture. J Gen Virol 2010;

91:1535–46.

9. Bradley AJ, Lurain NS, Ghazal P, et al. High-throughput

sequence analysis of variants of human cytomegalo-

virus strains Towne and AD169. J Gen Virol 2009;

90:2375–80.

10. Jung GS, Kim YY, KimJI, et al. Full genome sequencing

and analysis of human cytomegalovirus strain JHC isolated

from a Korean patient. Virus Res 2011; 156:113–20.

11. SijmonsS, ThysK, CorthoutM, etal. A method enabling

high-throughput sequencing of human cytomegalovirus

complete genomes from clinical isolates. PLoS One 2014;

9:e95501.

12. Sijmons S, Thys K, Mbong Ngwese M, et al. High-

throughput analysis of human cytomegalovirus genome

diversity highlights the widespread occurrence of gene-

disrupting mutations and pervasive recombination. J Virol

2015; 89:7673–95.

13. ZhaoF, ShenZZ, LiuZY, etal. Identification and BAC con-

struction of Han, the first characterized HCMV clinical

strain in China. J Med Virol 2016; 88:859–70.

14. Cha TA, Tom E, Kemble GW, Duke GM, Mocarski ES,

SpaeteRR. Human cytomegalovirus clinical isolates carry

at least 19 genes not found in laboratory strains. J Virol

1996; 70:78–83.

15. StantonRJ, BaluchovaK, DarganDJ, etal. Reconstruction

of the complete human cytomegalovirus genome in a BAC

reveals RL13 to be a potent inhibitor of replication. J Clin

Invest 2010; 120:3191–208.

16. Renzette N, Bhattacharjee B, Jensen JD, Gibson L,

KowalikTF. Extensive genome-wide variability of human

cytomegalovirus in congenitally infected infants. PLoS

Pathog 2011; 7:e1001344.

17. MelnikovA, Galinsky K, Rogov P, et al. Hybrid selection

for sequencing pathogen genomes from clinical samples.

Genome Biol 2011; 12:R73.

18. DepledgeDP, Palser AL, WatsonSJ, etal. Specific capture

and whole-genome sequencing of viruses from clinical

samples. PLoS One 2011; 6:e27805.

19. LassalleF, DepledgeDP, ReevesMB, etal. Islands of linkage

in an ocean of pervasive recombination reveals two-speed

evolution of human cytomegalovirus genomes. Virus Evol

2016; 2:vew017.

20. HouldcroftCJ, Bryant JM, Depledge DP, etal. Detection

of low frequency multi-drug resistance and novel puta-

tive maribavir resistance in immunocompromised pedi-

atric patients with cytomegalovirus. Front Microbiol 2016;

7:1317.

21. Hage E, Wilkie GS, Linnenweber-Held S, et al.

Characterization of human cytomegalovirus genome di-

versity in immunocompromised hosts by whole-genome

sequencing directly from clinical specimens. J Infect Dis

2017; 215:1673–83.

22. Chou SW, Dennison KM. Analysis of interstrain var-

iation in cytomegalovirus glycoprotein B sequences

encoding neutralization-related epitopes. J Infect Dis 1991;

163:1229–34.

23. Meyer-KönigU, Ebert K, SchrageB, Pollak S, Hufert FT.

Simultaneous infection of healthy people with multiple

human cytomegalovirus strains. Lancet 1998; 352:1280–1.

24. RasmussenL, GeisslerA, WintersM. Inter- and intragenic

variations complicate the molecular epidemiology of

human cytomegalovirus. J Infect Dis 2003; 187:809–19.

25. MattickC, DewinD, PolleyS, etal. Linkage of human cyto-

megalovirus glycoprotein gO variant groups identified from

worldwide clinical isolates with gN genotypes, implications

for disease associations and evidence for N-terminal sites of

positive selection. Virology 2004; 318:582–97.

26. Sekulin K, Görzer I, Heiss-Czedik D, Puchhammer-

StöcklE. Analysis of the variability of CMV strains in the

RL11D domain of the RL11 multigene family. Virus Genes

2007; 35:577–83.

27. BankevichA, NurkS, AntipovD, etal. SPAdes: a new ge-

nome assembly algorithm and its applications to single-cell

sequencing. J Comput Biol 2012; 19:455–77.

28. Silva GG, Dutilh BE, Matthews TD, etal. Combining de

novo and reference-guided assembly with scaffold_builder.

Source Code Biol Med 2013; 8:23.

29. LangmeadB, SalzbergSL. Fast gapped-read alignment with

Bowtie 2. Nat Methods 2012; 9:357–9.

30. MilneI, StephenG, BayerM, et al. Using Tablet for visual

exploration of second-generation sequencing data. Brief

Bioinform 2013; 14:193–202.

31. BradleyAJ, KovácsIJ, GathererD, etal. Genotypic analysis

of two hypervariable human cytomegalovirus genes. J Med

Virol 2008; 80:1615–23.

32. Bates M, Monze M, Bima H, Kapambwe M, Kasolo FC,

Gompels UA; CIGNIS Study Group. High human

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

Clinical HCMV Genomes • JID 2019:XX (XX XXXX) • 11

cytomegalovirus loads and diverse linked variable

genotypes in both HIV-1 infected and exposed, but unin-

fected, children in Africa. Virology 2008; 382:28–36.

33. SieversF, WilmA, DineenD, etal. Fast, scalable generation

of high-quality protein multiple sequence alignments using

Clustal Omega. Mol Syst Biol 2011; 7:539.

34. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S.

MEGA6: molecular evolutionary genetics analysis version

6.0. Mol Biol Evol 2013; 30:2725–9.

35. Davison AJ, Holton M, Dolan A, Dargan DJ,

Gatherer D, Hayward GS. Comparative genomics of

primate cytomegaloviruses. In: Reddehase MJ, ed.

Cytomegaloviruses: from molecular pathogenesis to inter-

vention. Vol 1. Norwich, UK: Caister Academic Press, 2013.

36. LiH, HandsakerB, WysokerA, etal. The sequence align-

ment/map format and SAMtools. Bioinformatics 2009;

25:2078–9.

37. Wilm A, Aw PP, Bertrand D, et al. LoFreq: a sequence-

quality aware, ultra-sensitive variant caller for uncovering

cell-population heterogeneity from high-throughput

sequencing datasets. Nucleic Acids Res 2012; 40:11189–201.

38. YangX, CharleboisP, MacalaladA, HennMR, ZodyMC.

V-Phaser 2: variant inference for viral populations. BMC

Genomics 2013; 14:674.

39. Suárez NM, MusondaKG, Escriva E, etal. Multiple-strain

infections of human cytomegalovirus with high genomic di-

versity are common in breast milk from HIV-positive women

in Zambia. J Infect Dis 2019; XX:XXX–XXX. doi:10.1093/

infdis/jiz209.

40. Meyer-König U, Haberland M, von Laer D, Haller O,

Hufert FT. Intragenic variability of human cytomegalo-

virus glycoprotein B in clinical strains. J Infect Dis 1998;

177:1162–9.

41. Shepp DH, MatchME, LipsonSM, PergolizziRG. A fifth

human cytomegalovirus glycoprotein B genotype. Res Virol

1998; 149:109–14.

42. DeckersM, HofmannJ, KreuzerKA, etal. High genotypic

diversity and a novel variant of human cytomegalovirus re-

vealed by combined UL33/UL55 genotyping with broad-

range PCR. Virol J 2009; 6:210.

43. HaberlandM, Meyer-KönigU, HufertFT. Variation within

the glycoprotein B gene of human cytomegalovirus is

due to homologous recombination. J Gen Virol 1999; 80:

1495–500.

44. Paterson DA, Dyer AP, Milne RS, Sevilla-Reyes E,

GompelsUA. A role for human cytomegalovirus glycopro-

tein O (gO) in cell fusion and a new hypervariable locus.

Virology 2002; 293:281–94.

45. Yan H, Koyano S, Inami Y, etal. Genetic linkage among

human cytomegalovirus glycoprotein N (gN) and gO genes,

with evidence for recombination from congenitally and

post-natally infected Japanese infants. J Gen Virol 2008;

89:2275–9.

46. Xu C, Nezami Ranjbar MR, Wu Z, DiCarlo J, Wang Y.

Detecting very low allele fraction variants using targeted

DNA sequencing and a novel molecular barcode-aware var-

iant caller. BMC Genomics 2017; 18:5.

47. Illingworth CJR, Roy S, Beale MA, TutillH, Williams R,

Breuer J. On the effective depth of viral sequence data.

Virus Evol 2017; 3:vex030.

48. McGeochDJ, CookS, DolanA, JamiesonFE, TelfordEA.

Molecular phylogeny and evolutionary timescale for the

family of mammalian herpesviruses. J Mol Biol 1995;

247:443–58.

49. McSharry BP, Avdic S, Slobedman B. Human cytomega-

lovirus encoded homologs of cytokines, chemokines and

their receptors: roles in immunomodulation. Viruses 2012;

4:2448–70.

50. Prod’hommeV, TomasecP, CunninghamC, et al. Human

cytomegalovirus UL40 signal peptide regulates cell sur-

face expression of the natural killer cell ligands HLA-E and

gpUL18. J Immunol 2012; 188:2794–804.

Downloaded from https://academic.oup.com/jid/advance-article-abstract/doi/10.1093/infdis/jiz208/5485299 by MHH-Bibliothek user on 17 June 2019

Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses

Article

Full-text available

Sep 2020

Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a 'G.G' context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workf low named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/ hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data. studies the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients by analysis of large-scale biological and epidemiological data sets with computational techniques.

Generation of UL128 -shRNA Transduced Fibroblasts for the Release Of Cell-free Virus from Clinical Human Cytomegalovirus Isolates

Article

Full-text available

Oct 2023

Working with recent isolates of human cytomegalovirus (HCMV) is complicated by their strictly cell-associated growth with lack of infectivity in the supernatant. Adaptation to cell-free growth is associated with disruption of the viral UL128 gene locus. The authors transduced fibroblasts with a lentiviral vector encoding UL128-specific-shRNA to allow the release of cell-free infectivity without genetic alteration. Transduced cells were cocultured with fibroblasts containing cell-associated isolates, and knockdown of the UL128 protein was validated by immunoblotting. Cell-free infectivity increased 1000-fold in isolate cocultures with UL128-shRNA compared with controls, and virions could be purified by density gradients. Transduced fibroblasts also allowed direct isolation of HCMV from a clinical specimen and cell-free transfer to other cell types. In conclusion, UL128-shRNA-transduced fibroblasts allow applications previously unsuitable for recent isolates.

Pathogenesis of human cytomegalovirus in the immunocompromised host

Article

Jun 2021

Human cytomegalovirus (HCMV) is a herpesvirus that infects ~60% of adults in developed countries and more than 90% in developing countries. Usually, it is controlled by a vigorous immune response so that infections are asymptomatic or symptoms are mild. However, if the immune system is compromised, HCMV can replicate to high levels and cause serious end organ disease. Substantial progress is being made in understanding the natural history and pathogenesis of HCMV infection and disease in the immunocompromised host. Serial measures of viral load defined the dynamics of HCMV replication and are now used routinely to allow intervention with antiviral drugs in individual patients. They are also used as pharmacodynamic read-outs to evaluate prototype vaccines that may protect against HCMV replication and to define immune correlates of this protection. This novel information is informing the design of randomized controlled trials of new antiviral drugs and vaccines currently under evaluation. In this Review, we discuss immune responses to HCMV and countermeasures deployed by the virus, the establishment of latency and reactivation from it, exogenous reinfection with additional strains, pathogenesis, development of end organ disease, indirect effects of infection, immune correlates of control of replication, current treatment strategies and the evaluation of novel vaccine candidates. Human cytomegalovirus (HCMV) infection is ordinarily controlled by a vigorous immune response; however, HCMV can replicate to high levels and cause end organ disease when the immune system is compromised. In this Review, Griffiths and Reeves discuss HCMV pathogenesis in immunocompromised individuals and emerging strategies to treat and prevent infection and disease.

Targeted amplification-based whole genome sequencing of Monkeypox virus in clinical specimens

Article

Full-text available

Dec 2023

The 2022 mpox outbreak has led to more than 91,000 cases in 115 countries. Whole genome sequencing (WGS) has been at the forefront of surveillance and outbreak investigations for different pathogens of public health significance. Many institutions performing WGS on Monkeypox virus (MPXV) use a resource-intensive metagenomic approach. Here we present a targeted amplification method for WGS of MPXV from clinical specimens. We designed 43 pairs of primers (amplicons ~5 kb) with PrimalScheme to span the ~200 kb viral genome and then added 12 additional primers to optimize amplification. We extracted nucleic acid from clinical specimens and amplified the two primer pools. All libraries were sequenced on the MiniSeq platform. Resulting reads were filtered by quality and then mapped to a MPXV reference genome. Consensus sequences were generated for phylogenetic analysis. A total of 91 specimens with a real-time-PCR cycle threshold (Ct) values ≤27.9 were sequenced using our targeted amplification protocol. The sequenced MPXV genomes were of high quality with mean genome coverage of 99.56% (95% CI 99.32-99.80%), mean depth 1,395× (95% CI 1275–1515), and mean mapping quality of 52.87 (95% CI 52.1–53.6) and allowed for greater multiplexing of samples relative to metagenomics. The MPXV genomes belong to 8 of the 13 clades observed during the 2022 global mpox outbreak. Targeted amplification enrichment provides high coverage, throughput, and short turnaround times. It is an efficient low-cost method for MPXV WGS and can benefit public health surveillance and outbreak management. IMPORTANCE We present a protocol to efficiently sequence genomes of the MPXV-causing mpox. This enables researchers and public health agencies to acquire high-quality genomic data using a rapid and cost-effective approach. Genomic data can be used to conduct surveillance and investigate mpox outbreaks. We present 91 mpox genomes that show the diversity of the 2022 mpox outbreak in Ontario, Canada.

Sequencing Directly from Clinical Specimens Reveals Genetic Variations in HCMV-Encoded Chemokine Receptor US28 That May Influence Antibody Levels and Interactions with Human Chemokines

Article

Full-text available

Oct 2021

Human cytomegalovirus (HCMV) is a common viral pathogen of solid organ transplant recipients, neonates, and HIV-infected individuals. HCMV encodes homologs of several host genes with the potential to influence viral persistence and/or pathogenesis.

Olfactory Entry Promotes Herpesvirus Recombination

Article

Full-text available

Nov 2021
J VIROL

Herpesvirus genomes show abundant evidence of past recombination. Its functional importance is unknown. A key question is whether recombinant viruses can outpace the immunity induced by their parents to reach higher loads. We tested this by co-infecting mice with attenuated mutants of Murid Herpesvirus-4 (MuHV-4). Infection by the natural olfactory route routinely allowed mutant viruses to reconstitute wild-type genotypes and reach normal viral loads. Lung co-infections rescued much less well. Attenuated murine cytomegalovirus mutants similarly showed recombinational rescue via the nose but not the lungs. These infections spread similarly, so route-specific rescue implied that recombination occurred close to the olfactory entry site. Rescue of replication-deficient MuHV-4 confirmed this, showing that coinfection occurred in the first encountered olfactory cells. This worked even with asynchronous inoculation, implying that a defective virus can wait here for later rescue. Virions entering the nose get caught on respiratory mucus, which the respiratory epithelial cilia push back towards the olfactory surface. Early infection was correspondingly focussed on the anterior olfactory edge. Thus, by concentrating incoming infection into a small area, olfactory entry seems to promote functionally significant recombination. Importance All organisms depend on genetic diversity to cope with environmental change. Small viruses rely on frequent point mutations. This is harder for herpesviruses because they have larger genomes. Recombination provides another means of genetic optimization. Human herpesviruses often co-infect, and they show evidence of past recombination, but whether this is rare and incidental or functionally important is unknown. We showed that herpesviruses entering mice via the natural olfactory route meet reliably enough for recombination routinely to repair crippling mutations and restore normal viral loads. It appeared to occur in the first encountered olfactory cells and reflected a concentration of infection at the anterior olfactory edge. Thus, natural host entry incorporates a significant capacity for herpesvirus recombination.

Long Range PCR-based deep sequencing for haplotype determination in mixed HCMV infections

Preprint

Jul 2021

Short read sequencing, which has extensively been used to decipher the genome diversity of human cytomegalovirus (HCMV) strains, often falls short to assess co-linearity of non-adjacent polymorphic sites in mixed HCMV populations. In the present study, we established a long amplicon sequencing workflow to identify number and relative quantities of unique HCMV haplotypes in mixtures. Accordingly, long read PacBio sequencing was applied to amplicons spanning over multiple polymorphic sites. Initial validation of this approach was performed with defined HCMV DNA templates derived from cell-free viruses and was further tested for its suitability on patient samples carrying mixed HCMV infections. Our data show that artificial HCMV DNA mixtures were correctly determined upon long amplicon sequencing down to 1% abundance of the minor DNA source. Total error rate of mapped reads ranged from 0.17 to 0.43 depending on the stringency of quality trimming. PCR products of up to 7.7 kb and a GC content <55% were efficiently generated when DNA was directly isolated from bronchoalveolar lavage samples, yet long range PCR may display a slightly lower sensitivity compared to short amplicons. In a single sample, up to three distinct haplotypes were identified showing varying relative frequencies. Intra-patient haplotype diversity is unevenly distributed across the target site and often interspersed by long identical stretches, thus unable to be linked by short reads. Moreover, diversity at single polymorphic regions as assessed by short amplicon sequencing may markedly underestimate the overall diversity of mixed populations. Quantitative haplotype determination by long amplicon sequencing provides a novel approach for HCMV strain characterisation in mixed infected samples which can be scaled up to cover the majority of the genome. This will substantially improve our understanding of intra-host HCMV strain diversity and its dynamic behaviour. Impact statement Human cytomegalovirus (HCMV), a large enveloped DNA virus, displays the highest inter-host genome variability among all human herpesviruses. Primary infection, reinfection and reactivation are mostly asymptomatic but may cause devastating harm in congenitally infected newborns and in immunosuppressed individuals. Multiple distinct strains circulate in humans, each characterised by a unique assembly of well-defined polymorphic genes, most of which are linked to cell entry, persistence and immune evasion. Mixed HCMV strain infections are common and may pose a high pathogenic potential for patients at risk for symptomatic infections. To better understand the biological behaviour and dynamics of individual viral genomes it is inevitable to assess the co-linearity of polymorphic sites in a genetically heterogeneous population. In this study, we established and successfully applied a long read sequencing technique to long amplicons and identified co-linear genome stretches (haplotypes) in patient samples with mixed HCMV populations. This strategy for haplotype determination allows linkage analysis of multiple non-adjacent polymorphic sites along up to 7.7 kb. This allows a better approximation to the true strain diversity in mixed samples, which short read sequencing approaches failed to do. Thereby, improving our knowledge on mixed HCMV infections important for the clinical outcome, diagnostics, treatment and vaccine development. Data Summary Sequence data generated in this study were deposited in GenBank with the accession numbers MW560357 - MW560373 . Raw data of Illumina and PacBio sequencing were submitted to the NCBI Sequence Read Archive (SRA) under project number SUB8972240. BioSample accession numbers are provided in Supplementary Table 3 and 4. Additional sequence data for reference purposes were accessed from GenBank. Accession numbers are listed in Supplementary Table 6 and 7.

Mutagenesis of Human Cytomegalovirus Glycoprotein L Disproportionately Disrupts gH/gL/gO over gH/gL/pUL128-131

Article

Full-text available

Aug 2021
J VIROL

The endemic betaherpesvirus HCMV circulates in human populations as a complex mixture of genetically distinct variants, establishes lifelong persistent infections, and causes significant disease in neonates and immunocompromised adults. This study capitalizes on our recent characterizations of three genetically distinct HCMV BAC clones to discern the functions of the envelope glycoprotein complexes gH/gL/gO and gH/gL/pUL128-13, which are promising vaccine targets that share the herpesvirus core fusion apparatus component, gH/gL.

LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data

Article

Full-text available

Jan 2021

Long-read, single-molecule DNA sequencing technologies have triggered a revolution in genomics by enabling the determination of large, reference-quality genomes in ways that overcome some of the limitations of short-read sequencing. However, the greater length and higher error rate of the reads generated on long-read platforms make the tools used for assembling short reads unsuitable for use in data assembly and motivate the development of new approaches. We present LoReTTA (Long Read Template-Targeted Assembler), a tool designed for performing de novo assembly of long reads generated from viral genomes on the PacBio platform. LoReTTA exploits a reference genome to guide the assembly process, an approach that has been successful with short reads. The tool was designed to deal with reads originating from viral genomes, which feature high genetic variability, possible multiple isoforms, and the dominant presence of additional organisms in clinical or environmental samples. LoReTTA was tested on a range of simulated and experimental datasets and outperformed established long-read assemblers in terms of assembly contiguity and accuracy. The software runs under the Linux operating system, is designed for easy adaptation to alternative systems, and features an automatic installation pipeline that takes care of the required dependencies. A command-line version and a user-friendly graphical interface version are available under a GPLv3 license at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset.

Influence of Human Cytomegalovirus Glycoprotein O Polymorphism on the Inhibitory Effect of Soluble Forms of Trimer- and Pentamer-Specific Entry Receptors

Article

Full-text available

Jul 2020
J VIROL

Human cytomegalovirus (HCMV) is known for its broad cell tropism, as reflected by the different organs and tissues affected by HCMV infection. Hence, inhibition of HCMV entry into distinct cell types could be considered a promising therapeutic option to limit cell-free HCMV infection. Soluble forms of cellular entry receptor PDGFRα rather than those of entry receptor neuropilin-2 inhibit infection of multiple cell types. sPDGFRα specifically interacts with gO of the trimeric gH/gL/gO envelope glycoprotein complex. HCMV strains may differ with respect to the amounts of trimer in virions and the highly polymorphic gO sequence. In this study, we show that the major gO genotypes of HCMV that are also found in vivo are similarly well inhibited by sPDGFRα. Novel gO genotypic forms potentially emerging through recombination, however, may evade sPDGFRα inhibition on epithelial cells. These findings provide useful additional information for the future development of anti-HCMV therapeutic compounds based on sPDGFRα.

Multiple-Strain Infections of Human Cytomegalovirus with High Genomic Diversity are Common In Breast Milk from HIV-Positive Women in Zambia

Article

Full-text available

May 2019

Background: In developed countries, human cytomegalovirus (HCMV) is a major pathogen in congenitally infected and immunocompromised individuals, where multiple-strain infection appears linked to disease severity. The situation is less documented in developing countries. In Zambia, breast milk is a key route for transmitting HCMV and carries higher viral loads in human immunodeficiency virus (HIV)-infected women. We investigated HCMV strain diversity. Methods: High-throughput sequence datasets were generated from 28 HCMV-positive breast milk samples donated by 22 mothers (15 HIV-infected and 7 HIV-negative) at 4-16 weeks postpartum, then analyzed by genome assembly and novel motif-based genotyping in 12 hypervariable HCMV genes. Results: Among the 20 samples from 14 donors (13 HIV-infected and one HIV-negative) who yielded data meeting quality thresholds, 89 of the possible 109 genotypes were detected, and multiple-strain infections involving up to 5 strains per person were apparent in 9 HIV-infected women. Strain diversity was extensive among individuals but conserved compartmentally and longitudinally within them. Genotypic linkage was maintained within hypervariable UL73/UL74 and RL12/RL13/UL1 loci for virus entry and immunomodulation, but not between genes more distant from each other. Conclusions: Breast milk from HIV-infected women contains multiple HCMV strains of high genotypic complexity and thus constitutes a major source for transmitting viral diversity.

On the effective depth of viral sequence data

Article

Full-text available

Nov 2017

Genome sequence data are of great value in describing evolutionary processes in viral populations. However, in such studies, the extent to which data accurately describes the viral population is a matter of importance. Multiple factors may influence the accuracy of a dataset, including the quantity and nature of the sample collected, and the subsequent steps in viral processing. To investigate this phenomenon, we sequenced replica datasets spanning a range of viruses, and in which the point at which samples were split was different in each case, from a dataset in which independent samples were collected from a single patient to another in which all processing steps up to sequencing were applied to a single sample before splitting the sample and sequencing each replicate. We conclude that neither a high read depth nor a high template number in a sample guarantee the precision of a dataset. Measures of consistency calculated from within a single biological sample may also be insufficient; distortion of the composition of a population by the experimental procedure or genuine within-host diversity between samples may each affect the results. Where it is possible, data from replicate samples should be collected to validate the consistency of short-read sequence data.

Characterization of Human Cytomegalovirus Genome Diversity in Immunocompromised Hosts by Whole-Genome Sequencing Directly From Clinical Specimens

Article

Full-text available

Mar 2017
J INFECT DIS

Background: Advances in next-generation sequencing (NGS) technologies allow comprehensive studies of genetic diversity over the entire genome of human cytomegalovirus (HCMV), a significant pathogen for immunocompromised individuals. Methods: NGS was performed on target-enriched sequence libraries prepared directly from a variety of clinical specimens (blood, urine, breast-milk, respiratory samples, biopsies and vitreous humor) obtained longitudinally or from different anatomical compartments from 20 HCMV-infected patients (renal transplant recipients, stem cell transplant recipients and congenitally infected children). Results: De novo assembled HCMV genome sequences were obtained for 57/68 sequenced samples. Analysis of longitudinal or compartmental HCMV diversity revealed various patterns: no major differences were detected among longitudinal, intra-individual blood samples from 9/15 patients and in most of the patients with compartmental samples, whereas a switch of the major HCMV population was observed in six individuals with sequential blood samples and upon compartmental analysis of one patient with HCMV retinitis. Variant analysis revealed additional aspects of minor virus population dynamics and antiviral resistance mutations. Conclusions: In immunosuppressed patients, HCMV can remain relatively stable or undergo drastic genomic changes that are suggestive of the emergence of minor resident strains or de novo infection.

Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller

Article

Full-text available

Jan 2017
BMC GENOMICS

Background Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. Results We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. Conclusions We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3425-4) contains supplementary material, which is available to authorized users.

Detection of Low Frequency Multi-Drug Resistance and Novel Putative Maribavir Resistance in Immunocompromised Pediatric Patients with Cytomegalovirus

Article

Full-text available

Sep 2016

Human cytomegalovirus (HCMV) is a significant pathogen in immunocompromised individuals, with the potential to cause fatal pneumonitis and colitis, as well as increasing the risk of organ rejection in transplant patients. With the advent of new anti-HCMV drugs there is therefore considerable interest in using virus sequence data to monitor emerging resistance to antiviral drugs in HCMV viraemia and disease, including the identification of putative new mutations. We used target-enrichment to deep sequence HCMV DNA from 11 immunosuppressed paediatric patients receiving single or combination anti-HCMV treatment, serially sampled over 1-27 weeks. Changes in consensus sequence and resistance mutations were analysed for three ORFs targeted by anti-HCMV drugs and the frequencies of drug resistance mutations monitored. Targeted-enriched sequencing of clinical material detected mutations occurring at frequencies of 2%. Seven patients showed no evidence of drug resistance mutations. Four patients developed drug resistance mutations a mean of 16 weeks after starting treatment. In two patients, multiple resistance mutations accumulated at frequencies of 20% or less, including putative maribavir and ganciclovir resistance mutations P522Q (UL54) and C480F (UL97). In one patient, resistance was detected 14 days earlier than by PCR. Phylogenetic analysis suggested recombination or superinfection in one patient. Deep sequencing of HCMV enriched from clinical samples excluded resistance in 7 of eleven subjects and identified resistance mutations earlier than conventional PCR-based resistance testing in 2 patients. Detection of multiple low level resistance mutations was associated with poor outcome.

Islands of linkage in an ocean of pervasive recombination reveals two-speed evolution of human cytomegalovirus genomes

Article

Full-text available

Jun 2016

Human cytomegalovirus (HCMV) infects most of the population worldwide, persisting throughout the host's life in a latent state with periodic episodes of reactivation. While typically asymptomatic, HCMV can cause fatal disease among congenitally infected infants and immunocompromised patients. These clinical issues are compounded by the emergence of antiviral resistance and the absence of an effective vaccine, the development of which is likely complicated by the numerous immune evasins encoded by HCMV to counter the host's adaptive immune responses, a feature that facilitates frequent super-infections. Understanding the evolutionary dynamics of HCMV is essential for the development of effective new drugs and vaccines. By comparing viral genomes from uncultivated or low-passaged clinical samples of diverse origins, we observe evidence of frequent homologous recombination events, both recent and ancient, and no structure of HCMV genetic diversity at the whole-genome scale. Analysis of individual gene-scale loci reveals a striking dichotomy: while most of the genome is highly conserved, recombines essentially freely and has evolved under purifying selection, 21 genes display extreme diversity, structured into distinct genotypes that do not recombine with each other. Most of these hyper-variable genes encode glycoproteins involved in cell entry or escape of host immunity. Evidence that half of them have diverged through episodes of intense positive selection suggests that rapid evolution of hyper-variable loci is likely driven by interactions with host immunity. It appears that this process is enabled by recombination unlinking hyper-variable loci from strongly constrained neighboring sites. It is conceivable that viral mechanisms facilitating super-infection have evolved to promote recombination between diverged genotypes, allowing the virus to continuously diversify at key loci to escape immune detection, while maintaining a genome optimally adapted to its asymptomatic infectious lifecycle.

Identification and BAC construction of Han, the first characterized HCMV clinical strain in China

Article

Full-text available

Oct 2015
J MED VIROL

Human cytomegalovirus (HCMV) is the leading infectious cause of birth defects, and may lead to severe or lethal diseases in immunocompromised individuals. Several HCMV strains have been identified and widely applied in research, but no isolate from China has been characterized. In the present study, we isolated, characterized and sequenced the first Chinese HCMV clinical strain Han, and constructed the novel and functional HCMV infectious clone Han-BAC-2311. HCMV Han was isolated from the urine sample of a Chinese infant with multiple developmental disorders. It expresses HCMV specific proteins and contains a representative HCMV genome with minor differences compared to other strains. By homologous recombination using mini-F derived BAC vector pUS-F6, the infectious clone Han-BAC-2311 was constructed containing representative viral genes across the HCMV genome. The insertion site and orientation of BAC sequence were confirmed by restriction enzyme digestion and Southern blotting. The reconstituted recombinant virus HanBAC-2311 expresses typical viral proteins with the same pattern as that of wild-type Han, and also displayed a similar growth kinetics to wild-type Han. The identification of the first clinical HCMV strain in China and the construction of its infectious clone will greatly facilitate the pathogenesis studies and vaccine development in China. This article is protected by copyright. All rights reserved.

High-Throughput Analysis of Human Cytomegalovirus Genome Diversity Highlights the Widespread Occurrence of Gene-Disrupting Mutations and Pervasive Recombination

Article

Full-text available

Jul 2015
J VIROL

Importance: Human cytomegalovirus has the largest genome of all viruses that infect humans. Currently, there is a great interest in establishing associations between genetic variants and strain pathogenicity of this herpesvirus. Since the number of publicly available full-genome sequences is limited, knowledge about strain diversity is highly fragmented and biased towards a small set of loci. Combined with our previous work, we have now contributed 101 complete genome sequences. We have used these data to conduct the first high-resolution analysis of interhost genome diversity, providing an unbiased and comprehensive overview of cytomegalovirus variability. These data are of major value to the development of novel antivirals and a vaccine and to identify potential targets for genotype-phenotype experiments. Furthermore, they have enabled a thorough study of the evolutionary processes that have shaped cytomegalovirus diversity.

A Method Enabling High-Throughput Sequencing of Human Cytomegalovirus Complete Genomes from Clinical Isolates

Article

Full-text available

Apr 2014
PLOS ONE

Human cytomegalovirus (HCMV) is a ubiquitous virus that can cause serious sequelae in immunocompromised patients and in the developing fetus. The coding capacity of the 235 kbp genome is still incompletely understood, and there is a pressing need to characterize genomic contents in clinical isolates. In this study, a procedure for the high-throughput generation of full genome consensus sequences from clinical HCMV isolates is presented. This method relies on low number passaging of clinical isolates on human fibroblasts, followed by digestion of cellular DNA and purification of viral DNA. After multiple displacement amplification, highly pure viral DNA is generated. These extracts are suitable for high-throughput next-generation sequencing and assembly of consensus sequences. Throughout a series of validation experiments, we showed that the workflow reproducibly generated consensus sequences representative for the virus population present in the original clinical material. Additionally, the performance of 454 GS FLX and/or Illumina Genome Analyzer datasets in consensus sequence deduction was evaluated. Based on assembly performance data, the Illumina Genome Analyzer was the platform of choice in the presented workflow. Analysis of the consensus sequences derived in this study confirmed the presence of gene-disrupting mutations in clinical HCMV isolates independent from in vitro passaging. These mutations were identified in genes RL5A, UL1, UL9, UL111A and UL150. In conclusion, the presented workflow provides opportunities for high-throughput characterization of complete HCMV genomes that could deliver new insights into HCMV coding capacity and genetic determinants of viral tropism and pathogenicity.

Comparative genomics of primate cytomegaloviruses

Article

Jan 2013

Human Cytomegalovirus Genomes Sequenced Directly From Clinical Material: Variation, Multiple-Strain Infection, Recombination, and Gene Loss

Figures

Recommended publications

Le nouvel eBook sur la surveillance de la tuberculose

Free NEW eBook | More with single-cell epigenomics and transcriptomics

eBook: Unlock the next wave of genomic discovery

eBook:ゲノムの発見の次の波を引き起こす

Fibroblasts from human post-myocardial infarction scars become superconductors after transduction wi...

Figure S2

Figure S2

Additional file 12