ArticlePDF AvailableLiterature Review

Reference standards for next-generation sequencing

Authors:

Abstract

Next-generation sequencing (NGS) provides a broad investigation of the genome, and it is being readily applied for the diagnosis of disease-associated genetic features. However, the interpretation of NGS data remains challenging owing to the size and complexity of the genome and the technical errors that are introduced during sample preparation, sequencing and analysis. These errors can be understood and mitigated through the use of reference standards - well-characterized genetic materials or synthetic spike-in controls that help to calibrate NGS measurements and to evaluate diagnostic performance. The informed use of reference standards, and associated statistical principles, ensures rigorous analysis of NGS data and is essential for its future clinical use.
Next-generation sequencing (NGS) is increasingly being
applied in clinical diagnosis, as it can identify genetic
variations associated with disease1, can determine fusion
genes that cause cancer2 and can detect pathogens in
patient samples or isolates3. Unlike previous diagnostic
sequencing, NGS can deliver a full qualitative and quan-
titative analysis of the DNA or RNA sequences within
a sample in a single test, and thereby promises improved
diagnosticyield.
Despite these advantages, the application and ana-
lysis of NGS data remain challenging. The sheer size and
diversity of the human genome defy simple ana lysis,
and the breadth of the tested sequences increases the
risk of both false-positive and false-negative diagnoses.
Furthermore, the detection of clinically relevant features
in extreme or repetitive regions of the genome is lim-
ited4, and technical variables that are introduced during
sample preparation, library construction, sequencing
and bioinformatic analysis further confound analysis5.
These errors can cause inaccurate analysis and mis-
diagnosis, and many clinical laboratories continue to
validate complex or ambiguous results with Sanger
sequencing6,7. Many of these errors can be mitigated
with the use of reference standards, which have been
recommended by a range of professional organiza-
tions8–16. The use of reference standards to assess diag-
nostic performance has been well studied in fields such
as ana lytical chemistry17–19, providing a conceptual
framework that can be readily applied to NGS assays.
A reference standard is defined as a material that is
“sufficiently homo geneous and stable with respect to
one or more specified properties, which has been estab-
lished to be fit for its intended use in a measurement
process” (REF.20). These properties can be qualitative,
such as the sequence of a DNA molecule; or quantita-
tive, such as its abundance within a sample. Given that
DNA is stable, can be replicated with fidelity and can
be accurately characterized, it constitutes an ideal ref-
erence material. Although RNA enjoys similar advan-
tages, its lower stability necessitates additional care in
handling and storage.
Although the adoption of reference standards for
NGS has been fairly slow, partly owing to the rapid
pace of technological innovation, the increasing
translation of NGS technology into clinical practice
has recently focused attention on the development of
accompanying reference standards. In this Review, we
describe the development and use of reference stand-
ards for NGS. We particularly focus on the conceptual
and statistical principles that underpin the design of
reference standards, and the relative merits and limit-
ations of differing approaches. We propose that an
understanding of reference standards, and their asso-
ciated statistical principles, is required for the rigor-
ous analysis of NGS data and its application in clinical
diagnosis.
Reference standards
To minimize bias associated with any particular
method, a reference standard should be characterized
using different procedures. For NGS, this could entail
character ization with different NGS technologies or
with orthogonal methods, such as Sanger sequencing
and quantitative PCR (qPCR). After extensive character-
ization, a consensus value, such as a genotype, can then
be assigned to the reference standard. Metrological
organizations, such as the National Institute of
Standards and Technology (NIST), can certify reference
1Genomics and Epigenetics
Division, Garvan Institute of
Medical Research, Sydney,
NSW2010, Australia.
2St Vincent’s Clinical School,
Faculty of Medicine,
University of New South
Wales, Sydney, NSW2052,
Australia.
3School of Biotechnology
& Biomolecular Sciences,
Faculty of Science,
University of New South
Wales, Sydney, NSW2052,
Australia.
4Altius Institute for
Biomedical Sciences, Seattle,
Washington98121, USA.
Correspondence to T.R.M.
t.mercer@garvan.org.au
doi:10.1038/nrg.2017.44
Published online 19 Jun 2017
Reference standards
Control materials with known
characteristics (for example,
a known genotype) against
which test performance can be
measured.
Reference standards for
next-generation sequencing
Simon A.Hardwick1,2, IraW.Deveson1,3 and TimR.Mercer1,2,4
Abstract | Next-generation sequencing (NGS) provides a broad investigation of the genome,
and it is being readily applied for the diagnosis of disease-associated genetic features.
However, the interpretation of NGS data remains challenging owing to the size and complexity of
the genome and the technical errors that are introduced during sample preparation, sequencing
and analysis. These errors can be understood and mitigated through the use of reference
standards — well-characterized genetic materials or synthetic spike-in controls that help to
calibrate NGS measurements and to evaluate diagnostic performance. The informed use of
reference standards, and associated statistical principles, ensures rigorous analysis of NGS data
and is essential for its future clinical use.
APPLICATIONS OF NEXT-GENERATION SEQUENCING
REVIEWS
NATURE REVIEWS
|
GENETICS VOLUME 18
|
AUGUST 2017
|
473
Nature Reviews | Genetics
0.1 0.2 0.4 0.6 0.8 1.0
0.1
0.2
0.4
0.6
0.8
1.0
Allele frequency measurement 1
Allele frequency measurement 2
a Regression-based approach b PCA-based approach
PC1
PC2
Patient samples
Commutable reference standards
Non-commutable reference standards
Patient samples
Commutable reference standards
Non-commutable reference standards
95% confidence interval95% prediction interval
Commutability
The ability of a reference
standard to perform
comparably to actual patient
samples when measured using
more than one measurement
procedure.
standards, indicating that they have been character-
ized for composition with a stated uncertainty and are
directly traceable to an external system of units21.
To be fit for its intended use, a reference standard
must be commutable; that is, it must perform compa-
rably to samples undergoing testing22. For example,
commutability requires a clinical DNA standard to per-
form similarly to patient genomic DNA (gDNA) dur-
ing library preparation, sequencing and analysis. The
use of reference standards with poor commutability
will bias any calibrated measurements and will result in
inaccurate diagnosis23. Given the importance of commu-
tability, clear guidelines have been developed for clini-
cal standards that can be readily adapted to reference
standards in NGS (BOX1).
Once commutability has been established, reference
standards can be used to calibrate measurements made
from patient samples. For example, the abundance of a
DNA sequence in a patient sample can be determined by
comparison to a reference sequence of known abundance,
along with the uncertainty associated with this measure-
ment. This calibration allows standardization of meas-
urements across multiple samples, and allows diagnostic
thresholds to be anchored to reference standards.
Reference standards in NGS. The development of refer-
ence standards should address some of the unique chal-
lenges posed by NGS. Foremost, the breadth and depth
of NGS enable many genome regions to be tested in
a single assay, increasing diagnostic (or prognostic) yield
but also increasing the risk of erroneously identifying
false positives. To evaluate these events requires refer-
ence standards that reflect the diversity of the genomic
or transcriptomic features being tested by an NGS assay.
Given the breadth of the interrogated sequence,
precision is of crucial importance for NGS tests in which,
for example, only a few rare mutations are expected within
large regions of the genome, and even a low false-positive
rate can overwhelm diagnosis with erroneous mutation
calls. Filtering practices are often applied to maximize
precision; however, these practices can concomitantly
increase the proportion of false-negative results, which
represent missed opportunities for diagnosis12. This
false-negative diagnosis rate is poorly understood for
many NGS applications and is difficult to measure
without the use of well-characterized reference standards.
Sequencing coverage is one of the most relevant tech-
nical variables in NGS and is typically limited by library
complexity and cost considerations24. Sufficient cover-
age is crucial for sensitivity and is required for confident
measurements of gene expression and variant detection.
At low sequence coverage, sensitivity limits are reached,
and uncertainty increases. Accordingly, the ability for
reference standards to assess the impact of sequencing
coverage is important for many NGS applications.
High sequencing coverage can also be required to
overcome the appreciable error rate of NGS that results
from DNA damage and sequencing errors25. By contrast,
systematic sequencing errors due to sequencing artefacts
and short-read misalignment cannot be overcome by
high coverage and often require reference standards to
be identified and understood26.
Given that reference standards can provide known
‘truths’, the difference between the expected values and
the measured values can provide an empirical estimate of
uncertainty27. This is otherwise difficult for NGS work-
flows, which typically include multiple steps that each
introduce different types and amounts of uncertainty.
A reference standard will accrue such uncertainties as it
progresses through the NGS workflow, and can provide
a cumulative measure of uncertainty associated with the
final diagnosis.
Box 1 | Commutability
Commutability describes the ability of a reference standard to perform comparably to
tested samples. For example, a reference human genome is considered commutable if it
performs similarly to a patient genome sample during sequencing and analysis.
Commutability can also be influenced by matrix effects, which encompass all sample
components other than the analyte of interest22. For example, the impact of fixation is
a matrix effect that can reduce the commutability of reference standards to
formalin-fixed, paraffin-embedded (FFPE) samples110.
Commutability must be established in cases in which a reference standard uses
a non-native version of an analyte22, such as in the case of synthetic spike-in controls.
It can also be difficult to develop commutable reference standards for samples that are
highly variable. For example, the quality of RNA samples tested or post-transcriptional
modifications of endogenous mRNAs (such as poly-adenylation and 5ʹ capping) can lead
to divergent performance compared with reference standards111. Using a poorly
commutable standard to calibrate a diagnostic test leads to inaccurate and variable test
results, and the commutability of reference standards should be established before use23.
A regression analysis can be used to show that reference standards and patient samples
exhibit similar quantitative performances112. For example (see the figure, part a), we can
plot the variant allele frequencies measured in patient samples using two measurement
procedures (such as different next-generation sequencing technologies). In this case,
commutable reference standards should be distributed within a specified prediction
interval (shaded region) relative to the regression line of patient samples (black line).
Alternatively, multivariate descriptive statistics can be used to compare relationships
between reference standards to patient samples according to the principal
components22. The results obtained for authentic patient samples typically form a cluster
of points (see the figure, part b). Data points from commutable reference standards
should be clustered similarly with patient samples, and non-commutable reference
standards would be distributed outside of patient samples.
PCA, principal component analysis.
REVIEWS
474
|
AUGUST 2017
|
VOLUME 18 www.nature.com/nrg
Nature Reviews | Genetics
x
y
a Spike-in controls
c Simulated
in silico libraries
.FASTQ
.FASTQ
Output
library
AGTCAG
Inter-sample
normalization Reference ladders
Tool analysis and
development
Diagnostic
statistics
Bioinformatic analysis
Read alignment
Variant calling
and annotation
Gene expression profiling
Fusion gene detection
16S rDNA analysis
De novo assembly
DNA or RNA sample b Biological reference materials (e.g. NA12878)
Library preparation
Sample quality
Library complexity
Amplification bias
Sequencing
Random errors
Systematic errors
Matrix effects
Effects caused by any sample
component other than the
analyte of interest that can
lead to the non-commutability
of reference standards.
Biological reference materials
Natural genetic materials can act as useful reference
standards that are relatively cheap and easy to develop,
encompass the full size and diversity of the human
genome or transcriptome, and are expected to be
generally commutable with patient samples. Natural
genetic materials are also agnostic to the limitations
of current NGS technologies and constitute an impar-
tial reference against which to compare alternative
methods (FIG.1).
Human genome reference standards. The detection of
genetic variation associated with disease is the leading
clinical use of NGS. However, the use of alternative
sequencing technologies and bioinformatic analyses
typically returns substantial disagreement among variant
calls, often at thousands of genomic sites, for the same
individual genome28–31. The difficulty in establishing a
comprehensive and unambiguous set of variants for even
a single genome indicates the need for reliably genotyped
human samples that can serve as reference standards.
Because the original human reference genome
sequence was assembled from a consensus of multiple
individuals32, it does not provide a biological material
to use as a reference standard. Instead, various individ-
ual human genomes have been established as reference
standards to benchmark NGS test performance. Despite
some concerns regarding genomic instability and drift33,
stable gDNA from these individuals can be fairly easily
and inexpensively sourced from transformed celllines.
The genome of a healthy female donor of European
ancestry known as NA12878 has become the foremost
human genome reference standard. The limitations of
individual NGS technologies in analysing the genome
were offset by integrating multiple sequencing and ana-
lytical approaches to generate a high-confidence set of
single nucleotide variants (SNVs) and small insertions
and deletions (indels) across most of the genome34. For
example, long-read sequencing was required to resolve
structural variant sets35 and to assign phasing infor-
mation to NA12878 variants36. Despite these efforts,
a substantial proportion of the human genome remains
refractory to sequencing analysis owing to extreme GC
content, low complexity or repetitive sequences. These
difficult regions often vary between individuals and host
a range of clinically relevant mutations4.
Many clinical laboratories routinely sequence the
NA12878 gDNA as a process control for their NGS
workflow (BOX2). The identified variants can be bench-
marked against high-confidence genotypes to assess
performance, and sequencing multiple replicates enables
repeatability (within-run variation) and reproducibility
(between-run variation) to be assessed37. Despite such
widespread usage, the consent provided by the NA12878
individual is limited to research use only, and comm-
ercial products cannot be derived from the genome.
The diversity of human genetic variation has moti-
vated the development and characterization of reference
genomes from different ancestries. Accordingly, NIST is
expanding its set of supported genome reference stand-
ards to include representatives from different ethnic
populations38. These additional genomes were collected
by the Personal Genome Project, with consent provided
for a broader range of uses39. These efforts are further
supported by a number of regional initiatives that aim
to develop reference genome banks for specific coun-
tries and to provide more relevant reference standards
for local populations40–43.
Figure 1 | Schematic overview of a next-generation sequencing workflow showing
the use of reference standards. a|Spike‑in controls can be added to DNA or RNA
samples for combined library preparation and sequencing, and partitioned for
subsequent bioinformatic analysis as internal quantitative and qualitative controls.
b|Well‑characterized biological reference materials act as valuable process controls but
cannot be directly added to samples. c|Finally, insilico sequencing libraries can be used
to rapidly evaluate key bioinformatic steps (indicated in blue). Reference standards can
be used to assess various biases and errors in the next-generation sequencing workflow
(indicated in beige). rDNA, ribosomal DNA.
REVIEWS
NATURE REVIEWS
|
GENETICS VOLUME 18
|
AUGUST 2017
|
475
Nature Reviews | Genetics
*Unbalanced class
FN
-
27,548
TP
-
3,244,053
TN*
-
2,812,727,786
FP
-
613
+
+
Predicted
Actual
Sensitivity (recall) = TP / (TP + FN) = 99.15%
Specificity = TN / (TN + FP) = 99.99%
Accuracy = (TP + TN) / Total = 99.99%
Precision = TP / (TP + FP ) = 99.98%
F1 = 2TP / (2 TP + FP + FN) = 99.56%
Reference genomes for disease studies. Patient genomes
that harbour disease-causing variants can also pro-
vide valuable reference standards to guide clinical
diagnosis with NGS. However, the sheer diversity of
causative variants that are tested with NGS, of which
only a small proportion can be present within any sin-
gle genome, presents a challenge to developing reference
genomes for disease.
Cells from patient samples with clinical variants
of interest can be transformed to provide a re newable
source of reference material. The Genetic Testing
Reference Materials Coordination Program (GeT-RM)
has characterized a wide range of cell lines that har-
bour pathogenic mutations for a range of inherited dis-
eases44–49, as well as variants in pharmacogenetic loci50,51.
These genomes stand as representative examples of the
variation associated with disease.
Genome editing to engineer specific variants into
a cell line offers an alternative approach52,53. However, the
risk of unintended off-target effects requires careful vali-
dation of engineered cell lines to ensure that they remain
isogenic for other genome positions.
Establishing a stable, well-characterized and renew-
able reference genome material for cancer has proved
particularly difficult. The extensive characterization of
several matched tumour and normal samples has illus-
trated the complexity of tumour genome populations
and has provided useful reference data for benchmark-
ing analysis54–56. However, tumour samples are typically
small and finite, and do not provide a ready source of
biological reference material. Furthermore, a tumour
sample can encompass multiple, evolving sub-clonal
populations and can be insufficiently stable to derive
a reliable and homogeneous reference material57.
Derived cell lines can provide a simplified example of a
cancer genome and can be mixed with other matched
cells lines to simulate tumour samples58. Ongoing
efforts, such as the Sequencing Quality Control (SEQC)
Consortium, aim to establish reference cell lines for use
in cancer studies and diagnosis59.
Reference RNA samples. RNA-sequencing (RNA-
seq) is confounded by the sheer size and diversity of
the transcriptome, variation in RNA sample quality
and library preparation methods, and complex bio-
informatic analysis60. Nevertheless, the importance of
accurately and reproducibly measuring gene expression
has motivated the development of well- characterized
natural RNA reference materials. However, the respon-
siveness of gene expression to external stimuli, even
during laboratory culture, can cause substantial
batch-specific transcriptome variation, and many ref-
erence RNA materials must be non-renewably stocked
from a single large batch61.
The SEQC59, GEUVADIS62 and Association of
Biomolecular Resource Facilities (ABRF)63 projects
have used human reference RNA samples to provide
comprehensive assessments of RNA-seq accuracy and
reproducibility with different protocols, NGS technol-
ogies and laboratory sites. Combining these reference
samples at known ratios has also allowed the relative
accuracy of NGS technologies to be evaluated based
on the detection of differentially expressed genes, and
this consensus analysis of reference RNA samples has
subsequently informed best practices for RNA-seq
experimentation59,62,63.
Box 2 | Diagnostic statistics
Reference standards can provide a ‘truth set’ to evaluate the classifier performance
of a next-generation sequencing (NGS)-based diagnostic test. The number of
true-positive (TP; see the figure, green), true-negative (TN; red), false-positive
(FP; blue) and false-negative (FN; yellow) predictions for an NGS test can be
calculated by comparison to this reference standard. These outcomes are often
tabulated into a confusion matrix (see the figure) and used to calculate a range of
statistical metrics that describe different aspects of the NGS test performance113.
To illustrate these metrics, consider the use of whole-genome sequencing (WGS)
to identify single nucleotide variants (SNVs) in the ‘accessible’ portion of the
human genome. By sequencing and analysing an NA12878 genomic DNA (gDNA)
sample, we can evaluate test performance by comparing results to the
high-confidence genotypes34. In this example, confident variant calls are not
possible within difficult regions of the human genome that are omitted from the
reportable range of the assay. Targeted gene or exome sequencing can further
reduce the reportable range to specific genes of interest. Reporting procedures
can also restrict this range. For example, if silent mutations are not reported owing
to their functional uncertainty, this variant type would be omitted from the
validated reportable range and would not be considered when calculating
diagnostic performance.
It should be noted that benchmarking against only high-confidence genotypes is
likely to overestimate performance given the difficulty in identifying variants that
are in the omitted ‘difficult’ regions37. This example also ignores other variant
types, such as small insertions and deletions (indels) and structural variants,
for which performance metrics may be lower114. Finally, many variant-calling
algorithms preferentially identify variants at known polymorphic sites115.
Therefore, novel or rare variants in patient samples may not be identified with
comparable sensitivity to known variants that are found in HapMap reference
samples such as the NA12878 genome.
Recall (or sensitivity) is one of the most important qualities of any diagnostic test
and, in the case of NGS, is often correlated with sequencing quality and depth.
In this example, recall is the fraction of known variants in the NA12878 genome that
were correctly identified. However, false-positive predictions do not affect this
metric, and our WGS test could simply identify variants at all sites and still have high
recall. Therefore, our WGS test should also be specific, and only identify sites where
a variant is present and show an absence of variants across the remainder of the
genome. Given that variants are relatively rare across the genome, precision
(also commonly known as positive predictive value), which excludes true negatives,
can most usefully describe this property in NGS, where positive and negative class
values are unbalanced. There is often a trade-off between the precision and the
recall of an NGS test that can be visualized (see BOX3) or described with
the F1value, which is the weighted average of precision and recall and provides
a measure of classifier performance.
REVIEWS
476
|
AUGUST 2017
|
VOLUME 18 www.nature.com/nrg
Diagnostic
threshold
True
positive
False
positive
True
negative
False
negative
False positives
True positives
False-positive rate
(1 - specificity)
Recall
Recall
Precision
Density
Variant allele frequency 10
010
1
0
1
10
0
1
AUC
Mutations
Sequence
errors
Nature Reviews | Genetics
c Precision-recall curveb ROC curvea Somatic variant detection
Variant allele frequencies
The fraction of alleles in a given
sample (for example, a tumour
biopsy sample) that
correspond to a variant of
interest.
Precision
(Also known as positive
predictive value). The fraction
of positive predictions made by
a test that are true.
Sensitivity
(Also known as recall). The
fraction of known positives that
are correctly predicted by
a test.
Systematic sequencing
errors
Nonrandom errors in sequence
determination due to sample
preparation and sequencing
processes.
NA12878
The well-characterized genome
from a healthy female
individual that is commonly
used to benchmark genome
analysis.
Long-read sequencing
Sequencing approach that uses
reads in excess of several
kilobases, enabling the
resolution of large structural
genomic features.
Phasing
The process of determining the
chromosome from which
a particular DNA variant is
derived.
Although these RNA samples were sequenced using
multiple NGS technologies, novel isoforms continue to
be discovered with increasing depth, suggesting that
further transcriptional diversity remains unannotated.
This highlights the challenge of using natural RNA
reference samples for which there is no comprehen-
sive consensus annotation to evaluate false-positive
and false-negative findings with RNA-seq. Despite
these challenges, natural reference RNA standards
are invaluable resources for understanding complex
transcriptional features associated with disease, such
as the diagnosis of theBCRABL1 fusion transcript in
leukaemia64.
Reference standards for microorganisms. Metagenomic
sequencing can deliver a global profile of the micro-
bial community within an environmental sample65 and
can diagnose the presence of pathogens directly from
patient samples or isolates3. Unlike previous technol-
ogies, NGS can also discover microorganisms that
are entirely novel or uncultivable in the laboratory66.
However, microbial diversity poses a challenge to ana-
lysis by NGS, with individual microorganisms exhibit-
ing a range of genome architectures. Analysis is further
confounded by shared and missing reference genome
sequences and the presence of background matrix
DNA, such as human DNA in patient samples9.
Box 3 | Illustrating diagnostic performance
Many tests do not return a clear positive or negative result but instead provide a continuous range of
quantitative or likelihood values (see the figure, part a). Therefore, a diagnostic threshold (dashed line) can
be used to divide this range of values into positive or negative predictions and thus provide a clear positive or
negative assessment to inform clinical decision making.
To determine the optimal threshold level, it is often useful to visualize the relationship between true-positive and
false-positive events. To illustrate this problem, consider the sequencing of a patient tumour sample to identify
low-frequency somatic mutations. Increased sequencing coverage typically improves recall by enabling the detection
of mutations at lower allele frequencies. However, sequencing errors also occur at low frequencies and can be
difficult to distinguish from somatic mutations116. Therefore, an allele frequency threshold must be selected that
maximizes the detection of true somatic mutations while minimizing the mistaken classification of sequencing errors
(see the figure, part a).
To establish this diagnostic threshold, we can use a mixture of DNA spike-in standards that represent
mutations that are titrated across a range of allele fractions77 and that are added to a patient tumour sample
before library preparation and sequencing. These DNA spike-ins allow us to measure both the true somatic
mutations and the false sequencing errors that are detected across the range of allele frequencies.
When selecting an appropriate diagnostic threshold, it may be useful to plot a receiver operating
characteristic (ROC) curve, which shows the relative acquisition of true positives and false positives after
all potential mutations are ranked according to allele frequency (see the figure, part b). The area under
the curve (AUC) quantifies the relationship between true-positive and false-positive rates, illustrated in the
ROC curve, and can be used to compare the diagnostic performance of independent tests. For data sets with
unbalanced classes, ROC curves should be interpreted cautiously because a small change in the false-positive
rate can lead to a large change in the absolute number of erroneous predictions117.
A precision-recall curve can be more useful for resolving diagnostic thresholds in next-generation
sequencing tests, in which the true-negative class is usually far larger than the true-positive class
(see the figure, part c). In this schematic example, the precision-recall curve distinctly shows the allele
frequency when abundant sequencing errors are no longer excluded. The AUC for the precision-recall curve
also provides a greater quantitative discrimination when comparing independent tests or alternative
filtering strategies.
REVIEWS
NATURE REVIEWS
|
GENETICS VOLUME 18
|
AUGUST 2017
|
477
Mock microbial
communities
A reference standard
generated by combining the
genome DNA (or cells) from
multiple individually cultured
microorganisms at known
concentrations.
Spike-in controls
DNA or RNA molecules of
known length, sequence
composition and abundance
that are directly added to
samples to act as qualitative
and quantitative internal
controls.
Various microbial reference genomes have been
released by NIST for tool development and analysis67.
The genomes were selected due to their importance in
food safety and clinical microbiology, and encompass
a wide range of GC contents. The US Food and Drug
Administration (FDA) has also established FDA-ARGOS,
a database that lists validated genome sequences from a
diverse range of infectious microorganisms, which can
be used to standardize the development of NGS tests9.
Mock microbial communities, in which multiple micro-
organisms have been individually cultured and combined
at known abundances to form a community, are often
favoured as reference standards to benchmark meta-
genome analysis. Mock communities can be assembled
from extracted gDNA samples, or directly from individual
cultures, to allow biases that arise during DNA extraction to
be examined. The use of microorganisms with completed
reference genomes and combined at known concentra-
tions also allows the limitations of genome quantification
and denovo assembly to be investigated. Similarly, mock
communities can act as common templates for the multi-
plex PCR primers that are used in 16S ribosomal DNA
(rDNA) profiling, and indicate whether specific microbial
lineages are under-estimated or missed during analysis68.
To improve standardization between participat-
ing laboratories, the Human Microbiome Project
Consortium assembled a mock community of different
bacteria and archaea that represent a range of GC con-
tents, genome sizes, repeat content and phylogenetic
diversity 69,70. The Microbiome Quality Control (MBQC)
project was subsequently initiated to evaluate methods
for measuring the human microbiome, using a range of
reference standards71. More recent efforts have expanded
the scope of represented microorganisms, and have
tailored mock communities to specific environmental
sources or to NGS applications72. These reference com-
munities have served as useful controls to compare 16S
and shotgun sequencing data, to evaluate bias due to GC
content and to benchmark metagenome analysis.
Spike-in controls
A major limitation of using natural genetic materials as
NGS reference standards is that they cannot typically be
combined with patient samples without contaminating
downstream analysis. Spike-in controls, by contrast, are
designed to be directly added to a sample and to undergo
concurrent library preparation and sequencing, thereby
acting as internal quantitative and qualitative controls
that are subject to the same downstream technical
variables as the accompanying sample (FIG.1).
Spike-in controls often comprise non-human or
artificial sequences73, or contain unique molecular bar-
codes74, so derivative reads can be distinguished from
the accompanying sample following sequencing. For
example, the PhiX bacteriophage genome is routinely
used as a spike-in control to determine the basic quality
control and error rate in Illumina sequencingruns75.
The design of spike-in control sequences is flexible and
constrained only by the limits of synthesis, enabling them
to be rapidly developed to represent diagnostic features
and to address the specific requirements of an NGS test.
Spike-ins are typically prepared individually and can be
combined at different concentrations to formulate com-
plex mixtures in which many features are represented
and internal ‘ladders’ are built to measure the quantita-
tive features of the accompanying sample. Despite these
advantages, achieving commutability remains a constant
challenge in developing spike-in controls, as synthetic
constructs may not reflect the complexity or behaviour
of native DNA or RNA samples76.
Genome spike-ins. Synthetic DNA spike-in controls have
been used to represent instances of human genetic var-
iation, including SNVs, indels, and large structural and
copy number variants77. The ability to represent genetic
variation, particularly with clinical relevance, enables
spike-ins to evaluate the detection of these variants with
NGS technologies. Furthermore, many variants can be
represented within a single mixture of DNA spike-ins,
enabling the breadth of NGS diagnosis to be appraised.
A substantial proportion of clinically relevant vari-
ants are difficult to resolve using current NGS technol-
ogies4. Genome spike-ins can be used to represent such
difficult variants, the presence of which may be other-
wise ambiguous in natural genome materials. Similarly,
natural genome reference materials can be supplemented
with synthetic controls that represent clinically relevant
or difficult variants that are not otherwise present74,78.
By manipulating the abundance of specific DNA
spike-ins, it is also possible to simulate quantitative fea-
tures of genome biology, such as variant allele frequency
and copy number variation. For example, pairs of DNA
spike-ins that represent reference and variant alleles can
be either combined to emulate heterozygous genotypes,
or further titrated to emulate lower somatic variant allele
frequencies that are commonly observed in tumour sam-
ples77. These internal DNA spike-in ladders can derive
quantitative statistics that are specific to an individual
library and can empirically define thresholds for distin-
guishing sequencing errors from true positives at low
allele frequencies (BOX3).
RNA spike-ins. RNA spike-ins were originally developed
by the External RNA Controls Consortium (ERCC) as
reference standards for quantitative reverse transcriptase
(qRT) PCR and microarray assays73,79,80, but have since
been widely adopted by the RNA-seq community. The
ERCC spike-ins comprise a set of polyadenylated tran-
scripts with a range of lengths and GC contents, and
without homology to the human genome. The develop-
ment of spliced RNA spike-ins, which emulate the com-
plex exon and intron architecture of human genes, has
allowed further assessments of alternative splicing and
transcript assembly using RNA-seq81–83. Custom RNA
spike-in sets have also been developed for more specific
applications, including the sequencing of small RNA
classes84 and the detection of oncogenic fusion genes85.
RNA spike-ins can be combined to form staggered
mixtures that encompass the range of human gene
expression86. The accuracy of gene expression measure-
ments with RNA-seq can then be empirically assessed
by comparison to this quantitative ladder (BOX4).
REVIEWS
478
|
AUGUST 2017
|
VOLUME 18 www.nature.com/nrg
Nature Reviews | Genetics
Variable and
inaccurate
sequencing
Omit from
downstream
quantitative
analysis
Quantification
limit
Detection
limit
Quantification
limit
Measured abundance
0.0
0.1
0.2
0.3
Measured abundance
Coefficient of variation (SD/Mean)
Relative frequency (fractions)
0
1
2
Residual
Quantification
limit
Detection
limit
Detection
limit
Input concentration
Input concentration
Density plot for
all transcripts
detected in
accompanying
sample
b Variation between replicates
a
c Accompanying sample
R2 = 0.945
Slope = 1.02 ± 0.03
Sy.x = 0.271
Limit of detection
The lowest concentration of an
analyte that can be detected
by an assay.
Box 4 | Quantitative accuracy and regression analysis
Next-generation sequencing can measure the amount of a DNA or RNA sequence in a sample, and is routinely
used to measure gene expression, allele frequencies and microbial abundance. Each of these quantitative
measurements can be assessed and calibrated against reference standards.
Consider an RNA-sequencing (RNA-seq) experiment that aims to measure the expression of thousands of genes
within a human RNA sample. RNA spike-ins, formulated into a staggered mixture, are added to the human sample
(in triplicate) and then undergo concurrent library preparation, sequencing and analysis.
The quantitative performance of the RNA-seq experiment can be evaluated by plotting the measured
abundance of each spike-in (y axis) against its known input concentration (x axis) (see the figure, part a).
Spike-ins at such low concentrations that they fail to be sequenced indicate the lower limit of detection of the
experiment. Users may also be interested in specifying an empirical limit of quantification118 below which gene
expression measurements become unreliably sparse and variable (blue dashed line). This is indicated by the high
standard deviation (SD; see the figure, part a) and coefficient of variation (see the figure, part b) between
replicates at low abundances.
A linear regression line (green) can be fitted to model the relationship between the measured abundance and the
known concentration of RNA spike-ins. This is most commonly performed using least-squares regression, whereby
a curve is plotted to minimize the sums of the squares of the residuals119. The residual for each data point is the
difference between the predicted value and its actual value (see the figure, part a (inset)), and can indicate whether
error is constant across the measured range.
The strength of this linear relationship is indicated by the coefficient of determination (R2), which is the fraction
of variation in the dependent variable (measured abundance) that is predictable from the independent variable
(input concentration). This describes how closely the regression line fits the data and how well the RNA-seq
experiment has measured the known concentration of the spike-ins.
Once the linear relationship between known concentration and measured gene expression is established,
the model can be used in reverse to calibrate the concentration of each gene from measured expression. This also
enables thresholds, such as the limit of quantification, to omit any endogenous genes in the accompanying
sample that are insufficiently expressed for reliable analysis83,86 (see the figure, part c).
REVIEWS
NATURE REVIEWS
|
GENETICS VOLUME 18
|
AUGUST 2017
|
479
Normalization
The adjustment of technical
bias between multiple samples
to facilitate accurate
comparisons.
Reportable range
The genomic region or regions
in which sequencing data of an
acceptable quality can be
derived by a next-generation
sequencing test.
Reference interval
The spectrum of sequence
variants that occur in an
unaffected population from
which the patient specimen
has been derived.
Proficiency testing
The provision of reference
samples to participating
laboratories for testing, with
results reported to an
independent organization for
evaluation (often known as
external quality assessment in
Europe).
This comparison also allows gene expression to be quan-
tified with absolute transcript copy numbers86. RNA
spike-ins can be formulated at different concentrations
between alternative mixtures to provide both positive
and negative controls in differential gene expression
tests. Adding alternative mixtures to different samples
enables users to empirically assess the accuracy of fold-
change measurements at different gene expression levels,
and can inform the interpretation of differential gene
expression between accompanying samples87.
RNA spike-ins can provide a completely character-
ized truth set that enables the evaluation of false- positive
and false-negative findings that is not otherwise possi-
ble for natural reference RNA samples. This advantage
was utilized during the SEQC and ABRF projects,
which complemented reference RNA samples with
ERCC controls, enabling a broader analysis of RNA-
seq performance that included evaluations of sensitiv-
ity and technical variation between NGS methods and
laboratories59,63.
Spike-in controls can also be used as scaling factors
for normalization between multiple samples (BOX5). This
has proved particularly useful in single-cell RNA-seq
experiments, which typically compare thousands of
individual cells between which the mRNA composi-
tion and the impact of experimental variables can vary
substantially88. By adding spike-ins during cell lysis,
the mRNA content returned from each cell can be
estimated according to the fraction of the library that
is derived from spike-ins, with an atypically high frac-
tion being indicative of low RNA quantity (and poten-
tially an experimental error)89. This ability to measure
absolute transcript numbers at high cellular resolution
allows researchers to investigate novel aspects of tran-
scriptome kinetics that were previously imperceptible
using conventional (bulk) RNA-seq90.
In silico data sets for bioinformatic analysis
The bioinformatic steps during the analysis of NGS
libraries are often complex, and are a substantial source
of bias and errors. In sili co data sets can be generated
quickly and easily, and have proved useful for devel-
oping and troubleshooting software tools, and for
assessing bioinformatic performance (FIG.1).
Common data sets (typically in FASTQ or SAM/
BAM format) can be rapidly simulated or altered
to generate ‘ground truth’ examples for testing bio-
informatics analysis. For example, rare or challenging
variants can be quickly represented at any desired fre-
quency within a simulated data set91. Furthermore, the
progress of each simulated read can be traced at each
step during the analytical workflow, from their orig-
inal genomic position, through alignment and final
analysis92. This allows each step to be assessed, and
enables the rapid optimization of the NGS workflow.
Various software tools have been developed to pro-
duce human genomes with known genotypes and to
simulate derivative NGS libraries93. These tools can
often incorporate sequencing errors and can model
other sources of error present in NGS libraries. Similarly,
for RNA-seq, insilico reference data sets have been used
to benchmark existing analytical tools94 and, more
recently, to develop innovative methods for transcript
quantification95,96.
The most obvious limitation of insilico data sets is
that their use is restricted to assessing bioinformatic
steps of NGS workflows, and it is difficult to fully
model the complexity and variability present in real
data using simulated data. Therefore, although insilico
reference standards are a useful supplement for test-
ing bioinformatic steps, they do not replace the use
of physical standards that measure the full range of
variables faced in clinical diagnosis.
Regulatory considerations
The ability to return diagnostic information from the
human genome sequence, which may form a perma-
nent component of an individual’s health record, neces-
sitates clear and robust regulatory oversight. Regional
organizations are typically vested with authority and
responsibility to regulate the development and the val-
idation of clinical diagnostic NGS tests, including the
use of reference standards to assess and monitor test
performance. The FDA and the European Medicines
Agency are two of the largest regulatory organizations.
For clarity, we outline the regulatory environment in
the United States, but analogous principles are applied
in other countries.
Validation data demonstrating the diagnostic
performance for a gene or a mutation of interest are
required for FDA approval of invitro diagnostic tests.
However, this requirement may be prohibitively diffi-
cult and expensive for NGS tests that can detect many
variants across large genome regions. Nevertheless,
in the few examples of NGS tests that have sought
FDA approval, such as the diagnosis of CFTR muta-
tions with Illumina’s MiSeqDx instrument97, refer-
ence standards have proved critical for benchmarking
performance.
Alternatively, an NGS test can be accredited for
use within a single laboratory under the Clinical
Laboratory Improvement Amendments (CLIA)98.
Reference standards are central to demonstrating
the validity of the NGS assay, including a global ana-
lysis of accuracy, precision, sensitivity, specificity,
reportable range and reference interval14. Currently, most
NGS diagnostic laboratories have sought approval
through this pathway, partly due to the lower cost and
speed, compared with FDA approval, with which rap-
idly evolving sequencing and bio informatics tools can
be accredited.
The ongoing performance of CLIA-approved NGS
tests is routinely monitored with proficiency testing, in
which blinded samples are periodically sent to partic-
ipating clinical laboratories for analysis, which then
report results for performance evaluation14. In some
cases, performance can also be verified by the infor-
mal inter-laboratory exchange of patient samples99.
Given the breadth of the variants tested, NGS tests
are more suited to methods-based evaluation, rather
than to specific genes or mutations of interest100,101.
Using this approach, proficiency testing can use a set
REVIEWS
480
|
AUGUST 2017
|
VOLUME 18 www.nature.com/nrg
A B C Spike-in
A B C Spike-in A B C Spike-in
Genes Genes
Genes
Normalized signal
mRNA levels
FN FP
Condition 1
Condition 2
Produces FNs and FPs
Normalized using
spike-ins
Normalized
using total
counts
Accurately detects gene
expression changes
Transcriptional
amplification
c
a b
of central reference samples to provide an independ-
ent and standard ized evaluation of many different
laboratories and different types of NGS tests.
The College of American Pathologists (CAP) offers
one of the most comprehensive proficiency testing
programmes for NGS (CAP proficiency testing),
including germline and somatic variants102, as well as
common or actionable fusion genes103. For microbial
genomics, the Global Microbial Identifier recently
launched a proficiency testing challenge for bacterial
whole-genome sequencing (GMI proficiency tests),
in which participants are sent live cultures, extracted
gDNA and NGS data from bacterial strains9.
Given the analytical complexity of NGS data,
insilico proficiency testing challenges have also been
introduced to test bioinformatics workflows102,104,105.
In these programmes, participants are sent NGS library
data for ana lysis using their local bio informatics work-
flow. This will prove particularly useful for assessing
the diagnosis of complex structural variants or for
evaluating false- negative rates. Although not intended
as a formal proficiency testing programme, the FDA
recently launched precisionFDA, an online portal
where participants can access and share NGS data sets
and bioinformatic tools, and can standardize analytical
best practices106.
Conclusions
The high-sequence throughput of NGS enables the
broad interrogation of the genome or transcriptome
with a single test. Given this advantage, NGS is being
rapidly established in clinics for the diagnosis of dis-
ease-associated genetic features. However, the diagno-
sis of features is far from simple, particularly given the
size and diversity of the genome and the complexity of
sequence data and bioinformatic analysis. Reference
standards are an invaluable resource through which to
understand these limitations.
We have described a range of reference standards
that have been developed for NGS (summarized in
TABLE1). Natural biological samples retain genome
complexity and can stand as a record of common and
pathogenic human genetic variation that is agnostic
to current sequencing or bioinformatic technologies.
By contrast, synthetic controls can be precisely
designed to address specific clinical or technological
applications and, through their careful synthesis and
preparation, can enable quantitative aspects of genome
biology to be assessed. Although not a substitute for
physical reference standards, insilico data sets can be
used to efficiently optimize bioinformatic steps. Each
type of reference standard has its own relative merits
and limitations, and ideally a combination of differ-
ent types should be used in order to provide a robust
framework for validation and quality control14.
Reference standards have so far mostly been used
to benchmark NGS workflows. However, we anticipate
that the routine use of reference standards will increas-
ingly enable the development of novel statistical and
bioinformatic analyses87,107. This includes the ability
to empirically measure library statistics and uncer-
tainty, which can then inform and train new genera-
tions of bioinformatic tools26,108. Furthermore, the use
of reference standards to expand and to standardize
the assessment of difficult, complex or quantitative
features of the genome can lead to further gains in
diagnostic yield109.
Box 5 | Normalization with spike-in controls
The normalization of next-generation sequencing (NGS) data aims to reduce the
differences due to technical effects between multiple samples while preserving
biological difference for analysis. As internal controls, spike-ins can be used to measure
these technical variables and to mitigate their effects between samples.
Approaches such as remove unwanted variation (RUV) can adjust for technical effects
by performing factor analysis on spike-in controls that remain constant between
samples107. These approaches assume that technical effects have an equal impact on both
spike-ins and the accompanying sample, and therefore that any differences in spike-ins
between samples result from experimental factors that can be mitigated accordingly.
Normalization has proved particularly important for RNA-sequencing (RNA-seq), owing
to the need to accurately compare gene expression across different cell types, conditions
and timepoints. To illustrate this, consider an RNA-seq experiment that aims to identify
differentially expressed genes between two conditions (see the figure, part a).
At a minimum, both conditions should be normalized for library sizes to ensure that the
differences in gene counts are not simply the result of one sample being sequenced at
greater depth (see the figure, part b).
However, consider a set of genes that is highly expressed in one condition but not in
another. Normalizing by library size alone will cause non-differentially expressed genes
to appear downregulated, whereas the genuine differences between the highly
expressed genes will be minimized (see the figure, part b). To address this problem, most
analyses assume that most genes are not differentially expressed between samples and
estimate a scaling factor based on differences in the mean gene expression120.
This assumption is violated when global differences in gene expression occur, such as
during transcriptional amplification in cancer121. In this case, the addition of RNA spike-in
controls in absolute amounts, such as in proportion to the number of cells in
a sample122, can act as constant scaling factors that detect and normalize the global shifts
in gene expression (see the figure, part c).
Although spike-in controls have been most widely used for normalization between
RNA-seq samples, they can similarly act as scaling factors in different NGS applications,
such as copy number variation between genomes77 or for normalizing microbial
communities in environmental samples123.
FN, false negative; FP, false positive.
REVIEWS
NATURE REVIEWS
|
GENETICS VOLUME 18
|
AUGUST 2017
|
481
Continued technological innovation is expected
to generate more sophisticated synthetic controls and to
lead to more comprehensively characterized bio logical
materials. This relationship between technological inno-
vation and reference standards is reciprocal, as reference
standards will in turn inform the development and opti-
mization of new sequencing technologies. The continued
development of reference standards is a relatively simple
alternative approach to improving the accuracy, reliability
and standardization of clinical diagnosis, without requiring
further advances in NGS technologies. Accordingly, the
use of reference standards is likely to expand in step with
the broader implementation of NGS in clinical diagnosis
and our evolving understanding of genetic disease.
1. Yang,Y. etal. Clinical whole-exome sequencing for the
diagnosis of Mendelian disorders. N.Engl. J.Med.
369, 1502–1511 (2013).
2. Byron,S.A., Van Keuren-Jensen,K.R.,
Engelthaler,D.M., Carpten,J.D. & Craig,D.W.
Translating RNA sequencing into clinical diagnostics:
opportunities and challenges. Nat. Rev. Genet. 17,
257–271 (2016).
3. Lefterova,M.I., Suarez,C.J., Banaei,N. &
Pinsky,B.A. Next-generation sequencing for infectious
disease diagnosis and management. J.Mol. Diagn.
17, 623–634 (2015).
4. Goldfeder,R.L. etal. Medical implications of technical
accuracy in genome sequencing. Genome Med. 8, 24
(2016).
This study investigated the location of clinically
relevant variants in regions of the human genome
that are refractory to reliable genotyping with NGS
owing to the presence of extreme GC content or
repetitive sequences.
5. van Dijk,E.L., Jaszczyszyn,Y. & Thermes,C. Library
preparation methods for next-generation sequencing:
tone down the bias. Exp. Cell Res. 322, 12–20 (2014).
6. Mu,W., Lu,H.-M., Chen,J., Li,S. & Elliott,A.M.
Sanger confirmation is required to achieve optimal
sensitivity and specificity in next-generation sequencing
panel testing. J.Mol. Diagn. 18, 923–932 (2016).
7. Beck,T.F., Mullikin,J.C. & Biesecker,L.G. Systematic
evaluation of Sanger validation of next-generation
sequencing variants. Clin. Chem. 62, 647–654 (2016).
8. Matthijs,G. etal. Guidelines for diagnostic next-
generation sequencing. Eur. J.Hum. Genet. 24, 2–5
(2016).
9. Gargis,A.S., Kalman,L. & Lubin,I.M. Assuring the
quality of next-generation sequencing in clinical
microbiology and public health laboratories.
J.Clin. Microbiol. 54, 2857–2865 (2016).
10. Gargis,A.S. etal. Good laboratory practice for clinical
next-generation sequencing informatics pipelines.
Nat. Biotechnol. 33, 689–693 (2015).
11. Aziz,N. etal. College of American Pathologists’
laboratory standards for next-generation sequencing
clinical tests. Arch. Pathol. Lab. Med. 139, 481–493
(2015).
12. Rehm,H.L. etal. ACMG clinical laboratory standards
for next-generation sequencing. Genet. Med. 15,
733–747 (2013).
13. Schrijver,I. etal. Opportunities and challenges
associated with clinical diagnostic genome sequencing.
J.Mol. Diagn. 14, 525–540 (2012).
14. Gargis,A.S. etal. Assuring the quality of next-
generation sequencing in clinical laboratory practice.
Nat. Biotechnol. 30, 1033–1036 (2012).
The Nex-StoCT (Next-generation Sequencing:
Standardization of Clinical Testing) workgroup
developed a set of guidelines to ensure that results
from NGS tests are sufficiently reliable for clinical
diagnosis, including the recommendation of
reference standards for test validation, quality
control and proficiency testing.
Table 1 | Overview of the main types of reference standards for next-generation sequencing
Reference standard Examples Advantages Disadvantages
DNA biological
materials
GeT‑RM patient cell
lines44–51
NA12878 reference
genome34
Additional GIAB
reference genomes38
Cancer reference
genomes54–56
NIST microbial
reference genomes67
Metagenome mock
communities69,70,72
Readily commutable with patient samples
Performance can be assessed across the
whole genome, encompassing millions of
DNA variants (BOX2)
Agnostic to different NGS technologies
Transformed cell lines can provide
a renewable source of gDNA
Limited representation of clinically relevant
variants or sequences
Consent and privacy considerations (for
human samples)
Difficult to achieve comprehensive
characterization
Diagnostic performance based on
high-confidence calls that overestimate
accuracy over the whole genome
Cell lines may suffer from genomic instability
and drift
Cannot be added to samples without the risk
of contaminating downstream analysis
Engineered cell lines Horizon Diagnostics52 Contain mutations of interest in the context
of the complete genome
Readily commutable with patient samples
External process control that cannot be
added to samples
Can cause off-target mutations and
unintended effects
RNA biological
materials
SEQC and ABRF
reference samples59,63
BCRABL1 mRNA
standard64
Readily commutable with patient samples
(subject to RNA degradation)
Agnostic to evolving NGS technologies
RNA must be non‑renewably stocked from
a single batch owing to batch-specific
variation
Absence of comprehensive transcriptome
annotation
Spike-in controls Synthetic germline78
and somatic74 mutation
reference panels
DNA variant spike‑ins77
ERCC controls86,87
Spliced RNA
spike-ins81–83
Fusion gene spike-ins85
Small RNA spike‑ins84
Internal controls can assess library
preparation, sequencing and downstream
analysis
Represent any variant or sequence of interest
(constrained only by limits of synthesis)
Establish reference ladders to evaluate NGS
quantitative accuracy (BOX4)
Scaling factors for normalization between
multiple samples (BOX5)
Must first establish commutability with
biological samples (BOX1)
Does not represent full size and complexity
of genome or transcriptome
Requires a small fraction (typically <5%) of
reads in the library
In silico data sets Simulated or real
FASTQ files91,92,102,105
Data sets can be readily edited to include any
desired genetic feature
In silico proficiency testing programmes
can standardize bioinformatic best
practices
Only able to assess bioinformatic analysis
Difficult to model the complex and variable
sources of variation present in experimental
data
ABRF, Association of Biomolecular Resource Facilities; ERCC, External RNA Controls Consortium; gDNA, genomic DNA; GeT‑RM, Genetic Testing Reference
Materials Coordination Program; GIAB, Genome in a Bottle Consortium; NGS, next‑generation sequencing; NIST, National Institute of Standards and Technology;
SEQC, Sequencing Quality Control. Table adapted with permission from REF.14, Macmillan Publishers Limited.
REVIEWS
482
|
AUGUST 2017
|
VOLUME 18 www.nature.com/nrg
15. Centers for Disease Control and Prevention. Good
laboratory practices for molecular genetic testing for
heritable diseases and conditions. MMWR Recomm.
Rep. 58, 1–29 (2009).
16. Chen,B. etal. Developing a sustainable process to
provide quality control materials for genetic testing.
Genet. Med. 7, 534–549 (2005).
17. Greg Miller,W. etal. Roadmap for harmonization of
clinical laboratory measurement procedures.
Clin. Chem. 57, 1108–1117 (2011).
18. Franzini,C. & Ceriotti,F. Impact of reference materials
on accuracy in clinical chemistry. Clin. Biochem. 31,
449–457 (1998).
19. Radin,N. What is a standard? Clin. Chem. 13, 55–76
(1967).
20. International Organization for Standardization. ISO
Guide 30:2015 — Reference Materials — Selected
Terms and Definitions (ISO, 2015).
21. Bunk,D.M. Reference materials and reference
measurement procedures: an overview from a national
metrology institute. Clin. Biochem. Rev. 28, 131–137
(2007).
22. Vesper,H.W., Miller,W.G. & Myers,G.L. Reference
materials and commutability. Clin. Biochem. Rev. 28,
139–147 (2007).
23. Miller,W.G., Myers,G.L. & Rej,R. Why commutability
matters. Clin. Chem. 52, 553–554 (2006).
24. Sims,D., Sudbery,I., Ilott,N.E., Heger,A. &
Ponting,C.P. Sequencing depth and coverage: key
considerations in genomic analyses. Nat. Rev. Genet.
15, 121–132 (2014).
25. Chen,L., Liu,P., Evans,T.C. & Ettwiller,L.M. DNA
damage is a pervasive cause of sequencing errors,
directly confounding variant identification.
Science 355, 752–756 (2017).
26. Zook,J.M., Samarov,D., McDaniel,J., Sen,S.K. &
Salit,M. Synthetic spike-in standards improve run-
specific systematic error analysis for DNA and RNA
sequencing. PLoS ONE 7, e41356 (2012).
27. White,G.H. & Farrance,I. Uncertainty of
measurement in quantitative medical testing:
a laboratory implementation guide. Clin. Biochem.
Rev. 25, S1–S24 (2004).
28. Ross,M.G. etal. Characterizing and measuring bias
in sequence data. Genome Biol. 14, R51 (2013).
29. O’Rawe,J. etal. Low concordance of multiple variant-
calling pipelines: practical implications for exome and
genome sequencing. Genome Med. 5, 28 (2013).
30. Reumers,J. etal. Optimized filtering reduces the error
rate in detecting genomic variants by short-read
sequencing. Nat. Biotechnol. 30, 61–68 (2012).
31. Lam,H.Y.K. etal. Performance comparison of whole-
genome sequencing platforms. Nat. Biotechnol. 30,
78–82 (2012).
32. Church,D.M. etal. Extending reference assembly
models. Genome Biol. 16, 13 (2015).
33. Torsvik,A. etal. U-251 revisited: genetic drift and
phenotypic consequences of long-term cultures of
glioblastoma cells. Cancer Med. 3, 812–824 (2014).
34. Zook,J.M. etal. Integrating human sequence data
sets provides a resource of benchmark SNP and indel
genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
The Genome in a Bottle Consortium used a range
of NGS technologies and analytical tools to
characterize the NA12878 genome and to provide
a set of high-confidence genotypes that can be
used to benchmark germline variant-calling
pipelines.
35. Parikh,H. etal. svclassify: a method to establish
benchmark structural variant calls. BMC Genomics
17, 64 (2016).
36. Eberle,M.A. etal. A reference data set of 5.4 million
phased human variants validated by genetic
inheritance from sequencing a three-generation
17-member pedigree. Genome Res. 27, 157–164
(2017).
37. Linderman,M.D. etal. Analytical validation of whole
exome and whole genome sequencing for clinical
applications. BMC Med. Genomics 7, 20 (2014).
38. Zook,J.M. etal. Extensive sequencing of seven
human genomes to characterize benchmark reference
materials. Sci. Data 3, 160025 (2016).
39. Ball,M.P. etal. A public resource facilitating clinical
use of genomes. Proc. Natl Acad. Sci. USA 109,
11920–11927 (2012).
40. Seo,J.-S. etal. De novo assembly and phasing of
a Korean human genome. Nature 538, 243–247
(2016).
41. Gudbjartsson,D.F. etal. Large-scale whole-genome
sequencing of the Icelandic population. Nat. Genet.
47, 435–444 (2015).
42. Cao,H. etal. De novo assembly of a haplotype-
resolved human genome. Nat. Biotechnol. 33,
617–622 (2015).
43. Besenbacher,S. etal. Novel variation and denovo
mutation rates in population-wide denovo assembled
Danish trios. Nat. Commun. 6, 5969 (2015).
44. Kalman,L.V. etal. Development of a genomic DNA
reference material panel for Rett syndrome
(MECP2-related disorders) genetic testing.
J.Mol. Diagn. 16, 273–279 (2014).
45. Kalman,L. etal. Development of a genomic DNA
reference material panel for myotonic dystrophy
type1 (DM1) genetic testing. J.Mol. Diagn. 15,
518–525 (2013).
46. Kalman,L. etal. Quality assurance for Duchenne and
Becker muscular dystrophy genetic testing.
J.Mol. Diagn. 13, 167–174 (2011).
47. Pratt,V.M. etal. Development of genomic reference
materials for cystic fibrosis genetic testing.
J.Mol. Diagn. 11, 186–193 (2009).
48. Amos Wilson,J. etal. Consensus characterization of
16 FMR1 reference materials: a consortium study.
J.Mol. Diagn. 10, 2–12 (2008).
49. Kalman,L. etal. Development of genomic reference
materials for Huntington disease genetic testing.
Genet. Med. 9, 719–723 (2007).
50. Pratt,V.M. etal. Characterization of 137 genomic
DNA reference materials for 28 pharmacogenetic
genes. J.Mol. Diagn. 18, 109–123 (2016).
This paper illustrates the process undertaken by
GeT-RM to develop reference materials for genetic
testing, including characterization by multiple
laboratories and subsequent consensus verification
of genotypes.
51. Pratt,V.M. etal. Characterization of 107 genomic
DNA reference materials for CYP2D6, CYP2C19,
CYP2C9, VKORC1, and UGT1A1: a GeT-RM and
Association for Molecular Pathology collaborative
project. J.Mol. Diagn. 12, 835–846 (2010).
52. Tsongalis,G.J. etal. Routine use of the Ion Torrent
AmpliSeq Cancer Hotspot Panel for identification of
clinically actionable somatic mutations.
Clin. Chem. Lab. Med. 52, 707 (2014).
53. Jarvis,M. etal. A novel method for creating artificial
mutant samples for performance evaluation and
quality control in clinical molecular genetics.
J.Mol. Diagn. 7, 247–251 (2005).
54. Craig,D.W. etal. A somatic reference standard for
cancer genome sequencing. Sci. Rep. 6, 24607 (2016).
55. Griffith,M. etal. Optimizing cancer genome
sequencing and analysis. Cell Syst. 1, 210–223
(2015).
This characterization of matched tumour and
normal samples shows the requirement for deep
sequencing to reveal the diversity of somatic
mutations and subclonal populations, with the
resulting data providing a useful resource for the
bioinformatic analysis of tumour samples.
56. Pleasance,E.D. etal. A comprehensive catalogue of
somatic mutations from a human cancer genome.
Nature 463, 191–196 (2010).
57. Zook,J.M. & Salit,M. Advancing benchmarks for
genome sequencing. Cell Syst. 1, 176–177 (2015).
58. Denroche,R.E. etal. A cancer cell-line titration series
for evaluating somatic classification. BMC Res. Notes
8, 823 (2015).
59. SEQC/MAQC-III Consortium. A comprehensive
assessment of RNA-seq accuracy, reproducibility and
information content by the Sequencing Quality Control
Consortium. Nat. Biotechnol. 32, 903–914 (2014).
This is a comprehensive study of RNA-seq accuracy
and reproducibility across multiple sequencing
platforms and laboratory sites, using human
reference RNA samples spiked with the ERCC
controls.
60. Conesa,A. etal. A survey of best practices for RNA-
seq data analysis. Genome Biol. 17, 13 (2016).
61. Novoradovskaya,N. etal. Universal Reference RNA as
a standard for microarray experiments.
BMC Genomics 5, 20 (2004).
62. ‘t Hoen,P.A.C. etal. Reproducibility of high-
throughput mRNA and small RNA sequencing across
laboratories. Nat. Biotechnol. 31, 1015–1022 (2013).
63. Li,S. etal. Multi-platform assessment of
transcriptome profiling using RNA-seq in the ABRF
next-generation sequencing study. Nat. Biotechnol.
32, 915–925 (2014).
64. White,H.E. etal. Establishment of the first World
Health Organization International Genetic Reference
Panel for quantitation of BCR-ABL mRNA. Blood 116 ,
e111–e117 (2010).
65. Escobar-Zepeda,A., Vera-Ponce de León,A. &
Sanchez-Flores,A. The road to metagenomics: from
microbiology to DNA sequencing technologies and
bioinformatics. Front. Genet. 6, 348 (2015).
66. Brown,C.T. etal. Unusual biology across a group
comprising more than 15% of domain bacteria.
Nature 523, 208–211 (2015).
67. Olson,N.D. etal. Best practices for evaluating single
nucleotide variant calling methods for microbial
genomics. Front. Genet. 6, 235 (2015).
68. Parada,A.E., Needham,D.M. & Fuhrman,J.A. Every
base matters: assessing small subunit rRNA primers
for marine microbiomes with mock communities, time
series and global field samples. Environ. Microbiol.
18, 1403–1414 (2016).
69. The Human Microbiome Project Consortium.
A framework for human microbiome research.
Nature 486, 215–221 (2012).
70. Jumpstart Consortium Human Microbiome Project
Data Generation Working Group. Evaluation of 16S
rDNA-based community profiling for human
microbiome research. PLoS ONE 7, e39315 (2012).
The Human Microbiome Project developed a mock
community of microbes commonly found on or in
the human body, which has been used to
benchmark metagenome sequencing and analysis.
71. Sinha,R., Abnet,C.C., White,O., Knight,R. &
Huttenhower,C. The microbiome quality control
project: baseline study design and future directions.
Genome Biol. 16, 276 (2015).
72. Singer,E. etal. High-resolution phylogenetic microbial
community profiling. ISME J. 10, 2020–2032 (2016).
73. The External RNA Controls Consortium. The External
RNA Controls Consortium: a progress report.
Nat. Methods 2, 731–734 (2005).
74. Sims,D.J. etal. Plasmid-based materials as multiplex
quality controls and calibrators for clinical next-
generation sequencing assays. J.Mol. Diagn. 18,
336–349 (2016).
75. Quail,M.A. etal. SASI-Seq: sample assurance spike-
ins, and highly differentiating 384 barcoding for
Illumina sequencing. BMC Genomics 15, 110 (2014).
76. Strom,C.M. etal. Technical validation of a multiplex
platform to detect thirty mutations in eight genetic
diseases prevalent in individuals of Ashkenazi Jewish
descent. Genet. Med. 7, 633–639 (2005).
77. Deveson,I.W. etal. Representing genetic variation
with synthetic DNA standards. Nat. Methods 13,
784–791 (2016).
This study presents a set of synthetic spike-in
controls representing DNA variants (SNVs, indels
and structural variants), which can function as
qualitative and quantitative controls for genome
sequencing.
78. Kudalkar,E.M. etal. Multiplexed reference materials
as controls for diagnostic next-generation sequencing.
J.Mol. Diagn. 18, 882–889 (2016).
79. The External RNA Controls Consortium. Proposed
methods for testing and selecting the ERCC external
RNA controls. BMC Genomics 6, 150 (2005).
80. Cronin,M. etal. Universal RNA reference materials
for gene expression. Clin. Chem. 50, 1464–1471
(2004).
81. Paul,L. etal. SIRVs: Spike-In RNA Variants as external
isoform controls in RNA-sequencing. Preprint at
bioRxiv http://dx.doi.org/10.1101/080747 (2016).
82. Leshkowitz,D. etal. Using synthetic mouse spike-in
transcripts to evaluate RNA-seq analysis tools.
PLoS ONE 11, e0153782 (2016).
83. Hardwick,S.A. etal. Spliced synthetic genes as
internal controls in RNA sequencing experiments.
Nat. Methods 13, 792–798 (2016).
84. Locati,M.D. etal. Improving small RNA-seq by using
a synthetic spike-in set for size-range quality control
together with a set for data normalization.
Nucleic Acids Res. 43, e89 (2015).
85. Tembe,W.D. etal. Open-access synthetic spike-in
mRNA-seq data for cancer gene fusions.
BMC Genomics 15, 824 (2014).
86. Jiang,L. etal. Synthetic spike-in standards for RNA-
seq experiments. Genome Res. 21, 1543–1551 (2011).
This study used the ERCC controls to measure the
sensitivity, dynamic range, quantitative accuracy
and biases of RNA-seq experiments.
87. Munro,S.A. etal. Assessing technical performance in
differential gene expression experiments with external
spike-in RNA control ratio mixtures. Nat. Commun. 5,
5125 (2014).
88. Stegle,O., Teichmann,S.A. & Marioni,J.C.
Computational and analytical challenges in single-cell
transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).
REVIEWS
NATURE REVIEWS
|
GENETICS VOLUME 18
|
AUGUST 2017
|
483
89. Brennecke,P. etal. Accounting for technical noise in
single-cell RNA-seq experiments. Nat. Methods 10,
1093–1095 (2013).
90. Owens,N.D.L. etal. Measuring absolute RNA copy
numbers at high temporal resolution reveals
transcriptome kinetics in development. Cell Rep. 14,
632–647 (2016).
91. Ewing,A.D. etal. Combining tumor genome
simulation with crowdsourcing to benchmark somatic
single-nucleotide-variant detection. Nat. Methods 12,
623–630 (2015).
92. Daber,R., Sukhadia,S. & Morrissette,J.J.D.
Understanding the limitations of next generation
sequencing informatics, an approach to clinical
pipeline validation using artificial data sets. Cancer
Genet. 206, 441–448 (2014).
93. Escalona,M., Rocha,S. & Posada,D. A comparison of
tools for the simulation of genomic next-generation
sequencing data. Nat. Rev. Genet. 17, 459–469 (2016).
94. Engstrom,P.G. etal. Systematic evaluation of spliced
alignment programs for RNA-seq data. Nat. Methods
10, 1185–1191 (2013).
95. Patro,R., Duggal,G., Love,M.I., Irizarry,R.A. &
Kingsford,C. Salmon provides fast and bias-aware
quantification of transcript expression. Nat. Methods
14, 417–419 (2017).
96. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L.
Near-optimal probabilistic RNA-seq quantification.
Nat. Biotechnol. 34, 525–527 (2016).
97. Sheridan,C. Milestone approval lifts Illumina’s NGS
from research into clinic. Nat. Biotechnol. 32,
111–112 (2014).
98. Centers for Medicare and Medicaid Services. US
Department of Health and Human Services. Part 493
— Laboratory Requirements: Clinical Laboratory
Improvement Amendments of 1988. 42 CFR
§493.1443–1495 https://www.cdc.gov/clia/
Regulatory/default.aspx
99. Richards,C.S. & Grody,W.W. Alternative approaches
to proficiency testing in molecular genetics.
Clin. Chem. 49, 717–718 (2003).
100. Schrijver,I. etal. Methods-based proficiency testing in
molecular genetic pathology. J.Mol. Diagn. 16,
283–287 (2014).
101. Richards,C.S., Palomaki,G.E., Lacbawan,F.L., Lyon,E.
& Feldman,G.L. Three-year experience of a CAP/ACMG
methods-based external proficiency testing program for
laboratories offering DNA sequencing for rare inherited
disorders. Genet. Med. 16, 25–32 (2014).
102. Duncavage,E.J. etal. A model study of insilico
proficiency testing for clinical next-generation
sequencing. Arch. Pathol. Lab. Med. 140,
1085–1091 (2016).
103. Tang,W., Hu,Z., Muallem,H. & Gulley,M.L. Quality
assurance of RNA expression profiling in clinical
laboratories. J.Mol. Diagn. 14, 1–11 (2012).
104. Duncavage,E.J., Abel,H.J. & Pfeifer,J.D. In silico
proficiency testing for clinical next-generation
sequencing. J.Mol. Diagn. 19, 35–42 (2017).
105. Davies,K.D. etal. Multi-institutional FASTQ file
exchange as a means of proficiency testing for next-
generation sequencing bioinformatics and variant
interpretation. J.Mol. Diagn. 18, 572–579 (2016).
106. Altman,R.B. etal. A research roadmap for next-
generation sequencing informatics. Sci. Transl Med. 8,
335ps10 (2016).
107. Risso,D., Ngai,J., Speed,T.P. & Dudoit,S.
Normalization of RNA-seq data using factor analysis of
control genes or samples. Nat. Biotechnol. 32,
896–902 (2014).
These authors developed a normalization strategy
for RNA-seq termed RUV (remove unwanted
variation), which adjusts for nuisance technical
effects between samples by performing factor
analysis on suitable sets of control genes (for
example, RNA spike-ins).
108. Poplin,R. etal. Creating a universal SNP and small
indel variant caller with deep neural networks.
Preprint at bioRxiv http://dx.doi.org/10.1101/092890
(2016).
109. Zheng,G.X.Y. etal. Haplotyping germline and
cancer genomes with high-throughput linked-read
sequencing. Nat. Biotechnol. 34, 303–311 (2016).
110 . Singh,R.R. etal. Clinical validation of a next-
generation sequencing screen for mutational hotspots
in 46 cancer-related genes. J.Mol. Diagn. 15,
607–622 (2013).
111. Svensson,V. etal. Power analysis of single-cell RNA-
sequencing experiments. Nat. Methods 14, 381–387
(2017).
112 . Franzini,C. Commutability of reference materials in
clinical chemistry. J.Int. Fed. Clin. Chem. 5, 169–173
(1993).
113 . Lever,J., Krzywinski,M. & Altman,N. Points of
significance: classification evaluation. Nat. Methods
13, 603–604 (2016).
114 . Telenti,A. etal. Deep sequencing of 10,000 human
genomes. Proc. Natl Acad. Sci. USA 113 ,
11901–11906 (2016).
115 . Nielsen,R., Paul,J.S., Albrechtsen,A. & Song,Y.S.
Genotype and SNP calling from next-generation
sequencing data. Nat. Rev. Genet. 12, 443–451 (2011).
116 . Cibulskis,K. etal. Sensitive detection of somatic point
mutations in impure and heterogeneous cancer
samples. Nat. Biotechnol. 31, 213–219 (2013).
117 . Saito,T. & Rehmsmeier,M. The precision-recall plot is
more informative than the ROC plot when evaluating
binary classifiers on imbalanced datasets. PLoS ONE
10, e0118432 (2015).
118 . Armbruster,D.A. & Pry,T. Limit of blank, limit of
detection and limit of quantitation. Clin. Biochem. Rev.
29, S49–S52 (2008).
119 . Altman,N. & Krzywinski,M. Points of significance:
simple linear regression. Nat. Methods 12,
999–1000 (2015).
120. Robinson,M.D. & Oshlack,A. A scaling normalization
method for differential expression analysis of RNA-seq
data. Genome Biol. 11, R25 (2010).
121. Lin,C.Y. etal. Transcriptional amplification in
tumor cells with elevated c-Myc. Cell 151, 56–67
(2012).
122. Lovén,J. etal. Revisiting global gene expression
analysis. Cell 151, 476–482 (2012).
123. Stämmler,F. etal. Adjusting microbiome profiles for
differences in microbial load by spike-in bacteria.
Microbiome 4, 28 (2016).
Acknowledgements
The authors thank the following funding sources: Australian
National Health and Medical Research Council (NHMRC)
Australia Fellowship 1062470 (to T.R.M.). S.A.H. and I.W.D.
are supported by Australian Postgraduate Award scholar-
ships. The contents of the published material are solely the
responsibility of the administering institution, a participating
institution or individual authors and do not reflect the views
of NHMRC. The authors also thank L. Burnett (Kinghorn
Centre for Clinical Genomics, Australia) for helpful sugges-
tions during manuscript preparation.
Competing interests statement
The authors declare competing interests: see Web version for
details.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
FURTHER INFORMATION
College of American Pathologists (CAP) proficiency testing:
http://www.cap.org/web/home/lab/proficiency-testing
External RNA Controls Consortium (ERCC):
http://jimb.stanford.edu/ercc/
FDA dAtabase for Regulatory Grade micrObial Sequences
(FDA-ARGOS): https://www.ncbi.nlm.nih.gov/
bioproject/231221
Genetic Testing Reference Materials Coordination Program
(GeT-RM): https://wwwn.cdc.gov/clia/Resources/GETRM/
default.aspx
Genome in a Bottle Consortium (GIAB): http://jimb.stanford.
edu/giab/
Global Microbial Identifier (GMI) proficiency tests: http://
www.globalmicrobialidentifier.org/workgroups/about-the-
gmi-proficiency-tests
PrecisionFDA: https://precision.fda.gov/
Sequins: http://www.sequin.xyz/
ALL LINKS ARE ACTIVE IN THE ONLINE PDF
REVIEWS
484
|
AUGUST 2017
|
VOLUME 18 www.nature.com/nrg
... Reference materials with known characteristics have crucial roles in sequencing by providing standards for quality control and proficiency testing (Hardwick et al., 2017). The impacts of the common technical variables during the generation of metagenomic datasets, such as extraction kit, skills of operating person, extraction batch, library preparation batch, and even different sequencing platforms, cause difficulty in discerning confident biological signals, especially for longitudinal time-series studies or inter-laboratory comparison. ...
Article
Full-text available
Antimicrobial resistance (AMR) poses an increasing threat to global health. To deliberate the distribution and transmission of environmental resistance in a wide geographic and longitudinal scope, standardization of the analytical methods is desperately required for the extensive implementation of large-scale environmental antibiotic resistance gene (ARGs) surveillance. In this review, a standardized surveillance method using metagenomic analysis, coupled with proper quantification tools and environmental reference materials as technical benchmarks, was established to facilitate the generation of comparable and informative resistome data-sets. As global and long-term ARGs surveillance has recently been performed in various environmental compartments, increasing efforts are also needed for assessing the health risks of ARGs. The development of risk assessment schemes that incorporate factors including transfer potential , host species, viability, and absolute quantification is essential to the regulatory guidelines for high-risk priority ARGs. This review provides guidance to ARGs surveillance regarding the level and the risk of ARG exposure, especially to identify and address critical hotspots.
... ES and GS offer comprehensive genomic analysis, they may necessitate supplementary measures to enhance coverage in regions with low mappability, as achieved in MGP through complementary methods like Sanger sequencing and qPCR, augmenting depth and coverage [5][6][7][8]. Despite potentially lower coverage compared to MGP, EGBP, which use in silico target selection, presents an adaptable alternative, characterized by its ability to swiftly modify gene content and expedite analysis, which is particularly advantageous in the evolving domain of genetics [4]. ...
Article
Full-text available
Background Though next-generation sequencing (NGS) tests like exome sequencing (ES), genome sequencing (GS), and panels derived from exome and genome data (EGBP) are effective for rare diseases, the ideal diagnostic approach is debated. Limited research has explored reanalyzing raw ES and GS data post-negative EGBP results for diagnostics. Results: We analyzed complete ES/GS raw sequencing data from Mayo Clinic's Program for Rare and Undiagnosed Diseases (PRaUD) patients to assess whether supplementary findings could augment diagnostic yield. ES data from 80 patients (59 adults) and GS data from 20 patients (10 adults), averaging 43 years in age, were analyzed. Most patients had renal ( n =44) and auto-inflammatory ( n =29) phenotypes. Ninety-six cases had negative findings and in four cases additional genetic variants were found, including a variant related to a recently described disease (RRAGD-related hypomagnesemia), a variant missed due to discordant inheritance pattern ( COL4A3 ), a variant with high allelic frequency ( NPHS2 ) in the general population, and a variant associated with an initially untargeted phenotype ( HNF1A ). Conclusion: ES and GS show diagnostic yields comparable to EGBP for single-system diseases. However, EGBP's limitations in detecting new disease-associated genes underscore the necessity for periodic updates.
... Source data are provided in a Source Data File. proteomic technologies 8 . Well-characterised human genomes, such as NA12878, have been used to improve the accuracy and reliability of genome sequencing. ...
Article
Full-text available
The expression of genes encompasses their transcription into mRNA followed by translation into protein. In recent years, next-generation sequencing and mass spectrometry methods have profiled DNA, RNA and protein abundance in cells. However, there are currently no reference standards that are compatible across these genomic, transcriptomic and proteomic methods, and provide an integrated measure of gene expression. Here, we use synthetic biology principles to engineer a multi-omics control, termed pREF, that can act as a universal molecular standard for next-generation sequencing and mass spectrometry methods. The pREF sequence encodes 21 synthetic genes that can be in vitro transcribed into spike-in mRNA controls, and in vitro translated to generate matched protein controls. The synthetic genes provide qualitative controls that can measure sensitivity and quantitative accuracy of DNA, RNA and peptide detection. We demonstrate the use of pREF in metagenome DNA sequencing and RNA sequencing experiments and evaluate the quantification of proteins using mass spectrometry. Unlike previous spike-in controls, pREF can be independently propagated and the synthetic mRNA and protein controls can be sustainably prepared by recipient laboratories using common molecular biology techniques. Together, this provides a universal synthetic standard able to integrate genomic, transcriptomic and proteomic methods.
... Moreover, variations in DNA sheer sizes and the diversity of human genome material within clinical samples further affect the interpretation of the sequencing results. These factors collectively contribute to variations in the positive predictive value (PPV) and negative predictive value (NPV) of molecular assays in clinical diagnosis [13][14][15]. ...
Article
Full-text available
Metagenomic next-generation sequencing (mNGS) provides considerable advantages in identifying emerging and re-emerging, difficult-to-detect and co-infected pathogens; however, the clinical application of mNGS remains limited primarily due to the lack of quantitative capabilities. This study introduces a novel approach, KingCreate-Quantification (KCQ) system, for quantitative analysis of microbes in clinical specimens by mNGS, which co-sequence the target DNA extracted from the specimens along with a set of synthetic dsDNA molecules used as Internal-Standard (IS). The assay facilitates the conversion of microbial reads into their copy numbers based on IS reads utilizing a mathematical model proposed in this study. The performance of KCQ was systemically evaluated using commercial mock microbes with varying IS input amounts, different proportions of human genomic DNA, and at varying amounts of sequence analysis data. Subsequently, KCQ was applied in microbial quantitation in 36 clinical specimens including blood, bronchoalveolar lavage fluid , cerebrospinal fluid and oropharyngeal swabs. A total of 477 microbe genetic fragments were screened using the bioinformatic system. Of these 83 fragments were quantitatively compared with digital droplet PCR (ddPCR), revealing a correlation coefficient of 0.97 between the quantitative results of KCQ and ddPCR. Our study demonstrated that KCQ presents a practical approach for the quantitative analysis of microbes by mNGS in clinical samples.
... Low-coverage (or low-pass) whole-genome next-generation sequencing (NGS) is a low-cost technique with a short turnaround time, unprecedented resolution, reliable high-throughput, and minimal DNA requirements. It has been widely used in clinics [6]. Compared to CMA, NGS has significant advantages in terms of quality, speed, and affordability [7][8][9]. ...
Article
Full-text available
Purpose We evaluated the value of copy number variation sequencing (CNV-seq) and quantitative fluorescence (QF)-PCR for analyzing chromosomal abnormalities (CA) in spontaneous abortion specimens. Methods A total of 650 products of conception (POCs) were collected from spontaneous abortion between April 2018 and May 2020. CNV-seq and QF-PCR were performed to determine the characteristics and frequencies of copy number variants (CNVs) with clinical significance. The clinical features of the patients were recorded. Results Clinically significant chromosomal abnormalities were identified in 355 (54.6%) POCs, of which 217 (33.4%) were autosomal trisomies, 42(6.5%) were chromosomal monosomies and 40 (6.2%) were pathogenic CNVs (pCNVs). Chromosomal trisomy occurs mainly on chromosomes 15, 16, 18, 21and 22. Monosomy X was not associated with the maternal or gestational age. The frequency of chromosomal abnormalities in miscarriages from women with a normal live birth history was 55.3%; it was 54.4% from women without a normal live birth history (P > 0.05). There were no significant differences among women without, with 1, and with ≥ 2 previous miscarriages regarding the rate of chromosomal abnormalities (P > 0.05); CNVs were less frequently detected in women with advanced maternal age than in women aged ≤ 29 and 30–34 years (P < 0.05). Conclusion Chromosomal abnormalities are the most common cause of pregnancy loss, and maternal and gestational ages are strongly associated with fetal autosomal trisomy aberrations. Embryo chromosomal examination is recommended regardless of the gestational age, modes of conception or previous abortion status.
Preprint
Full-text available
A main limitation of bulk transcriptomic technologies is that individual measurements normally contain contributions from multiple cell populations, impeding the identification of cellular heterogeneity within diseased tissues. To extract cellular insights from existing large cohorts of bulk transcriptomic data, we present CSsingle, a novel method designed to accurately deconvolve bulk data into a predefined set of cell types using a scRNA-seq reference. Through comprehensive benchmark evaluations and analyses using diverse real data sets, we reveal the systematic bias inherent in existing methods, stemming from differences in cell size or library size. Our extensive experiments demonstrate that CSsingle exhibits superior accuracy and robustness compared to leading methods, particularly when dealing with bulk mixtures originating from cell types of markedly different cell sizes, as well as when handling bulk and single-cell reference data obtained from diverse sources. Our work provides an efficient and robust methodology for the integrated analysis of bulk and scRNA-seq data, facilitating various biological and clinical studies.
Article
Full-text available
High-throughput technologies for multiomics or molecular phenomics profiling have been extensively adopted in biomedical research and clinical applications, offering a more comprehensive understanding of biological processes and diseases. Omics reference materials play a pivotal role in ensuring the accuracy, reliability, and comparability of laboratory measurements and analyses. However, the current application of omics reference materials has revealed several issues, including inappropriate selection and underutilization, leading to inconsistencies across laboratories. This review aims to address these concerns by emphasizing the importance of well-characterized reference materials at each level of omics, encompassing (epi-)genomics, transcriptomics, proteomics, and metabolomics. By summarizing their characteristics, advantages, and limitations along with appropriate performance metrics pertinent to study purposes, we provide an overview of how omics reference materials can enhance data quality and data integration, thus fostering robust scientific investigations with omics technologies.
Article
Full-text available
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Preprint
Full-text available
Next-generation sequencing (NGS) is a rapidly evolving set of technologies that can be used to determine the sequence of an individual’s genome ¹ by calling genetic variants present in an individual using billions of short, errorful sequence reads ² . Despite more than a decade of effort and thousands of dedicated researchers, the hand-crafted and parameterized statistical models used for variant calling still produce thousands of errors and missed variants in each genome 3,4 . Here we show that a deep convolutional neural network ⁵ can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships (likelihoods) between images of read pileups around putative variant sites and ground-truth genotype calls. This approach, called DeepVariant, outperforms existing tools, even winning the “highest performance” award for SNPs in a FDA-administered variant calling challenge. The learned model generalizes across genome builds and even to other mammalian species, allowing non-human sequencing projects to benefit from the wealth of human ground truth data. We further show that, unlike existing tools which perform well on only a specific technology, DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, from deep whole genomes from 10X Genomics to Ion Ampliseq exomes. DeepVariant represents a significant step from expert-driven statistical modeling towards more automatic deep learning approaches for developing software to interpret biological instrumentation data.
Preprint
Full-text available
Spike-In RNA variants (SIRVs) enable for the first time the validation of RNA sequencing workflows using external isoform transcript controls. 69 transcripts, derived from seven human model genes, cover the eukaryotic transcriptome complexity of start- and end-site variations, alternative splicing, overlapping genes, and antisense transcription in a condensed format. Reference RNA samples were spiked with SIRV mixes, sequenced, and exemplarily four data evaluation pipelines were challenged to account for biases introduced by the RNA-Seq workflow. The deviations of the respective isoform quantifications from the known inputs allow to determine the comparability of sequencing experiments and to extrapolate to which degree alterations in an RNA-Seq workflow affect gene expression measurements. The SIRVs as external isoform controls are an important gauge for inter-experimental comparability and a modular spike-in contribution to clear the way for diagnostic RNA-Seq applications.
Article
Full-text available
Background Gestational diabetes mellitus (GDM) is increasing partly due to the obesity epidemic. Adipocytokines have thus been suggested as first trimester screening markers for GDM. In this study we explore the associations between body mass index (BMI) and serum concentrations of adiponectin, leptin, and the adiponectin/leptin ratio. Furthermore, we investigate whether these markers can improve the ability to screen for GDM in the first trimester. Methods A cohort study in which serum adiponectin and leptin were measured between gestational weeks 6+0 and 14+0 in 2590 pregnant women, categorized into normal weight, moderately obese, or severely obese. Results Lower concentrations of adiponectin were associated with GDM in all BMI groups; the association was more pronounced in BMI<35 kg/m Conclusions Low adiponectin measured in the first trimester is associated with the development of GDM; higher BMI was associated with lower performance of adiponectin, though this was insignificant. Leptin had an inverse relationship with GDM in severely obese women and did not improve the ability to predict GDM.
Article
Full-text available
When is a mutation a true genetic variant? Large-scale sequencing studies have set out to determine the low-frequency pathogenic genetic variants in individuals and populations. However, Chen et al. demonstrate that many so-called low-frequency genetic variants in large public databases may be due to DNA damage. They scored libraries sequenced with and without a DNA damage–repairing enzymatic mix to assess the proportion of true rare variants. It remains to be seen how best to repair DNA before sequencing to provide more accurate assessments of mutation. Science , this issue p. 752
Article
This paper reviews the definition and the use of standards and the standards available for clinical chemists, and discusses the limitations of various so-called standards.
Article
Background Prostate-specific antigen (PSA) test is of paramount importance as a diagnostic tool for the detection and monitoring of patients with prostate cancer. In the presence of interfering factors such as heterophilic antibodies or anti-PSA antibodies the PSA test can yield significantly falsified results. The prevalence of these factors is unknown. Methods We determined the recovery of PSA concentrations diluting patient samples with a standard serum of known PSA concentration. Based on the frequency distribution of recoveries in a pre-study on 268 samples, samples with recoveries <80% or >120% were defined as suspect, re-tested and further characterized to identify the cause of interference. Results A total of 1158 consecutive serum samples were analyzed. Four samples (0.3%) showed reproducibly disturbed recoveries of 10%, 68%, 166% and 4441%. In three samples heterophilic antibodies were identified as the probable cause, in the fourth anti-PSA-autoantibodies. The very low recovery caused by the latter interference was confirmed in serum, as well as heparin- and EDTA plasma of blood samples obtained 6 months later. Analysis by eight different immunoassays showed recoveries ranging between <10% and 80%. In a follow-up study of 212 random plasma samples we found seven samples with autoantibodies against PSA which however did not show any disturbed PSA recovery. Conclusions About 0.3% of PSA determinations by the electrochemiluminescence assay (ECLIA) of Roche diagnostics are disturbed by heterophilic or anti-PSA autoantibodies. Although they are rare, these interferences can cause relevant misinterpretations of a PSA test result.
Article
Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.
Article
Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.
Article
We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA–seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis. © 2017 Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.