ArticlePDF Available

SNP typing using the HID-Ion AmpliSeq™ Identity Panel in a southern Chinese population

Authors:

Abstract and Figures

In the present study, 90 autosomal single nucleotide polymorphisms (SNPs) and 34 Y chromosomal SNPs were sequenced simultaneously using HID-Ion AmpliSeq™ Identity Panel on the Ion PGM™ platform for 125 samples in a southern Chinese population. Raw data were analyzed and forensic parameters were calculated. Haplogrouping concordance was also assessed using alternative methods based on Y-SNP haplotypes and Y-STR haplotypes. The results showed that allelic imbalance occurred more frequently with low coverage while several SNPs with high coverage were also observed with poor allelic balance, including rs214955, rs430046, rs7520386, rs876724, rs9171188, rs16981290, and rs2032631. Totally, 21,261 miscalled reads (0.28%) were observed. The rate of allele-specific miscalled reads (ASMRs) was higher than that of allele nonspecific miscalled reads (ANMRs) and associated with genetic diversity of the SNP. The ASMRs of major allele were lower than that of minor allele while there was no difference for ANMRs. The combined discrimination power (CDP) was 1–4.81 × 10⁻³⁴ and the combined power of exclusion (CPE) was 0.99989 and 0.99999992 for duo and trio paternity testing, respectively. No significant genetic difference was detected between southern and northern Chinese populations. For haplogroup study, O2 was the predominant haplogroup and 97.01% of samples were assigned consistent haplogoups with Y-SNP and Y-STR haplotypes. In conclusion, the AmpliSeq™ Identity Panel was powerful for individual identification and trio paternity testing. ASMRs were associated with the genetic diversity and allele frequency while neither was related for ANMRs. High concordance of haplogrouping assignment can be obtained with Y-STR and Y-SNP haplotypes.
This content is subject to copyright. Terms and conditions apply.
ORIGINAL ARTICLE
SNP typing using the HID-Ion AmpliSeqIdentity Panel
in a southern Chinese population
Ran Li
1
&Chuchu Zhang
1
&Haiyan Li
2
&Riga Wu
1
&Haixia Li
1
&Zhenya Tang
2
&
Chenhao Zhen
3
&Jianye Ge
4
&Dan Peng
1
&Ying Wang
1
&Hongying Chen
2
&
Hongyu Sun
1,5
Received: 19 March 2017 /Accepted: 11 October 2017 /Published online: 18 October 2017
#Springer-Verlag GmbH Germany 2017
Abstract In the present study, 90 autosomal single nucleotide
polymorphisms (SNPs) and 34 Y chromosomal SNPs were
sequenced simultaneously using HID-Ion AmpliSeq
Identity Panel on the Ion PGMplatform for 125 samples
in a southern Chinese population. Raw data were analyzed
and forensic parameters were calculated. Haplogrouping con-
cordance was also assessed using alternative methods based
on Y-SNP haplotypes and Y-STR haplotypes. The results
showed that allelic imbalance occurred more frequently with
low coverage while several SNPs with high coverage were
also observed with poor allelic balance, including rs214955,
rs430046, rs7520386, rs876724, rs9171188, rs16981290, and
rs2032631. Totally, 21,261 miscalled reads (0.28%) were ob-
served. The rate of allele-specific miscalled reads (ASMRs)
was higher than that of allele nonspecific miscalled reads
(ANMRs) and associated with genetic diversity of the SNP.
The ASMRs of major allele were lower than that of minor
allele while there was no difference for ANMRs. The com-
bined discrimination power (CDP) was 14.81 × 10
34
and
the combined power of exclusion (CPE) was 0.99989 and
0.99999992 for duo and trio paternity testing, respectively.
No significant genetic difference was detected between south-
ern and northern Chinese populations. For haplogroup study,
O2 was the predominant haplogroup and 97.01% of samples
were assigned consistent haplogoups with Y-SNP and Y-STR
haplotypes. In conclusion, the AmpliSeqIdentity Panel was
powerful for individual identification and trio paternity test-
ing. ASMRs were associated with the genetic diversity and
allele frequency while neither was related for ANMRs. High
concordance of haplogrouping assignment can be obtained
with Y-STR and Y-SNP haplotypes.
Keywords Single nucleotide polymorphism (SNP) .Next
generation sequencing (NGS) .Ion torrent PGM.
Population genetics .Miscalled reads
Introduction
Single nucleotide polymorphism (SNP), with lower mutation
rates and smaller amplicon sizes compared with routinely used
short tandem repeats (STR), is being considered as a poten-
tially useful tool in forensic human identification [1,2]. Due to
the di-allelic nature of SNP, the per-locus discrimination pow-
er is weaker than that of STR, while it can be compensated by
typing additional independent loci [3]. Several autosomal
SNP marker sets have been developed with various genotyp-
ing methods, including single-base extension, chip-based mi-
croarrays, and allele-specific hybridization arrays, [1,46].
However, either due to small number of SNP loci in a single
Ran Li and Chuchu Zhang contributed equally to the article.
Electronic supplementary material The online version of this article
(https://doi.org/10.1007/s00414-017-1706-3) contains supplementary
material, which is available to authorized users.
*Hongyu Sun
sunhy@mail.sysu.edu.cn; sunhongyu2002@163.com
1
Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun
Yat-sen University, Guangzhou 510080, Peoples Republic of China
2
The Center of Criminal Technology of Guangdong Province,
Guangzhou 510050, PeoplesRepublicofChina
3
The Second Clinical Medical School (Zhujiang Hospital), Southern
Medical University, Guangzhou 510280, PeoplesRepublicofChina
4
Thermo Fisher Scientific Inc, South San Francisco, CA 94080, USA
5
Guangdong Province Translational Forensic Medicine Engineering
Technology Research Center, Zhongshan School of Medicine, Sun
Yat-sen University, Guangzhou 510089, Guangdong, Peoples
Republic of China
Int J Legal Med (2018) 132:9971006
https://doi.org/10.1007/s00414-017-1706-3
analysis or requesting large amount of input DNA, these sets
were not widely used by forensic DNA labs [7,8].
Recently, massively parallel sequencing (MPS) or next-
generation sequencing (NGS) technologies, with acceptable
sequencing accuracy and costs, are highly interesting for the
forensic genetic community. They provide the possibility to
detect several hundred to thousand markers (including differ-
ent kinds of makers, e.g., SNP and STR) simultaneously and
also allow multiple samples to be processed in a joint sequenc-
ing run using sample-tagging DNA barcodes. Furthermore,
detailed sequence information pertaining to target regions
can also be generated using this technology [911]. The Ion
Torrent Personal Genome Machine (PGM) was launched in
early 2011 and is the first commercial sequencing machine
that does not require fluorescence and camera scanning,
resulting in higher speed, lower cost, and smaller instrument
size [10,12]. In addition, a study by Elena et al. showed that
based on this platform, it was possible to obtain consistent
SNP profiles with 31 pg of DNA and partial informative pro-
files with as little as 5 pg or with severely degraded DNA [13].
The HID-Ion AmpliSeqIdentity Panel (HID Identity Panel)
released by Thermo Fisher Scientific co-amplifies 90 autoso-
mal SNPs (A-SNPs) and 34 Y chromosomal SNPs (Y-SNPs),
which were selected based on the study of Pakstis et al. [5],
Sanchez et al. [4], and Karafet et al. [14]. It was reported that
powerful capacity for personal identification could be gener-
ated using this panel [14].
Previous studies based on this panel were performed on
relatively small population size and no population data had
been done for Guangdong province in south China, especially
for the Y-SNP typing. Therefore, further exploration was con-
ducted in the present study.
Materials and methods
Samples, DNA extraction, and DNA quantification
Peripheral blood samples from 101 male and 24 female
unrelated individuals in Guangdong province in south
China were collected with informed consent. DNA was
extracted on the AutoMate ExpressForensic DNA
Extraction System (Thermo Fisher Scientific, MA,
USA) with the PrepFiler Express BAT Forensic
DNA Extraction Kit (Thermo Fisher). DNA extracts
were quantified on the Qubit ®2.0 fluorometer
(Thermo Fisher) using the Qubit ®dsDNA HS Assay
Kit (Thermo Fisher) according to the manufacturers
protocol. The study was approved by the Human
Subjects Committee of Sun Yat-sen University (No.
2016-008).
Y-STR genotyping and haplogrouping
All of the male samples were genotyped using AmpFLSTR®
YfilerPCR Amplification kit (Thermo Fisher) and Y
haplogroups were predicted using Whit AtheysHaplogroup
Predictor (http://www.hprg.com/hapest5/index.html)[16,17]
with minimum fitness score = 20, minimum probability =
85%, and area priors = Bequal priors.^
Library preparation, purification, and quantification
Libraries were constructed using the Ion AmpliSeqLibrary
Kit 2.0 and the Ion AmpliSeqIdentity Panel v2.3 (Thermo
Fisher) following the manufacturers recommendations. A to-
tal of 1 ng of inputDNA was processed using the GeneAmp®
9700 System (Thermo Fisher) and the following thermal cy-
cling conditions: 2 min at 99 °C, 15 s at 99 °C, and 4 min at
60 °C for 21 cycles, and a final hold at 10 °C. 2 μL of FuPa
reagent (Thermo Fisher) was added to digest excess PCR
primers. The reactions were then incubated for 10 min at
50 °C, 10 min at 55 °C, and 20 min at 60 °C with a final hold
at 10 °C. The libraries were barcoded using the Ion Xpress
barcode adapter (Thermo Fisher) with the following tempera-
ture incubation steps: 22 °C for 30 min, 72 °C for 10 min, and
a hold step at 10 °C. Libraries were then purified using 1.5 ×
Agencourt®AMPure®XP reagent (Beckman Coulter, FL,
USA) according to the manufacturers instructions. Purified
libraries were quantified on ABI 7500 Real-time PCR System
with the Ion Library Quantitation Kit (Thermo Fisher) and
subsequently diluted to 20 pM. All barcoded libraries were
equivolume mixed.
Emulsion PCR and sequencing
Emulsion PCR (emPCR) was performed on the OneTouch
2 (OT2) instrument (Thermo Fisher) with the Ion PGM
Template OT2 200 Kit (Thermo Fisher) and template-
positive Ion Sphere Particles (ISPs) were enriched on the Ion
OneTouchES instrument (Thermo Fisher). Sequencing was
performed using the Ion PGMSequencing 200 Kit v2 on
Ion 314or 316chips (depending on the sample size)
following the manufacturersprotocols.
Data analysis
Raw data were processed using the Ion Torrent Suite Sever
version 4.6 (Thermo Fisher). Homo sapiens hg 19 was used as
reference genome to facilitate alignment. The
HID_SNP_Genotyper plugin v.4.3.1 was launched to facili-
tate the genotyping of SNPs with germline low stringency.
This plugin was also used to generate comprehensive analysis
reports including CSV files containing detailed mapping,
998 Int J Legal Med (2018) 132:9971006
genotype, coverage, and quality check information for each
sample in the run.
Statistics
The CSV files were further analyzed using Microsoft Excel
2010. The frequency of major allele reads (F
MAR
)was
adopted to assess allelic balance [16,17]. Y-SNPs with no
calls were re-genotyped by checking at the data manually
and making an allele call with a minimum coverage
and a minimum F
MAR
50%. Base miscalling in autosomal
SNP homozygotes or Y-SNPs, which included allele-specific
miscalled reads (ASMR, defined as the miscalled reads of
alternative locus-specific allele) and allele nonspecific
miscalled reads (ANMR, defined as the miscalled reads of
none locus-specific alleles) was analyzed separately. Cervus
3.0 [18] was employed to calculate allele frequency observed
and expected heterozygosity (H
obs
and H
Exp
), matching prob-
ability (MP), discrimination power (DP), polymorphism infor-
mation content (PIC), exclusion probability for duo paternity
testing (PE
duo
), and trio paternity testing (PE
trio
). Hardy-
Weinberg equilibrium (HWE), linkage disequilibrium (LD),
and Fstatistic (Fst) were calculated utilizing Arlequin 3.5
[19]. To compare distribution differences between southern
and northern Chinese Han populations as well as other coun-
tries and continents, frequency data of autosomal SNPs were
downloaded from SPSmart (http://spsmart.cesga.es/)[20]. For
Y-SNPs, the genetic diversity (GD) among individuals was
calculated as GD = 1 −∑pi
2
,wherepi represented the fre-
quency of the ith allele. Y-SNP haplotypes were manually
counted and haplotype diversity (HD) was calculated as
HD = [N(1 −∑pi
2
)] / (N1), where Nrepresented the number
of haplotypes and pi represented the frequency of the ith
haplotype.
Y-SNP haplogrouping
Haplogroup assignment was determined according to the
International Society of Genetic Genealogy (ISOGG) Y-
DNA Haplogroup Tree 2017 (http://www.isogg.org/tree/)
[21]. Concordances between Y-SNP-based haplogrouping
and Y-STR-based haplogrouping were explored.
Results
Coverage and allele balance
Figure 1showed the coverage variations for each SNP.
Compared withautosomal SNPs, Y-SNPs displayed relatively
lower coverage (400 ± 450× vs 950 ± 989× on average),
which might be explained by the single-copy nature of the Y
chromosome. For autosomal SNPs, the highest coverage was
observed for rs13218440 (1972 ± 1272×) while the lowest
coverage was only 169 ± 112× for rs2342747. Other ineffi-
ciently amplified (< 300×) SNPs were rs876724(231 ± 175×),
rs10488710 (253 ± 196×), rs729172 (263 ± 311×), rs993934
(254 ± 380×), and rs12997453 (253 ± 368×).
As shown in Fig. 2, significant differences were observed
for F
MAR
(%) value of homozygotes and heterozygotes. The
F
MAR
(%) value for most homozygotes was > 90%, apart from
two SNPs in three individuals with critical values of 89.3%
(rs2046361), 89.5% (rs7520386), and 89.8% (rs7520386), re-
spectively. The F
MAR
(%) values for heterozygotes were
mainly between 50 and 60% and most loci showed good al-
lelic balance except five: rs214955, rs430046, rs7520386,
rs876724, and rs9171188, of which rs7520386 performed
the worst. Unusual allelic balance in heterozygous
(F
MAR
> 65%) showed relatively low coverage (< 160 on
average). Similar results were obtained for Y-SNPs and the
F
MAR
was all > 90% except four samples with low coverage
of 6×, 7×, 7×, and 38×, respectively. Two loci (rs16981290
and rs203263) exhibited reduced allelic balance values in
comparison with the other Y-SNPs.
Miscalled reads and miscalled rates
Miscalled reads were defined as reads with base calls that
differed from the SNP genotype calls, which encompassed
ASMRs and ANMRs (see example in Fig. 3a). For
Fig. 1 The coverage of autosomal SNPs and Y-SNPs
Int J Legal Med (2018) 132:9971006 999
autosomal SNPs, only the miscalled reads of homozygotes
werecountedduetothefactthatitwasdifficulttoidentify
ASMRs for heterozygotes. In total, 21,261 miscalled reads
(0.28%) were observed out of 7,631,248 total reads in
6492 homozygous autosomal SNPs and 3434 Y-SNPs,
which was a very small part of the total reads. Among
them, most were ASMRs, which were over four times
greater than that observed for ANMRs (Fig. 3b). To
Fig. 2 Allelic balance of 90 autosomal SNPs and 34 Y-SNPs. The circle, cross, and rhombus represent homozygotes for autosomal SNPs and
heterozygotes for autosomal SNPs and Y-SNPs, respectively
Fig. 3 Allele-specific miscalled
reads (ASMRs) and allele non-
specific miscalled reads
(ANMRs) for autosomal homo-
zygotes and Y-SNPs. aThe illus-
tration of CCRs, ASMRs, and
ANMRs. bThe proportion of
CCRs, ASMRs, and ANMRs. c
Plots of miscalled reads and total
reads per sample (heterozygotes
for autosomal SNPs were
excluded)
1000 Int J Legal Med (2018) 132:9971006
explore the relationship between miscalled reads and total
reads, a function was developed (Fig. 3c) and a good linear
correlation was observed. On average, the miscalled rate
was 0.26 ± 0.16% for ASMRs and 0.06 ± 0.05% for
ANMRs per sample. Additionally, different rates were ob-
served among autosomal SNPs, highly polymorphic (two
kinds of alleles were observed in the present population)
and poorly polymorphic (only one allele was observed in
the present population) Y-SNPs, and the rate of ASMRs
was highly associated with genetic diversity (He or GD).
As shown in Fig. 4a, the slope for ANMRs was quite flat
compared with that of ASMRs (0.037 vs 0.209). The
ASMRs of major allele (allele with higher frequency for
one SNP) was lower than that of minor allele (allele with
lower frequency), while there was no significant difference
for ANMRs for both autosomal SNPs and Y-SNPs
(Fig. 4b, c).
Allele frequency of autosomal SNPs and forensic
parameters
A total of 149 no calls were observed for 11,250 SNPs
(90 × 125). These no calls were observed mainly due to low
coverage (17× on average) and the mean value of call rate per
sample was 98.68%. Of the samples, 74.4% (93 out of 125)
were fully genotyped for autosomal SNPs and 10 BNN^
(which means Bno call^and is considered invalid genotype
which fails allele calling by Torrent variant caller software)
out of 90 SNPs were detected for four samples. The allele
frequency, H
Obs
,H
Exp
,PIC,PE
duo
,PE
trio
, and DP values are
shown in Table 1. PIC ranged from 0.095 to 0.375 where
rs74091 and rs25193 were the least polymorphic SNPs.
Three SNPs, rs1058083, rs10773760, and rs7520386, failed
the Hardy-Weinberg equilibrium test (p<0.05),andallofthe
SNPs passed after Bonferroni correction (p=0.05/90)except
Fig. 4 The rate of allele-specific miscalled reads (ASMRs) and allele
nonspecific miscalled reads (ANMRs) for autosomal SNPs and Y-
SNPs. aThe relationship between miscalled rate and He (or GD). The
miscalled rates were the median of former and later 50% ordered by He
for autosomal SNPs or by GD for Y-SNPs. bComparison of the miscalled
rates between major and minor allele for autosomal SNPs (tested by one-
way ANOVA). cComparison of the miscalled rate between major and
minor allele for Y-SNPs (tested by Mann-Whitney Utest)
Int J Legal Med (2018) 132:9971006 1001
Tabl e 1 The frequencies and related forensic parameters of 90 autosomal SNPs in the population from Guangdong province, South China (N=125)
Locus Allele Frequency H
Obs
H
Exp
PIC MP PE
duo
PE
tio
DP HWE
rs1490413 A/G 0.604/0.396 0.488 0.480 0.364 0.387 0.114 0.182 0.613 1.000
rs7520386 A/G 0.648/0.352 0.240 0.458 0.352 0.400 0.104 0.176 0.600 < 10
3
**
rs4847034 A/G 0.556/0.444 0.488 0.496 0.372 0.378 0.122 0.186 0.622 1.000
rs560681 A/G 0.632/0.368 0.496 0.467 0.357 0.394 0.108 0.178 0.606 0.566
rs10495407 G/A 0.726/0.274 0.435 0.400 0.319 0.442 0.079 0.159 0.558 0.370
rs891700 G/A 0.512/0.487 0.542 0.502 0.375 0.375 0.125 0.188 0.625 0.465
rs1413212 C/T 0.536/0.464 0.496 0.499 0.374 0.376 0.124 0.187 0.624 1.000
rs876724 C/T 0.516/0.484 0.500 0.502 0.375 0.375 0.125 0.187 0.625 1.000
rs1109037 G/A 0.548/0.452 0.520 0.497 0.373 0.377 0.123 0.186 0.623 0.719
rs993934 G/A 0.532/0.468 0.551 0.500 0.374 0.376 0.124 0.187 0.624 0.334
rs12997453 G/A 0.717/0.283 0.443 0.407 0.323 0.436 0.082 0.162 0.564 0.369
rs907100 C/G 0.537/0.463 0.500 0.499 0.374 0.376 0.124 0.187 0.624 1.000
rs1357617 T/A 0.800/0.200 0.304 0.321 0.269 0.514 0.051 0.134 0.486 0.577
rs4364205 G/T 0.588/0.412 0.536 0.486 0.367 0.383 0.117 0.184 0.617 0.271
rs1872575 A/G 0.614/0.386 0.463 0.476 0.362 0.389 0.112 0.181 0.611 0.849
rs1355366 T/C 0.826/0.174 0.248 0.288 0.246 0.550 0.041 0.123 0.450 0.196
rs6444724 T/C 0.512/0.488 0.464 0.502 0.375 0.375 0.125 0.187 0.625 0.473
rs2046361 A/T 0.504/0.496 0.455 0.502 0.375 0.375 0.125 0.187 0.625 0.345
rs6811238 G/T 0.633/0.367 0.444 0.466 0.357 0.395 0.108 0.178 0.605 0.698
rs1979255 G/C 0.564/0.436 0.504 0.494 0.371 0.379 0.121 0.185 0.621 0.858
rs717302 A/G 0.832/0.168 0.304 0.281 0.240 0.558 0.039 0.120 0.442 0.522
rs159606 G/A 0.700/0.300 0.488 0.422 0.332 0.425 0.088 0.166 0.575 0.089
rs7704770 A/G 0.628/0.372 0.488 0.469 0.358 0.393 0.109 0.179 0.607 0.704
rs251934 A/G 0.940/0.060 0.120 0.113 0.106 0.793 0.006 0.053 0.207 1.000
rs338882 A/G 0.544/0.456 0.464 0.498 0.373 0.377 0.123 0.187 0.623 0.474
rs13218440 G/A 0.592/0.408 0.464 0.485 0.366 0.384 0.117 0.183 0.616 0.713
rs214955 T/C 0.500/0.500 0.552 0.502 0.375 0.375 0.125 0.188 0.625 0.286
rs727811 T/G 0.657/0.343 0.504 0.453 0.349 0.403 0.102 0.175 0.597 0.228
rs6955448 C/T 0.704/0.296 0.448 0.418 0.330 0.427 0.087 0.165 0.573 0.521
rs917118 C/T 0.732/0.268 0.424 0.394 0.315 0.446 0.077 0.158 0.554 0.493
rs321198 C/T 0.584/0.416 0.480 0.488 0.368 0.382 0.118 0.184 0.618 0.855
rs737681 C/T 0.884/0.116 0.168 0.206 0.184 0.653 0.021 0.092 0.347 0.059
rs10092491 C/T 0.656/0.344 0.426 0.453 0.350 0.403 0.102 0.175 0.597 0.550
rs4288409 C/A 0.612/0.388 0.488 0.477 0.362 0.388 0.113 0.181 0.612 0.852
rs2056277 C/T 0.844/0.156 0.262 0.264 0.228 0.578 0.035 0.114 0.422 1.000
rs1015250 G/C 0.556/0.444 0.456 0.496 0.372 0.378 0.122 0.186 0.622 0.467
rs7041158 C/T 0.720/0.280 0.447 0.405 0.322 0.437 0.081 0.161 0.563 0.270
rs1463729 C/T 0.520/0.480 0.496 0.501 0.375 0.375 0.125 0.187 0.625 1.000
rs1360288 C/T 0.616/0.384 0.496 0.475 0.361 0.390 0.112 0.181 0.610 0.705
rs10776839 G/T 0.545/0.455 0.484 0.498 0.373 0.377 0.123 0.186 0.623 0.855
rs826472 C/T 0.833/0.167 0.279 0.279 0.239 0.560 0.039 0.120 0.440 1.000
rs735155 T/C 0.892/0.108 0.216 0.193 0.174 0.670 0.019 0.087 0.330 0.357
rs3780962 G/A 0.564/0.436 0.504 0.494 0.371 0.379 0.121 0.185 0.621 0.856
rs740598 A/G 0.508/0.492 0.500 0.502 0.375 0.375 0.125 0.187 0.625 1.000
rs964681 T/C 0.660/0.340 0.472 0.451 0.348 0.405 0.101 0.174 0.595 0.690
rs1498553 C/T 0.570/0.430 0.533 0.492 0.370 0.380 0.120 0.185 0.620 0.461
rs901398 T/C 0.772/0.228 0.312 0.353 0.290 0.482 0.062 0.145 0.518 0.205
rs10488710 G/C 0.596/0.404 0.504 0.484 0.366 0.385 0.116 0.183 0.615 0.711
rs2076848 A/T 0.680/0.320 0.416 0.437 0.341 0.414 0.095 0.170 0.586 0.680
1002 Int J Legal Med (2018) 132:9971006
rs7520386, which was one of the SNPs that exhibited a
skewed allelic balance. Linkage disequilibrium test indicated
that all of the SNPs were independent from each other except
rs12997453/rs993934 (p< 0.05/4050) and rs2056277/
rs10092491 (p< 0.05/4050) after Bonferroni correction
(supplementary Table 1). The total discrimination power
(TDP) was 14.81 × 10
34
, and the combined power of ex-
clusion (CPE) was 0.99989 for duo paternity testing and
0.99999992 for trio paternity testing. Little genetic differenti-
ation [22] was detected between southern and northern
Tabl e 1 (continued)
Locus Allele Frequency H
Obs
H
Exp
PIC MP PE
duo
PE
tio
DP HWE
rs2269355 G/C 0.620/0.380 0.520 0.473 0.360 0.391 0.111 0.180 0.609 0.343
rs2111980 T/C 0.624/0.376 0.464 0.471 0.359 0.392 0.110 0.180 0.608 1.000
rs10773760 A/G 0.604/0.396 0.360 0.480 0.364 0.387 0.114 0.182 0.613 0.008*
rs1335873 A/T 0.624/0.376 0.544 0.471 0.359 0.392 0.110 0.180 0.608 0.089
rs1886510 G/A 0.843/0.157 0.261 0.265 0.229 0.576 0.035 0.115 0.424 1.000
rs1058083 G/A 0.596/0.404 0.392 0.484 0.366 0.385 0.116 0.183 0.615 0.041*
rs354439 A/T 0.569/0.431 0.476 0.493 0.370 0.380 0.120 0.185 0.620 0.718
rs1454361 T/A 0.612/0.388 0.520 0.477 0.362 0.388 0.113 0.181 0.612 0.350
rs722290 G/C 0.517/0.483 0.466 0.502 0.375 0.375 0.125 0.187 0.625 0.464
rs873196 T/C 0.912/0.088 0.128 0.161 0.148 0.718 0.013 0.074 0.282 0.051
rs4530059 G/A 0.764/0.236 0.376 0.362 0.296 0.474 0.065 0.148 0.526 0.804
rs2016276 T/C 0.556/0.444 0.488 0.496 0.372 0.378 0.122 0.186 0.622 1.000
rs1821380 C/G 0.568/0.432 0.464 0.493 0.370 0.380 0.120 0.185 0.620 0.585
rs1528460 T/C 0.575/0.425 0.517 0.491 0.369 0.381 0.119 0.185 0.619 0.579
rs729172 G/T 0.848/0.152 0.294 0.264 0.229 0.577 0.033 0.112 0.423 0.460
rs2342747 G/A 0.644/0.356 0.440 0.460 0.353 0.398 0.105 0.177 0.602 0.696
rs430046 C/T 0.656/0.344 0.464 0.453 0.349 0.403 0.102 0.175 0.597 0.843
rs1382387 A/C 0.684/0.316 0.392 0.434 0.339 0.416 0.093 0.169 0.584 0.304
rs9905977 G/A 0.624/0.376 0.432 0.471 0.359 0.392 0.110 0.180 0.608 0.444
rs740910 A/G 0.947/0.053 0.105 0.100 0.095 0.815 0.005 0.048 0.185 1.000
rs938283 T/C 0.876/0.124 0.232 0.218 0.194 0.636 0.024 0.097 0.364 0.690
rs2292972 T/C 0.644/0.356 0.536 0.460 0.353 0.398 0.105 0.177 0.602 0.079
rs1493232 C/A 0.624/0.376 0.544 0.471 0.359 0.392 0.110 0.180 0.608 0.088
rs9951171 G/A 0.524/0.476 0.536 0.501 0.374 0.376 0.124 0.187 0.624 0.476
rs1736442 C/T 0.621/0.379 0.452 0.473 0.360 0.391 0.111 0.180 0.609 0.703
rs1024116 C/T 0.915/0.085 0.171 0.157 0.144 0.724 0.012 0.072 0.276 0.599
rs719366 A/G 0.792/0.208 0.320 0.331 0.275 0.504 0.054 0.138 0.496 0.786
rs576261 A/C 0.592/0.408 0.528 0.485 0.366 0.384 0.117 0.183 0.616 0.355
rs1031825 A/C 0.504/0.496 0.480 0.502 0.375 0.375 0.125 0.187 0.625 0.721
rs445251 C/G 0.656/0.344 0.492 0.453 0.350 0.403 0.102 0.175 0.597 0.422
rs1005533 G/A 0.660/0.340 0.424 0.451 0.348 0.405 0.101 0.174 0.595 0.552
rs1523537 T/C 0.585/0.415 0.427 0.488 0.368 0.382 0.118 0.184 0.618 0.194
rs722098 G/A 0.532/0.468 0.520 0.500 0.374 0.376 0.124 0.187 0.624 0.721
rs2830795 G/A 0.548/0.452 0.440 0.497 0.373 0.377 0.123 0.186 0.623 0.208
rs2831700 G/A 0.572/0.428 0.456 0.492 0.370 0.380 0.120 0.185 0.620 0.463
rs914165 G/A 0.684/0.316 0.424 0.434 0.339 0.416 0.093 0.169 0.584 0.837
rs221956 C/T 0.596/0.404 0.456 0.484 0.366 0.385 0.116 0.183 0.615 0.580
rs733164 G/A 0.876/0.124 0.200 0.218 0.194 0.636 0.024 0.097 0.364 0.399
rs987640 T/A 0.552/0.448 0.496 0.497 0.372 0.378 0.122 0.186 0.622 1.000
rs2040411 G/A 0.712/0.288 0.368 0.412 0.326 0.432 0.084 0.163 0.568 0.277
rs1028528 A/G 0.612/0.388 0.424 0.477 0.362 0.388 0.113 0.181 0.612 0.260
*p<0.05,**p< 0.00056 (0.05/90)
Int J Legal Med (2018) 132:9971006 1003
Chinese populations (Fst < 0.05 for all of the SNPs) and the
most dramatic differences in allele frequency variation were
detected with Africans and Europeans. Details were presented
in supplementary Tables 2and 3.
Comparison of haplogrouping based on Y-STR
and Y-SNP haplotypes
A no call rate of 2.3% (79 out of 3434), higher than that of
autosomal SNPs, was observed in male samples mostly due to
low coverage as well. Out of 34 Y-SNPs, only 15 SNPs were
observed possessing two kinds of alleles in the population.
Totally, seven haplotypes were detected (Fig. 5)andtheHD
was 0.644. Based on Y-SNP haplotypes, seven haplogroups
were assigned. Haplogroup O2 accounted for more than a half
(54.5%) and O1a, O1b, C, N, D, and Q accounted for 18.8%,
9.9%, 7.9%, 5.9%, 2.0% and 1.0%, respectively. For the con-
cordance study, 67 out of the 101 samples were assigned the
haplogroups automatically based on Y-STR haplotypes with
the parameter settings as mentioned before, of which 97.01%
(65/67) of them were assigned consistent haplogoups
(Table 2).
Discussion
In this study, we investigate the performance and polymor-
phisms of the HID-Ion AmpliSeqIdentity Panel used in a
population ofsouthern China. When coverage was low, allelic
imbalance might occur more frequently, even resulting in no
calls. High coverage and relatively low allelic balance were
also observed for several SNPs, of which rs214955, rs430046,
and rs7520386 were also shown to exhibit allelic imbalance in
other studies [8,9,11,15]. These three loci seemed performed
not as well as other SNPs, of which rs7520386 performed the
worst. Additionally, rs7520386 also showed the highest
miscalled rate with a value of 3.23% (supplementary
Table 4).Therefore, it was necessary to modify the primers
of these problematic SNPs to improve the performance of
the panel.
Upon analysis of miscalled reads, we observed that the rate
of ASMRs was higher than that of ANMRs and increased
linearly with coverages, both of which were also observed
and regarded as background signals in the Guo study [8].
Additionally, differential rates were observed among autoso-
mal SNPs, highly polymorphic and poorly polymorphic Y-
SNPs, which implied that genetic diversity might also be re-
lated. As Fig. 4a showed, the rate of ASMRs was highly
associated with genetic diversity (k= 0. 209) while the slope
for ANMRs was quite flat (k= 0.037), which indicated that He
or GD contributed little to the variation of ANMRs. To ex-
plore whether allelic frequency was related as well, miscalled
rates of major and minor allele were compared. In spite of a
nonsignificant pvalue (p=0.178)forY-SNPs,theASMRsof
major allele was lower than that of minor allele for both auto-
somal SNPs and Y-SNPs while there was no significant dif-
ference for ANMRs. Therefore, it seemed these two kinds of
miscalled reads resulted from different ways. It is worth men-
tioning that these miscalled reads (ASMRs especially) or
background noises, similar to stutter for STR, have a critical
influence on mixture analysis in forensic practice. On the oth-
er hand, how these background noises were produced was still
Tabl e 2 Haplogrouping based on Y-STR haplotypes and Y-SNP haplotypes (n=67)
Y-STR Y-SNP Total
O2 O1a O1b C D N R2
O2 32 32
O1 19 8 1 28
C2 4 4
D11
N11
R2 1 1
Total 32 19 9 4 1 2 0 67
Numbers colored gray represent consistent results
Fig. 5 Haplotypes of 101 male
samples based on 34 Y-SNPs.
Loci colored gray represent
highly polymorphic loci
1004 Int J Legal Med (2018) 132:9971006
not very clear. Artifact produced in the PCR procedure and
sequencing error might be part of the reasons. Furthermore,
since barcodes were utilized and samples were sequenced si-
multaneously, barcode contamination mightalso be one of the
reasons, possibly as a result of incorrect ligation of carry-over
barcodes after pooling together. It should be studied further in
the future.
In this study, rs12997453/rs993934 and rs2056277/
rs10092491 failed the linkage disequilibrium test even after
Bonferroni correction, which showed different results of other
studies [5,8,15]. Since the physic distances were > 109 Mb
for rs2056277/rs10092491 and > 58 Mb for rs12997453/
rs993934, it was generally thought independent for these
two pairs of makers. Considering a relatively small sample
size of our study, failing of linkage disequilibrium test might
result from random effect. These polymorphic SNPs are pow-
erful tools for individual identification and trio paternity test-
ing, which perform comparable to that of 22 STRs [8]. But it
may not enough for duo paternity testing, not mention for
other relative testing. In the current research or applications
of paternity and kinship testing, SNPs are more frequently
regarded as complements to STR typing [3,23]. As estimated
by Mo et al. [3], 85, 127, 491, and 1858 putative SNP loci are
required to investigate parent-child, full-sibling, half-sibling/
uncle-nephew, and first-cousin relationships with a false test-
ing level of 0.1%. However, when a great number of SNPs are
utilized, it is more likely that linkages will be emerged, which
should be noticed.
Moderate genetic difference was observed for some SNPs
between Chinese population and Japanese, Africans,
Americans, or Europeans, but little frequency variation was
detected between southern and northern Chinese, indicating
that this panel could be widely applicable across Chinese pop-
ulation. A tendency could be observed that allele frequency
variation was in accordance with the differences in geographic
locations and similar pattern was also detected in previous
study based on Y-STR [24].
Haplogroup determination is of great interest in the study
of human population genetics as well as forensic genetics, as it
reveals the phylogenetic relationships by descent [25].
Haplogroup can be inferred by either Y-SNP or Y-STR typing.
A study has shown that there is a high degree of concordance
between these two methods. Muzzio et al. demonstrated that
Y-STR-based haplogrouping software systems offered rela-
tively low accuracy [25]. However, in this study, 97.01% of
samples were assigned consistent haplogroups with Y-STR
and Y-SNP haplotypes, which was similar to that of others
results [2628]. Not enough markers (only seven STRs were
typed) may be the reason that leads to low accuracy in
Muzziosstudy[29]. It should be noticed that either because
limited markers are included in this panel or no adequate da-
tabase are available to calibrate the program, some samples
may be defined as same haplogroup but different sub-
haplogroup between two methods. For example, haplogroup
C was defined based on Y-SNPs while it was predicted as C2
based on Y-STRs. Similarly, haplogroup O1a and O1b were
determined based on Y-SNPs while O1 was obtained using
Whit Atheys method. Still, we considered they were consis-
tent. Additionally, the high concordance was based on strict
parameter settings in Whit Atheys method (with minimum
fitness score = 20, minimum probability = 85%, and area
priors = Bequal priors^) and 34 out of the 101 samples could
not be assigned the haplogroups automatically. For these 34
samples, if wechose the haplogroupwith the maximum prob-
ability as their haplogroup, all the samples could obtain a
haplogroup assignment but the concordance would decrease
to 86.14%. Given the high mutation rates of the Y-STRs, it is
possible to find the same Y-STR haplotype in samples from
different haplogroups [29], which might explain why the con-
cordance cannot reach 100%. On the other hand, though Y-
SNP analysis appears to represent a more optimal approach
for haplogroup determination, considering the widespread
popularity of Y-STR typing in forensic DNA labs, it is still
practical to determine haplogroup with Y-STR haplotypes
preliminarily.
Conclusion
The Ion Torrent PGMis a promising platform for forensic
genetics research and applications. The HID-Ion AmpliSeq
Identity Panel proved to be a powerful tool for individual
identification and trio paternity testing in Chinesepopulations.
However, additional SNPs are required to facilitate both duo
paternity testing and relative testing using this panel. The
miscalled rates were 0.26 ± 0.16% for ASMRs and
0.06 ± 0.05% for ANMRs. ASMRs were associated with ge-
netic diversity and allele frequency while neither was related
for ANMRs, which indicated that they might result from dif-
ferent ways. Additionally, high concordance of haplogrouping
assignment can be obtained with Y-STR and Y-SNP
haplotypes.
Funding This study was funded by the National Natural
Science Foundation of China (81671873, 81273347),
Fundamental Research Funds for the Central Universities
(16ykzd08).
Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of
interest.
Ethical approval All procedures performed in studies involving hu-
man participants were in accordance with the ethical standards of the
institutional and/or national research committee and with the 1964
Int J Legal Med (2018) 132:9971006 1005
Helsinki declaration and its later amendments or comparable ethical
standards.
Informed consent Informed consent was obtained from all individual
participants included in the study.
References
1. Kidd KK, Pakstis AJ, Speed WC, Grigorenko EL, Kajuna SLB,
Karoma NJ, Kungulilo S, Kim J, Lu R, Odunsi A, Okonofua F,
Parnas J, Schulz LO, Zhukova OV, Kidd JR (2006) Developing a
SNP panel for forensic identification of individuals. Forensic Sci Int
164:2032
2. Amorim A, Pereira L (2005) Pros and cons in the use of SNPs in
forensic kinship investigation: a comparative analysis with STRs.
Forensic Sci Int 150:1721
3. Mo S, Liu Y, Wang S, Bo X, Li Z, Chen Y, Ni M (2016) Exploring
the efficacy of paternity and kinship testing based on single nucle-
otide polymorphisms. Forensic Sci Int Genet 22:161168
4. Sanchez JJ, Phillips C, Borsting C, Balogh K, Bogus M, Fondevila
M, Harrison CD, Musgrave-Brown E, Salas A, Syndercombe-
Court D, Schneider PM, Carracedo A, Morling N (2006) A multi-
plex assay with 52 single nucleotide polymorphisms for human
identification. Electrophoresis 27:17131724
5. Pakstis AJ, Speed WC, Fang R, Hyland FCL, Furtado MR, Kidd
JR, Kidd KK (2010) SNPs for a universal individual identification
panel. Hum Genet 127:315324
6. Sobrino B, Brión M, Carracedo A (2005) SNPs in forensic genetics:
a review on SNP typing methodologies. Forensic Sci Int 154:181
194
7. Seo SB, King JL, Warshauer DH, Davis CP, Ge J, Budowle B
(2013) Single nucleotide polymorphism typing with massively par-
allel sequencing for human identification. Int J Legal Med 127:
10791086
8. GuoF,ZhouY,SongH,ZhaoJ,ShenH,ZhaoB,LiuF,JiangX
(2016) Next generation sequencing of SNPs using the HID-Ion
AmpliSeqIdentity Panel on the Ion Torrent PGMplatform.
Forensic Sci Int Genet 25:7384
9. Børsting C, Fordyce SL, Olofsson J, Mogensen HS, Morling N
(2014) Evaluation of the Ion TorrentHID SNP 169-plex: a
SNP typing assay developed for human identification by second
generation sequencing. Forensic Sci Int Genet 12:144154
10. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE,
Wain J, Pallen MJ (2012) Performance comparison of benchtop
high-throughput sequencingplatforms.Nat Biotechnol 30:434439
11. Eduardoff M, Santos C, de la Puente M, Gross TE, Fondevila M,
Strobl C, Sobrino B, Ballard D, Schneider PM, Carracedo Á, Lareu
MV, Parson W, Phillips C (2015) Inter-laboratory evaluation of
SNP-basedforensic identification by massively parallel sequencing
using the Ion PGM. Forensic Sci Int Genet 17:110121
12. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey
M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J,
SimonsJF,MarranD,MyersJW,DavidsonJF,BrantingA,
Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT,
Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao
X, Reed B, Sabina J, Feierstein E, Schorn M, Alanjary M,
Dimalanta E, Dressman D, Kasinskas R, Sokolsky T, Fidanza JA,
Namsaraev E, McKernan KJ, Williams A, Roth GT, Bustillo J
(2011) An integrated semiconductor device enabling non-optical
genome sequencing. Nature 475:348352
13. Elena S, Alessandro A, Ignazio C, Sharon W, Luigi R, Andrea B
(2016) Revealing the challenges of low template DNA analysis
with the prototype Ion AmpliSeqIdentity panel v2.3 on the
PGMSequencer. Forensic Sci Int Genet 22:2536
14. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura
SL, Hammer MF (2008) New binary polymorphisms reshape and
increase resolution of the human Y chromosomal haplogroup tree.
Genome Res 18:830838
15. Zhang S, Bian Y, Zhang Z, Zheng H, Wang Z, Zha L, Cai J, Gao Y,
Ji C, Hou Y, Li C (2015) Parallel analysis of 124 universal SNPs for
human identification by targeted semiconductor sequencing. Sci
Rep-UK 5:18683
16. Athey TW (2006) Haplogroup prediction from Y-STR values using
a Bayesian-allele-frequency approach. J Genet Geneal 2:3439
17. Athey TW (2005) Haplogroup prediction from Y-STR values using
an allele-frequency approach. J Genet Geneal 1:17
18. Kalinowski ST, Taper ML, Marshall TC (2007) Revising how the
computer program CERVUS accommodates genotyping error in-
creases success in paternity assignment. Mol Ecol 16:10991106
19. ExcoffierL, Lischer HE (2010) Arlequin suite ver 3.5: a new series
of programs to perform population genetics analyses under Linux
and Windows. Mol Ecol Resour 10:564567
20. Amigo J, Salas A, Phillips C, Carracedo A (2008) SPSmart:
adapting population based SNP genotype databases for fast and
comprehensive web access. BMC Bioinformatics 9:428
21. International Society of Genetic Genealogy (ISOGG): Y-DNA
Haplogroup Tree 2017, Version: 12.128. In, 2017. Available at:
http://www.isogg.org/tree/
22. Wright S (1978) Evolution and the genetics of populations. Vol. 4.
Variability within and among natural populations. University of
Chicago Press, Chicago
23. Phillips C, García-Magariños M, Salas A, Carracedo Á, Lareu MV
(2012) SNPs as supplements in simple kinship analysis or as core
markers in distant pairwise relationship tests: when do SNPs add
value or replace well-established and powerful STR tests? Transfus
Med Hemoth 39:202210
24. Wang Y, Liu C, Zhang CC, Li R, Li Y, XL O, Sun HY (2015)
Analysis of 17 Y-STR loci haplotype and Y-chromosome
haplogroup distribution in five Chinese ethnic groups.
Electrophoresis 36:25462552
25. Muzzio M, Ramallo V, Motti JMB, Santos MR, López Camelo JS,
Bailliet G (2011) Software for Y-haplogroup predictions: a word of
caution. Int J Legal Med 125:143147
26. Petrejcikova E, Carnogurska J, Hronska D, Bernasovska J,
Boronova I, Gabrikova D, Bozikova A, Macekova S (2014) Y-
SNP analysis versus Y-haplogroup predictor in the Slovak popula-
tion. Anthropol Anz 71:275285
27. Dogan S, Babic N, Gurkan C, Goksu A, Marjanovic D, Hadziavdic
V (2016) Y-chromosomal haplogroup distribution in the Tuzla
Canton of Bosnia and Herzegovina: a concordance study using four
different in silico assignment algorithms based on Y-STR data.
Homo 67:471483
28. Nunez C, Geppert M, Baeta M, Roewer L, Martinez-Jarreta B
(2012) Y chromosome haplogroup diversity in a Mestizo popula-
tion of Nicaragua. Forensic Sci Int Genet 6:e192e195
29. Athey W (2011) Comments on the article, BSoftware for Y
haplogroup predictions, a word of caution^. Int J Legal Med
125(901903):905906
1006 Int J Legal Med (2018) 132:9971006
... Similarly, Schirmer et al. found a substantial bias related to motifs ending in "GG," and the top three motifs were linked to 16 % of all substitution errors [19]. Two independent studies by Guo et al. [20] and Li et al. [21], based on the Ion Torrent PGM™ platform (Thermo Fisher Scientific, Waltham, USA), both reported higher proportions of allele-specific miscalled reads than allele-nonspecific miscalled reads. With respect to MGI MPS platform, it has been only recently used for forensic analysis [6,22] and there are few studies on its noise characteristics. ...
... A heterozygote with F MAR is between 0.6 and 0.9 as imbalance according to [30]. With respect to noises, two types of noises were evaluated, i.e., allele-specific miscalled reads (ASMR) and allele-nonspecific miscalled reads (ANMR) [21]. For example, guanine (G) is an allele-specific miscalled base, while cytosine (C) and thymine (T) are allele-nonspecific miscalled bases for a "A/A" homozygote at rs1490413 (G/A). ...
... Sample contamination may contribute to the higher levels of ASMR than ANMR. However, similar results were also obtained by two independent studies [20,21]. Thus, it is more likely a systematic issue. ...
Article
Full-text available
Three MPS platforms are being used in forensic genetic analysis, i.e., MiSeq FGx, Ion S5 XL, and MGISEQ-2000. However, few studies compared their performance. In this study, we sequenced 83 common SNPs of 71 samples using the ForenSeq™ DNA Signature Prep Kit on MiSeq FGx, the Precision ID Identity Panel on Ion S5 XL, and the MGIEasy Signature Identification Library Prep Kit on MGISEQ-2000 and then the performance was compared. Results showed that the MiSeq FGx had the highest sequence quality but the lowest sequencing depth and allele balance. Discordant genotypes were observed at six SNPs, which may be caused by variants at primer binding regions, indel errors, or misalignments. Besides, two kinds of background noises, allele-specific miscalled reads (ASMR) and allele-nonspecific miscalled reads (ANMR), were characterized. MGISEQ-2000 showed the highest level of ASMR while Ion S5 XL had the highest level of ANMR. Site- and genotype-dependent miscalled patterns were observed at several SNPs on Ion S5 XL and MGISEQ-2000, but few on MiSeq FGx. In conclusion, the three MPS platforms perform differently with respect to sequencing quality, sequencing depth, allele balance, concordance, and background noise. These findings may be useful for data comparison, mixture deconvolution, and heteroplasmy analysis in forensic genetics.
... Well-designed studies demonstrated that this panel is a robust, reliable, and informative tool for human identi cation. This panel has been used for the investigation of diverse populations including Han Chinese [7,9], southern Chinese [10], three East Asian minorities (Tibetan, Uygur, and Hui) [11], Brazilian [12], and central Indian [13]. However, there are very limited reports on the Southeast Asian population. ...
... An ideal median LB is 1.00; however, the median LBs of 28 autosomal SNPs and 9 Y-SNPs were less than 0.50, indicating that these loci had less than half the mean read depth. In accordance with previous studies [10,11,17], rs2342747 also had the lowest median LB in this study. Allele dropouts are generally more likely to be observed in SNPs with low LB when analyzing trace amounts of DNA. ...
Preprint
Full-text available
Background—Single nucleotide polymorphisms (SNPs) have become popular in forensic genetics as an alternative to short tandem repeats (STRs) due to low mutation rates and small amplicon sizes. The Precision ID Identity Panel (Thermo Fisher Scientific), consisting of 90 autosomal SNPs and 34 Y-SNPs, was introduced for human identification by next-generation sequencing (NGS), enabling many studies on the global population; however, few reports are available on the Southeast Asian population. Methods and Results—A total of 96 unrelated male samples from Myanmar (Yangon) were analyzed with the Precision ID Identity Panel on a MiSeq (Illumina) using an in-house TruSeq compatible universal adapter. The sequencing performance was evaluated by locus balance and heterozygote balance, and the results were comparable to those of the Ion Torrent platform. For 90 autosomal SNPs, minor allele frequencies ranged between 0.068 and 0.500, and combined match probability (6.994×10⁻³⁴) was lower than that of 22 PowerPlex Fusion autosomal STRs (3.130×10⁻²⁶). Moreover, we identified 51 cryptic variations around the target SNPs using a custom variant caller, Visual SNP. For 34 Y-SNPs, 14 Y-haplogroups were observed—mostly O2 and O1b groups. Interpopulation analysis revealed that the Myanmar population is genetically closer to the East and Southeast Asian populations than the South Asian population. Conclusions—We demonstrate that the Precision ID Identity Panel can be successfully analyzed on a MiSeq using a custom data analysis pipeline and provide high discrimination power for human identification in the Myanmar population, while extending the accessibility of NGS analysis for SNPs in forensics.
Article
Background: Forensic DNA analysis has seen remarkable advancements with the advent of Next Generation Sequencing (NGS). In particular, NGS analysis of single nucleotide polymorphisms (SNPs) offers significant advantages in the analysis of challenging samples compared to conventional STR analysis. Objective: This study aimed to investigate the SNPs of the Precision ID Identity Panel, a commercially available NGS panel for personal identification, by generating genetic profiles of 298 Koreans and comparing them with other global populations. Methods: A total of 124 SNPs, including 90 autosomal and 34 Y-SNPs, were analyzed using the Precision ID Identity Panel, and forensic parameters, microhaplotypes, and population differences were investigated. Results: The NGS data were successfully obtained from 298 Koreans. The analysis of forensic parameters exhibited a low combined match probability of 1.532 × 10- 34, which is comparable to that obtained from commonly used STR analysis. Additionally, the microhaplotype analysis revealed that the use of 16 microhaplotypes provided higher discriminatory power compared to single target SNPs. Furthermore, the adoption of microhaplotype data resulted in an increase of over 20% in expected heterozygosity at five loci. Inter-population analysis showed a close genetic relationship between Koreans and individuals from China and Myanmar in East and Southeast Asia, which are geographically adjacent to Korea. Conclusions: The results of this study show that the Precision ID Identity panel can be a useful alternative where traditional STR typing is not feasible. Also, the data from our study will be useful as a reference for Koreans in forensic investigations and the prosecution of criminal justice.
Article
Full-text available
PCR-MPS is an emerging tool for the analysis of low-quality DNA samples. In this study, we used PCR-MPS to analyse 32 challenging bone DNA samples from three Second World War victims, which previously yielded no results in conventional STR PCR-CE typing. The Identity Panel was used with 27 cycles of PCR. Despite that we only had an average of 6.8 pg of degraded DNA as template, 30 out of 32 libraries (93.8%) produced sequencing data for about 63/90 autosomal markers per sample. Out of the 30 libraries, 14 (46.7%) yielded single source genetic profiles in agreement with the biological identity of the donor, whereas 12 cases (40.0%) resulted in SNP profiles that did not match or were mixed. The misleading outcomes for those 12 cases were likely due to hidden exogenous human contamination, as shown by the higher frequencies of allelic imbalance, unusual high frequencies of allelic drop-ins, high heterozygosity levels in the consensus profiles generated from challenging samples, and traces of amplified molecular products in four out of eight extraction negative controls. Even if the source and the time of the contamination were not identified, it is likely that it occurred along the multi-step bone processing workflow. Our results suggest that only positive identification by statistical tools (e.g. likelihood ratio) should be accepted as reliable; oppositely, the results leading to exclusion should be treated as inconclusive because of potential contamination issues. Finally, strategies are discussed for monitoring the workflow of extremely challenging bone samples in PCR-MPS experiments with an increased number of PCR cycles.
Article
Single nucleotide polymorphisms (SNPs) have become popular in forensic genetics as an alternative to short tandem repeats (STRs). The Precision ID Identity Panel (Thermo Fisher Scientific), consisting of 90 autosomal SNPs and 34 Y-chromosomal SNPs, enabled human identification studies on global populations through next-generation sequencing (NGS). However, most previous studies on the panel have used the Ion Torrent platform, and there are few reports on the Southeast Asian population. Here, a total of 96 unrelated males from Myanmar (Yangon) were analyzed with the Precision ID Identity Panel on a MiSeq (Illumina) using an in-house TruSeq compatible universal adapter and a custom variant caller, Visual SNP. The sequencing performance evaluated by locus balance and heterozygote balance was comparable to that of the Ion Torrent platform. For 90 autosomal SNPs, the combined match probability (CMP) was 6.994 × 10-34, lower than that of 22 PowerPlex Fusion autosomal STRs (3.130 × 10-26). For 34 Y-SNPs, 14 Y-haplogroups (mostly O2 and O1b) were observed. We found 51 cryptic variations (42 haplotypes) around target SNPs, of which haplotypes corresponding to 33 autosomal SNPs decreased CMP. Interpopulation analysis revealed that the Myanmar population is genetically closer to the East and Southeast Asian populations. In conclusion, the Precision ID Identity Panel can be successfully analyzed on the Illumina MiSeq and provides high discrimination power for human identification in the Myanmar population. This study broadened the accessibility of the NGS-based SNP panel by expanding the available NGS platforms and adopting a robust NGS data analysis tool.
Article
Distant kinship identification is one of the critical problems in forensic genetics. As a new type of genetic marker defined and discussed in the last decade, the microhaplotype (MH) has drawn much attention in such identification owing to its specific advantages to traditional short tandem repeat (STR) or single nucleotide polymorphism (SNP) markers. In this study, MH markers were screened step by step from the 1000 Genomes Project database, and a novel multiplex panel containing 188 MHs (in which 181 are reported the first time, while 1 was reported in a previous study and the other 6 have partial overlaps with known markers) was constructed for application in 2nd- and 3rd-degree kinship identification. Along with the construction, a novel MH nomenclature was proposed, in which the SNP position information they contained was taken into account to eliminate the possibility that the same locus was named differently interlaboratory. After a series of evaluations, the panel was shown to have good sequencing accuracy, high sensitivity, species specificity, and resistance to anti-PCR inhibitors or degradation. Population data of the 188 MHs were calculated based on the genetic information of 221 unrelated Hebei Han individuals, and the effective number of alleles (Ae) ranged from 2.0925 to 8.2634 (with an average of 2.9267). For the whole system, the cumulative matching probability (CMP), the cumulative power of exclusion in paternity testing of duos (CPEduo) and that of trios (CPEtrio) reached 2.8422 × 10-137, 1-1.3109 × 10-21, and 1-2.8975 × 10-39, respectively, indicating that this panel was satisfactory for individual identification and paternity testing. Then, the efficiency of the 188 MHs in 2nd- and 3rd-degree kinship testing was studied based on 30 extended families consisting of 179 2nd-degree and 121 3rd-degree relatives, as well as simulations of 0.5 million pairs of those two kinships. The results showed that clear opinions would be given in 83.36% of 2nd-degree identifications with a false rate less than 10-5, when the confirming and excluding thresholds of cumulative likelihood ratio (CLR) were set as 104 and 10-4, respectively. This panel is still not sufficient to solve the problem of 3rd-degree kinship identification alone, and approximately 300 or 870 MH loci would be needed in 2nd- or 3rd-degree kinship identification, respectively, to achieve a system efficiency not less than 0.99 with such a threshold set; such necessary numbers would be used only as a reference in further research.
Article
Multiplex DNA typing methods using massively parallel sequencing can be used to predict externally visible characteristics (EVCs) in forensic DNA phenotyping through the analysis of single-nucleotide polymorphisms. The focus of EVC determination has focused on hair color, eye color, and skin tone as well as visible biogeographical ancestry features. In this study, we researched off-label applications beyond what is currently marketed by the manufacturer of the Verogen ForenSeq kit primer set B and Imagen primer set E SNP loci. We investigated additional EVC predictions by examining published genome wide sequencing studies and reported allele-specific gene expression and predictive values. We have identified 15 SNPs included in the ForenSeq kit panel and Imagen kits that have additional EVC prediction capabilities beyond what is published in the Verogen manuals. The additional EVCs that can be predicted include hair graying, ephelides hyperpigmented spots, dermatoheliosis, facial pigmented spots, standing height, pattern balding, helix-rolling ear morphology, hair shape, hair thickness, facial morphology, eyebrow thickness, sarcoidosis, obesity, vitiligo, and tanning propensity. The loci can be used to augment and refine phenotype predictions with software such as MetaHuman for missing persons, cold case, and historic case investigations.
Article
Full-text available
Single nucleotide polymorphism (SNP) possesses a promising application in forensic individual identification due to its wide distribution in the human genome and the ability to carry out the genotyping of degraded biological samples by designing short amplicons. Some commonly used individual identification SNPs are less polymorphic in East Asian populations. In order to improve the individual identification efficiencies in East Asian populations, SNP genetic markers with relatively higher polymorphisms were selected from the 1,000 Genome Project phase III database in East Asian populations. A total of 111 individual identification SNPs (II-SNPs) with the observed heterozygosity values greater than 0.4 were screened in East Asian populations, and then, the forensic efficiencies of these selected SNPs were also evaluated in Chinese Inner Mongolia Manchu group. The observed heterozygosity and power of discrimination values at 111 II-SNPs in the Inner Mongolia Manchu group ranged from 0.4011 to 0.7005, and 0.5620 to 0.8025, respectively, and the average value of polymorphism information content was greater than 0.3978. The cumulative match probability and combined probability of exclusion values at II-SNPs were 7.447E⁻⁵¹ and 1-4.17E⁻¹² in the Inner Mongolia Manchu group, respectively. The accumulative efficiency results indicated that the set of II-SNPs could be used as a potential tool for forensic individual identification and parentage testing in the Manchu group. The sequencing depths ranged from 781× to 12374×. And the mean allele count ratio and noise level were 0.8672 and 0.0041, respectively. The sequencing results indicated that the SNP genetic marker detection based on the massively parallel sequencing technology for SNP genetic markers had high sequencing performance and could meet the sequencing requirements of II-SNPs in the studied group.
Article
Full-text available
SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10 ng-0.5 ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R2 = 0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed.
Article
Y-chromosomal haplogroups are sets of ancestrally related paternal lineages, traditionally assigned by the use of Y-chromosomal single nucleotide polymorphism (Y-SNP) markers. An increasingly popular and a less labor-intensive alternative approach has been Y-chromosomal haplogroup assignment based on already available Y-STR data using a variety of different algorithms. In the present study, such in silico haplogroup assignments were made based on 23-loci Y-STR data for 100 unrelated male individuals from the Tuzla Canton, Bosnia and Herzegovina (B&H) using the following four different algorithms: Whit Athey's Haplogroup Predictor, Jim Cullen's World Haplogroup & Haplogroup-I Subclade Predictor, Vadim Urasin's YPredictor and the NevGen Y-DNA Haplogroup Predictor. Prior in-house assessment of these four different algorithms using a previously published dataset (n = 132) from B&H with both Y-STR (12-loci) and Y-SNP data suggested haplogroup misassignment rates between 0.76% and 3.02%. Subsequent analyses with the Tuzla Canton population sample revealed only a few differences in the individual haplogroup assignments when using different algorithms. Nevertheless, the resultant Y-chromosomal haplogroup distribution by each method was very similar, where the most prevalent haplogroups observed were I, R and E with their sublineages I2a, R1a and E1b1b, respectively, which is also in accordance with the previously published Y-SNP data for the B&H population. In conclusion, results presented herein not only constitute a concordance study on the four most popular haplogroup assignment algorithms, but they also give a deeper insight into the inter-population differentiation in B&H on the basis of Y haplogroups for the first time.
Article
The HID-Ion AmpliSeq™ Identity Panel (the HID Identity Panel) is designed to detect 124-plex single nucleotide polymorphisms (SNPs) with next generation sequencing (NGS) technology on the Ion Torrent PGM™ platform, including 90 individual identification SNPs (IISNPs) on autosomal chromosomes and 34 lineage informative SNP (LISNPs) on Y chromosome. In this study, we evaluated performance for the HID Identity Panel to provide a reference for NGS-SNP application, focusing on locus strand balance, locus coverage balance, heterozygote balance, and background signals. Besides, several experiments were carried out to find out improvements and limitations of this panel, including studies of species specificity, repeatability and concordance, sensitivity, mixtures, case-type samples and degraded samples, population genetics and pedigrees following the Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines. In addition, Southern and Northern Chinese Han were investigated to assess applicability of this panel. Results showed this panel led to cross-reactivity with primates to some extent but rarely with non-primate animals. Repeatable and concordant genotypes could be obtained in triplicate with one exception at rs7520386. Full profiles could be obtained from 100 pg input DNA, but the optimal input DNA would be 1 ng–200 pg with 21 initial PCR cycles. A sample with ≥20% minor contributors could be considered as a mixture by the number of homozygotes, and full profiles belonging to minor contributors could be detected between 9:1 and 1:9 mixtures with known reference profiles. Also, this assay could be used for case-type samples and degraded samples. For autosomal SNPs (A-SNPs), FST across all 90 loci was not significantly different between Southern and Northern Chinese Han or between male and female samples. All A-SNP loci were independent in Chinese Han population. Except for 18 loci with He <0.4, most of the A-SNPs in the HID Identity Panel presented high polymorphism. Forensic parameters were calculated >99.999% for combined discrimination power (CDP), 0.999999724 for combined power of exclusion (CPE), 1.390 × 1011 for combined likelihood ratio (CLR) of trios, and 2.361 × 106 for CLR of motherless duos. For Y-SNPs, a total of 8 haplotypes were observed with the value of 0.684 for haplotype diversity. As a whole, the HID Identity Panel is a well-performed, robust, reliable and high informative NGS-SNP assay and it can fully meet requirements for individual identification and paternity testing in forensic science.
Article
Short tandem repeats (STRs) are conventional genetic markers typically used for paternity and kinship testing. As supplementary markers of STRs, single nucleotide polymorphisms (SNPs) have less discrimination power but broader applicability to degraded samples. The rapid improvement of next-generation sequencing (NGS) and multiplex amplification technologies also make it possible now to simultaneously identify dozens or even hundreds of SNP loci in a single pool. However, few studies have been endeavored to kinship testing based on SNP loci. In this study, we genotyped 90 autosomal human identity SNP loci with NGS, and investigated their testing efficacies based on the likelihood ratio model in eight pedigree scenarios involving paternity, half/full-sibling, uncle/nephew, and first-cousin relationships. We found that these SNPs might be sufficient to discriminate paternity and full-sibling, but impractical for more distant relatives such as uncle and cousin. Furthermore, we conducted an in silico study to obtain the theoretical tendency of how testing efficacy varied with increasing number of SNP loci. For each testing battery in a given pedigree scenario, we obtained distributions of logarithmic likelihood ratio for both simulated relatives and unrelated controls. The proportion of the overlapping area between the two distributions was defined as a false testing level (FTL) to evaluate the testing efficacy. We estimated that 85, 127, 491, and 1,858 putative SNP loci were required to discriminate paternity, full-sibling, half-sibling/uncle-nephew, and first-cousin (FTL, 0.1%), respectively. To test a half-sibling or nephew, an additional uncle relative could be included to decrease the required number of putative SNP loci to ∼320 (FTL, 0.1%). As a systematic computation of paternity and kinship testing based only on SNPs, our results could be informative for further studies and applications on paternity and kinship testing using SNP loci.
Article
Forensic scientists frequently have to deal with the analysis of challenging sources of DNA such as degraded and low template DNA (LtDNA). The capacity to genotype difficult biological traces has been facilitated by emerging technologies. Massive parallel sequencing (MPS) on microchip among other technologies promises high sensitivity and discrimination power. In this study we evaluated the combined use of the Quantifiler® Trio DNA Quantification Kit with the prototype Ion AmpliSeq™ Identity panel v2.3 and PGM™ platform in LtDNA samples. Coverage, allele balance, allele drop-out/in, consistency and variance were assessed. Overall, the results showed a great level of performance and consistency in terms of genotyping capability even under the most challenging conditions, making it possible to obtain consistent SNP profiles with 31pg of DNA and partial informative profiles with as little as 5pg or with severely degraded DNA. In addition, we demonstrated that the stochastic effects observed in some samples are due to the amplification of the library rather than sequencing. Based on our data, we proposed general recommendations for the analysis of casework samples starting from the use of quantification data, which proved to be critical in deciding whether to process the samples via STR (Short Tandem Repeats) analysis or SNP MPS. In our experience, the use of the prototype Ion AmpliSeq™ Identity panel v2.3 has revealed a new applicable solution for processing LtDNAs. This approach provides users with an additional tool for analysis of traces that either would not give informative results with conventional STR-based techniques.
Article
To investigate genetic diversity in Chinese populations, 706 unrelated male individuals from five ethnic groups (Han, Korean, Hui, Mongolian and Tibetan, respectively) were analyzed with 17 Y-chromosomal short tandem repeats (STRs). The haplotype diversity was 0.99985 in the combined data. 675 distinct haplotypes were observed, of which 649 were unique. Y-chromosome haplogroup in the five groups were also predicted with Y-STR haplotypes. Genetic distance of the five studied ethnic groups and other published groups was analyzed by analysis of molecular variance (AMOVA) and visualized in a multi-dimensional scaling (MDS) plot. In conclusion, the 17 Y-STR loci are highly polymorphic markers in the five groups and hence are very useful in forensic application, population genetics and human evolution studies. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.