Content uploaded by Dan Peng
Author content
All content in this area was uploaded by Dan Peng on Aug 27, 2019
Content may be subject to copyright.
ORIGINAL ARTICLE
SNP typing using the HID-Ion AmpliSeq™Identity Panel
in a southern Chinese population
Ran Li
1
&Chuchu Zhang
1
&Haiyan Li
2
&Riga Wu
1
&Haixia Li
1
&Zhenya Tang
2
&
Chenhao Zhen
3
&Jianye Ge
4
&Dan Peng
1
&Ying Wang
1
&Hongying Chen
2
&
Hongyu Sun
1,5
Received: 19 March 2017 /Accepted: 11 October 2017 /Published online: 18 October 2017
#Springer-Verlag GmbH Germany 2017
Abstract In the present study, 90 autosomal single nucleotide
polymorphisms (SNPs) and 34 Y chromosomal SNPs were
sequenced simultaneously using HID-Ion AmpliSeq™
Identity Panel on the Ion PGM™platform for 125 samples
in a southern Chinese population. Raw data were analyzed
and forensic parameters were calculated. Haplogrouping con-
cordance was also assessed using alternative methods based
on Y-SNP haplotypes and Y-STR haplotypes. The results
showed that allelic imbalance occurred more frequently with
low coverage while several SNPs with high coverage were
also observed with poor allelic balance, including rs214955,
rs430046, rs7520386, rs876724, rs9171188, rs16981290, and
rs2032631. Totally, 21,261 miscalled reads (0.28%) were ob-
served. The rate of allele-specific miscalled reads (ASMRs)
was higher than that of allele nonspecific miscalled reads
(ANMRs) and associated with genetic diversity of the SNP.
The ASMRs of major allele were lower than that of minor
allele while there was no difference for ANMRs. The com-
bined discrimination power (CDP) was 1–4.81 × 10
−34
and
the combined power of exclusion (CPE) was 0.99989 and
0.99999992 for duo and trio paternity testing, respectively.
No significant genetic difference was detected between south-
ern and northern Chinese populations. For haplogroup study,
O2 was the predominant haplogroup and 97.01% of samples
were assigned consistent haplogoups with Y-SNP and Y-STR
haplotypes. In conclusion, the AmpliSeq™Identity Panel was
powerful for individual identification and trio paternity test-
ing. ASMRs were associated with the genetic diversity and
allele frequency while neither was related for ANMRs. High
concordance of haplogrouping assignment can be obtained
with Y-STR and Y-SNP haplotypes.
Keywords Single nucleotide polymorphism (SNP) .Next
generation sequencing (NGS) .Ion torrent PGM™.
Population genetics .Miscalled reads
Introduction
Single nucleotide polymorphism (SNP), with lower mutation
rates and smaller amplicon sizes compared with routinely used
short tandem repeats (STR), is being considered as a poten-
tially useful tool in forensic human identification [1,2]. Due to
the di-allelic nature of SNP, the per-locus discrimination pow-
er is weaker than that of STR, while it can be compensated by
typing additional independent loci [3]. Several autosomal
SNP marker sets have been developed with various genotyp-
ing methods, including single-base extension, chip-based mi-
croarrays, and allele-specific hybridization arrays, [1,4–6].
However, either due to small number of SNP loci in a single
Ran Li and Chuchu Zhang contributed equally to the article.
Electronic supplementary material The online version of this article
(https://doi.org/10.1007/s00414-017-1706-3) contains supplementary
material, which is available to authorized users.
*Hongyu Sun
sunhy@mail.sysu.edu.cn; sunhongyu2002@163.com
1
Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun
Yat-sen University, Guangzhou 510080, People’s Republic of China
2
The Center of Criminal Technology of Guangdong Province,
Guangzhou 510050, People’sRepublicofChina
3
The Second Clinical Medical School (Zhujiang Hospital), Southern
Medical University, Guangzhou 510280, People’sRepublicofChina
4
Thermo Fisher Scientific Inc, South San Francisco, CA 94080, USA
5
Guangdong Province Translational Forensic Medicine Engineering
Technology Research Center, Zhongshan School of Medicine, Sun
Yat-sen University, Guangzhou 510089, Guangdong, People’s
Republic of China
Int J Legal Med (2018) 132:997–1006
https://doi.org/10.1007/s00414-017-1706-3
analysis or requesting large amount of input DNA, these sets
were not widely used by forensic DNA labs [7,8].
Recently, massively parallel sequencing (MPS) or next-
generation sequencing (NGS) technologies, with acceptable
sequencing accuracy and costs, are highly interesting for the
forensic genetic community. They provide the possibility to
detect several hundred to thousand markers (including differ-
ent kinds of makers, e.g., SNP and STR) simultaneously and
also allow multiple samples to be processed in a joint sequenc-
ing run using sample-tagging DNA barcodes. Furthermore,
detailed sequence information pertaining to target regions
can also be generated using this technology [9–11]. The Ion
Torrent Personal Genome Machine (PGM) was launched in
early 2011 and is the first commercial sequencing machine
that does not require fluorescence and camera scanning,
resulting in higher speed, lower cost, and smaller instrument
size [10,12]. In addition, a study by Elena et al. showed that
based on this platform, it was possible to obtain consistent
SNP profiles with 31 pg of DNA and partial informative pro-
files with as little as 5 pg or with severely degraded DNA [13].
The HID-Ion AmpliSeq™Identity Panel (HID Identity Panel)
released by Thermo Fisher Scientific co-amplifies 90 autoso-
mal SNPs (A-SNPs) and 34 Y chromosomal SNPs (Y-SNPs),
which were selected based on the study of Pakstis et al. [5],
Sanchez et al. [4], and Karafet et al. [14]. It was reported that
powerful capacity for personal identification could be gener-
ated using this panel [14].
Previous studies based on this panel were performed on
relatively small population size and no population data had
been done for Guangdong province in south China, especially
for the Y-SNP typing. Therefore, further exploration was con-
ducted in the present study.
Materials and methods
Samples, DNA extraction, and DNA quantification
Peripheral blood samples from 101 male and 24 female
unrelated individuals in Guangdong province in south
China were collected with informed consent. DNA was
extracted on the AutoMate Express™Forensic DNA
Extraction System (Thermo Fisher Scientific, MA,
USA) with the PrepFiler Express BAT ™Forensic
DNA Extraction Kit (Thermo Fisher). DNA extracts
were quantified on the Qubit ®2.0 fluorometer
(Thermo Fisher) using the Qubit ®dsDNA HS Assay
Kit (Thermo Fisher) according to the manufacturer’s
protocol. The study was approved by the Human
Subjects Committee of Sun Yat-sen University (No.
2016-008).
Y-STR genotyping and haplogrouping
All of the male samples were genotyped using AmpFLSTR®
Yfiler™PCR Amplification kit (Thermo Fisher) and Y
haplogroups were predicted using Whit Athey’sHaplogroup
Predictor (http://www.hprg.com/hapest5/index.html)[16,17]
with minimum fitness score = 20, minimum probability =
85%, and area priors = Bequal priors.^
Library preparation, purification, and quantification
Libraries were constructed using the Ion AmpliSeq™Library
Kit 2.0 and the Ion AmpliSeq™Identity Panel v2.3 (Thermo
Fisher) following the manufacturer’s recommendations. A to-
tal of 1 ng of inputDNA was processed using the GeneAmp®
9700 System (Thermo Fisher) and the following thermal cy-
cling conditions: 2 min at 99 °C, 15 s at 99 °C, and 4 min at
60 °C for 21 cycles, and a final hold at 10 °C. 2 μL of FuPa
reagent (Thermo Fisher) was added to digest excess PCR
primers. The reactions were then incubated for 10 min at
50 °C, 10 min at 55 °C, and 20 min at 60 °C with a final hold
at 10 °C. The libraries were barcoded using the Ion Xpress™
barcode adapter (Thermo Fisher) with the following tempera-
ture incubation steps: 22 °C for 30 min, 72 °C for 10 min, and
a hold step at 10 °C. Libraries were then purified using 1.5 ×
Agencourt®AMPure®XP reagent (Beckman Coulter, FL,
USA) according to the manufacturer’s instructions. Purified
libraries were quantified on ABI 7500 Real-time PCR System
with the Ion Library Quantitation Kit (Thermo Fisher) and
subsequently diluted to 20 pM. All barcoded libraries were
equivolume mixed.
Emulsion PCR and sequencing
Emulsion PCR (emPCR) was performed on the OneTouch™
2 (OT2) instrument (Thermo Fisher) with the Ion PGM™
Template OT2 200 Kit (Thermo Fisher) and template-
positive Ion Sphere Particles (ISPs) were enriched on the Ion
OneTouch™ES instrument (Thermo Fisher). Sequencing was
performed using the Ion PGM™Sequencing 200 Kit v2 on
Ion 314™or 316™chips (depending on the sample size)
following the manufacturer’sprotocols.
Data analysis
Raw data were processed using the Ion Torrent Suite Sever
version 4.6 (Thermo Fisher). Homo sapiens hg 19 was used as
reference genome to facilitate alignment. The
HID_SNP_Genotyper plugin v.4.3.1 was launched to facili-
tate the genotyping of SNPs with germline low stringency.
This plugin was also used to generate comprehensive analysis
reports including CSV files containing detailed mapping,
998 Int J Legal Med (2018) 132:997–1006
genotype, coverage, and quality check information for each
sample in the run.
Statistics
The CSV files were further analyzed using Microsoft Excel
2010. The frequency of major allele reads (F
MAR
)was
adopted to assess allelic balance [16,17]. Y-SNPs with no
calls were re-genotyped by checking at the data manually
and making an allele call with a minimum coverage ≥6×
and a minimum F
MAR
≥50%. Base miscalling in autosomal
SNP homozygotes or Y-SNPs, which included allele-specific
miscalled reads (ASMR, defined as the miscalled reads of
alternative locus-specific allele) and allele nonspecific
miscalled reads (ANMR, defined as the miscalled reads of
none locus-specific alleles) was analyzed separately. Cervus
3.0 [18] was employed to calculate allele frequency observed
and expected heterozygosity (H
obs
and H
Exp
), matching prob-
ability (MP), discrimination power (DP), polymorphism infor-
mation content (PIC), exclusion probability for duo paternity
testing (PE
duo
), and trio paternity testing (PE
trio
). Hardy-
Weinberg equilibrium (HWE), linkage disequilibrium (LD),
and Fstatistic (Fst) were calculated utilizing Arlequin 3.5
[19]. To compare distribution differences between southern
and northern Chinese Han populations as well as other coun-
tries and continents, frequency data of autosomal SNPs were
downloaded from SPSmart (http://spsmart.cesga.es/)[20]. For
Y-SNPs, the genetic diversity (GD) among individuals was
calculated as GD = 1 −∑pi
2
,wherepi represented the fre-
quency of the ith allele. Y-SNP haplotypes were manually
counted and haplotype diversity (HD) was calculated as
HD = [N(1 −∑pi
2
)] / (N−1), where Nrepresented the number
of haplotypes and pi represented the frequency of the ith
haplotype.
Y-SNP haplogrouping
Haplogroup assignment was determined according to the
International Society of Genetic Genealogy (ISOGG) Y-
DNA Haplogroup Tree 2017 (http://www.isogg.org/tree/)
[21]. Concordances between Y-SNP-based haplogrouping
and Y-STR-based haplogrouping were explored.
Results
Coverage and allele balance
Figure 1showed the coverage variations for each SNP.
Compared withautosomal SNPs, Y-SNPs displayed relatively
lower coverage (400 ± 450× vs 950 ± 989× on average),
which might be explained by the single-copy nature of the Y
chromosome. For autosomal SNPs, the highest coverage was
observed for rs13218440 (1972 ± 1272×) while the lowest
coverage was only 169 ± 112× for rs2342747. Other ineffi-
ciently amplified (< 300×) SNPs were rs876724(231 ± 175×),
rs10488710 (253 ± 196×), rs729172 (263 ± 311×), rs993934
(254 ± 380×), and rs12997453 (253 ± 368×).
As shown in Fig. 2, significant differences were observed
for F
MAR
(%) value of homozygotes and heterozygotes. The
F
MAR
(%) value for most homozygotes was > 90%, apart from
two SNPs in three individuals with critical values of 89.3%
(rs2046361), 89.5% (rs7520386), and 89.8% (rs7520386), re-
spectively. The F
MAR
(%) values for heterozygotes were
mainly between 50 and 60% and most loci showed good al-
lelic balance except five: rs214955, rs430046, rs7520386,
rs876724, and rs9171188, of which rs7520386 performed
the worst. Unusual allelic balance in heterozygous
(F
MAR
> 65%) showed relatively low coverage (< 160 on
average). Similar results were obtained for Y-SNPs and the
F
MAR
was all > 90% except four samples with low coverage
of 6×, 7×, 7×, and 38×, respectively. Two loci (rs16981290
and rs203263) exhibited reduced allelic balance values in
comparison with the other Y-SNPs.
Miscalled reads and miscalled rates
Miscalled reads were defined as reads with base calls that
differed from the SNP genotype calls, which encompassed
ASMRs and ANMRs (see example in Fig. 3a). For
Fig. 1 The coverage of autosomal SNPs and Y-SNPs
Int J Legal Med (2018) 132:997–1006 999
autosomal SNPs, only the miscalled reads of homozygotes
werecountedduetothefactthatitwasdifficulttoidentify
ASMRs for heterozygotes. In total, 21,261 miscalled reads
(0.28%) were observed out of 7,631,248 total reads in
6492 homozygous autosomal SNPs and 3434 Y-SNPs,
which was a very small part of the total reads. Among
them, most were ASMRs, which were over four times
greater than that observed for ANMRs (Fig. 3b). To
Fig. 2 Allelic balance of 90 autosomal SNPs and 34 Y-SNPs. The circle, cross, and rhombus represent homozygotes for autosomal SNPs and
heterozygotes for autosomal SNPs and Y-SNPs, respectively
Fig. 3 Allele-specific miscalled
reads (ASMRs) and allele non-
specific miscalled reads
(ANMRs) for autosomal homo-
zygotes and Y-SNPs. aThe illus-
tration of CCRs, ASMRs, and
ANMRs. bThe proportion of
CCRs, ASMRs, and ANMRs. c
Plots of miscalled reads and total
reads per sample (heterozygotes
for autosomal SNPs were
excluded)
1000 Int J Legal Med (2018) 132:997–1006
explore the relationship between miscalled reads and total
reads, a function was developed (Fig. 3c) and a good linear
correlation was observed. On average, the miscalled rate
was 0.26 ± 0.16% for ASMRs and 0.06 ± 0.05% for
ANMRs per sample. Additionally, different rates were ob-
served among autosomal SNPs, highly polymorphic (two
kinds of alleles were observed in the present population)
and poorly polymorphic (only one allele was observed in
the present population) Y-SNPs, and the rate of ASMRs
was highly associated with genetic diversity (He or GD).
As shown in Fig. 4a, the slope for ANMRs was quite flat
compared with that of ASMRs (0.037 vs 0.209). The
ASMRs of major allele (allele with higher frequency for
one SNP) was lower than that of minor allele (allele with
lower frequency), while there was no significant difference
for ANMRs for both autosomal SNPs and Y-SNPs
(Fig. 4b, c).
Allele frequency of autosomal SNPs and forensic
parameters
A total of 149 no calls were observed for 11,250 SNPs
(90 × 125). These no calls were observed mainly due to low
coverage (17× on average) and the mean value of call rate per
sample was 98.68%. Of the samples, 74.4% (93 out of 125)
were fully genotyped for autosomal SNPs and ≥10 BNN^
(which means Bno call^and is considered invalid genotype
which fails allele calling by Torrent variant caller software)
out of 90 SNPs were detected for four samples. The allele
frequency, H
Obs
,H
Exp
,PIC,PE
duo
,PE
trio
, and DP values are
shown in Table 1. PIC ranged from 0.095 to 0.375 where
rs74091 and rs25193 were the least polymorphic SNPs.
Three SNPs, rs1058083, rs10773760, and rs7520386, failed
the Hardy-Weinberg equilibrium test (p<0.05),andallofthe
SNPs passed after Bonferroni correction (p=0.05/90)except
Fig. 4 The rate of allele-specific miscalled reads (ASMRs) and allele
nonspecific miscalled reads (ANMRs) for autosomal SNPs and Y-
SNPs. aThe relationship between miscalled rate and He (or GD). The
miscalled rates were the median of former and later 50% ordered by He
for autosomal SNPs or by GD for Y-SNPs. bComparison of the miscalled
rates between major and minor allele for autosomal SNPs (tested by one-
way ANOVA). cComparison of the miscalled rate between major and
minor allele for Y-SNPs (tested by Mann-Whitney Utest)
Int J Legal Med (2018) 132:997–1006 1001
Tabl e 1 The frequencies and related forensic parameters of 90 autosomal SNPs in the population from Guangdong province, South China (N=125)
Locus Allele Frequency H
Obs
H
Exp
PIC MP PE
duo
PE
tio
DP HWE
rs1490413 A/G 0.604/0.396 0.488 0.480 0.364 0.387 0.114 0.182 0.613 1.000
rs7520386 A/G 0.648/0.352 0.240 0.458 0.352 0.400 0.104 0.176 0.600 < 10
–3
**
rs4847034 A/G 0.556/0.444 0.488 0.496 0.372 0.378 0.122 0.186 0.622 1.000
rs560681 A/G 0.632/0.368 0.496 0.467 0.357 0.394 0.108 0.178 0.606 0.566
rs10495407 G/A 0.726/0.274 0.435 0.400 0.319 0.442 0.079 0.159 0.558 0.370
rs891700 G/A 0.512/0.487 0.542 0.502 0.375 0.375 0.125 0.188 0.625 0.465
rs1413212 C/T 0.536/0.464 0.496 0.499 0.374 0.376 0.124 0.187 0.624 1.000
rs876724 C/T 0.516/0.484 0.500 0.502 0.375 0.375 0.125 0.187 0.625 1.000
rs1109037 G/A 0.548/0.452 0.520 0.497 0.373 0.377 0.123 0.186 0.623 0.719
rs993934 G/A 0.532/0.468 0.551 0.500 0.374 0.376 0.124 0.187 0.624 0.334
rs12997453 G/A 0.717/0.283 0.443 0.407 0.323 0.436 0.082 0.162 0.564 0.369
rs907100 C/G 0.537/0.463 0.500 0.499 0.374 0.376 0.124 0.187 0.624 1.000
rs1357617 T/A 0.800/0.200 0.304 0.321 0.269 0.514 0.051 0.134 0.486 0.577
rs4364205 G/T 0.588/0.412 0.536 0.486 0.367 0.383 0.117 0.184 0.617 0.271
rs1872575 A/G 0.614/0.386 0.463 0.476 0.362 0.389 0.112 0.181 0.611 0.849
rs1355366 T/C 0.826/0.174 0.248 0.288 0.246 0.550 0.041 0.123 0.450 0.196
rs6444724 T/C 0.512/0.488 0.464 0.502 0.375 0.375 0.125 0.187 0.625 0.473
rs2046361 A/T 0.504/0.496 0.455 0.502 0.375 0.375 0.125 0.187 0.625 0.345
rs6811238 G/T 0.633/0.367 0.444 0.466 0.357 0.395 0.108 0.178 0.605 0.698
rs1979255 G/C 0.564/0.436 0.504 0.494 0.371 0.379 0.121 0.185 0.621 0.858
rs717302 A/G 0.832/0.168 0.304 0.281 0.240 0.558 0.039 0.120 0.442 0.522
rs159606 G/A 0.700/0.300 0.488 0.422 0.332 0.425 0.088 0.166 0.575 0.089
rs7704770 A/G 0.628/0.372 0.488 0.469 0.358 0.393 0.109 0.179 0.607 0.704
rs251934 A/G 0.940/0.060 0.120 0.113 0.106 0.793 0.006 0.053 0.207 1.000
rs338882 A/G 0.544/0.456 0.464 0.498 0.373 0.377 0.123 0.187 0.623 0.474
rs13218440 G/A 0.592/0.408 0.464 0.485 0.366 0.384 0.117 0.183 0.616 0.713
rs214955 T/C 0.500/0.500 0.552 0.502 0.375 0.375 0.125 0.188 0.625 0.286
rs727811 T/G 0.657/0.343 0.504 0.453 0.349 0.403 0.102 0.175 0.597 0.228
rs6955448 C/T 0.704/0.296 0.448 0.418 0.330 0.427 0.087 0.165 0.573 0.521
rs917118 C/T 0.732/0.268 0.424 0.394 0.315 0.446 0.077 0.158 0.554 0.493
rs321198 C/T 0.584/0.416 0.480 0.488 0.368 0.382 0.118 0.184 0.618 0.855
rs737681 C/T 0.884/0.116 0.168 0.206 0.184 0.653 0.021 0.092 0.347 0.059
rs10092491 C/T 0.656/0.344 0.426 0.453 0.350 0.403 0.102 0.175 0.597 0.550
rs4288409 C/A 0.612/0.388 0.488 0.477 0.362 0.388 0.113 0.181 0.612 0.852
rs2056277 C/T 0.844/0.156 0.262 0.264 0.228 0.578 0.035 0.114 0.422 1.000
rs1015250 G/C 0.556/0.444 0.456 0.496 0.372 0.378 0.122 0.186 0.622 0.467
rs7041158 C/T 0.720/0.280 0.447 0.405 0.322 0.437 0.081 0.161 0.563 0.270
rs1463729 C/T 0.520/0.480 0.496 0.501 0.375 0.375 0.125 0.187 0.625 1.000
rs1360288 C/T 0.616/0.384 0.496 0.475 0.361 0.390 0.112 0.181 0.610 0.705
rs10776839 G/T 0.545/0.455 0.484 0.498 0.373 0.377 0.123 0.186 0.623 0.855
rs826472 C/T 0.833/0.167 0.279 0.279 0.239 0.560 0.039 0.120 0.440 1.000
rs735155 T/C 0.892/0.108 0.216 0.193 0.174 0.670 0.019 0.087 0.330 0.357
rs3780962 G/A 0.564/0.436 0.504 0.494 0.371 0.379 0.121 0.185 0.621 0.856
rs740598 A/G 0.508/0.492 0.500 0.502 0.375 0.375 0.125 0.187 0.625 1.000
rs964681 T/C 0.660/0.340 0.472 0.451 0.348 0.405 0.101 0.174 0.595 0.690
rs1498553 C/T 0.570/0.430 0.533 0.492 0.370 0.380 0.120 0.185 0.620 0.461
rs901398 T/C 0.772/0.228 0.312 0.353 0.290 0.482 0.062 0.145 0.518 0.205
rs10488710 G/C 0.596/0.404 0.504 0.484 0.366 0.385 0.116 0.183 0.615 0.711
rs2076848 A/T 0.680/0.320 0.416 0.437 0.341 0.414 0.095 0.170 0.586 0.680
1002 Int J Legal Med (2018) 132:997–1006
rs7520386, which was one of the SNPs that exhibited a
skewed allelic balance. Linkage disequilibrium test indicated
that all of the SNPs were independent from each other except
rs12997453/rs993934 (p< 0.05/4050) and rs2056277/
rs10092491 (p< 0.05/4050) after Bonferroni correction
(supplementary Table 1). The total discrimination power
(TDP) was 1–4.81 × 10
−34
, and the combined power of ex-
clusion (CPE) was 0.99989 for duo paternity testing and
0.99999992 for trio paternity testing. Little genetic differenti-
ation [22] was detected between southern and northern
Tabl e 1 (continued)
Locus Allele Frequency H
Obs
H
Exp
PIC MP PE
duo
PE
tio
DP HWE
rs2269355 G/C 0.620/0.380 0.520 0.473 0.360 0.391 0.111 0.180 0.609 0.343
rs2111980 T/C 0.624/0.376 0.464 0.471 0.359 0.392 0.110 0.180 0.608 1.000
rs10773760 A/G 0.604/0.396 0.360 0.480 0.364 0.387 0.114 0.182 0.613 0.008*
rs1335873 A/T 0.624/0.376 0.544 0.471 0.359 0.392 0.110 0.180 0.608 0.089
rs1886510 G/A 0.843/0.157 0.261 0.265 0.229 0.576 0.035 0.115 0.424 1.000
rs1058083 G/A 0.596/0.404 0.392 0.484 0.366 0.385 0.116 0.183 0.615 0.041*
rs354439 A/T 0.569/0.431 0.476 0.493 0.370 0.380 0.120 0.185 0.620 0.718
rs1454361 T/A 0.612/0.388 0.520 0.477 0.362 0.388 0.113 0.181 0.612 0.350
rs722290 G/C 0.517/0.483 0.466 0.502 0.375 0.375 0.125 0.187 0.625 0.464
rs873196 T/C 0.912/0.088 0.128 0.161 0.148 0.718 0.013 0.074 0.282 0.051
rs4530059 G/A 0.764/0.236 0.376 0.362 0.296 0.474 0.065 0.148 0.526 0.804
rs2016276 T/C 0.556/0.444 0.488 0.496 0.372 0.378 0.122 0.186 0.622 1.000
rs1821380 C/G 0.568/0.432 0.464 0.493 0.370 0.380 0.120 0.185 0.620 0.585
rs1528460 T/C 0.575/0.425 0.517 0.491 0.369 0.381 0.119 0.185 0.619 0.579
rs729172 G/T 0.848/0.152 0.294 0.264 0.229 0.577 0.033 0.112 0.423 0.460
rs2342747 G/A 0.644/0.356 0.440 0.460 0.353 0.398 0.105 0.177 0.602 0.696
rs430046 C/T 0.656/0.344 0.464 0.453 0.349 0.403 0.102 0.175 0.597 0.843
rs1382387 A/C 0.684/0.316 0.392 0.434 0.339 0.416 0.093 0.169 0.584 0.304
rs9905977 G/A 0.624/0.376 0.432 0.471 0.359 0.392 0.110 0.180 0.608 0.444
rs740910 A/G 0.947/0.053 0.105 0.100 0.095 0.815 0.005 0.048 0.185 1.000
rs938283 T/C 0.876/0.124 0.232 0.218 0.194 0.636 0.024 0.097 0.364 0.690
rs2292972 T/C 0.644/0.356 0.536 0.460 0.353 0.398 0.105 0.177 0.602 0.079
rs1493232 C/A 0.624/0.376 0.544 0.471 0.359 0.392 0.110 0.180 0.608 0.088
rs9951171 G/A 0.524/0.476 0.536 0.501 0.374 0.376 0.124 0.187 0.624 0.476
rs1736442 C/T 0.621/0.379 0.452 0.473 0.360 0.391 0.111 0.180 0.609 0.703
rs1024116 C/T 0.915/0.085 0.171 0.157 0.144 0.724 0.012 0.072 0.276 0.599
rs719366 A/G 0.792/0.208 0.320 0.331 0.275 0.504 0.054 0.138 0.496 0.786
rs576261 A/C 0.592/0.408 0.528 0.485 0.366 0.384 0.117 0.183 0.616 0.355
rs1031825 A/C 0.504/0.496 0.480 0.502 0.375 0.375 0.125 0.187 0.625 0.721
rs445251 C/G 0.656/0.344 0.492 0.453 0.350 0.403 0.102 0.175 0.597 0.422
rs1005533 G/A 0.660/0.340 0.424 0.451 0.348 0.405 0.101 0.174 0.595 0.552
rs1523537 T/C 0.585/0.415 0.427 0.488 0.368 0.382 0.118 0.184 0.618 0.194
rs722098 G/A 0.532/0.468 0.520 0.500 0.374 0.376 0.124 0.187 0.624 0.721
rs2830795 G/A 0.548/0.452 0.440 0.497 0.373 0.377 0.123 0.186 0.623 0.208
rs2831700 G/A 0.572/0.428 0.456 0.492 0.370 0.380 0.120 0.185 0.620 0.463
rs914165 G/A 0.684/0.316 0.424 0.434 0.339 0.416 0.093 0.169 0.584 0.837
rs221956 C/T 0.596/0.404 0.456 0.484 0.366 0.385 0.116 0.183 0.615 0.580
rs733164 G/A 0.876/0.124 0.200 0.218 0.194 0.636 0.024 0.097 0.364 0.399
rs987640 T/A 0.552/0.448 0.496 0.497 0.372 0.378 0.122 0.186 0.622 1.000
rs2040411 G/A 0.712/0.288 0.368 0.412 0.326 0.432 0.084 0.163 0.568 0.277
rs1028528 A/G 0.612/0.388 0.424 0.477 0.362 0.388 0.113 0.181 0.612 0.260
*p<0.05,**p< 0.00056 (0.05/90)
Int J Legal Med (2018) 132:997–1006 1003
Chinese populations (Fst < 0.05 for all of the SNPs) and the
most dramatic differences in allele frequency variation were
detected with Africans and Europeans. Details were presented
in supplementary Tables 2and 3.
Comparison of haplogrouping based on Y-STR
and Y-SNP haplotypes
A no call rate of 2.3% (79 out of 3434), higher than that of
autosomal SNPs, was observed in male samples mostly due to
low coverage as well. Out of 34 Y-SNPs, only 15 SNPs were
observed possessing two kinds of alleles in the population.
Totally, seven haplotypes were detected (Fig. 5)andtheHD
was 0.644. Based on Y-SNP haplotypes, seven haplogroups
were assigned. Haplogroup O2 accounted for more than a half
(54.5%) and O1a, O1b, C, N, D, and Q accounted for 18.8%,
9.9%, 7.9%, 5.9%, 2.0% and 1.0%, respectively. For the con-
cordance study, 67 out of the 101 samples were assigned the
haplogroups automatically based on Y-STR haplotypes with
the parameter settings as mentioned before, of which 97.01%
(65/67) of them were assigned consistent haplogoups
(Table 2).
Discussion
In this study, we investigate the performance and polymor-
phisms of the HID-Ion AmpliSeq™Identity Panel used in a
population ofsouthern China. When coverage was low, allelic
imbalance might occur more frequently, even resulting in no
calls. High coverage and relatively low allelic balance were
also observed for several SNPs, of which rs214955, rs430046,
and rs7520386 were also shown to exhibit allelic imbalance in
other studies [8,9,11,15]. These three loci seemed performed
not as well as other SNPs, of which rs7520386 performed the
worst. Additionally, rs7520386 also showed the highest
miscalled rate with a value of 3.23% (supplementary
Table 4).Therefore, it was necessary to modify the primers
of these problematic SNPs to improve the performance of
the panel.
Upon analysis of miscalled reads, we observed that the rate
of ASMRs was higher than that of ANMRs and increased
linearly with coverages, both of which were also observed
and regarded as background signals in the Guo study [8].
Additionally, differential rates were observed among autoso-
mal SNPs, highly polymorphic and poorly polymorphic Y-
SNPs, which implied that genetic diversity might also be re-
lated. As Fig. 4a showed, the rate of ASMRs was highly
associated with genetic diversity (k= 0. 209) while the slope
for ANMRs was quite flat (k= 0.037), which indicated that He
or GD contributed little to the variation of ANMRs. To ex-
plore whether allelic frequency was related as well, miscalled
rates of major and minor allele were compared. In spite of a
nonsignificant pvalue (p=0.178)forY-SNPs,theASMRsof
major allele was lower than that of minor allele for both auto-
somal SNPs and Y-SNPs while there was no significant dif-
ference for ANMRs. Therefore, it seemed these two kinds of
miscalled reads resulted from different ways. It is worth men-
tioning that these miscalled reads (ASMRs especially) or
background noises, similar to stutter for STR, have a critical
influence on mixture analysis in forensic practice. On the oth-
er hand, how these background noises were produced was still
Tabl e 2 Haplogrouping based on Y-STR haplotypes and Y-SNP haplotypes (n=67)
Y-STR Y-SNP Total
O2 O1a O1b C D N R2
O2 32 32
O1 19 8 1 28
C2 4 4
D11
N11
R2 1 1
Total 32 19 9 4 1 2 0 67
Numbers colored gray represent consistent results
Fig. 5 Haplotypes of 101 male
samples based on 34 Y-SNPs.
Loci colored gray represent
highly polymorphic loci
1004 Int J Legal Med (2018) 132:997–1006
not very clear. Artifact produced in the PCR procedure and
sequencing error might be part of the reasons. Furthermore,
since barcodes were utilized and samples were sequenced si-
multaneously, barcode contamination mightalso be one of the
reasons, possibly as a result of incorrect ligation of carry-over
barcodes after pooling together. It should be studied further in
the future.
In this study, rs12997453/rs993934 and rs2056277/
rs10092491 failed the linkage disequilibrium test even after
Bonferroni correction, which showed different results of other
studies [5,8,15]. Since the physic distances were > 109 Mb
for rs2056277/rs10092491 and > 58 Mb for rs12997453/
rs993934, it was generally thought independent for these
two pairs of makers. Considering a relatively small sample
size of our study, failing of linkage disequilibrium test might
result from random effect. These polymorphic SNPs are pow-
erful tools for individual identification and trio paternity test-
ing, which perform comparable to that of 22 STRs [8]. But it
may not enough for duo paternity testing, not mention for
other relative testing. In the current research or applications
of paternity and kinship testing, SNPs are more frequently
regarded as complements to STR typing [3,23]. As estimated
by Mo et al. [3], 85, 127, 491, and 1858 putative SNP loci are
required to investigate parent-child, full-sibling, half-sibling/
uncle-nephew, and first-cousin relationships with a false test-
ing level of 0.1%. However, when a great number of SNPs are
utilized, it is more likely that linkages will be emerged, which
should be noticed.
Moderate genetic difference was observed for some SNPs
between Chinese population and Japanese, Africans,
Americans, or Europeans, but little frequency variation was
detected between southern and northern Chinese, indicating
that this panel could be widely applicable across Chinese pop-
ulation. A tendency could be observed that allele frequency
variation was in accordance with the differences in geographic
locations and similar pattern was also detected in previous
study based on Y-STR [24].
Haplogroup determination is of great interest in the study
of human population genetics as well as forensic genetics, as it
reveals the phylogenetic relationships by descent [25].
Haplogroup can be inferred by either Y-SNP or Y-STR typing.
A study has shown that there is a high degree of concordance
between these two methods. Muzzio et al. demonstrated that
Y-STR-based haplogrouping software systems offered rela-
tively low accuracy [25]. However, in this study, 97.01% of
samples were assigned consistent haplogroups with Y-STR
and Y-SNP haplotypes, which was similar to that of others’
results [26–28]. Not enough markers (only seven STRs were
typed) may be the reason that leads to low accuracy in
Muzzio’sstudy[29]. It should be noticed that either because
limited markers are included in this panel or no adequate da-
tabase are available to calibrate the program, some samples
may be defined as same haplogroup but different sub-
haplogroup between two methods. For example, haplogroup
C was defined based on Y-SNPs while it was predicted as C2
based on Y-STRs. Similarly, haplogroup O1a and O1b were
determined based on Y-SNPs while O1 was obtained using
Whit Athey’s method. Still, we considered they were consis-
tent. Additionally, the high concordance was based on strict
parameter settings in Whit Athey’s method (with minimum
fitness score = 20, minimum probability = 85%, and area
priors = Bequal priors^) and 34 out of the 101 samples could
not be assigned the haplogroups automatically. For these 34
samples, if wechose the haplogroupwith the maximum prob-
ability as their haplogroup, all the samples could obtain a
haplogroup assignment but the concordance would decrease
to 86.14%. Given the high mutation rates of the Y-STRs, it is
possible to find the same Y-STR haplotype in samples from
different haplogroups [29], which might explain why the con-
cordance cannot reach 100%. On the other hand, though Y-
SNP analysis appears to represent a more optimal approach
for haplogroup determination, considering the widespread
popularity of Y-STR typing in forensic DNA labs, it is still
practical to determine haplogroup with Y-STR haplotypes
preliminarily.
Conclusion
The Ion Torrent PGM™is a promising platform for forensic
genetics research and applications. The HID-Ion AmpliSeq™
Identity Panel proved to be a powerful tool for individual
identification and trio paternity testing in Chinesepopulations.
However, additional SNPs are required to facilitate both duo
paternity testing and relative testing using this panel. The
miscalled rates were 0.26 ± 0.16% for ASMRs and
0.06 ± 0.05% for ANMRs. ASMRs were associated with ge-
netic diversity and allele frequency while neither was related
for ANMRs, which indicated that they might result from dif-
ferent ways. Additionally, high concordance of haplogrouping
assignment can be obtained with Y-STR and Y-SNP
haplotypes.
Funding This study was funded by the National Natural
Science Foundation of China (81671873, 81273347),
Fundamental Research Funds for the Central Universities
(16ykzd08).
Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of
interest.
Ethical approval All procedures performed in studies involving hu-
man participants were in accordance with the ethical standards of the
institutional and/or national research committee and with the 1964
Int J Legal Med (2018) 132:997–1006 1005
Helsinki declaration and its later amendments or comparable ethical
standards.
Informed consent Informed consent was obtained from all individual
participants included in the study.
References
1. Kidd KK, Pakstis AJ, Speed WC, Grigorenko EL, Kajuna SLB,
Karoma NJ, Kungulilo S, Kim J, Lu R, Odunsi A, Okonofua F,
Parnas J, Schulz LO, Zhukova OV, Kidd JR (2006) Developing a
SNP panel for forensic identification of individuals. Forensic Sci Int
164:20–32
2. Amorim A, Pereira L (2005) Pros and cons in the use of SNPs in
forensic kinship investigation: a comparative analysis with STRs.
Forensic Sci Int 150:17–21
3. Mo S, Liu Y, Wang S, Bo X, Li Z, Chen Y, Ni M (2016) Exploring
the efficacy of paternity and kinship testing based on single nucle-
otide polymorphisms. Forensic Sci Int Genet 22:161–168
4. Sanchez JJ, Phillips C, Borsting C, Balogh K, Bogus M, Fondevila
M, Harrison CD, Musgrave-Brown E, Salas A, Syndercombe-
Court D, Schneider PM, Carracedo A, Morling N (2006) A multi-
plex assay with 52 single nucleotide polymorphisms for human
identification. Electrophoresis 27:1713–1724
5. Pakstis AJ, Speed WC, Fang R, Hyland FCL, Furtado MR, Kidd
JR, Kidd KK (2010) SNPs for a universal individual identification
panel. Hum Genet 127:315–324
6. Sobrino B, Brión M, Carracedo A (2005) SNPs in forensic genetics:
a review on SNP typing methodologies. Forensic Sci Int 154:181–
194
7. Seo SB, King JL, Warshauer DH, Davis CP, Ge J, Budowle B
(2013) Single nucleotide polymorphism typing with massively par-
allel sequencing for human identification. Int J Legal Med 127:
1079–1086
8. GuoF,ZhouY,SongH,ZhaoJ,ShenH,ZhaoB,LiuF,JiangX
(2016) Next generation sequencing of SNPs using the HID-Ion
AmpliSeq™Identity Panel on the Ion Torrent PGM™platform.
Forensic Sci Int Genet 25:73–84
9. Børsting C, Fordyce SL, Olofsson J, Mogensen HS, Morling N
(2014) Evaluation of the Ion Torrent™HID SNP 169-plex: a
SNP typing assay developed for human identification by second
generation sequencing. Forensic Sci Int Genet 12:144–154
10. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE,
Wain J, Pallen MJ (2012) Performance comparison of benchtop
high-throughput sequencingplatforms.Nat Biotechnol 30:434–439
11. Eduardoff M, Santos C, de la Puente M, Gross TE, Fondevila M,
Strobl C, Sobrino B, Ballard D, Schneider PM, Carracedo Á, Lareu
MV, Parson W, Phillips C (2015) Inter-laboratory evaluation of
SNP-basedforensic identification by massively parallel sequencing
using the Ion PGM™. Forensic Sci Int Genet 17:110–121
12. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey
M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J,
SimonsJF,MarranD,MyersJW,DavidsonJF,BrantingA,
Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT,
Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao
X, Reed B, Sabina J, Feierstein E, Schorn M, Alanjary M,
Dimalanta E, Dressman D, Kasinskas R, Sokolsky T, Fidanza JA,
Namsaraev E, McKernan KJ, Williams A, Roth GT, Bustillo J
(2011) An integrated semiconductor device enabling non-optical
genome sequencing. Nature 475:348–352
13. Elena S, Alessandro A, Ignazio C, Sharon W, Luigi R, Andrea B
(2016) Revealing the challenges of low template DNA analysis
with the prototype Ion AmpliSeq™Identity panel v2.3 on the
PGM™Sequencer. Forensic Sci Int Genet 22:25–36
14. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura
SL, Hammer MF (2008) New binary polymorphisms reshape and
increase resolution of the human Y chromosomal haplogroup tree.
Genome Res 18:830–838
15. Zhang S, Bian Y, Zhang Z, Zheng H, Wang Z, Zha L, Cai J, Gao Y,
Ji C, Hou Y, Li C (2015) Parallel analysis of 124 universal SNPs for
human identification by targeted semiconductor sequencing. Sci
Rep-UK 5:18683
16. Athey TW (2006) Haplogroup prediction from Y-STR values using
a Bayesian-allele-frequency approach. J Genet Geneal 2:34–39
17. Athey TW (2005) Haplogroup prediction from Y-STR values using
an allele-frequency approach. J Genet Geneal 1:1–7
18. Kalinowski ST, Taper ML, Marshall TC (2007) Revising how the
computer program CERVUS accommodates genotyping error in-
creases success in paternity assignment. Mol Ecol 16:1099–1106
19. ExcoffierL, Lischer HE (2010) Arlequin suite ver 3.5: a new series
of programs to perform population genetics analyses under Linux
and Windows. Mol Ecol Resour 10:564–567
20. Amigo J, Salas A, Phillips C, Carracedo A (2008) SPSmart:
adapting population based SNP genotype databases for fast and
comprehensive web access. BMC Bioinformatics 9:428
21. International Society of Genetic Genealogy (ISOGG): Y-DNA
Haplogroup Tree 2017, Version: 12.128. In, 2017. Available at:
http://www.isogg.org/tree/
22. Wright S (1978) Evolution and the genetics of populations. Vol. 4.
Variability within and among natural populations. University of
Chicago Press, Chicago
23. Phillips C, García-Magariños M, Salas A, Carracedo Á, Lareu MV
(2012) SNPs as supplements in simple kinship analysis or as core
markers in distant pairwise relationship tests: when do SNPs add
value or replace well-established and powerful STR tests? Transfus
Med Hemoth 39:202–210
24. Wang Y, Liu C, Zhang CC, Li R, Li Y, XL O, Sun HY (2015)
Analysis of 17 Y-STR loci haplotype and Y-chromosome
haplogroup distribution in five Chinese ethnic groups.
Electrophoresis 36:2546–2552
25. Muzzio M, Ramallo V, Motti JMB, Santos MR, López Camelo JS,
Bailliet G (2011) Software for Y-haplogroup predictions: a word of
caution. Int J Legal Med 125:143–147
26. Petrejcikova E, Carnogurska J, Hronska D, Bernasovska J,
Boronova I, Gabrikova D, Bozikova A, Macekova S (2014) Y-
SNP analysis versus Y-haplogroup predictor in the Slovak popula-
tion. Anthropol Anz 71:275–285
27. Dogan S, Babic N, Gurkan C, Goksu A, Marjanovic D, Hadziavdic
V (2016) Y-chromosomal haplogroup distribution in the Tuzla
Canton of Bosnia and Herzegovina: a concordance study using four
different in silico assignment algorithms based on Y-STR data.
Homo 67:471–483
28. Nunez C, Geppert M, Baeta M, Roewer L, Martinez-Jarreta B
(2012) Y chromosome haplogroup diversity in a Mestizo popula-
tion of Nicaragua. Forensic Sci Int Genet 6:e192–e195
29. Athey W (2011) Comments on the article, BSoftware for Y
haplogroup predictions, a word of caution^. Int J Legal Med
125(901–903):905–906
1006 Int J Legal Med (2018) 132:997–1006
A preview of this full-text is provided by Springer Nature.
Content available from International Journal of Legal Medicine
This content is subject to copyright. Terms and conditions apply.