ArticlePDF Available

SNP typing using the HID-Ion AmpliSeq™ Identity Panel in a southern Chinese population

July 2018
International Journal of Legal Medicine 132(901–903)

July 2018
132(901–903)

DOI:10.1007/s00414-017-1706-3

Authors:

Ran Li

Sun Yat-Sen University

Riga wu

Sun Yat-Sen University

Show all 12 authorsHide

In the present study, 90 autosomal single nucleotide polymorphisms (SNPs) and 34 Y chromosomal SNPs were sequenced simultaneously using HID-Ion AmpliSeq™ Identity Panel on the Ion PGM™ platform for 125 samples in a southern Chinese population. Raw data were analyzed and forensic parameters were calculated. Haplogrouping concordance was also assessed using alternative methods based on Y-SNP haplotypes and Y-STR haplotypes. The results showed that allelic imbalance occurred more frequently with low coverage while several SNPs with high coverage were also observed with poor allelic balance, including rs214955, rs430046, rs7520386, rs876724, rs9171188, rs16981290, and rs2032631. Totally, 21,261 miscalled reads (0.28%) were observed. The rate of allele-specific miscalled reads (ASMRs) was higher than that of allele nonspecific miscalled reads (ANMRs) and associated with genetic diversity of the SNP. The ASMRs of major allele were lower than that of minor allele while there was no difference for ANMRs. The combined discrimination power (CDP) was 1–4.81 × 10⁻³⁴ and the combined power of exclusion (CPE) was 0.99989 and 0.99999992 for duo and trio paternity testing, respectively. No significant genetic difference was detected between southern and northern Chinese populations. For haplogroup study, O2 was the predominant haplogroup and 97.01% of samples were assigned consistent haplogoups with Y-SNP and Y-STR haplotypes. In conclusion, the AmpliSeq™ Identity Panel was powerful for individual identification and trio paternity testing. ASMRs were associated with the genetic diversity and allele frequency while neither was related for ANMRs. High concordance of haplogrouping assignment can be obtained with Y-STR and Y-SNP haplotypes.

The coverage of autosomal SNPs and Y-SNPs

…

Allelic balance of 90 autosomal SNPs and 34 Y-SNPs. The circle, cross, and rhombus represent homozygotes for autosomal SNPs and heterozygotes for autosomal SNPs and Y-SNPs, respectively

…

Allele-specific miscalled reads (ASMRs) and allele nonspecific miscalled reads (ANMRs) for autosomal homozygotes and Y-SNPs. a The illustration of CCRs, ASMRs, and ANMRs. b The proportion of CCRs, ASMRs, and ANMRs. c Plots of miscalled reads and total reads per sample (heterozygotes for autosomal SNPs were excluded)

…

The rate of allele-specific miscalled reads (ASMRs) and allele nonspecific miscalled reads (ANMRs) for autosomal SNPs and Y-SNPs. a The relationship between miscalled rate and He (or GD). The miscalled rates were the median of former and later 50% ordered by He for autosomal SNPs or by GD for Y-SNPs. b Comparison of the miscalled rates between major and minor allele for autosomal SNPs (tested by one-way ANOVA). c Comparison of the miscalled rate between major and minor allele for Y-SNPs (tested by Mann-Whitney U test)

…

Haplotypes of 101 male samples based on 34 Y-SNPs. Loci colored gray represent highly polymorphic loci

…

Figures - available from: International Journal of Legal Medicine

This content is subject to copyright. Terms and conditions apply.

Content uploaded by Dan Peng

Content may be subject to copyright.

ORIGINAL ARTICLE

SNP typing using the HID-Ion AmpliSeq™Identity Panel

in a southern Chinese population

Ran Li

&Chuchu Zhang

&Haiyan Li

&Riga Wu

&Haixia Li

&Zhenya Tang

Chenhao Zhen

&Jianye Ge

&Dan Peng

&Ying Wang

&Hongying Chen

Hongyu Sun

1,5

Received: 19 March 2017 /Accepted: 11 October 2017 /Published online: 18 October 2017

#Springer-Verlag GmbH Germany 2017

Abstract In the present study, 90 autosomal single nucleotide

polymorphisms (SNPs) and 34 Y chromosomal SNPs were

sequenced simultaneously using HID-Ion AmpliSeq™

Identity Panel on the Ion PGM™platform for 125 samples

in a southern Chinese population. Raw data were analyzed

and forensic parameters were calculated. Haplogrouping con-

cordance was also assessed using alternative methods based

on Y-SNP haplotypes and Y-STR haplotypes. The results

showed that allelic imbalance occurred more frequently with

low coverage while several SNPs with high coverage were

also observed with poor allelic balance, including rs214955,

rs430046, rs7520386, rs876724, rs9171188, rs16981290, and

rs2032631. Totally, 21,261 miscalled reads (0.28%) were ob-

served. The rate of allele-specific miscalled reads (ASMRs)

was higher than that of allele nonspecific miscalled reads

(ANMRs) and associated with genetic diversity of the SNP.

The ASMRs of major allele were lower than that of minor

allele while there was no difference for ANMRs. The com-

bined discrimination power (CDP) was 1–4.81 × 10

−34

and

the combined power of exclusion (CPE) was 0.99989 and

0.99999992 for duo and trio paternity testing, respectively.

No significant genetic difference was detected between south-

ern and northern Chinese populations. For haplogroup study,

O2 was the predominant haplogroup and 97.01% of samples

were assigned consistent haplogoups with Y-SNP and Y-STR

haplotypes. In conclusion, the AmpliSeq™Identity Panel was

powerful for individual identification and trio paternity test-

ing. ASMRs were associated with the genetic diversity and

allele frequency while neither was related for ANMRs. High

concordance of haplogrouping assignment can be obtained

with Y-STR and Y-SNP haplotypes.

Keywords Single nucleotide polymorphism (SNP) .Next

generation sequencing (NGS) .Ion torrent PGM™.

Population genetics .Miscalled reads

Introduction

Single nucleotide polymorphism (SNP), with lower mutation

rates and smaller amplicon sizes compared with routinely used

short tandem repeats (STR), is being considered as a poten-

tially useful tool in forensic human identification [1,2]. Due to

the di-allelic nature of SNP, the per-locus discrimination pow-

er is weaker than that of STR, while it can be compensated by

typing additional independent loci [3]. Several autosomal

SNP marker sets have been developed with various genotyp-

ing methods, including single-base extension, chip-based mi-

croarrays, and allele-specific hybridization arrays, [1,4–6].

However, either due to small number of SNP loci in a single

Ran Li and Chuchu Zhang contributed equally to the article.

Electronic supplementary material The online version of this article

(https://doi.org/10.1007/s00414-017-1706-3) contains supplementary

material, which is available to authorized users.

*Hongyu Sun

sunhy@mail.sysu.edu.cn; sunhongyu2002@163.com

Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun

Yat-sen University, Guangzhou 510080, People’s Republic of China

The Center of Criminal Technology of Guangdong Province,

Guangzhou 510050, People’sRepublicofChina

The Second Clinical Medical School (Zhujiang Hospital), Southern

Medical University, Guangzhou 510280, People’sRepublicofChina

Thermo Fisher Scientific Inc, South San Francisco, CA 94080, USA

Guangdong Province Translational Forensic Medicine Engineering

Technology Research Center, Zhongshan School of Medicine, Sun

Yat-sen University, Guangzhou 510089, Guangdong, People’s

Republic of China

Int J Legal Med (2018) 132:997–1006

https://doi.org/10.1007/s00414-017-1706-3

analysis or requesting large amount of input DNA, these sets

were not widely used by forensic DNA labs [7,8].

Recently, massively parallel sequencing (MPS) or next-

generation sequencing (NGS) technologies, with acceptable

sequencing accuracy and costs, are highly interesting for the

forensic genetic community. They provide the possibility to

detect several hundred to thousand markers (including differ-

ent kinds of makers, e.g., SNP and STR) simultaneously and

also allow multiple samples to be processed in a joint sequenc-

ing run using sample-tagging DNA barcodes. Furthermore,

detailed sequence information pertaining to target regions

can also be generated using this technology [9–11]. The Ion

Torrent Personal Genome Machine (PGM) was launched in

early 2011 and is the first commercial sequencing machine

that does not require fluorescence and camera scanning,

resulting in higher speed, lower cost, and smaller instrument

size [10,12]. In addition, a study by Elena et al. showed that

based on this platform, it was possible to obtain consistent

SNP profiles with 31 pg of DNA and partial informative pro-

files with as little as 5 pg or with severely degraded DNA [13].

The HID-Ion AmpliSeq™Identity Panel (HID Identity Panel)

released by Thermo Fisher Scientific co-amplifies 90 autoso-

mal SNPs (A-SNPs) and 34 Y chromosomal SNPs (Y-SNPs),

which were selected based on the study of Pakstis et al. [5],

Sanchez et al. [4], and Karafet et al. [14]. It was reported that

powerful capacity for personal identification could be gener-

ated using this panel [14].

Previous studies based on this panel were performed on

relatively small population size and no population data had

been done for Guangdong province in south China, especially

for the Y-SNP typing. Therefore, further exploration was con-

ducted in the present study.

Materials and methods

Samples, DNA extraction, and DNA quantification

Peripheral blood samples from 101 male and 24 female

unrelated individuals in Guangdong province in south

China were collected with informed consent. DNA was

extracted on the AutoMate Express™Forensic DNA

Extraction System (Thermo Fisher Scientific, MA,

USA) with the PrepFiler Express BAT ™Forensic

DNA Extraction Kit (Thermo Fisher). DNA extracts

were quantified on the Qubit ®2.0 fluorometer

(Thermo Fisher) using the Qubit ®dsDNA HS Assay

Kit (Thermo Fisher) according to the manufacturer’s

protocol. The study was approved by the Human

Subjects Committee of Sun Yat-sen University (No.

2016-008).

Y-STR genotyping and haplogrouping

All of the male samples were genotyped using AmpFLSTR®

Yfiler™PCR Amplification kit (Thermo Fisher) and Y

haplogroups were predicted using Whit Athey’sHaplogroup

Predictor (http://www.hprg.com/hapest5/index.html)[16,17]

with minimum fitness score = 20, minimum probability =

85%, and area priors = Bequal priors.^

Library preparation, purification, and quantification

Libraries were constructed using the Ion AmpliSeq™Library

Kit 2.0 and the Ion AmpliSeq™Identity Panel v2.3 (Thermo

Fisher) following the manufacturer’s recommendations. A to-

tal of 1 ng of inputDNA was processed using the GeneAmp®

9700 System (Thermo Fisher) and the following thermal cy-

cling conditions: 2 min at 99 °C, 15 s at 99 °C, and 4 min at

60 °C for 21 cycles, and a final hold at 10 °C. 2 μL of FuPa

reagent (Thermo Fisher) was added to digest excess PCR

primers. The reactions were then incubated for 10 min at

50 °C, 10 min at 55 °C, and 20 min at 60 °C with a final hold

at 10 °C. The libraries were barcoded using the Ion Xpress™

barcode adapter (Thermo Fisher) with the following tempera-

ture incubation steps: 22 °C for 30 min, 72 °C for 10 min, and

a hold step at 10 °C. Libraries were then purified using 1.5 ×

Agencourt®AMPure®XP reagent (Beckman Coulter, FL,

USA) according to the manufacturer’s instructions. Purified

libraries were quantified on ABI 7500 Real-time PCR System

with the Ion Library Quantitation Kit (Thermo Fisher) and

subsequently diluted to 20 pM. All barcoded libraries were

equivolume mixed.

Emulsion PCR and sequencing

Emulsion PCR (emPCR) was performed on the OneTouch™

2 (OT2) instrument (Thermo Fisher) with the Ion PGM™

Template OT2 200 Kit (Thermo Fisher) and template-

positive Ion Sphere Particles (ISPs) were enriched on the Ion

OneTouch™ES instrument (Thermo Fisher). Sequencing was

performed using the Ion PGM™Sequencing 200 Kit v2 on

Ion 314™or 316™chips (depending on the sample size)

following the manufacturer’sprotocols.

Data analysis

Raw data were processed using the Ion Torrent Suite Sever

version 4.6 (Thermo Fisher). Homo sapiens hg 19 was used as

reference genome to facilitate alignment. The

HID_SNP_Genotyper plugin v.4.3.1 was launched to facili-

tate the genotyping of SNPs with germline low stringency.

This plugin was also used to generate comprehensive analysis

reports including CSV files containing detailed mapping,

998 Int J Legal Med (2018) 132:997–1006

genotype, coverage, and quality check information for each

sample in the run.

Statistics

The CSV files were further analyzed using Microsoft Excel

2010. The frequency of major allele reads (F

MAR

)was

adopted to assess allelic balance [16,17]. Y-SNPs with no

calls were re-genotyped by checking at the data manually

and making an allele call with a minimum coverage ≥6×

and a minimum F

MAR

≥50%. Base miscalling in autosomal

SNP homozygotes or Y-SNPs, which included allele-specific

miscalled reads (ASMR, defined as the miscalled reads of

alternative locus-specific allele) and allele nonspecific

miscalled reads (ANMR, defined as the miscalled reads of

none locus-specific alleles) was analyzed separately. Cervus

3.0 [18] was employed to calculate allele frequency observed

and expected heterozygosity (H

obs

and H

Exp

), matching prob-

ability (MP), discrimination power (DP), polymorphism infor-

mation content (PIC), exclusion probability for duo paternity

testing (PE

duo

), and trio paternity testing (PE

trio

). Hardy-

Weinberg equilibrium (HWE), linkage disequilibrium (LD),

and Fstatistic (Fst) were calculated utilizing Arlequin 3.5

[19]. To compare distribution differences between southern

and northern Chinese Han populations as well as other coun-

tries and continents, frequency data of autosomal SNPs were

downloaded from SPSmart (http://spsmart.cesga.es/)[20]. For

Y-SNPs, the genetic diversity (GD) among individuals was

calculated as GD = 1 −∑pi

,wherepi represented the fre-

quency of the ith allele. Y-SNP haplotypes were manually

counted and haplotype diversity (HD) was calculated as

HD = [N(1 −∑pi

)] / (N−1), where Nrepresented the number

of haplotypes and pi represented the frequency of the ith

haplotype.

Y-SNP haplogrouping

Haplogroup assignment was determined according to the

International Society of Genetic Genealogy (ISOGG) Y-

DNA Haplogroup Tree 2017 (http://www.isogg.org/tree/)

[21]. Concordances between Y-SNP-based haplogrouping

and Y-STR-based haplogrouping were explored.

Results

Coverage and allele balance

Figure 1showed the coverage variations for each SNP.

Compared withautosomal SNPs, Y-SNPs displayed relatively

lower coverage (400 ± 450× vs 950 ± 989× on average),

which might be explained by the single-copy nature of the Y

chromosome. For autosomal SNPs, the highest coverage was

observed for rs13218440 (1972 ± 1272×) while the lowest

coverage was only 169 ± 112× for rs2342747. Other ineffi-

ciently amplified (< 300×) SNPs were rs876724(231 ± 175×),

rs10488710 (253 ± 196×), rs729172 (263 ± 311×), rs993934

(254 ± 380×), and rs12997453 (253 ± 368×).

As shown in Fig. 2, significant differences were observed

for F

MAR

(%) value of homozygotes and heterozygotes. The

MAR

(%) value for most homozygotes was > 90%, apart from

two SNPs in three individuals with critical values of 89.3%

(rs2046361), 89.5% (rs7520386), and 89.8% (rs7520386), re-

spectively. The F

MAR

(%) values for heterozygotes were

mainly between 50 and 60% and most loci showed good al-

lelic balance except five: rs214955, rs430046, rs7520386,

rs876724, and rs9171188, of which rs7520386 performed

the worst. Unusual allelic balance in heterozygous

MAR

> 65%) showed relatively low coverage (< 160 on

average). Similar results were obtained for Y-SNPs and the

MAR

was all > 90% except four samples with low coverage

of 6×, 7×, 7×, and 38×, respectively. Two loci (rs16981290

and rs203263) exhibited reduced allelic balance values in

comparison with the other Y-SNPs.

Miscalled reads and miscalled rates

Miscalled reads were defined as reads with base calls that

differed from the SNP genotype calls, which encompassed

ASMRs and ANMRs (see example in Fig. 3a). For

Fig. 1 The coverage of autosomal SNPs and Y-SNPs

Int J Legal Med (2018) 132:997–1006 999

autosomal SNPs, only the miscalled reads of homozygotes

werecountedduetothefactthatitwasdifficulttoidentify

ASMRs for heterozygotes. In total, 21,261 miscalled reads

(0.28%) were observed out of 7,631,248 total reads in

6492 homozygous autosomal SNPs and 3434 Y-SNPs,

which was a very small part of the total reads. Among

them, most were ASMRs, which were over four times

greater than that observed for ANMRs (Fig. 3b). To

Fig. 2 Allelic balance of 90 autosomal SNPs and 34 Y-SNPs. The circle, cross, and rhombus represent homozygotes for autosomal SNPs and

heterozygotes for autosomal SNPs and Y-SNPs, respectively

Fig. 3 Allele-specific miscalled

reads (ASMRs) and allele non-

specific miscalled reads

(ANMRs) for autosomal homo-

zygotes and Y-SNPs. aThe illus-

tration of CCRs, ASMRs, and

ANMRs. bThe proportion of

CCRs, ASMRs, and ANMRs. c

Plots of miscalled reads and total

reads per sample (heterozygotes

for autosomal SNPs were

excluded)

1000 Int J Legal Med (2018) 132:997–1006

explore the relationship between miscalled reads and total

reads, a function was developed (Fig. 3c) and a good linear

correlation was observed. On average, the miscalled rate

was 0.26 ± 0.16% for ASMRs and 0.06 ± 0.05% for

ANMRs per sample. Additionally, different rates were ob-

served among autosomal SNPs, highly polymorphic (two

kinds of alleles were observed in the present population)

and poorly polymorphic (only one allele was observed in

the present population) Y-SNPs, and the rate of ASMRs

was highly associated with genetic diversity (He or GD).

As shown in Fig. 4a, the slope for ANMRs was quite flat

compared with that of ASMRs (0.037 vs 0.209). The

ASMRs of major allele (allele with higher frequency for

one SNP) was lower than that of minor allele (allele with

lower frequency), while there was no significant difference

for ANMRs for both autosomal SNPs and Y-SNPs

(Fig. 4b, c).

Allele frequency of autosomal SNPs and forensic

parameters

A total of 149 no calls were observed for 11,250 SNPs

(90 × 125). These no calls were observed mainly due to low

coverage (17× on average) and the mean value of call rate per

sample was 98.68%. Of the samples, 74.4% (93 out of 125)

were fully genotyped for autosomal SNPs and ≥10 BNN^

(which means Bno call^and is considered invalid genotype

which fails allele calling by Torrent variant caller software)

out of 90 SNPs were detected for four samples. The allele

frequency, H

Obs

Exp

,PIC,PE

duo

,PE

trio

, and DP values are

shown in Table 1. PIC ranged from 0.095 to 0.375 where

rs74091 and rs25193 were the least polymorphic SNPs.

Three SNPs, rs1058083, rs10773760, and rs7520386, failed

the Hardy-Weinberg equilibrium test (p<0.05),andallofthe

SNPs passed after Bonferroni correction (p=0.05/90)except

Fig. 4 The rate of allele-specific miscalled reads (ASMRs) and allele

nonspecific miscalled reads (ANMRs) for autosomal SNPs and Y-

SNPs. aThe relationship between miscalled rate and He (or GD). The

miscalled rates were the median of former and later 50% ordered by He

for autosomal SNPs or by GD for Y-SNPs. bComparison of the miscalled

rates between major and minor allele for autosomal SNPs (tested by one-

way ANOVA). cComparison of the miscalled rate between major and

minor allele for Y-SNPs (tested by Mann-Whitney Utest)

Int J Legal Med (2018) 132:997–1006 1001

Tabl e 1 The frequencies and related forensic parameters of 90 autosomal SNPs in the population from Guangdong province, South China (N=125)

Locus Allele Frequency H

Obs

Exp

PIC MP PE

duo

tio

DP HWE

rs1490413 A/G 0.604/0.396 0.488 0.480 0.364 0.387 0.114 0.182 0.613 1.000

rs7520386 A/G 0.648/0.352 0.240 0.458 0.352 0.400 0.104 0.176 0.600 < 10

–3

rs4847034 A/G 0.556/0.444 0.488 0.496 0.372 0.378 0.122 0.186 0.622 1.000

rs560681 A/G 0.632/0.368 0.496 0.467 0.357 0.394 0.108 0.178 0.606 0.566

rs10495407 G/A 0.726/0.274 0.435 0.400 0.319 0.442 0.079 0.159 0.558 0.370

rs891700 G/A 0.512/0.487 0.542 0.502 0.375 0.375 0.125 0.188 0.625 0.465

rs1413212 C/T 0.536/0.464 0.496 0.499 0.374 0.376 0.124 0.187 0.624 1.000

rs876724 C/T 0.516/0.484 0.500 0.502 0.375 0.375 0.125 0.187 0.625 1.000

rs1109037 G/A 0.548/0.452 0.520 0.497 0.373 0.377 0.123 0.186 0.623 0.719

rs993934 G/A 0.532/0.468 0.551 0.500 0.374 0.376 0.124 0.187 0.624 0.334

rs12997453 G/A 0.717/0.283 0.443 0.407 0.323 0.436 0.082 0.162 0.564 0.369

rs907100 C/G 0.537/0.463 0.500 0.499 0.374 0.376 0.124 0.187 0.624 1.000

rs1357617 T/A 0.800/0.200 0.304 0.321 0.269 0.514 0.051 0.134 0.486 0.577

rs4364205 G/T 0.588/0.412 0.536 0.486 0.367 0.383 0.117 0.184 0.617 0.271

rs1872575 A/G 0.614/0.386 0.463 0.476 0.362 0.389 0.112 0.181 0.611 0.849

rs1355366 T/C 0.826/0.174 0.248 0.288 0.246 0.550 0.041 0.123 0.450 0.196

rs6444724 T/C 0.512/0.488 0.464 0.502 0.375 0.375 0.125 0.187 0.625 0.473

rs2046361 A/T 0.504/0.496 0.455 0.502 0.375 0.375 0.125 0.187 0.625 0.345

rs6811238 G/T 0.633/0.367 0.444 0.466 0.357 0.395 0.108 0.178 0.605 0.698

rs1979255 G/C 0.564/0.436 0.504 0.494 0.371 0.379 0.121 0.185 0.621 0.858

rs717302 A/G 0.832/0.168 0.304 0.281 0.240 0.558 0.039 0.120 0.442 0.522

rs159606 G/A 0.700/0.300 0.488 0.422 0.332 0.425 0.088 0.166 0.575 0.089

rs7704770 A/G 0.628/0.372 0.488 0.469 0.358 0.393 0.109 0.179 0.607 0.704

rs251934 A/G 0.940/0.060 0.120 0.113 0.106 0.793 0.006 0.053 0.207 1.000

rs338882 A/G 0.544/0.456 0.464 0.498 0.373 0.377 0.123 0.187 0.623 0.474

rs13218440 G/A 0.592/0.408 0.464 0.485 0.366 0.384 0.117 0.183 0.616 0.713

rs214955 T/C 0.500/0.500 0.552 0.502 0.375 0.375 0.125 0.188 0.625 0.286

rs727811 T/G 0.657/0.343 0.504 0.453 0.349 0.403 0.102 0.175 0.597 0.228

rs6955448 C/T 0.704/0.296 0.448 0.418 0.330 0.427 0.087 0.165 0.573 0.521

rs917118 C/T 0.732/0.268 0.424 0.394 0.315 0.446 0.077 0.158 0.554 0.493

rs321198 C/T 0.584/0.416 0.480 0.488 0.368 0.382 0.118 0.184 0.618 0.855

rs737681 C/T 0.884/0.116 0.168 0.206 0.184 0.653 0.021 0.092 0.347 0.059

rs10092491 C/T 0.656/0.344 0.426 0.453 0.350 0.403 0.102 0.175 0.597 0.550

rs4288409 C/A 0.612/0.388 0.488 0.477 0.362 0.388 0.113 0.181 0.612 0.852

rs2056277 C/T 0.844/0.156 0.262 0.264 0.228 0.578 0.035 0.114 0.422 1.000

rs1015250 G/C 0.556/0.444 0.456 0.496 0.372 0.378 0.122 0.186 0.622 0.467

rs7041158 C/T 0.720/0.280 0.447 0.405 0.322 0.437 0.081 0.161 0.563 0.270

rs1463729 C/T 0.520/0.480 0.496 0.501 0.375 0.375 0.125 0.187 0.625 1.000

rs1360288 C/T 0.616/0.384 0.496 0.475 0.361 0.390 0.112 0.181 0.610 0.705

rs10776839 G/T 0.545/0.455 0.484 0.498 0.373 0.377 0.123 0.186 0.623 0.855

rs826472 C/T 0.833/0.167 0.279 0.279 0.239 0.560 0.039 0.120 0.440 1.000

rs735155 T/C 0.892/0.108 0.216 0.193 0.174 0.670 0.019 0.087 0.330 0.357

rs3780962 G/A 0.564/0.436 0.504 0.494 0.371 0.379 0.121 0.185 0.621 0.856

rs740598 A/G 0.508/0.492 0.500 0.502 0.375 0.375 0.125 0.187 0.625 1.000

rs964681 T/C 0.660/0.340 0.472 0.451 0.348 0.405 0.101 0.174 0.595 0.690

rs1498553 C/T 0.570/0.430 0.533 0.492 0.370 0.380 0.120 0.185 0.620 0.461

rs901398 T/C 0.772/0.228 0.312 0.353 0.290 0.482 0.062 0.145 0.518 0.205

rs10488710 G/C 0.596/0.404 0.504 0.484 0.366 0.385 0.116 0.183 0.615 0.711

rs2076848 A/T 0.680/0.320 0.416 0.437 0.341 0.414 0.095 0.170 0.586 0.680

1002 Int J Legal Med (2018) 132:997–1006

rs7520386, which was one of the SNPs that exhibited a

skewed allelic balance. Linkage disequilibrium test indicated

that all of the SNPs were independent from each other except

rs12997453/rs993934 (p< 0.05/4050) and rs2056277/

rs10092491 (p< 0.05/4050) after Bonferroni correction

(supplementary Table 1). The total discrimination power

(TDP) was 1–4.81 × 10

−34

, and the combined power of ex-

clusion (CPE) was 0.99989 for duo paternity testing and

0.99999992 for trio paternity testing. Little genetic differenti-

ation [22] was detected between southern and northern

Tabl e 1 (continued)

Locus Allele Frequency H

Obs

Exp

PIC MP PE

duo

tio

DP HWE

rs2269355 G/C 0.620/0.380 0.520 0.473 0.360 0.391 0.111 0.180 0.609 0.343

rs2111980 T/C 0.624/0.376 0.464 0.471 0.359 0.392 0.110 0.180 0.608 1.000

rs10773760 A/G 0.604/0.396 0.360 0.480 0.364 0.387 0.114 0.182 0.613 0.008*

rs1335873 A/T 0.624/0.376 0.544 0.471 0.359 0.392 0.110 0.180 0.608 0.089

rs1886510 G/A 0.843/0.157 0.261 0.265 0.229 0.576 0.035 0.115 0.424 1.000

rs1058083 G/A 0.596/0.404 0.392 0.484 0.366 0.385 0.116 0.183 0.615 0.041*

rs354439 A/T 0.569/0.431 0.476 0.493 0.370 0.380 0.120 0.185 0.620 0.718

rs1454361 T/A 0.612/0.388 0.520 0.477 0.362 0.388 0.113 0.181 0.612 0.350

rs722290 G/C 0.517/0.483 0.466 0.502 0.375 0.375 0.125 0.187 0.625 0.464

rs873196 T/C 0.912/0.088 0.128 0.161 0.148 0.718 0.013 0.074 0.282 0.051

rs4530059 G/A 0.764/0.236 0.376 0.362 0.296 0.474 0.065 0.148 0.526 0.804

rs2016276 T/C 0.556/0.444 0.488 0.496 0.372 0.378 0.122 0.186 0.622 1.000

rs1821380 C/G 0.568/0.432 0.464 0.493 0.370 0.380 0.120 0.185 0.620 0.585

rs1528460 T/C 0.575/0.425 0.517 0.491 0.369 0.381 0.119 0.185 0.619 0.579

rs729172 G/T 0.848/0.152 0.294 0.264 0.229 0.577 0.033 0.112 0.423 0.460

rs2342747 G/A 0.644/0.356 0.440 0.460 0.353 0.398 0.105 0.177 0.602 0.696

rs430046 C/T 0.656/0.344 0.464 0.453 0.349 0.403 0.102 0.175 0.597 0.843

rs1382387 A/C 0.684/0.316 0.392 0.434 0.339 0.416 0.093 0.169 0.584 0.304

rs9905977 G/A 0.624/0.376 0.432 0.471 0.359 0.392 0.110 0.180 0.608 0.444

rs740910 A/G 0.947/0.053 0.105 0.100 0.095 0.815 0.005 0.048 0.185 1.000

rs938283 T/C 0.876/0.124 0.232 0.218 0.194 0.636 0.024 0.097 0.364 0.690

rs2292972 T/C 0.644/0.356 0.536 0.460 0.353 0.398 0.105 0.177 0.602 0.079

rs1493232 C/A 0.624/0.376 0.544 0.471 0.359 0.392 0.110 0.180 0.608 0.088

rs9951171 G/A 0.524/0.476 0.536 0.501 0.374 0.376 0.124 0.187 0.624 0.476

rs1736442 C/T 0.621/0.379 0.452 0.473 0.360 0.391 0.111 0.180 0.609 0.703

rs1024116 C/T 0.915/0.085 0.171 0.157 0.144 0.724 0.012 0.072 0.276 0.599

rs719366 A/G 0.792/0.208 0.320 0.331 0.275 0.504 0.054 0.138 0.496 0.786

rs576261 A/C 0.592/0.408 0.528 0.485 0.366 0.384 0.117 0.183 0.616 0.355

rs1031825 A/C 0.504/0.496 0.480 0.502 0.375 0.375 0.125 0.187 0.625 0.721

rs445251 C/G 0.656/0.344 0.492 0.453 0.350 0.403 0.102 0.175 0.597 0.422

rs1005533 G/A 0.660/0.340 0.424 0.451 0.348 0.405 0.101 0.174 0.595 0.552

rs1523537 T/C 0.585/0.415 0.427 0.488 0.368 0.382 0.118 0.184 0.618 0.194

rs722098 G/A 0.532/0.468 0.520 0.500 0.374 0.376 0.124 0.187 0.624 0.721

rs2830795 G/A 0.548/0.452 0.440 0.497 0.373 0.377 0.123 0.186 0.623 0.208

rs2831700 G/A 0.572/0.428 0.456 0.492 0.370 0.380 0.120 0.185 0.620 0.463

rs914165 G/A 0.684/0.316 0.424 0.434 0.339 0.416 0.093 0.169 0.584 0.837

rs221956 C/T 0.596/0.404 0.456 0.484 0.366 0.385 0.116 0.183 0.615 0.580

rs733164 G/A 0.876/0.124 0.200 0.218 0.194 0.636 0.024 0.097 0.364 0.399

rs987640 T/A 0.552/0.448 0.496 0.497 0.372 0.378 0.122 0.186 0.622 1.000

rs2040411 G/A 0.712/0.288 0.368 0.412 0.326 0.432 0.084 0.163 0.568 0.277

rs1028528 A/G 0.612/0.388 0.424 0.477 0.362 0.388 0.113 0.181 0.612 0.260

*p<0.05,**p< 0.00056 (0.05/90)

Int J Legal Med (2018) 132:997–1006 1003

Chinese populations (Fst < 0.05 for all of the SNPs) and the

most dramatic differences in allele frequency variation were

detected with Africans and Europeans. Details were presented

in supplementary Tables 2and 3.

Comparison of haplogrouping based on Y-STR

and Y-SNP haplotypes

A no call rate of 2.3% (79 out of 3434), higher than that of

autosomal SNPs, was observed in male samples mostly due to

low coverage as well. Out of 34 Y-SNPs, only 15 SNPs were

observed possessing two kinds of alleles in the population.

Totally, seven haplotypes were detected (Fig. 5)andtheHD

was 0.644. Based on Y-SNP haplotypes, seven haplogroups

were assigned. Haplogroup O2 accounted for more than a half

(54.5%) and O1a, O1b, C, N, D, and Q accounted for 18.8%,

9.9%, 7.9%, 5.9%, 2.0% and 1.0%, respectively. For the con-

cordance study, 67 out of the 101 samples were assigned the

haplogroups automatically based on Y-STR haplotypes with

the parameter settings as mentioned before, of which 97.01%

(65/67) of them were assigned consistent haplogoups

(Table 2).

Discussion

In this study, we investigate the performance and polymor-

phisms of the HID-Ion AmpliSeq™Identity Panel used in a

population ofsouthern China. When coverage was low, allelic

imbalance might occur more frequently, even resulting in no

calls. High coverage and relatively low allelic balance were

also observed for several SNPs, of which rs214955, rs430046,

and rs7520386 were also shown to exhibit allelic imbalance in

other studies [8,9,11,15]. These three loci seemed performed

not as well as other SNPs, of which rs7520386 performed the

worst. Additionally, rs7520386 also showed the highest

miscalled rate with a value of 3.23% (supplementary

Table 4).Therefore, it was necessary to modify the primers

of these problematic SNPs to improve the performance of

the panel.

Upon analysis of miscalled reads, we observed that the rate

of ASMRs was higher than that of ANMRs and increased

linearly with coverages, both of which were also observed

and regarded as background signals in the Guo study [8].

Additionally, differential rates were observed among autoso-

mal SNPs, highly polymorphic and poorly polymorphic Y-

SNPs, which implied that genetic diversity might also be re-

lated. As Fig. 4a showed, the rate of ASMRs was highly

associated with genetic diversity (k= 0. 209) while the slope

for ANMRs was quite flat (k= 0.037), which indicated that He

or GD contributed little to the variation of ANMRs. To ex-

plore whether allelic frequency was related as well, miscalled

rates of major and minor allele were compared. In spite of a

nonsignificant pvalue (p=0.178)forY-SNPs,theASMRsof

major allele was lower than that of minor allele for both auto-

somal SNPs and Y-SNPs while there was no significant dif-

ference for ANMRs. Therefore, it seemed these two kinds of

miscalled reads resulted from different ways. It is worth men-

tioning that these miscalled reads (ASMRs especially) or

background noises, similar to stutter for STR, have a critical

influence on mixture analysis in forensic practice. On the oth-

er hand, how these background noises were produced was still

Tabl e 2 Haplogrouping based on Y-STR haplotypes and Y-SNP haplotypes (n=67)

Y-STR Y-SNP Total

O2 O1a O1b C D N R2

O2 32 32

O1 19 8 1 28

C2 4 4

D11

N11

R2 1 1

Total 32 19 9 4 1 2 0 67

Numbers colored gray represent consistent results

Fig. 5 Haplotypes of 101 male

samples based on 34 Y-SNPs.

Loci colored gray represent

highly polymorphic loci

1004 Int J Legal Med (2018) 132:997–1006

not very clear. Artifact produced in the PCR procedure and

sequencing error might be part of the reasons. Furthermore,

since barcodes were utilized and samples were sequenced si-

multaneously, barcode contamination mightalso be one of the

reasons, possibly as a result of incorrect ligation of carry-over

barcodes after pooling together. It should be studied further in

the future.

In this study, rs12997453/rs993934 and rs2056277/

rs10092491 failed the linkage disequilibrium test even after

Bonferroni correction, which showed different results of other

studies [5,8,15]. Since the physic distances were > 109 Mb

for rs2056277/rs10092491 and > 58 Mb for rs12997453/

rs993934, it was generally thought independent for these

two pairs of makers. Considering a relatively small sample

size of our study, failing of linkage disequilibrium test might

result from random effect. These polymorphic SNPs are pow-

erful tools for individual identification and trio paternity test-

ing, which perform comparable to that of 22 STRs [8]. But it

may not enough for duo paternity testing, not mention for

other relative testing. In the current research or applications

of paternity and kinship testing, SNPs are more frequently

regarded as complements to STR typing [3,23]. As estimated

by Mo et al. [3], 85, 127, 491, and 1858 putative SNP loci are

required to investigate parent-child, full-sibling, half-sibling/

uncle-nephew, and first-cousin relationships with a false test-

ing level of 0.1%. However, when a great number of SNPs are

utilized, it is more likely that linkages will be emerged, which

should be noticed.

Moderate genetic difference was observed for some SNPs

between Chinese population and Japanese, Africans,

Americans, or Europeans, but little frequency variation was

detected between southern and northern Chinese, indicating

that this panel could be widely applicable across Chinese pop-

ulation. A tendency could be observed that allele frequency

variation was in accordance with the differences in geographic

locations and similar pattern was also detected in previous

study based on Y-STR [24].

Haplogroup determination is of great interest in the study

of human population genetics as well as forensic genetics, as it

reveals the phylogenetic relationships by descent [25].

Haplogroup can be inferred by either Y-SNP or Y-STR typing.

A study has shown that there is a high degree of concordance

between these two methods. Muzzio et al. demonstrated that

Y-STR-based haplogrouping software systems offered rela-

tively low accuracy [25]. However, in this study, 97.01% of

samples were assigned consistent haplogroups with Y-STR

and Y-SNP haplotypes, which was similar to that of others’

results [26–28]. Not enough markers (only seven STRs were

typed) may be the reason that leads to low accuracy in

Muzzio’sstudy[29]. It should be noticed that either because

limited markers are included in this panel or no adequate da-

tabase are available to calibrate the program, some samples

may be defined as same haplogroup but different sub-

haplogroup between two methods. For example, haplogroup

C was defined based on Y-SNPs while it was predicted as C2

based on Y-STRs. Similarly, haplogroup O1a and O1b were

determined based on Y-SNPs while O1 was obtained using

Whit Athey’s method. Still, we considered they were consis-

tent. Additionally, the high concordance was based on strict

parameter settings in Whit Athey’s method (with minimum

fitness score = 20, minimum probability = 85%, and area

priors = Bequal priors^) and 34 out of the 101 samples could

not be assigned the haplogroups automatically. For these 34

samples, if wechose the haplogroupwith the maximum prob-

ability as their haplogroup, all the samples could obtain a

haplogroup assignment but the concordance would decrease

to 86.14%. Given the high mutation rates of the Y-STRs, it is

possible to find the same Y-STR haplotype in samples from

different haplogroups [29], which might explain why the con-

cordance cannot reach 100%. On the other hand, though Y-

SNP analysis appears to represent a more optimal approach

for haplogroup determination, considering the widespread

popularity of Y-STR typing in forensic DNA labs, it is still

practical to determine haplogroup with Y-STR haplotypes

preliminarily.

Conclusion

The Ion Torrent PGM™is a promising platform for forensic

genetics research and applications. The HID-Ion AmpliSeq™

Identity Panel proved to be a powerful tool for individual

identification and trio paternity testing in Chinesepopulations.

However, additional SNPs are required to facilitate both duo

paternity testing and relative testing using this panel. The

miscalled rates were 0.26 ± 0.16% for ASMRs and

0.06 ± 0.05% for ANMRs. ASMRs were associated with ge-

netic diversity and allele frequency while neither was related

for ANMRs, which indicated that they might result from dif-

ferent ways. Additionally, high concordance of haplogrouping

assignment can be obtained with Y-STR and Y-SNP

haplotypes.

Funding This study was funded by the National Natural

Science Foundation of China (81671873, 81273347),

Fundamental Research Funds for the Central Universities

(16ykzd08).

Compliance with ethical standards

Conflict of interest The authors declare that they have no conflict of

interest.

Ethical approval All procedures performed in studies involving hu-

man participants were in accordance with the ethical standards of the

institutional and/or national research committee and with the 1964

Int J Legal Med (2018) 132:997–1006 1005

Helsinki declaration and its later amendments or comparable ethical

standards.

Informed consent Informed consent was obtained from all individual

participants included in the study.

References

1. Kidd KK, Pakstis AJ, Speed WC, Grigorenko EL, Kajuna SLB,

Karoma NJ, Kungulilo S, Kim J, Lu R, Odunsi A, Okonofua F,

Parnas J, Schulz LO, Zhukova OV, Kidd JR (2006) Developing a

SNP panel for forensic identification of individuals. Forensic Sci Int

164:20–32

2. Amorim A, Pereira L (2005) Pros and cons in the use of SNPs in

forensic kinship investigation: a comparative analysis with STRs.

Forensic Sci Int 150:17–21

3. Mo S, Liu Y, Wang S, Bo X, Li Z, Chen Y, Ni M (2016) Exploring

the efficacy of paternity and kinship testing based on single nucle-

otide polymorphisms. Forensic Sci Int Genet 22:161–168

4. Sanchez JJ, Phillips C, Borsting C, Balogh K, Bogus M, Fondevila

M, Harrison CD, Musgrave-Brown E, Salas A, Syndercombe-

Court D, Schneider PM, Carracedo A, Morling N (2006) A multi-

plex assay with 52 single nucleotide polymorphisms for human

identification. Electrophoresis 27:1713–1724

5. Pakstis AJ, Speed WC, Fang R, Hyland FCL, Furtado MR, Kidd

JR, Kidd KK (2010) SNPs for a universal individual identification

panel. Hum Genet 127:315–324

6. Sobrino B, Brión M, Carracedo A (2005) SNPs in forensic genetics:

a review on SNP typing methodologies. Forensic Sci Int 154:181–

194

7. Seo SB, King JL, Warshauer DH, Davis CP, Ge J, Budowle B

(2013) Single nucleotide polymorphism typing with massively par-

allel sequencing for human identification. Int J Legal Med 127:

1079–1086

8. GuoF,ZhouY,SongH,ZhaoJ,ShenH,ZhaoB,LiuF,JiangX

(2016) Next generation sequencing of SNPs using the HID-Ion

AmpliSeq™Identity Panel on the Ion Torrent PGM™platform.

Forensic Sci Int Genet 25:73–84

9. Børsting C, Fordyce SL, Olofsson J, Mogensen HS, Morling N

(2014) Evaluation of the Ion Torrent™HID SNP 169-plex: a

SNP typing assay developed for human identification by second

generation sequencing. Forensic Sci Int Genet 12:144–154

10. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE,

Wain J, Pallen MJ (2012) Performance comparison of benchtop

high-throughput sequencingplatforms.Nat Biotechnol 30:434–439

11. Eduardoff M, Santos C, de la Puente M, Gross TE, Fondevila M,

Strobl C, Sobrino B, Ballard D, Schneider PM, Carracedo Á, Lareu

MV, Parson W, Phillips C (2015) Inter-laboratory evaluation of

SNP-basedforensic identification by massively parallel sequencing

using the Ion PGM™. Forensic Sci Int Genet 17:110–121

12. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey

M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J,

SimonsJF,MarranD,MyersJW,DavidsonJF,BrantingA,

Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT,

Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao

X, Reed B, Sabina J, Feierstein E, Schorn M, Alanjary M,

Dimalanta E, Dressman D, Kasinskas R, Sokolsky T, Fidanza JA,

Namsaraev E, McKernan KJ, Williams A, Roth GT, Bustillo J

(2011) An integrated semiconductor device enabling non-optical

genome sequencing. Nature 475:348–352

13. Elena S, Alessandro A, Ignazio C, Sharon W, Luigi R, Andrea B

(2016) Revealing the challenges of low template DNA analysis

with the prototype Ion AmpliSeq™Identity panel v2.3 on the

PGM™Sequencer. Forensic Sci Int Genet 22:25–36

14. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura

SL, Hammer MF (2008) New binary polymorphisms reshape and

increase resolution of the human Y chromosomal haplogroup tree.

Genome Res 18:830–838

15. Zhang S, Bian Y, Zhang Z, Zheng H, Wang Z, Zha L, Cai J, Gao Y,

Ji C, Hou Y, Li C (2015) Parallel analysis of 124 universal SNPs for

human identification by targeted semiconductor sequencing. Sci

Rep-UK 5:18683

16. Athey TW (2006) Haplogroup prediction from Y-STR values using

a Bayesian-allele-frequency approach. J Genet Geneal 2:34–39

17. Athey TW (2005) Haplogroup prediction from Y-STR values using

an allele-frequency approach. J Genet Geneal 1:1–7

18. Kalinowski ST, Taper ML, Marshall TC (2007) Revising how the

computer program CERVUS accommodates genotyping error in-

creases success in paternity assignment. Mol Ecol 16:1099–1106

19. ExcoffierL, Lischer HE (2010) Arlequin suite ver 3.5: a new series

of programs to perform population genetics analyses under Linux

and Windows. Mol Ecol Resour 10:564–567

20. Amigo J, Salas A, Phillips C, Carracedo A (2008) SPSmart:

adapting population based SNP genotype databases for fast and

comprehensive web access. BMC Bioinformatics 9:428

21. International Society of Genetic Genealogy (ISOGG): Y-DNA

Haplogroup Tree 2017, Version: 12.128. In, 2017. Available at:

http://www.isogg.org/tree/

22. Wright S (1978) Evolution and the genetics of populations. Vol. 4.

Variability within and among natural populations. University of

Chicago Press, Chicago

23. Phillips C, García-Magariños M, Salas A, Carracedo Á, Lareu MV

(2012) SNPs as supplements in simple kinship analysis or as core

markers in distant pairwise relationship tests: when do SNPs add

value or replace well-established and powerful STR tests? Transfus

Med Hemoth 39:202–210

24. Wang Y, Liu C, Zhang CC, Li R, Li Y, XL O, Sun HY (2015)

Analysis of 17 Y-STR loci haplotype and Y-chromosome

haplogroup distribution in five Chinese ethnic groups.

Electrophoresis 36:2546–2552

25. Muzzio M, Ramallo V, Motti JMB, Santos MR, López Camelo JS,

Bailliet G (2011) Software for Y-haplogroup predictions: a word of

caution. Int J Legal Med 125:143–147

26. Petrejcikova E, Carnogurska J, Hronska D, Bernasovska J,

Boronova I, Gabrikova D, Bozikova A, Macekova S (2014) Y-

SNP analysis versus Y-haplogroup predictor in the Slovak popula-

tion. Anthropol Anz 71:275–285

27. Dogan S, Babic N, Gurkan C, Goksu A, Marjanovic D, Hadziavdic

V (2016) Y-chromosomal haplogroup distribution in the Tuzla

Canton of Bosnia and Herzegovina: a concordance study using four

different in silico assignment algorithms based on Y-STR data.

Homo 67:471–483

28. Nunez C, Geppert M, Baeta M, Roewer L, Martinez-Jarreta B

(2012) Y chromosome haplogroup diversity in a Mestizo popula-

tion of Nicaragua. Forensic Sci Int Genet 6:e192–e195

29. Athey W (2011) Comments on the article, BSoftware for Y

haplogroup predictions, a word of caution^. Int J Legal Med

125(901–903):905–906

1006 Int J Legal Med (2018) 132:997–1006

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from International Journal of Legal Medicine

This content is subject to copyright. Terms and conditions apply.

Comparison of three massively parallel sequencing platforms for single nucleotide polymorphism (SNP) genotyping in forensic genetics

Article

Full-text available

Jun 2023
INT J LEGAL MED

Three MPS platforms are being used in forensic genetic analysis, i.e., MiSeq FGx, Ion S5 XL, and MGISEQ-2000. However, few studies compared their performance. In this study, we sequenced 83 common SNPs of 71 samples using the ForenSeq™ DNA Signature Prep Kit on MiSeq FGx, the Precision ID Identity Panel on Ion S5 XL, and the MGIEasy Signature Identification Library Prep Kit on MGISEQ-2000 and then the performance was compared. Results showed that the MiSeq FGx had the highest sequence quality but the lowest sequencing depth and allele balance. Discordant genotypes were observed at six SNPs, which may be caused by variants at primer binding regions, indel errors, or misalignments. Besides, two kinds of background noises, allele-specific miscalled reads (ASMR) and allele-nonspecific miscalled reads (ANMR), were characterized. MGISEQ-2000 showed the highest level of ASMR while Ion S5 XL had the highest level of ANMR. Site- and genotype-dependent miscalled patterns were observed at several SNPs on Ion S5 XL and MGISEQ-2000, but few on MiSeq FGx. In conclusion, the three MPS platforms perform differently with respect to sequencing quality, sequencing depth, allele balance, concordance, and background noise. These findings may be useful for data comparison, mixture deconvolution, and heteroplasmy analysis in forensic genetics.

Investigation of SNPs in the Precision ID Identity Panel using next-generation sequencing in a Myanmar population

Preprint

Full-text available

Mar 2022

Background—Single nucleotide polymorphisms (SNPs) have become popular in forensic genetics as an alternative to short tandem repeats (STRs) due to low mutation rates and small amplicon sizes. The Precision ID Identity Panel (Thermo Fisher Scientific), consisting of 90 autosomal SNPs and 34 Y-SNPs, was introduced for human identification by next-generation sequencing (NGS), enabling many studies on the global population; however, few reports are available on the Southeast Asian population. Methods and Results—A total of 96 unrelated male samples from Myanmar (Yangon) were analyzed with the Precision ID Identity Panel on a MiSeq (Illumina) using an in-house TruSeq compatible universal adapter. The sequencing performance was evaluated by locus balance and heterozygote balance, and the results were comparable to those of the Ion Torrent platform. For 90 autosomal SNPs, minor allele frequencies ranged between 0.068 and 0.500, and combined match probability (6.994×10⁻³⁴) was lower than that of 22 PowerPlex Fusion autosomal STRs (3.130×10⁻²⁶). Moreover, we identified 51 cryptic variations around the target SNPs using a custom variant caller, Visual SNP. For 34 Y-SNPs, 14 Y-haplogroups were observed—mostly O2 and O1b groups. Interpopulation analysis revealed that the Myanmar population is genetically closer to the East and Southeast Asian populations than the South Asian population. Conclusions—We demonstrate that the Precision ID Identity Panel can be successfully analyzed on a MiSeq using a custom data analysis pipeline and provide high discrimination power for human identification in the Myanmar population, while extending the accessibility of NGS analysis for SNPs in forensics.

Development and validation of YARN: A novel SE-400 MPS kit for East Asian paternal lineage analysis

Article

Mar 2024

Tools and techniques of using NGS platforms in forensic population genetic studies

Chapter

Jan 2024

Forensic genetic analysis of single-nucleotide polymorphisms and microhaplotypes in Koreans through next-generation sequencing using precision ID identity panel

Article

Jul 2023
GENES GENOM

Background: Forensic DNA analysis has seen remarkable advancements with the advent of Next Generation Sequencing (NGS). In particular, NGS analysis of single nucleotide polymorphisms (SNPs) offers significant advantages in the analysis of challenging samples compared to conventional STR analysis. Objective: This study aimed to investigate the SNPs of the Precision ID Identity Panel, a commercially available NGS panel for personal identification, by generating genetic profiles of 298 Koreans and comparing them with other global populations. Methods: A total of 124 SNPs, including 90 autosomal and 34 Y-SNPs, were analyzed using the Precision ID Identity Panel, and forensic parameters, microhaplotypes, and population differences were investigated. Results: The NGS data were successfully obtained from 298 Koreans. The analysis of forensic parameters exhibited a low combined match probability of 1.532 × 10- 34, which is comparable to that obtained from commonly used STR analysis. Additionally, the microhaplotype analysis revealed that the use of 16 microhaplotypes provided higher discriminatory power compared to single target SNPs. Furthermore, the adoption of microhaplotype data resulted in an increase of over 20% in expected heterozygosity at five loci. Inter-population analysis showed a close genetic relationship between Koreans and individuals from China and Myanmar in East and Southeast Asia, which are geographically adjacent to Korea. Conclusions: The results of this study show that the Precision ID Identity panel can be a useful alternative where traditional STR typing is not feasible. Also, the data from our study will be useful as a reference for Koreans in forensic investigations and the prosecution of criminal justice.

SNP analysis of challenging bone DNA samples using the HID-Ion AmpliSeq™ Identity Panel: facts and artefacts

Article

Full-text available

May 2023
INT J LEGAL MED

PCR-MPS is an emerging tool for the analysis of low-quality DNA samples. In this study, we used PCR-MPS to analyse 32 challenging bone DNA samples from three Second World War victims, which previously yielded no results in conventional STR PCR-CE typing. The Identity Panel was used with 27 cycles of PCR. Despite that we only had an average of 6.8 pg of degraded DNA as template, 30 out of 32 libraries (93.8%) produced sequencing data for about 63/90 autosomal markers per sample. Out of the 30 libraries, 14 (46.7%) yielded single source genetic profiles in agreement with the biological identity of the donor, whereas 12 cases (40.0%) resulted in SNP profiles that did not match or were mixed. The misleading outcomes for those 12 cases were likely due to hidden exogenous human contamination, as shown by the higher frequencies of allelic imbalance, unusual high frequencies of allelic drop-ins, high heterozygosity levels in the consensus profiles generated from challenging samples, and traces of amplified molecular products in four out of eight extraction negative controls. Even if the source and the time of the contamination were not identified, it is likely that it occurred along the multi-step bone processing workflow. Our results suggest that only positive identification by statistical tools (e.g. likelihood ratio) should be accepted as reliable; oppositely, the results leading to exclusion should be treated as inconclusive because of potential contamination issues. Finally, strategies are discussed for monitoring the workflow of extremely challenging bone samples in PCR-MPS experiments with an increased number of PCR cycles.

Genetic investigation of 124 SNPs in a Myanmar population using the Precision ID Identity Panel and the Illumina MiSeq

Article

Apr 2023

Single nucleotide polymorphisms (SNPs) have become popular in forensic genetics as an alternative to short tandem repeats (STRs). The Precision ID Identity Panel (Thermo Fisher Scientific), consisting of 90 autosomal SNPs and 34 Y-chromosomal SNPs, enabled human identification studies on global populations through next-generation sequencing (NGS). However, most previous studies on the panel have used the Ion Torrent platform, and there are few reports on the Southeast Asian population. Here, a total of 96 unrelated males from Myanmar (Yangon) were analyzed with the Precision ID Identity Panel on a MiSeq (Illumina) using an in-house TruSeq compatible universal adapter and a custom variant caller, Visual SNP. The sequencing performance evaluated by locus balance and heterozygote balance was comparable to that of the Ion Torrent platform. For 90 autosomal SNPs, the combined match probability (CMP) was 6.994 × 10-34, lower than that of 22 PowerPlex Fusion autosomal STRs (3.130 × 10-26). For 34 Y-SNPs, 14 Y-haplogroups (mostly O2 and O1b) were observed. We found 51 cryptic variations (42 haplotypes) around target SNPs, of which haplotypes corresponding to 33 autosomal SNPs decreased CMP. Interpopulation analysis revealed that the Myanmar population is genetically closer to the East and Southeast Asian populations. In conclusion, the Precision ID Identity Panel can be successfully analyzed on the Illumina MiSeq and provides high discrimination power for human identification in the Myanmar population. This study broadened the accessibility of the NGS-based SNP panel by expanding the available NGS platforms and adopting a robust NGS data analysis tool.

Development and evaluation of a novel panel containing 188 microhaplotypes for 2nd-degree kinship testing in the Hebei Han population

Article

Mar 2023
FORENSIC SCI INT-GEN

Distant kinship identification is one of the critical problems in forensic genetics. As a new type of genetic marker defined and discussed in the last decade, the microhaplotype (MH) has drawn much attention in such identification owing to its specific advantages to traditional short tandem repeat (STR) or single nucleotide polymorphism (SNP) markers. In this study, MH markers were screened step by step from the 1000 Genomes Project database, and a novel multiplex panel containing 188 MHs (in which 181 are reported the first time, while 1 was reported in a previous study and the other 6 have partial overlaps with known markers) was constructed for application in 2nd- and 3rd-degree kinship identification. Along with the construction, a novel MH nomenclature was proposed, in which the SNP position information they contained was taken into account to eliminate the possibility that the same locus was named differently interlaboratory. After a series of evaluations, the panel was shown to have good sequencing accuracy, high sensitivity, species specificity, and resistance to anti-PCR inhibitors or degradation. Population data of the 188 MHs were calculated based on the genetic information of 221 unrelated Hebei Han individuals, and the effective number of alleles (Ae) ranged from 2.0925 to 8.2634 (with an average of 2.9267). For the whole system, the cumulative matching probability (CMP), the cumulative power of exclusion in paternity testing of duos (CPEduo) and that of trios (CPEtrio) reached 2.8422 × 10-137, 1-1.3109 × 10-21, and 1-2.8975 × 10-39, respectively, indicating that this panel was satisfactory for individual identification and paternity testing. Then, the efficiency of the 188 MHs in 2nd- and 3rd-degree kinship testing was studied based on 30 extended families consisting of 179 2nd-degree and 121 3rd-degree relatives, as well as simulations of 0.5 million pairs of those two kinships. The results showed that clear opinions would be given in 83.36% of 2nd-degree identifications with a false rate less than 10-5, when the confirming and excluding thresholds of cumulative likelihood ratio (CLR) were set as 104 and 10-4, respectively. This panel is still not sufficient to solve the problem of 3rd-degree kinship identification alone, and approximately 300 or 870 MH loci would be needed in 2nd- or 3rd-degree kinship identification, respectively, to achieve a system efficiency not less than 0.99 with such a threshold set; such necessary numbers would be used only as a reference in further research.

Additional predictions for forensic DNA phenotyping of externally visible characteristics using the ForenSeq and Imagen kits

Article

Feb 2023
J FORENSIC SCI

Multiplex DNA typing methods using massively parallel sequencing can be used to predict externally visible characteristics (EVCs) in forensic DNA phenotyping through the analysis of single-nucleotide polymorphisms. The focus of EVC determination has focused on hair color, eye color, and skin tone as well as visible biogeographical ancestry features. In this study, we researched off-label applications beyond what is currently marketed by the manufacturer of the Verogen ForenSeq kit primer set B and Imagen primer set E SNP loci. We investigated additional EVC predictions by examining published genome wide sequencing studies and reported allele-specific gene expression and predictive values. We have identified 15 SNPs included in the ForenSeq kit panel and Imagen kits that have additional EVC prediction capabilities beyond what is published in the Verogen manuals. The additional EVCs that can be predicted include hair graying, ephelides hyperpigmented spots, dermatoheliosis, facial pigmented spots, standing height, pattern balding, helix-rolling ear morphology, hair shape, hair thickness, facial morphology, eyebrow thickness, sarcoidosis, obesity, vitiligo, and tanning propensity. The loci can be used to augment and refine phenotype predictions with software such as MetaHuman for missing persons, cold case, and historic case investigations.

Systematic selections and forensic application evaluations of 111 individual identification SNPs in the Chinese Inner Mongolia Manchu group

Article

Full-text available

Sep 2022

Single nucleotide polymorphism (SNP) possesses a promising application in forensic individual identification due to its wide distribution in the human genome and the ability to carry out the genotyping of degraded biological samples by designing short amplicons. Some commonly used individual identification SNPs are less polymorphic in East Asian populations. In order to improve the individual identification efficiencies in East Asian populations, SNP genetic markers with relatively higher polymorphisms were selected from the 1,000 Genome Project phase III database in East Asian populations. A total of 111 individual identification SNPs (II-SNPs) with the observed heterozygosity values greater than 0.4 were screened in East Asian populations, and then, the forensic efficiencies of these selected SNPs were also evaluated in Chinese Inner Mongolia Manchu group. The observed heterozygosity and power of discrimination values at 111 II-SNPs in the Inner Mongolia Manchu group ranged from 0.4011 to 0.7005, and 0.5620 to 0.8025, respectively, and the average value of polymorphism information content was greater than 0.3978. The cumulative match probability and combined probability of exclusion values at II-SNPs were 7.447E⁻⁵¹ and 1-4.17E⁻¹² in the Inner Mongolia Manchu group, respectively. The accumulative efficiency results indicated that the set of II-SNPs could be used as a potential tool for forensic individual identification and parentage testing in the Manchu group. The sequencing depths ranged from 781× to 12374×. And the mean allele count ratio and noise level were 0.8672 and 0.0041, respectively. The sequencing results indicated that the SNP genetic marker detection based on the massively parallel sequencing technology for SNP genetic markers had high sequencing performance and could meet the sequencing requirements of II-SNPs in the studied group.

Parallel Analysis of 124 Universal SNPs for Human Identification by Targeted Semiconductor Sequencing

Article

Full-text available

Dec 2015

SNPs, abundant in human genome with lower mutation rate, are attractive to genetic application like forensic, anthropological and evolutionary studies. Universal SNPs showing little allelic frequency variation among populations while remaining highly informative for human identification were obtained from previous studies. However, genotyping tools target only dozens of markers simultaneously, limiting their applications. Here, 124 SNPs were simultaneous tested using Ampliseq technology with Ion Torrent PGM platform. Concordance study was performed with 2 reference samples of 9947A and 9948 between NGS and Sanger sequencing. Full concordance were obtained except genotype of rs576261 with 9947A. Parameter of FMAR (%) was introduced for NGS data analysis for the first time, evaluating allelic performance, sensitivity testing and mixture testing. FMAR values for accurate heterozygotes should be range from 50% to 60%, for homozygotes or Y-SNP should be above 90%. SNPs of rs7520386, rs4530059, rs214955, rs1523537, rs2342747, rs576261 and rs12997453 were recognized as poorly performing loci, either with allelic imbalance or with lower coverage. Sensitivity testing demonstrated that with DNA range from 10 ng-0.5 ng, all correct genotypes were obtained. For mixture testing, a clear linear correlation (R2 = 0.9429) between the excepted FMAR and observed FMAR values of mixtures was observed.

Y-chromosomal haplogroup distribution in the Tuzla Canton of Bosnia and Herzegovina: A concordance study using four different in silico assignment algorithms based on Y-STR data

Article

Dec 2016
HOMO

Y-chromosomal haplogroups are sets of ancestrally related paternal lineages, traditionally assigned by the use of Y-chromosomal single nucleotide polymorphism (Y-SNP) markers. An increasingly popular and a less labor-intensive alternative approach has been Y-chromosomal haplogroup assignment based on already available Y-STR data using a variety of different algorithms. In the present study, such in silico haplogroup assignments were made based on 23-loci Y-STR data for 100 unrelated male individuals from the Tuzla Canton, Bosnia and Herzegovina (B&H) using the following four different algorithms: Whit Athey's Haplogroup Predictor, Jim Cullen's World Haplogroup & Haplogroup-I Subclade Predictor, Vadim Urasin's YPredictor and the NevGen Y-DNA Haplogroup Predictor. Prior in-house assessment of these four different algorithms using a previously published dataset (n = 132) from B&H with both Y-STR (12-loci) and Y-SNP data suggested haplogroup misassignment rates between 0.76% and 3.02%. Subsequent analyses with the Tuzla Canton population sample revealed only a few differences in the individual haplogroup assignments when using different algorithms. Nevertheless, the resultant Y-chromosomal haplogroup distribution by each method was very similar, where the most prevalent haplogroups observed were I, R and E with their sublineages I2a, R1a and E1b1b, respectively, which is also in accordance with the previously published Y-SNP data for the B&H population. In conclusion, results presented herein not only constitute a concordance study on the four most popular haplogroup assignment algorithms, but they also give a deeper insight into the inter-population differentiation in B&H on the basis of Y haplogroups for the first time.

Haplogroup prediction from Y-STR values using a Bayesian-allele-frequency approach

Article

Jan 2006

W.T. Athey

Next generation sequencing of SNPs using the HID-Ion AmpliSeq™ Identity Panel on the Ion Torrent PGM™ platform:

Article

Jul 2016

The HID-Ion AmpliSeq™ Identity Panel (the HID Identity Panel) is designed to detect 124-plex single nucleotide polymorphisms (SNPs) with next generation sequencing (NGS) technology on the Ion Torrent PGM™ platform, including 90 individual identification SNPs (IISNPs) on autosomal chromosomes and 34 lineage informative SNP (LISNPs) on Y chromosome. In this study, we evaluated performance for the HID Identity Panel to provide a reference for NGS-SNP application, focusing on locus strand balance, locus coverage balance, heterozygote balance, and background signals. Besides, several experiments were carried out to find out improvements and limitations of this panel, including studies of species specificity, repeatability and concordance, sensitivity, mixtures, case-type samples and degraded samples, population genetics and pedigrees following the Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines. In addition, Southern and Northern Chinese Han were investigated to assess applicability of this panel. Results showed this panel led to cross-reactivity with primates to some extent but rarely with non-primate animals. Repeatable and concordant genotypes could be obtained in triplicate with one exception at rs7520386. Full profiles could be obtained from 100 pg input DNA, but the optimal input DNA would be 1 ng–200 pg with 21 initial PCR cycles. A sample with ≥20% minor contributors could be considered as a mixture by the number of homozygotes, and full profiles belonging to minor contributors could be detected between 9:1 and 1:9 mixtures with known reference profiles. Also, this assay could be used for case-type samples and degraded samples. For autosomal SNPs (A-SNPs), FST across all 90 loci was not significantly different between Southern and Northern Chinese Han or between male and female samples. All A-SNP loci were independent in Chinese Han population. Except for 18 loci with He <0.4, most of the A-SNPs in the HID Identity Panel presented high polymorphism. Forensic parameters were calculated >99.999% for combined discrimination power (CDP), 0.999999724 for combined power of exclusion (CPE), 1.390 × 1011 for combined likelihood ratio (CLR) of trios, and 2.361 × 106 for CLR of motherless duos. For Y-SNPs, a total of 8 haplotypes were observed with the value of 0.684 for haplotype diversity. As a whole, the HID Identity Panel is a well-performed, robust, reliable and high informative NGS-SNP assay and it can fully meet requirements for individual identification and paternity testing in forensic science.

Exploring the efficacy of paternity and kinship testing based on single nucleotide polymorphisms

Article

May 2016

Short tandem repeats (STRs) are conventional genetic markers typically used for paternity and kinship testing. As supplementary markers of STRs, single nucleotide polymorphisms (SNPs) have less discrimination power but broader applicability to degraded samples. The rapid improvement of next-generation sequencing (NGS) and multiplex amplification technologies also make it possible now to simultaneously identify dozens or even hundreds of SNP loci in a single pool. However, few studies have been endeavored to kinship testing based on SNP loci. In this study, we genotyped 90 autosomal human identity SNP loci with NGS, and investigated their testing efficacies based on the likelihood ratio model in eight pedigree scenarios involving paternity, half/full-sibling, uncle/nephew, and first-cousin relationships. We found that these SNPs might be sufficient to discriminate paternity and full-sibling, but impractical for more distant relatives such as uncle and cousin. Furthermore, we conducted an in silico study to obtain the theoretical tendency of how testing efficacy varied with increasing number of SNP loci. For each testing battery in a given pedigree scenario, we obtained distributions of logarithmic likelihood ratio for both simulated relatives and unrelated controls. The proportion of the overlapping area between the two distributions was defined as a false testing level (FTL) to evaluate the testing efficacy. We estimated that 85, 127, 491, and 1,858 putative SNP loci were required to discriminate paternity, full-sibling, half-sibling/uncle-nephew, and first-cousin (FTL, 0.1%), respectively. To test a half-sibling or nephew, an additional uncle relative could be included to decrease the required number of putative SNP loci to ∼320 (FTL, 0.1%). As a systematic computation of paternity and kinship testing based only on SNPs, our results could be informative for further studies and applications on paternity and kinship testing using SNP loci.

Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment

Article

Jan 2007

Revealing the challenges of low template DNA analysis with the prototype Ion AmpliSeq™ Identity panel v2.3 on the PGM™ Sequencer

Article

Jul 2015
FORENSIC SCI INT-GEN

Forensic scientists frequently have to deal with the analysis of challenging sources of DNA such as degraded and low template DNA (LtDNA). The capacity to genotype difficult biological traces has been facilitated by emerging technologies. Massive parallel sequencing (MPS) on microchip among other technologies promises high sensitivity and discrimination power. In this study we evaluated the combined use of the Quantifiler® Trio DNA Quantification Kit with the prototype Ion AmpliSeq™ Identity panel v2.3 and PGM™ platform in LtDNA samples. Coverage, allele balance, allele drop-out/in, consistency and variance were assessed. Overall, the results showed a great level of performance and consistency in terms of genotyping capability even under the most challenging conditions, making it possible to obtain consistent SNP profiles with 31pg of DNA and partial informative profiles with as little as 5pg or with severely degraded DNA. In addition, we demonstrated that the stochastic effects observed in some samples are due to the amplification of the library rather than sequencing. Based on our data, we proposed general recommendations for the analysis of casework samples starting from the use of quantification data, which proved to be critical in deciding whether to process the samples via STR (Short Tandem Repeats) analysis or SNP MPS. In our experience, the use of the prototype Ion AmpliSeq™ Identity panel v2.3 has revealed a new applicable solution for processing LtDNAs. This approach provides users with an additional tool for analysis of traces that either would not give informative results with conventional STR-based techniques.

Analysis of 17 Y‐STR loci haplotype and Y‐chromosome haplogroup distribution in five Chinese ethnic groups

Article

Jun 2015
ELECTROPHORESIS

To investigate genetic diversity in Chinese populations, 706 unrelated male individuals from five ethnic groups (Han, Korean, Hui, Mongolian and Tibetan, respectively) were analyzed with 17 Y-chromosomal short tandem repeats (STRs). The haplotype diversity was 0.99985 in the combined data. 675 distinct haplotypes were observed, of which 649 were unique. Y-chromosome haplogroup in the five groups were also predicted with Y-STR haplotypes. Genetic distance of the five studied ethnic groups and other published groups was analyzed by analysis of molecular variance (AMOVA) and visualized in a multi-dimensional scaling (MDS) plot. In conclusion, the 17 Y-STR loci are highly polymorphic markers in the five groups and hence are very useful in forensic application, population genetics and human evolution studies. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

Inter-laboratory evaluation of SNP-based forensic identification by massively parallel sequencing using the Ion PGMTM

Article

Apr 2015
FORENSIC SCI INT-GEN

Evolution and the Genetics of Populations, Volume 4: Variability within and among Natural Populations.

Article

Mar 1979

SNP typing using the HID-Ion AmpliSeq™ Identity Panel in a southern Chinese population

Abstract and Figures

Recommended publications

Haplotype analysis of the polymorphic 40 Y-STR markers in Chinese populations

Analysis of 17 Y‐STR loci haplotype and Y‐chromosome haplogroup distribution in five Chinese ethnic...

Haplotype analysis of the polymorphic 24 Y-STR markers in six ethnic populations from China

Study on Y-chromosomal STR polymorphisms from Shaanxi Han population