ArticlePDF Available

Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases

October 2016
Scientific Data 3(1):160089

October 2016
3(1):160089

DOI:10.1038/sdata.2016.89

License
CC BY 4.0

Authors:

Mariet Allen

Mayo Foundation for Medical Education and Research

Minerva M Carrasquillo

Mayo Foundation for Medical Education and Research

Cory C Funk

Institute for Systems Biology

Ben Heavner

Institute for Systems Biology

Show all 31 authorsHide

Previous genome-wide association studies (GWAS), conducted by our group and others, have identified loci that harbor risk variants for neurodegenerative diseases, including Alzheimer's disease (AD). Human disease variants are enriched for polymorphisms that affect gene expression, including some that are known to associate with expression changes in the brain. Postulating that many variants confer risk to neurodegenerative disease via transcriptional regulatory mechanisms, we have analyzed gene expression levels in the brain tissue of subjects with AD and related diseases. Herein, we describe our collective datasets comprised of GWAS data from 2,099 subjects; microarray gene expression data from 773 brain samples, 186 of which also have RNAseq; and an independent cohort of 556 brain samples with RNAseq. We expect that these datasets, which are available to all qualified researchers, will enable investigators to explore and identify transcriptional mechanisms contributing to neurodegenerative diseases.

Overview of the relationship of the four genomic datasets herein described.:

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Content uploaded by Nilüfer Ertekin-Taner

Content may be subject to copyright.

Data Descriptor: Human whole

genome genotype and transcriptome

data for Alzheimer’s and other

neurodegenerative diseases

Mariet Allen

1,*

, Minerva M. Carrasquillo

1,*

, Cory Funk

, Benjamin D. Heavner

Fanggeng Zou

, Curtis S. Younkin

, Jeremy D. Burgess

, High-Seng Chai

, Julia Crook

James A. Eddy

, Hongdong Li

, Ben Logsdon

, Mette A. Peters

, Kristen K. Dang

Xue Wang

, Daniel Serie

, Chen Wang

, Thuy Nguyen

, Sarah Lincoln

, Kimberly Malphrus

Gina Bisceglio

,MaLi

, Todd E. Golde

, Lara M. Mangravite

, Yan Asmann

Nathan D. Price

, Ronald C. Petersen

, Neill R. Graff-Radford

, Dennis W. Dickson

Steven G. Younkin

& Nilüfer Ertekin-Taner

1,8

Previous genome-wide association studies (GWAS), conducted by our group and others, have identiﬁed loci

that harbor risk variants for neurodegenerative diseases, including Alzheimer's disease (AD). Human disease

variants are enriched for polymorphisms that affect gene expression, including some that are known to

associate with expression changes in the brain. Postulating that many variants confer risk to

neurodegenerative disease via transcriptional regulatory mechanisms, we have analyzed gene expression

levels in the brain tissue of subjects with AD and related diseases. Herein, we describe our collective

datasets comprised of GWAS data from 2,099 subjects; microarray gene expression data from 773 brain

samples, 186 of which also have RNAseq; and an independent cohort of 556 brain samples with RNAseq.

We expect that these datasets, which are available to all qualiﬁed researchers, will enable investigators to

explore and identify transcriptional mechanisms contributing to neurodegenerative diseases.

Design Type disease state design •individual genetic characteristics comparison design

Measurement Type(s) genetic sequence variation analysis •transcription proﬁling by array assay

Technology Type(s) Whole Genome Association Study •RNA-seq assay

Factor Type(s) regional part of brain •diagnosis

Sample Characteristic(s) Homo sapiens •cerebellum •temporal cortex

Mayo Clinic, Department of Neuroscience, 4500 San Pablo Road, Jacksonville, Florida 32224, USA.

Institute for

Systems Biology, 401 Terry Ave N., Seattle, Washington 98109, USA.

Mayo Clinic, Department of Health

Sciences Research, 4500 San Pablo Road, Jacksonville, Florida 32224, USA.

Mayo Clinic, Department of Health

Sciences Research, 200 First Street, Rochester, Minnesota 55905, USA.

Sage Bionetworks, 1100 Fairview Ave. N.,

Seattle, Washington 98109, USA.

University of Florida, Center for Translational Research in Neurodegenerative

Diseases, 1275 Center Dr, Gainesville, Florida 32611, USA.

Mayo Clinic, Department of Neurology, 200 First

Street, Rochester, Minnesota 55905, USA.

Mayo Clinic, Department of Neurology, 4500 San Pablo Road,

Jacksonville, Florida 32224, USA. *These authors contributed equally to this work. Correspondence and requests

for materials should be addressed to N.E.-T. (email: taner.nilufer@mayo.edu).

OPEN

SUBJECT CATEGORIES

» Neurodegeneration

» Genetics of the nervous

system

» Genome-wide

association studies

» RNA sequencing

Received: 8April 2016

Accepted: 31 August 2016

Published: 11 October 2016

www.nature.com/scientificdata

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 1

Background & Summary

In the past decade GWAS identiﬁed risk loci for human diseases, including AD

1–7

and other

neurodegenerative diseases

8,9

. Despite this progress, a comprehensive understanding of the molecular

mechanisms underlying these complex conditions remains elusive. This is partly due to the inability of

the disease GWAS approach to identify the actual disease gene and the functional disease risk variants.

and others

11,12

utilized combined gene expression GWAS (eGWAS) and disease GWAS to identify

loci which harbor regulatory variants that confer disease risk and to nominate the actual disease genes at

these loci. The underlying premise of these studies is that genetic variants that modulate expression levels

of genes, which encode critical members of disease molecular pathways, will also inﬂuence disease risk

If this is correct, then there should be signiﬁcant overlap between disease GWAS and eGWAS variants,

especially if assessed in the disease-relevant tissue. Indeed, in an eGWAS of brain tissue from subjects

with AD and non-AD, comprised largely of other neurodegenerative diagnoses, we identiﬁed signiﬁcant

enrichment for disease GWAS variants for AD and other diseases

.We

14–18

and others

8,19–22

determined

that many of the risk variants for AD and other neurodegenerative diseases inﬂuence brain levels of genes

that are nearby in the genome. These studies implicate the genes that are likely to be involved in disease

pathways, nominate regulatory variants as the functional disease risk factors and provide testable

hypotheses for their downstream effects.

Most large-scale gene expression studies in human brains published to date

10,19,20,23

utilize

microarray-based gene or exon arrays. Despite the versatility, cost-effectiveness and large-scale utility,

this approach has limitations, including restricted dynamic range, lack of probes for all known gene

isoforms and conﬁnement of assays to known transcripts. RNA sequencing (RNAseq) provides an

attractive alternative that can surpass these limitations and provide much more in-depth information

about the human transcriptome in a high-throughput manner

. To expand our prior work on the

human transcriptome based on microarray approaches and to evaluate gene/exon/isoform levels in a

comparative fashion between AD and other neurodegenerative diseases, we have generated RNAseq data

on brain samples from both a subset of the subjects that underwent microarray transcriptome studies

and also an independent cohort. These datasets will be of utility in performing expression quantitative

trait loci (eQTL), expression proﬁling and network analyses to facilitate interpretation of genetic

associations and further understanding of disease-mediated changes in transcriptional regulation.

The present report is a description of the large-scale human genetic, and both microarray- and

RNAseq-based transcriptome datasets we generated. The datasets described in this report have been

made available to the research community through the Accelerating Medicines Partnership in

Alzheimer’s Disease (AMP-AD) Knowledge Portal (Data Citation 1). The portal is hosted in the

Synapse software platform

from Sage Bionetworks as part of a series of datasets developed in support of

the AMP-AD Target Identiﬁcation and Preclinical Validation Project. The AMP-AD consortium includes

six academic teams that will be generating genomic data from human brain or blood samples collected

from more than 10 cohorts. Datasets are hosted in a common environment with standardized meta-data

and annotations to facilitate cross-cohort query, access, and analysis. Each dataset provides a unique

perspective on AD; therefore, datasets differ in types, generation protocols, and underlying patient

characteristics. Together, this collection represents to date the most comprehensive collection of human

genomic data in the ﬁeld and, as such, it will be invaluable to a broad set of researchers.

The datasets described herein include the following: (1) late-onset AD GWAS

(Mayo LOAD GWAS)

on 2,099 subjects (Data Citation 2); (2) Mayo eGWAS

on 773 samples from the cerebellum (CER) and

temporal cortex (TCX) brain regions from a subset of Mayo LOAD GWAS participants (Data

Citations 3,4); (3) Mayo Pilot RNAseq

generated on a subset of 186 TCX samples from the Mayo

eGWAS (Data Citation 5); (4) Mayo RNAseq on an independent cohort of 556 TCX

(Data Citation 6)

and CER (Data Citation 7) samples from subjects with AD, progressive supranuclear palsy (PSP),

pathologic aging and elderly controls without neurodegenerative diseases. This report provides a

comprehensive understanding of these cohorts, a detailed description of subjects, samples, data

generation, and quality control (QC) as well as instructions to access these rich datasets by the scientiﬁc

community.

Methods

The repository of human whole genome genotype and transcriptome data described herein (Table 1,

Fig. 1) consist of the following resources some of which have previously been published: Previously

published datasets include whole genome genotype data from the Mayo LOAD GWAS

(Data Citation 2)

and microarray-based whole transcriptome data from the Mayo eGWAS

(Data Citations 3,4).

Next-generation RNA-sequencing (RNAseq) data from a subset of the patients from the Mayo Clinic

eGWAS, referred to as the ‘Mayo Pilot RNAseq’(Data Citation 5), was published in part

A non-overlapping cohort with RNAseq-based transcriptome data named ‘Mayo RNAseq’(Data

Citations 6,7) has also been published in part

. For a comprehensive description of the overall repository,

the data from the published studies are also described herein, albeit in an abbreviated fashion. These four

study cohorts will be referred to by their names as mentioned above, preceded by letters A-D (Table 1)

henceforth.

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 2

Study Populations

All of this work was approved by the Mayo Clinic Institutional Review Board. All human subjects or their

next of kin provided informed consent. The characteristics of the four study populations are as follows:

Mayo LOAD GWAS. The characteristics of the cohort for this study (Data Citation 2) were previously

described in detail

. Brieﬂy, this is a LOAD case versus control study composed in total of 2,099 subjects

sourced from three different series, namely: Mayo Clinic Jacksonville, Mayo Clinic Rochester and Mayo

Clinic Brain Bank series. These series are respectively termed as JS, RS and AUT in the GWAS

publication

(Table 1). Subjects in the Mayo Clinic Jacksonville and Mayo Clinic Rochester series were

diagnosed clinically. These series consisted of 353 LOAD cases versus 331 controls; and 245 LOAD cases

versus 701 controls. The Mayo Clinic Brain Bank series is a post-mortem cohort that consists of 246

LOAD cases versus 223 controls. All subjects were North American Caucasians. All clinical LOAD

subjects were diagnosed as probable or possible AD, according to NINCDS-ADRDA criteria

. All

clinical controls had a clinical dementia rating score of 0. LOAD subjects in the Mayo Clinic Brain Bank

series met neuropathologic criteria for deﬁnite AD and had a Braak score of ≥4.0 (ref. 28), while controls

did not meet neuropathologic criteria for AD, and each had Braak score of ≤2.5, which is an intermediary

level of neuroﬁbrillary tangle pathology between Braak score of 2 and 3; but most controls had

neuropathologies unrelated to AD, including vascular dementia, frontotemporal dementia, dementia with

Lewy bodies, multi-system atrophy, amyotrophic lateral sclerosis, and progressive supranuclear palsy.

Ages, APOE ε4genotype and sex distribution for the Mayo LOAD GWAS cohort are shown in Table 2.

This study only included subjects with ages between 60 and 80 years, based on the assumption that much

of the genetic risk for LOAD will be concentrated in this age group, especially given the

age-dependent effects of the strongest AD risk variant apolipoprotein E ε4(APOE4)

. Age for the

clinically diagnosed LOAD cases is deﬁned as age at ﬁrst diagnosis of AD, since age at onset is not always

available. Age at entry into the study is used for the clinically diagnosed controls. Age at death is utilized

for the cases and controls in the postmortem Mayo Clinic Brain Bank series, given that for this

cohort, age at clinical diagnosis/ evaluation is not always available. Illumina Hap300 microarray genotypes

from the subjects in these three case-control series were utilized to conduct a GWAS of LOAD risk

Mayo eGWAS. This cohort was previously described in detail

. All subjects in the Mayo eGWAS

(Data Citations 3,4) are a subset of the Mayo Clinic Brain Bank series from the Mayo LOAD GWAS

Study Name Brief Description Study Cohort/

Sample type

N Cohort Characteristics Datatype Platform Reference

A. Mayo LOAD

GWAS (Data

Citation 2)

LOAD Case control GWAS. Uses

samples from 3 cohorts: Total 2,099

subjects (Post-QC). This data is used to

identify loci associated with LOAD risk.

Mayo Clinic

Jacksonville (JS)/

Antemortem

N=353 cases, 331

controls

Clinical: AD Cases and Controls,

collected at Mayo Clinic Jacksonville.

Age at ﬁrst diagnosis of AD or age at

study entry: 60–80.

LOAD GWAS

Genotypes,

demographics

Illumina Hap 300

Carrasquillo

et al.

Nature

Genetics

Mayo Clinic

Rochester (RS)/

Antemortem

N=245 cases, 701

controls

Clinical: AD Cases and Controls,

collected at Mayo Clinic Rochester.

Age at ﬁrst diagnosis of AD or age at

study entry: 60–80.

Mayo Clinic Brain

Bank (AUT)/

Postmortem

N=246 cases, 223

controls

Post-mortem: AD Cases (Braak ≥4.0)

and Other Pathologies (Braak ≤2.5).

Age at death: 60–80.

B. Mayo eGWAS

(Data Citations 3,4)

WG-DASL gene expression measures

for a subset of Mayo Brain Bank

subjects that were included in the Mayo

LOAD GWAS: RNA was isolated from

two brain regions: TCX and CER. This

data is utilized to identify loci associated

with brain gene expression in subjects

with AD, subjects with Other brain

pathologies that do not meet criteria for

AD (Non-AD), and the combined

cohort.

Mayo Brain Bank/

Temporal Cortex

N=202 AD, 197

Non-AD controls

Post-mortem: AD Cases (Braak ≥4.0)

and Other Pathologies (Braak ≤2.5).

Age at death: 60–80.

Gene expression

phenotypes,

eGWAS results,

covariates

Illumina

WG-DASL

Zou et al.

PLoS

Genetics

Mayo Brain Bank/

Cerebellum

N=197 AD, 177

Non-AD controls

C. Mayo Pilot

RNAseq (Data

Citation 5)

RNAseq gene expression measures for a

subset of Mayo Brain Bank subjects that

were included in the Mayo LOAD

GWAS: RNA was isolated from TCX.

This data is utilized to identify loci

associated with brain gene expression in

subjects with AD and subjects with PSP.

Mayo Brain Bank/

Temporal Cortex

N=94 AD, 92 PSP Post-mortem: AD Cases (Braak ≥4.0)

and pathologic diagnosis of PSP

(Braak ≤2.5). Age at death: 60–80.

Gene expression

phenotypes,

covariates

IlluminaHiSeq2000,

50 bp, paired end

RNAseq

Allen

et al.

Neurology:

Genetics

D. Mayo RNAseq

(Data Citations 6,7)

RNAseq gene expression measures for

subjects from the Mayo Brain Bank

non-overlapping with the Mayo LOAD

GWAS, and also from Banner Sun

Health Institute. RNA was isolated from

two brain regions: TCX and CER. This

data is utilized to compare brain gene

expression between different pairwise

diagnostic groups.

Mayo Brain Bank

and Banner Sun

Health/Temporal

Cortex

N=84 AD, 84 PSP,

30 pathologic

aging, 80 controls Post-mortem: AD Cases (Braak ≥4.0),

pathologic diagnoses of PSP

(Braak ≤3), pathologic aging

(Braak ≤3) and elderly control brains

(Braak ≤3) without neurodegenerative

diagnoses. Age at death ≤60.

Gene expression

phenotypes,

covariates

IlluminaHiSeq2000,

101 bp, paired end

RNAseq

Mayo Brain Bank

and Banner Sun

Health/Cerebellum

N=86 AD, 84

PSP, 28

pathologic aging,

80 controls

Table 1. Meta-data for each of the four studies.

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 3

(Data Citation 2) (Fig. 1). The Mayo eGWAS is a whole transcriptome expression study in which brain

samples from two different regions were analyzed, namely cerebellum (CER), which is relatively spared in

AD, and temporal cortex (TCX), which is typically one of the ﬁrst regions to be affected with AD

neuropathology

. Transcriptome measurements were obtained from TCX of 202 AD subjects and from

CER of 197 AD (Table 1). This study also included subjects without AD neuropathology, which are

referred to as non-AD, given that many of these subjects had other neuropathologies. There were 197

non-AD subjects with TCX transcriptome measurements with the following neuropathologic diagnoses:

progressive supranuclear palsy (PSP, n=107); Lewy body disease (LBD, n=25); corticobasal

degeneration (CBD, n=22); frontotemporal lobar degeneration (FTLD, n=16); multiple system atrophy

(MSA, n=11), vascular dementia (VaD, n=6); other (n=10). There were 177 non-AD subjects with

CER transcriptome measurements that had the following neuropathologies: PSP (n=98); LBD (n=23);

CBD (n=22); FTLD (n=15); MSA (n=7); VaD (n=4); other (n=8). Eighty-ﬁve percent of the subjects

in the TCX cohort overlapped with those in the CER cohort. Demographics for the Mayo eGWAS

subjects and samples, including RNA quality as assessed by RNA Integrity Numbers (RIN) are shown in

Table 2.

Mayo Pilot RNAseq. All subjects in the Mayo Pilot RNAseq study (Data Citation 5) are a subset of the

Mayo eGWAS (Data Citations 3,4), and are therefore also participants of the Mayo Clinic Brain Bank

series that was included in the Mayo LOAD GWAS (Data Citation 2) (Fig. 1). The diagnostic categories

in the Mayo Pilot RNAseq consist of 94 subjects with AD neuropathology and 92 PSP subjects, previously

described

18,26

. PSP is a primary tauopathy characterized neuropathologically by neuroﬁbrillary tangles

(NFT) and tau-positive glial lesions

29,30

; and often presents clinically as a parkinsonian disorder. All PSP

A. Mayo LOAD GWAS

(n=2,099)

(Data Citation 2)

1 post-mortem cohort:

Mayo Clinic Brain Bank

(n=469)

2 ante-mortem cohorts:

Mayo Clinic Jacksonville

(n=684) and Rochester

(n=946).

TCX samples with WG-

DASL gene expression

(n=197 AD, 177 non-AD)

CER samples with WG-

DASL gene expression

(n=202 AD, 197 non-AD)

B. Mayo eGWAS

(n=773)

(Data Citation 3,4)

C. Mayo Pilot RNAseq

(n=94 AD, 92 PSP)

(Data Citation 5)

D. Mayo RNAseq

(n=556)

(Data Citation 6,7)

TCX samples

(n=84 AD, 84 PSP, 30

pathologic aging, 80

controls)

CER samples

(n=86 AD, 84 PSP, 28

pathologic aging, 80

controls)

Figure 1. Overview of the relationship of the four genomic datasets herein described.

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 4

subjects were diagnosed neuropathologically by a single neuropathologist (DWD). For this study, only

TCX samples were assessed (Table 2).

Mayo RNAseq. The subjects from this cohort are non-overlapping with the cohorts described above.

The Mayo RNAseq cohort was utilized to generate RNAseq-based whole transcriptome data from 278

TCX

(Data Citation 6) and 278 CER (Data Citation 7) samples. Two hundred thirty-eight subjects had

both CER and TCX RNAseq and the rest had either CER or TCX RNAseq measurements based on tissue

availability. CER samples were from the following diagnostic categories: 86 AD, 84 PSP, 28 pathologic

aging and 80 controls without neurodegenerative diagnoses. TCX samples had the following diagnostic

groups: 84 AD, 84 PSP, 30 pathologic aging and 80 controls. Control subjects each had Braak

NFT stage

of 3.0 or less, CERAD

neuritic and cortical plaque densities of 0 (none) or 1 (sparse) and lacked any of

the following pathologic diagnoses: AD, Parkinson’s disease (PD), DLB, VaD, PSP, motor neuron disease

(MND), CBD, Pick’s disease (PiD), Huntington’s disease (HD), FTLD, hippocampal sclerosis (HipScl) or

dementia lacking distinctive histology (DLDH). Subjects with pathologic aging also lacked the above

diagnoses and had Braak NFT stage of 3.0 or less, but had CERAD neuritic and cortical plaque densities

of 2 or more. None of the pathologic aging subjects had a clinical diagnosis of dementia or mild cognitive

impairment. Given the presence of amyloid plaques, but not tangles and the absence of dementia,

pathologic aging is considered to be either a prodrome of AD or a condition, in which there is resistance

to the development of NFT and/or dementia

Within the Mayo RNAseq cohort (Data Citations 6,7), all AD and PSP subjects were from the Mayo

Clinic Brain Bank, and all pathologic aging subjects were obtained from the Banner Sun Health Institute.

Thirty-four control CER and 31 control TCX samples were from the Mayo Clinic Brain Bank, and the

remaining control tissue was from the Banner Sun Health Institute. All subjects were North American

Caucasians. All but control subjects, had ages at death ≥60, and a more relaxed lower age cutoff of ≥50

was applied for normal controls to achieve sample sizes similar to that of AD and PSP subjects. No upper

age limit was imposed on this cohort, however when subjects had ages at death of ≥90, their ages were

recorded as ‘90_or_above’and shown as ‘90’in Table 2 to protect patient conﬁdentiality.

Table 2 details the demographic characteristics of the Mayo RNAseq cohort (Data Citations 6,7).

PSP subjects tended to be younger than the other diagnostic groups. As expected, there was a greater

frequency of APOE4 positive subjects in the AD group, followed by pathologic aging, then PSP and

control subjects. AD and pathologic aging subjects had greater female sex frequency (57%), followed by

controls (49%), then PSP subjects (39%). RIN for all samples were selected to be ≥5.0. Pathologic aging

and control samples had slightly lower RINs than AD and PSP samples, due to limitations in availability

of samples in these former diagnostic categories.

Molecular Data

Sample collection and processing. For the Mayo LOAD GWAS (A) (Data Citation 2), DNA samples

were collected and processed as previously described

. For the antemortem Mayo Clinic Jacksonville and

Mayo Clinic Rochester series, whole blood samples were collected in 10 ml EDTA tubes followed by DNA

A. Mayo LOAD GWAS (Data Citation 2) B. Mayo eGWAS (Data Citations 3,4) C. Mayo Pilot RNAseq (Data Citation

TCX CER TCX

Variables AD (n=844) CON (1,255) AD (n=202) NON-AD

(n=197)

AD (n=197) NON-AD (n=177) AD (n=94) PSP (N=92)

Mean Age ±s.d. (Range) 74.0 ±4.8 (60–80) 73.2 ±4.4 (60–80) 73.6 ±5.5 (60–80) 71.6 ±5.6 (60–80) 73.6 ±5.6 (60–80) 71.7 ±5.5 (60–80) 74.1 ±5.7 (60–80) 71.9 ±5.4 (60–80)

APOE4 positive/negative/

null

(%APOE4 positive)

549/277/18 (65%) 344/889/22 (27%) 123/79/0 (61%) 49/146/2 (25%) 126/71/0 (64%) 45/130/2 (25%) 58/36/0 (62%) 20/72/0 (22%)

Female (%) 482 (57%) 641 (51%) 108 (53%) 78 (40%) 101 (51%) 63 (36%) 41 (44%) 37 (40%)

Mean RIN ±s.d. (Range) NA NA 6.3±0.9 (5–9) 6.9 ±1.0 (5–9.3) 7.2 ±1.0 (5–9.4) 7.2 ±1.0 (5–9) 7.0 ±0.7 (6.2–9) 7.0 ±0.9 (5.7–9.3)

D. Mayo RNAseq (Data Citations 6,7)

TCX CER

Variables AD (n=84) PSP (n=84) Path Aging

(n=30)

Control (n=80) AD (n=86) PSP (n=84) Path Aging

(n=28)

Control (n=80)

Mean Age ±s.d. (Range) 82.4 ±7.7 (60–90) 74.0 ±6.5 (61–89) 85.2 ±4.3 (76–90) 82.6 ±8.8 (53–90) 82.5 ±7.7 (60–90) 74.0 ±6.5 (61–89) 84.7 ±4.3 (76–90) 82.5 ±8.3 (58–90)

APOE4 positive/negative

(%APOE4 positive)

43/41 (51%) 13/71 (15%) 10/20 (33%) 10/70 (13%) 43/43 (50%) 13/71 (15%) 9/19 (32%) 11/69 (14%)

Female (%) 48 (57%) 33 (39%) 17 (57%) 39 (49%) 49 (57%) 33 (39%) 16 (57%) 39 (49%)

Mean RIN ±s.d. (Range) 8.6 ±0.5 (7.7–10.0) 8.5 ±0.5 (7.8–10.0) 7.4 ±1.0 (5.3–8.9) 7.6 ±1.0 (5.3–9.7) 8.3 ±0.8 (5.7–10.0) 8.4 ±0.9 (5.5–10.0) 7.5±1.0 (5.7–9.0) 7.6 ±1.0 (5.5–9.7)

Table 2. Demographics for the cohorts included in each of the four studies.

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 5

extraction using AutoGenFlex STAR instrument (AutoGen), whereas cerebellar tissue was used for DNA

extraction from the postmortem Mayo Clinic Brain Bank series using the Wizard Genomic DNA

puriﬁcation kit (Promega). Given limited amounts of DNA from samples in the Mayo Clinic Rochester

series and Mayo Clinic Brain Bank series, whole genome ampliﬁcation (WGA) was applied using the

Illustra GenomiPhi V2 DNA Ampliﬁcation Kit (GE Healthcare Bio-Sciences), in four 5 ml reactions that

utilized 5–15 ng genomic DNA as a template. Subsequent to the pooling of these reaction products, WGA

DNA was subjected to quality control (QC) using SNP genotyping as previously described.

RNA extraction methods for the Mayo eGWAS

(B) (Data Citations 3,4) and Mayo Pilot RNAseq

using the Ambion RNAqueous kit (Life Technologies, Grand Island, NY) according to the manufacturer’s

instructions. Brain samples for the Mayo RNAseq (D) (Data Citations 6,7) study underwent RNA

extractions via the Trizol/chloroform/ethanol method, followed by DNase and Cleanup of RNA using

Qiagen RNeasy Mini Kit and Qiagen RNase -Free DNase Set. The quantity and quality of all RNA

samples were determined by the Agilent 2100 Bioanalyzer using the Agilent RNA 6000 Nano Chip

(Agilent Technologies, Santa Clara, CA). Samples had to have an RNA Integrity Number (RIN) ≥5.0 for

inclusion in either study (Table 2).

Data generation. The genotype data for the Mayo LOAD GWAS (A) (Data Citation 2) was generated

using HumanHap300-Duo Genotyping BeadChips

, which were processed with an Illumina BeadLab

station at the Mayo Clinic Genotyping Shared Resource (currently Mayo Clinic Medical Genome

Facility =MGF, Rochester, Minnesota) according to the manufacturer’s protocols. Two samples were

genotyped per chip for 318,237 SNPs across the genome. Genotype calls were made using the auto-calling

algorithm in Illumina’s BeadStudio 2.0 software.

For the Mayo eGWAS study (B) (Data Citations 3,4), transcript levels were measured using the Whole

Genome DASL assay (Illumina, San Diego, CA) as previously described

. Probe annotations were done

based on NCBI RefSeq, Build 36.2. The RNA samples were randomized across the chips and plates using

a stratiﬁed approach to ensure balance with respect to diagnosis, age, gender, RIN and APOE genotype.

Raw probe mRNA expression data were exported from GenomeStudio software (Illumina Inc.) and

preprocessed for background correction, variance stabilizing transformation, quantile normalization and

probe ﬁltering using the lumi package of BioConductor

Samples for both Mayo Pilot RNAseq (C) (Data Citation 5) and Mayo RNAseq (D) (Data

Citations 6,7) studies were randomized prior to transfer to the Mayo Clinic MGF Gene Expression Core

for library preparation and then the Sequencing Core for RNA sequencing. Mayo Pilot RNAseq (C)

(Data Citation 5) AD and PSP samples were randomized across ﬂowcells, taking into account age at

death, sex and RIN. These samples underwent library preparation and sequencing at different times and

therefore should be considered as separate datasets. Likewise, Mayo RNAseq (D) of TCX

and CER

samples (Data Citations 6,7, respectively) underwent RNAseq at different times. These samples were

randomized across ﬂowcells, taking into account age at death, sex, RIN, Braak stage and diagnosis. The

TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA) was used for library preparation from all

samples. The library concentration and size distribution was determined on an Agilent Bioanalyzer DNA

1000 chip. All samples were run in triplicates using barcoding (3 samples per ﬂowcell lane). For Mayo

Pilot RNAseq (C) (Data Citation 5) samples, 50 base-pair, paired-end sequencing was done, whereas

Mayo RNAseq (D) (Data Citations 6,7) samples underwent 101 bp, paired-end sequencing.

Data Processing. Mayo LOAD GWAS (A) (Data Citation 2) genotypes from Illumina BeadStudio 2.0

software were utilized to generate lgen, map and fam ﬁles that were imported into PLINK

and

converted to binary ped (.bed) and map (.bim) ﬁles, which are deposited together with PLINK format fam

and covariate ﬁles (DOI and descriptions for each these ﬁles are provided in Table 3 (available online

only)).

The Mayo eGWAS WG-DASL microarray expression dataset from TCX and CER (B) includes

covariates and probe expression levels (Data Citation 3), which are preprocessed as published

and

described above. The Mayo eGWAS ‘eSNP Results’(Data Citation 4) are the eQTL results from the test of

association between the Mayo LOAD GWAS (Data Citation 2) genotypes and the WG-DASL gene

expression measures analyzed by multivariable linear regression using an additive model in PLINK

,as

published previously

(DOI and descriptions for each these ﬁles are provided in Table 3 (available online

only)). These analysis used preprocessed probe transcript levels as traits, SNP minor allele dosage as the

independent variable, and adjusted for the following covariates: APOE ε4 dosage (0, 1, 2), age at death,

sex, PCR plate, RIN and adjusted RIN squared (RIN-RINmean)

. Analyses were limited to SNP-probe

pairs that were in-cis,deﬁned as +/ −100 kb of the targeted gene according to NCBI Build 36. The ADs

and non–ADs were analyzed both separately and jointly. The joint analyses included diagnosis as an

additional covariate (AD =1, non–AD =0). Results of analyses for both the genotyped SNPs as well as

genotypes imputed to HapMap2 reference are provided. HapMap2 imputations were done as described

The eGWAS results were previously made available through the NIAGADS repository (https://www.

niagads.org/datasets/ng00025).

The Mayo Pilot RNAseq

(Data Citation 5), Mayo RNAseq TCX

and CER data (Data Citations 6,7,

respectively) were processed using the same analytic pipeline. Read alignments were done using the

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 6

SNAPR software

, an RNA sequence aligner based on SNAP, using GRCh38 reference and Ensembl v77

gene models. Outputs include per-sample gene and transcript counts, which are merged into a single ﬁle

per data type (gene or transcript) that contains data for all samples across all genes/transcripts (DOI and

descriptions for each these ﬁles are provided in Table 3 (available online only)). Alignment with SNAPR

starts with the creation of hash indices built from both a reference genome GRCh38 and transcriptome

GRCh38.77. SNAPR ﬁlters fastq reads by Phred score (>80% of the read must have a Phred score

>= 20) and simultaneously aligns each read (or read pair) to both the genome and transcriptome. The

best alignment is written to a sorted BAM ﬁle with read counts simultaneously tabulated and written for

each sample. Read counts are given by gene ID and transcript ID (two separate ﬁles). We have previously

tested the read counts generated by SNAPR to the read counts generated by HT-Seq and found them to

be very comparable.

Post-processing was also performed using the same pipeline for these three RNAseq datasets as follows:

The individual read count ﬁles produced by SNAPR are merged into a single ﬁle using two scripts:

merge_count_ﬁles.R and a dataset-speciﬁc read-count merge script. These scripts generate the

corresponding _counts.txt.gz ﬁles. The merged count ﬁles are normalized with the normalize_

readcounts.R script, which uses the edgeR implementation of the trimmed mean of M-values (TMM)

normalization method to calculate counts per million (CPM). These normalized counts are saved for both

gene and transcript levels (DOI and descriptions for each these ﬁles are provided in Table 3 (available

online only)).

Code Availability. The R script called merge_count_ﬁles.R

was used to merge the RNAseq read

count ﬁles produced by SNAPR into a single ﬁle, and can be found at https://github.com/CoryFunk/

AMP-AD-scripts/blob/master/combine_count_ﬁles.pl. Also, the R script used to normalize the merged

RNAseq read counts, called normalize_readcounts.R

, can be found at https://github.com/CoryFunk/

AMP-AD-scripts/blob/master/tmm_normalization.R.

Data Records

Data available for studies A-D (Data Citations 2–7; Table 3 (available online only)) consists of a set of

ﬁles that contain genomic, genetic or covariate data for a deﬁned set of samples; analytic results are also

provided when available. Data ﬁles can be found in the Sage Bionetworks AMP-AD Knowledge Portal

(Data Citation 1) in study speciﬁc folders (and subfolders). Users can identify and search for data ﬁles

and data descriptions using the unique Synapse ID and corresponding DOI provided in Table 3 (available

online only). Each sample within a study has a unique sample ID, this sample ID is consistent across all

ﬁles within the study, and ﬁles in other studies where applicable. The relationship between studies

and sample overlaps is illustrated in Fig. 1. The samples in study C (Data Citation 5) are a subset of

the samples in study B (Data Citation 3) which are likewise a subset of the samples in study A (Data

Citation 2); the samples in study D (Data Citations 6,7) are independent of those in studies A-C. The

Usage Notes section describes the data accession conditions, and the steps for requesting access.

Technical Validation

Data QC

Mayo LOAD GWAS (A) (Data Citation 2) QC methods were previously published

. Brieﬂy, using

PLINK

, subjects with genotyping call rates of o90%, duplicate genotyping and/or sex-mismatches

between recorded and deduced sex were eliminated from the dataset. All SNPs with genotyping call rates

o90%, minor allele frequencies o0.01, and/or Hardy-Weinberg p values o0.001 were also eliminated.

Prior to QC, 318,237 SNPs were genotyped in 2,465 subjects. The available data includes the 313,504 SNP

genotypes from 2,099 subjects that passed these QC parameters.

The Mayo eGWAS

(B) (Data Citations 3,4) data was generated as follows: We annotated probes for

presence of genetic variants by comparing their positions according to NCBI RefSeq, Build 36.3 to those

of all variants within dbSNP131 and identiﬁed the list of probes that have ≥1 variants within their

sequence. We depict this information in the ﬁles for the Mayo eGWAS, ‘eSNP Results’(Data Citation 4)

(Table 3 (available online only)), by including ‘SNP-In-Probe’column, which has ‘TRUE’if the probe

sequence harbors ≥1 SNP, and ‘FALSE’, otherwise. We also calculated for each probe within each analytic

group, percent detection rate above background. Probes that are detected in >12.5%, >25%, >50% and

>75% of the subjects in each analytic group are annotated by four separate columns within the ‘eSNP

Results’(Data Citation 4) from the eGWAS that included HapMap2 imputed genotypes, described below.

The purpose of these annotation columns is to enable others the ﬂexibility to impose cutoffs based on

presence/absence of variants within probe sequence and/or probe detection rates while providing the full

dataset for completeness. The Mayo eGWAS (Data Citation 3,Data Citation 4) also included replicate

samples as described for QC and to estimate intraclass coefﬁcients (ICC), which is the between-subject

variance, as a percentage of the total variance in probe expression

. There were 4 AD and 4 non-AD

temporal cortex samples that were measured in 5 replicates; and 10 AD and 5 non-AD cerebellar sample

replicates across ﬁve plates. Universal human RNA (UHR) samples were also run on each PCR plate as

part of QC. The expression phenotypes include results from only one of the replicate subjects selected

randomly and exclude UHR results. It should be noted that 3 AD and 9 non-AD subjects for TCX, and

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 7

4 AD subjects for CER, do not have associated GWAS genotypes as they did not pass ≥1 GWAS QC

parameter described above.

For the Mayo Pilot RNAseq

identiﬁed 2 outliers in the AD and 4 in the PSP cohort. The covariates for these subjects were set to

missing ( =NA) in the respective covariate ﬁles (DOI and descriptions for these ﬁles are provide in

Table 3 (available online only)). Hence, although 96 AD and 96 PSP subjects underwent sequencing in

the Mayo Pilot RNAseq study, 94 AD and 92 PSP subjects were retained for analyses. It should be noted

that of these subjects, 1 AD and 7 PSP subjects lack GWAS data due to either having genotype counts

o90% or failing sex checks. PCA identiﬁed no outliers in the Mayo RNAseq (D) of TCX

samples (Data

Citation 6) but 2 such subjects in the CER analyses (Data Citation 7). The covariate data in the relevant

CER ﬁles for these two subjects were set to missing. We likewise assessed the RNASeq data for sex

discrepancies based on Y chromosome gene expression and documented sex and identiﬁed 2 subjects

with mis-matched sex for both TCX and CER, plus a third subject in the CER cohort. These were also set

to missing in the covariate ﬁles. At the time of this publication, the Mayo RNAseq subjects did not have

GWAS genotypes deposited on Synapse.

Usage Notes

The data described herein is available for use by the research community and has been deposited in the

AMP-AD Knowledge Portal (Data Citation 1). Table 3 (available online only) provides a detailed

description of the ﬁles deposited for the four studies, their speciﬁc Synapse identiﬁers (IDs), DOIs, the

types of ﬁles and deﬁnitions of the column headers. These ﬁles (Data Citations 2–7), and their assigned

DOIs will be maintained in perpetuity in the AMP-AD Knowledge Portal (Data Citation 1). Access to all

of these ﬁles is enabled through the Sage Bionetworks, Synapse repository; and a subset of the ﬁles for the

Mayo LOAD GWAS (Data Citation 2) and the Mayo eGWAS (Data Citations 3,4) are also available via

NIAGADS (www.niagads.org).

The AMP-AD Knowledge Portal hosts data derived from multiple cohorts that were generated as part

of or used in support of the AMP-AD Target Identiﬁcation and Preclinical Validation project

(Data Citation 1). The portal uses the Synapse software platform

for backend support, providing users

with both web-based and programmatic access to data ﬁles. All data ﬁles in the portal are annotated using

a standard vocabulary to enable users to search for relevant content across the AMP-AD datasets using

programmatic queries. Data is stored in a cloud based manner hosted by Amazon web services (AWS),

which enables user to execute cloud-based compute. Detailed descriptions including data processing,

QC metrics, and assay and cohort speciﬁc variables are provided for each ﬁle as applicable.

Access for the data described herein is controlled in a manner set forth by the institutional review

board (IRB) at the Mayo Clinic. All data use terms include: (1) maintenance of data in a secure and

conﬁdential manner, (2) respect for the privacy of study participants, (3) citation of the data contributors

in any publications resulting from data use, and (4) informing data contributors of resultant publications.

Speciﬁc data use terms are provided for each dataset (Data Citations 3–6) under the header ‘Terms of

use’; users must register for a Synapse account and provide electronic agreement to these terms prior to

accessing the study ﬁles. Access to the Mayo LOAD GWAS data (A) (Data Citation 2) requires a data use

certiﬁcate (doi:10.7303/syn2954402.2). User approvals are managed by the Synapse Access and

Compliance Team (ACT).

Data on the AMP-AD Knowledge Portal are annotated with a common dictionary of terms

(doi:10.7303/syn5478487.2) to enable querying of the data using the Synapse analytical clients (R client:

syn1834618, python client: syn1768504, command line client: syn2375225). Fields, their allowable values

speciﬁc to the datasets described herein and the dictionary of annotations are shown in Table 3 (available

online only). These annotations can be used to identify ﬁles of interest within the available datasets and to

ﬁlter on any of the ﬁelds using the allowable values from the dictionary (an example is shown here:

doi:10.7303/syn5585666.1).

References

1. Carrasquillo, M. M. et al. Genetic variation in PCDH11X is associated with susceptibility to late-onset Alzheimer's disease. Nat

Genet 41, 192–198 (2009).

2. Harold, D. et al. Genome-wide association study identiﬁes variants at CLU and PICALM associated with Alzheimer's disease.

Nature genetics 41, 1088–1093 (2009).

3. Lambert, J. C. et al. Genome-wide association study identiﬁes variants at CLU and CR1 associated with Alzheimer's disease.

Nature genetics 41, 1094–1099 (2009).

4. Seshadri, S. et al. Genome-wide analysis of genetic loci associated with Alzheimer disease. Jama 303, 1832–1840 (2010).

5. Naj, A. C. et al. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's

disease. Nature genetics 43, 436–441 (2011).

6. Hollingworth, P. et al. Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with

Alzheimer's disease. Nature genetics 43, 429–435 (2011).

7. Lambert, J. C. et al. Meta-analysis of 74,046 individuals identiﬁes 11 new susceptibility loci for Alzheimer's disease. Nature

genetics 45, 1452–1458 (2013).

8. Hoglinger, G. U. et al. Identiﬁcation of common variants inﬂuencing risk of the tauopathy progressive supranuclear palsy. Nature

genetics 43, 699–705 (2011).

9. Simon-Sanchez, J. et al. Genome-wide association study reveals genetic risk underlying Parkinson's disease. Nature genetics 41,

1308–1312 (2009).

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 8

10. Zou, F. et al. Brain expression genome-wide association study (eGWAS) identiﬁes human disease-associated variants. PLoS Genet

8, e1002707 (2012).

11. Dixon, A. L. et al. A genome-wide association study of global gene expression. Nature genetics 39, 1202–1207 (2007).

12. Emilsson, V. et al. Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008).

13. Saykin, A. J. et al. Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans.

Alzheimers Dement 11, 792–814 (2015).

14. Zou, F. et al. Gene expression levels as endophenotypes in genome-wide association studies of Alzheimer disease. Neurology 74,

480–486 (2010).

15. Allen, M. et al. Novel late-onset Alzheimer disease loci variants associate with brain gene expression. Neurology 79,

221–228 (2012).

16. Allen, M. et al. Glutathione S-transferase omega genes in Alzheimer and Parkinson disease risk, age-at-diagnosis and brain gene

expression: an association study with mechanistic implications. Mol Neurodegener 7, 13 (2012).

17. Allen, M. et al. Association of MAPT haplotypes with Alzheimer's disease risk and MAPT brain gene expression levels.

Alzheimers Res Ther 6, 39 (2014).

18. Allen, M et al. Late-onset Alzheimer disease risk variants mark brain regulatory loci. Neurology: Genetics 1, e15 (2015).

19. Myers, A. J. et al. A survey of genetic human cortical gene expression. Nature genetics 39, 1494–1499 (2007).

20. Webster, J. A. et al. Genetic control of human brain transcript expression in Alzheimer disease. Am J Hum Genet 84,

445–458 (2009).

21. Chapuis, J. et al. Increased expression of BIN1 mediates Alzheimer genetic risk by modulating tau pathology. Mol Psychiatry 18,

1225–1234 (2013).

22. Hazrati, L. N. et al. Genetic association of CR1 with Alzheimer's disease: a tentative disease mechanism. Neurobiol Aging 33,

2949 e5–2949 e12 (2012).

23. Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat Neurosci 17,

1418–1428 (2014).

24. Montgomery, S. B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464,

773–777 (2010).

25. Derry, J. M. et al. Developing predictive molecular maps of human disease through community-based modeling. Nature genetics

44, 127–130 (2012).

26. Allen, M. et al. Gene expression, methylation and neuropathology correlations at progressive supranuclear palsy risk loci. Acta

Neuropathol 132, 197–211 (2016).

27. McKhann, G et al. Clinical diagnosis of Alzheimer's disease: report of the NINCDS-ADRDA Work Group under the auspices of

Department of Health and Human Services Task Force on Alzheimer's Disease. Neurology 34, 939–944 (1984).

28. Farrer, LA et al. Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease.

A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. Jama 278, 1349–1356 (1997).

29. Braak, H. & Braak, E. Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol (Berl) 82, 239–259 (1991).

30. Hauw, J. J. et al. Preliminary NINDS neuropathologic criteria for Steele-Richardson-Olszewski syndrome (progressive

supranuclear palsy). Neurology 44, 2015–2019 (1994).

31. Mirra, S. S. et al. Interlaboratory comparison of neuropathology assessments in Alzheimer's disease: a study of the Consortium to

Establish a Registry for Alzheimer's Disease (CERAD). J Neuropathol Exp Neurol 53, 303–315 (1994).

32. Wang, J., Dickson, D. W., Trojanowski, J. Q. & Lee, V. M. The levels of soluble versus insoluble brain Abeta distinguish

Alzheimer's disease from normal and pathologic aging. Exp Neurol 158, 328–337 (1999).

33. Du, P., Kibbe, W. A. & Lin, S. M. lumi: a pipeline for processing Illumina microarray. Bioinformatics 24, 1547–1548 (2008).

34. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81,

559–575 (2007).

35. Magis, A. T., Funk, C. C. & Price, N. D. SNAPR: A Bioinformatics Pipeline for Efﬁcient and Accurate RNA-Seq Alignment and

Analysis. IEEE Life Sciences Letters 1, 22–25 (2015).

36. Funk, C. AMP-AD-scripts: AMP-AD Fl-Mayo-ISB. in Zenodo https://dx.doi.org/10.5281/zenodo.56828 (2016).

Data Citations

1. Synapse http://dx.doi.org/10.7303/syn2580853 (2016).

2. Carrasquillo, M. M. et al. Synapse http://dx.doi.org/10.7303/syn2910256 (2016).

3. Zou, F. et al. Synapse http://dx.doi.org/10.7303/syn3157225 (2016).

4. Zou, F. et al. Synapse http://dx.doi.org/10.7303/syn3157249 (2016).

5. Allen, M. et al. Synapse http://dx.doi.org/10.7303/syn3157268 (2016).

6. Allen, M. et al. Synapse http://dx.doi.org/10.7303/syn3163039 (2016).

7. Allen, M. et al. Synapse http://dx.doi.org/10.7303/syn5049298 (2016).

Acknowledgements

We thank the patients and their families for the sample and tissue donations. Without their generosity,

this research would not be possible. The Mayo Clinic Alzheimer's Disease Genetic Studies were led by

Dr Nilüfer Ertekin-Taner and Dr Steven G. Younkin, Mayo Clinic, Jacksonville, FL using samples from

the Mayo Clinic Study of Aging, the Mayo Clinic Alzheimer's Disease Research Center, and the Mayo

Clinic Brain Bank. Data collection was supported through funding by NIA grants P50 AG016574, R01

AG032990, U01 AG046139, R01 AG018023, U01 AG006576, U01 AG006786, R01 AG025711, R01

AG017216, R01 AG003949, NINDS grant R01 NS080820, the GHR foundation, CurePSP Foundation,

and support from Mayo Foundation. Samples collected through the Sun Health Research Institute Brain

and Body Donation Program of Sun City, Arizona. The Brain and Body Donation Program is supported

by the National Institute of Neurological Disorders and Stroke (U24 NS072026 National Brain and Tissue

Resource for Parkinson’s Disease and Related Disorders), the National Institute on Aging (P30 AG19610

Arizona Alzheimer’s Disease Core Center), the Arizona Department of Health Services (contract 211002,

Arizona Alzheimer’s Research Center), the Arizona Biomedical Research Commission (contracts 4001,

0011, 05-901 and 1001 to the Arizona Parkinson's Disease Consortium) and the Michael J. Fox

Foundation for Parkinson’s Research. We thank Mrs. Kelly Viola for her assistance with revisions of this

manuscript.

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 9

Author Contributions

M.A. helped with draft of the manuscript, analyzed data, contributed to the Mayo eGWAS and oversaw

the Mayo Pilot RNAseq and Mayo RNAseq studies; M.M.C. helped with draft of manuscript, analyzed

data, co-led the Mayo LOAD GWAS, and oversaw the Mayo Pilot RNAseq and Mayo RNAseq studies;

C.F. analyzed data for Mayo Pilot RNAseq and Mayo RNAseq; B.D.H. analyzed data for Mayo Pilot

RNAseq and Mayo RNAseq; F.Z. analyzed data and oversaw the Mayo eGWAS; C.S.Y. analyzed and

databased data for all studies; J.D.B. analyzed data for Mayo eGWAS, Mayo Pilot RNAseq and Mayo

RNAseq; H.-S.C. analyzed data for Mayo eGWAS; J.C. provided statistical support; J.A.E. analyzed data

for Mayo Pilot RNAseq and Mayo RNAseq; H.L. analyzed data for Mayo Pilot RNAseq and Mayo

RNAseq; B.L. architected the data repository, deposited these data into the public portal and manage data

dissemination; M.A.P. architected the data repository, deposited these data into the public portal and

manage data dissemination; K.K.D architected the data repository, deposited these data into the public

portal and manage data dissemination; X.W. analyzed data for Mayo Pilot RNAseq and Mayo RNAseq;

D.S. analyzed data for Mayo eGWAS, Mayo Pilot RNAseq and Mayo RNAseq; C.W. analyzed data for

Mayo eGWAS; T.N. generated data; S.L. generated data; K.M. generated data; G.B. generated data;

M.L. generated data; T.E.G. provided comments for the manuscript; L.M.M. architected the data

repository, deposited these data into the public portal and manage data dissemination; Y.A. analyzed data

for Mayo Pilot RNAseq and Mayo RNAseq; N.P. oversaw bioinformatics analysis of Mayo Pilot RNAseq

and Mayo RNAseq; R.C.P. provided patient material and data; N.R.G.-R. provided patient material and

data; D.W.D. provided patient material and data; S.G.Y. analyzed data, designed and led the Mayo

GWAS, wrote the manuscript; N.E.-T. analyzed data, designed and led the Mayo eGWAS, Mayo Pilot

RNAseq and Mayo RNAseq studies and wrote the manuscript.

Additional information

Table 3 is only available in the online version of this paper.

Competing ﬁnancial interests: Below are the disclosures for R.C.P.: Pﬁzer, Inc., and Janssen Alzheimer

Immunotherapy: Chair, Data Monitoring Committee. Hoffman-La Roche, Inc.: Consultant. Merck, Inc.:

Consultant. Genentech, Inc.: Consultant. Biogen, Inc.: Consultant. Eli Lilly & Co.: Consultant. N.R.G.-R.

has multicenter treatment study grants from Lilly and TauRx and consulted for Cytox. N.E.-T. has

consulted for Cytox. The remaining authors declare no competing ﬁnancial interests.

How to cite this article: Allen, M et al. Human whole genome genotype and transcriptome data for

Alzheimer's and other neurodegenerative diseases. Sci. Data 3:160089 doi: 10.1038/sdata.2016.89 (2016).

This work is licensed under a Creative Commons Attribution 4.0 International License. The

images or other third party material in this article are included in the article’s Creative

Commons license, unless indicated otherwise in the credit line; if the material is not included under the

Creative Commons license, users will need to obtain permission from the license holder to reproduce the

material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0

Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released

under the CC0 waiver to maximize reuse.

www.nature.com/sdata/

SCIENTIFIC DATA |3:160089 |DOI: 10.1038/sdata.2016.89 10

Supplementary Material

Data

October 2016

Mariet Allen · Minerva M Carrasquillo · Cory C Funk · Ben Heavner · Nilüfer Ertekin-Taner

The impact of astrocytic NF-κB on healthy and Alzheimer’s disease brains

Article

Full-text available

Jun 2024

Astrocytes play a role in healthy cognitive function and Alzheimer’s disease (AD). The transcriptional factor nuclear factor-κB (NF-κB) drives astrocyte diversity, but the mechanisms are not fully understood. By combining studies in human brains and animal models and selectively manipulating NF-κB function in astrocytes, we deepened the understanding of the role of astrocytic NF-κB in brain health and AD. In silico analysis of bulk and cell-specific transcriptomic data revealed the association of NF-κB and astrocytes in AD. Confocal studies validated the higher level of p50 NF-κB and phosphorylated-p65 NF-κB in glial fibrillary acidic protein (GFAP)⁺-astrocytes in AD versus non-AD subjects. In the healthy mouse brain, chronic activation of astrocytic NF-κB disturbed the proteomic milieu, causing a loss of mitochondrial-associated proteins and the rise of inflammatory-related proteins. Sustained NF-κB signaling also led to microglial reactivity, production of pro-inflammatory mediators, and buildup of senescence-related protein p16INK4A in neurons. However, in an AD mouse model, NF-κB inhibition accelerated β-amyloid and tau accumulation. Molecular biology studies revealed that astrocytic NF-κB activation drives the increase in GFAP and inflammatory proteins and aquaporin-4, a glymphatic system protein that assists in mitigating AD. Our investigation uncovered fundamental mechanisms by which NF-κB enables astrocytes' neuroprotective and neurotoxic responses in the brain.

Activation of the muscle-to-brain axis ameliorates neurocognitive deficits in an Alzheimer disease mouse model via enhancing neurotrophic and synaptic signaling

Preprint

Full-text available

Jun 2024

INTRODUCTION Skeletal muscle regulates central nervous system (CNS) function and health, activating the muscle-to-brain axis through the secretion of skeletal muscle originating factors (‘myokines’) with neuroprotective properties. However, the precise mechanisms underlying these benefits in the context of Alzheimer’s disease (AD) remain poorly understood. METHODS To investigate muscle-to-brain axis signaling in response to amyloid β (Aβ)- induced toxicity, we generated 5xFAD transgenic female mice with enhanced skeletal muscle function (5xFAD;cTFEB;HSACre) at prodromal (4-months old) and late (8-months old) symptomatic stages. RESULTS Skeletal muscle TFEB overexpression reduced Aβ plaque accumulation in the cortex and hippocampus at both ages and rescued behavioral neurocognitive deficits in 8- months-old 5xFAD mice. These changes were associated with transcriptional and protein remodeling of neurotrophic signaling and synaptic integrity, partially due to the CNS-targeting myokine prosaposin (PSAP). DISCUSSION Our findings implicate the muscle-to-brain axis as a novel neuroprotective pathway against amyloid pathogenesis in AD.

Alteration of gene expression and protein solubility of the PI 5-phosphatase SHIP2 are correlated with Alzheimer’s disease pathology progression

Article

Full-text available

Jun 2024
ACTA NEUROPATHOL

A recent large genome-wide association study has identified EGFR (encoding the epidermal growth factor EGFR) as a new genetic risk factor for late-onset AD. SHIP2, encoded by INPPL1, is taking part in the signalling and interactome of several growth factor receptors, such as the EGFR. While INPPL1 has been identified as one of the most significant genes whose RNA expression correlates with cognitive decline, the potential alteration of SHIP2 expression and localization during the progression of AD remains largely unknown. Here we report that gene expression of both EGFR and INPPL1 was upregulated in AD brains. SHIP2 immunoreactivity was predominantly detected in plaque-associated astrocytes and dystrophic neurites and its increase was correlated with amyloid load in the brain of human AD and of 5xFAD transgenic mouse model of AD. While mRNA of INPPL1 was increased in AD, SHIP2 protein undergoes a significant solubility change being depleted from the soluble fraction of AD brain homogenates and co-enriched with EGFR in the insoluble fraction. Using FRET-based flow cytometry biosensor assay for tau-tau interaction, overexpression of SHIP2 significantly increased the FRET signal while siRNA-mediated downexpression of SHIP2 significantly decreased FRET signal. Genetic association analyses suggest that some variants in INPPL1 locus are associated with the level of CSF pTau. Our data support the hypothesis that SHIP2 is an intermediate key player of EGFR and AD pathology linking amyloid and tau pathologies in human AD. Supplementary Information The online version contains supplementary material available at 10.1007/s00401-024-02745-7.

A common Alu insertion in the 3'UTR of TMEM106B is associated with risk of dementia

Article

Full-text available

Jun 2024
ALZHEIMERS DEMENT

INTRODUCTION Sequence variants in TMEM106B have been associated with an increased risk of developing dementia. METHODS As part of our efforts to generate a set of mouse lines in which we replaced the mouse Tmem106b gene with a human TMEM106B gene comprised of either a risk or protective haplotype, we conducted an in‐depth sequence analysis of these alleles. We also analyzed transcribed TMEM106B sequences using RNA‐seq data (AD Knowledge portal) and full genome sequences (1000 Genomes). RESULTS We identified an AluYb8 insertion in the 3' untranslated region (3'UTR) of the TMEM106B risk haplotype. We found this AluYb8 insertion in every risk haplotype analyzed, but not in either protective haplotypes or in non‐human primates. DISCUSSION We conclude that this risk haplotype arose early in human development with a single Alu‐insertion event within a unique haplotype context. This AluYb8 element may act as a functional variant in conferring an increased risk of developing dementia. Highlights We conducted an in‐depth sequence analysis of (1) a risk and (2) a protective haplotype of the human TMEM106B gene. We also analyzed transcribed TMEM106B sequences using RNA‐seq data (AD Knowledge Portal) and full genome sequences (1000 Genomes). We identified an AluYb8 insertion in the 3' untranslated region (3'UTR) of the TMEM106B risk haplotype. We found this AluYb8 insertion in every risk haplotype analyzed, but not in either protective haplotypes or in non‐human primates. This AluYb8 element may act as a functional variant in conferring an increased risk of developing dementia.

A Personalized Metabolic Modelling Approach through Integrated Analysis of RNA-Seq-Based Genomic Variants and Gene Expression Levels in Alzheimer's Disease

Preprint

Apr 2024

Motivation: Alzheimer's disease (AD) is known to cause alterations in brain metabolism. Furthermore, genomic variants in enzyme-coding genes may exacerbate AD-linked metabolic changes. Generating condition-specific metabolic models by mapping gene expression data to genome-scale metabolic models is a routine approach to elucidate disease mechanisms from a metabolic perspective. RNAseq data provides both gene expression and genomic variation information. Integrating variants that perturb enzyme functionality from the same RNAseq data may enhance model accuracy, offering insights into genome-wide AD metabolic pathology. Results: Our study pioneers the extraction of both transcriptomic and genomic data from the same RNA-seq data to reconstruct personalized metabolic models. We mapped genes with significantly higher load of pathogenic variants in AD onto a human genome-scale metabolic network together with the gene expression data. Comparative analysis of the resulting personalized patient metabolic models with the control models showed enhanced accuracy in detecting AD-associated metabolic pathways compared to the case where only expression data was mapped on the metabolic network. Besides, several otherwise would-be missed pathways were annotated in AD by considering the effect of genomic variants.

Metabolomics profiling reveals distinct, sex‐specific signatures in serum and brain metabolomes in mouse models of Alzheimer's disease

Article

Full-text available

Apr 2024
ALZHEIMERS DEMENT

INTRODUCTION Increasing evidence suggests that metabolic impairments contribute to early Alzheimer's disease (AD) mechanisms and subsequent dementia. Signals in metabolic pathways conserved across species can facilitate translation. METHODS We investigated differences in serum and brain metabolites between the early‐onset 5XFAD and late‐onset LOAD1 (APOE4.Trem2*R47H) mouse models of AD to C57BL/6J controls at 6 months of age. RESULTS We identified sex differences for several classes of metabolites, such as glycerophospholipids, sphingolipids, and amino acids. Metabolic signatures were notably different between brain and serum in both mouse models. The 5XFAD mice exhibited stronger differences in brain metabolites, whereas LOAD1 mice showed more pronounced differences in serum. DISCUSSION Several of our findings were consistent with results in humans, showing glycerophospholipids reduction in serum of apolipoprotein E (apoE) ε4 carriers and replicating the serum metabolic imprint of the APOE ε4 genotype. Our work thus represents a significant step toward translating metabolic dysregulation from model organisms to human AD. Highlights This was a metabolomic assessment of two mouse models relevant to Alzheimer's disease. Mouse models exhibit broad sex‐specific metabolic differences, similar to human study cohorts. The early‐onset 5XFAD mouse model primarily alters brain metabolites while the late‐onset LOAD1 model primarily changes serum metabolites. Apolipoprotein E (apoE) ε4 mice recapitulate glycerophospolipid signatures of human APOE ε4 carriers in both brain and serum.

A TrkB and TrkC partial agonist restores deficits in synaptic function and promotes activity‐dependent synaptic and microglial transcriptomic changes in a late‐stage Alzheimer's mouse model

Article

Full-text available

May 2024
ALZHEIMERS DEMENT

INTRODUCTION Tropomyosin related kinase B (TrkB) and C (TrkC) receptor signaling promotes synaptic plasticity and interacts with pathways affected by amyloid beta (Aβ) toxicity. Upregulating TrkB/C signaling could reduce Alzheimer's disease (AD)‐related degenerative signaling, memory loss, and synaptic dysfunction. METHODS PTX‐BD10‐2 (BD10‐2), a small molecule TrkB/C receptor partial agonist, was orally administered to aged London/Swedish‐APP mutant mice (APPL/S) and wild‐type controls. Effects on memory and hippocampal long‐term potentiation (LTP) were assessed using electrophysiology, behavioral studies, immunoblotting, immunofluorescence staining, and RNA sequencing. RESULTS In APPL/S mice, BD10‐2 treatment improved memory and LTP deficits. This was accompanied by normalized phosphorylation of protein kinase B (Akt), calcium‐calmodulin–dependent kinase II (CaMKII), and AMPA‐type glutamate receptors containing the subunit GluA1; enhanced activity‐dependent recruitment of synaptic proteins; and increased excitatory synapse number. BD10‐2 also had potentially favorable effects on LTP‐dependent complement pathway and synaptic gene transcription. DISCUSSION BD10‐2 prevented APPL/S/Aβ‐associated memory and LTP deficits, reduced abnormalities in synapse‐related signaling and activity‐dependent transcription of synaptic genes, and bolstered transcriptional changes associated with microglial immune response. Highlights Small molecule modulation of tropomyosin related kinase B (TrkB) and C (TrkC) restores long‐term potentiation (LTP) and behavior in an Alzheimer's disease (AD) model. Modulation of TrkB and TrkC regulates synaptic activity‐dependent transcription. TrkB and TrkC receptors are candidate targets for translational therapeutics. Electrophysiology combined with transcriptomics elucidates synaptic restoration. LTP identifies neuron and microglia AD‐relevant human‐mouse co‐expression modules.

mosGraphGen: a novel tool to generate multi-omic signaling graphs to facilitate integrative and interpretable graph AI model development

Preprint

Full-text available

May 2024

Multi-omic data, i.e., genomics, epigenomics, transcriptomics, proteomics, characterize cellular complex signaling systems from multi-level and multi-view and provide a holistic view of complex cellular signaling pathways. However, it remains challenging to integrate and interpret multi-omics data. Graph neural network (GNN) AI models have been widely used to analyze graph-structure datasets and are ideal for integrative multi-omics data analysis because they can naturally integrate and represent multi-omics data as a biologically meaningful multi-level signaling graph and interpret multi-omics data by node and edge ranking analysis for signaling flow/cascade inference. However, it is non-trivial for graph-AI model developers to pre-analyze multi-omics data and convert them into graph-structure data for individual samples, which can be directly fed into graph-AI models. To resolve this challenge, we developed mosGraphGen (multi-omics signaling graph generator), a novel computational tool that generates multi-omics signaling graphs of individual samples by mapping the multi-omics data onto a biologically meaningful multi-level background signaling network. With mosGraphGen, AI model developers can directly apply and evaluate their models using these mos-graphs. We evaluated the mosGraphGen using both multi-omics datasets of cancer and Alzheimer’s disease (AD) samples. The code of mosGraphGen is open-source and publicly available via GitHub: https://github.com/Multi-OmicGraphBuilder/mosGraphGen

Selenotranscriptome network in Alzheimer’s disease

Article

May 2024

The interplay between selenoproteins, oxidative stress, and cell death pathways holds promise in unravelling novel therapeutic targets for Alzheimer’s disease (AD) in the future. Nonetheless, further comprehensive investigations are warranted to fully comprehend the precise contributions of selenoproteins in the aetiology and potential therapeutic strategies for Alzheimer’s disease. Previous work into gene expression networks in AD has included analysis of the entire transcriptome and, as of yet, has not yielded consistent insight into pathological pathways.1 Despite the comprehensive assessment of the transcriptome enabled by current technologies, one drawback of the whole transcriptome analysis is the risk of overlooking subtle yet significant variations in metabolic pathways.2 Thus, we aimed to assess gene expression of known selenoprotein and selenium-containing pathways in two different brain regions (dorsolateral prefrontal cortex (DPC) and posterior cingulate cortex (PCC)) across the AD spectrum. We used RNA sequencing data from The Rush University’s Religious Orders Study and Memory and Aging Project (ROSMAP) cohort available in the AD Knowledge Portal (https://www.synapse.org/).3 This study included data available for a total of 889 DPC and 647 PCC samples. Four pathological phenotypes were determined based on pathology (CERAD) and clinical (CDR) status: AD ([(+) pathology, (+) clinical], prodromal disease, corresponding to donors that have not received a clinical diagnosis despite the presence of pathological alterations ([(+) pathology, (−) clinical], controls ([(−) pathology, (−) clinical] and non-AD dementia [(+) pathology, (+) clinical]. This last group was excluded from the analysis as it is assumed they may have been misdiagnosed or presented with non-AD dementia. Six selenium or AD-related pathways were assessed, accounting for 421 unique genes. Group comparisons were performed using linear mixed modelling adjusted for age, sex, APOEe4 status and batch via DESeq2 package with Benjamini-Hochberg adjustment for multiple testing. A total of 18 genes significantly differed between AD and controls in both brain areas (same direction in both brain areas; P < 0.05), including eight selenoprotein genes or genes directly associated with selenoprotein synthesis. Fifteen of them were also different (same direction) in PCC (seven selenoprotein/selenoprotein synthesis genes), and four were different in DPC (four selenoprotein/selenoprotein synthesis genes) between AD and prodromal. Only three genes significantly differed between prodromal and control samples (DPC), including the selenoprotein DIO3 and the transcription factor SP3. Our findings indicate a progressive change in gene expression across the different stages of AD. These findings shed light on critical genes involved in selenoprotein synthesis that play a role in AD pathogenesis. Restricting the analysis to a subset of pathways enabled the detection of smaller alterations between groups, which is particularly appropriate in trace element homeostasis, where small alterations may have significant downstream effects.

In vivo validation of late-onset Alzheimer's disease genetic risk factors

Article

Apr 2024
ALZHEIMERS DEMENT

INTRODUCTION Genome‐wide association studies have identified over 70 genetic loci associated with late‐onset Alzheimer's disease (LOAD), but few candidate polymorphisms have been functionally assessed for disease relevance and mechanism of action. METHODS Candidate genetic risk variants were informatically prioritized and individually engineered into a LOAD‐sensitized mouse model that carries the AD risk variants APOE ε4/ε4 and Trem2*R47H. The potential disease relevance of each model was assessed by comparing brain transcriptomes measured with the Nanostring Mouse AD Panel at 4 and 12 months of age with human study cohorts. RESULTS We created new models for 11 coding and loss‐of‐function risk variants. Transcriptomic effects from multiple genetic variants recapitulated a variety of human gene expression patterns observed in LOAD study cohorts. Specific models matched to emerging molecular LOAD subtypes. DISCUSSION These results provide an initial functionalization of 11 candidate risk variants and identify potential preclinical models for testing targeted therapeutics. Highlights A novel approach to validate genetic risk factors for late‐onset AD (LOAD) is presented. LOAD risk variants were knocked in to conserved mouse loci. Variant effects were assayed by transcriptional analysis. Risk variants in Abca7 , Mthfr , Plcg2, and Sorl1 loci modeled molecular signatures of clinical disease. This approach should generate more translationally relevant animal models.

Gene expression, methylation and neuropathology correlations at progressive supranuclear palsy risk loci

Article

Full-text available

Aug 2016
ACTA NEUROPATHOL

To determine the effects of single nucleotide polymorphisms (SNPs) identified in a genome-wide association study of progressive supranuclear palsy (PSP), we tested their association with brain gene expression, CpG methylation and neuropathology. In 175 autopsied PSP subjects, we performed associations between seven PSP risk variants and temporal cortex levels of 20 genes in-cis, within ±100 kb. Methylation measures were collected using reduced representation bisulfite sequencing in 43 PSP brains. To determine whether SNP/expression associations are due to epigenetic modifications, CpG methylation levels of associated genes were tested against relevant variants. Quantitative neuropathology endophenotypes were tested for SNP associations in 422 PSP subjects. Brain levels of LRRC37A4 and ARL17B were associated with rs8070723; MOBP with rs1768208 and both ARL17A and ARL17B with rs242557. Expression associations for LRRC37A4 and MOBP were available in an additional 100 PSP subjects. Meta-analysis revealed highly significant associations for PSP risk alleles of rs8070723 and rs1768208 with higher LRRC37A4 and MOBP brain levels, respectively. Methylation levels of one CpG in the 3' region of ARL17B associated with rs242557 and rs8070723. Additionally, methylation levels of an intronic ARL17A CpG associated with rs242557 and that of an intronic MOBP CpG with rs1768208. MAPT and MOBP region risk alleles also associated with higher levels of neuropathology. Strongest associations were observed for rs242557/coiled bodies and tufted astrocytes; and for rs1768208/coiled bodies and tau threads. These findings suggest that PSP variants at MAPT and MOBP loci may confer PSP risk via influencing gene expression and tau neuropathology. MOBP, LRRC37A4, ARL17A and ARL17B warrant further assessment as candidate PSP risk genes. Our findings have implications for the mechanism of action of variants at some of the top PSP risk loci.

Late-onset Alzheimer disease risk variants mark brain regulatory loci

Article

Full-text available

Aug 2015

Objective: To investigate the top late-onset Alzheimer disease (LOAD) risk loci detected or confirmed by the International Genomics of Alzheimer's Project for association with brain gene expression levels to identify variants that influence Alzheimer disease (AD) risk through gene expression regulation. Methods: Expression levels from the cerebellum (CER) and temporal cortex (TCX) were obtained using Illumina whole-genome cDNA-mediated annealing, selection, extension, and ligation assay (WG-DASL) for ∼400 autopsied patients (∼200 with AD and ∼200 with non-AD pathologies). We tested 12 significant LOAD genome-wide association study (GWAS) index single nucleotide polymorphisms (SNPs) for cis association with levels of 34 genes within ±100 kb. We also evaluated brain levels of 14 LOAD GWAS candidate genes for association with 1,899 cis-SNPs. Significant associations were validated in a subset of TCX samples using next-generation RNA sequencing (RNAseq). Results: We identified strong associations of brain CR1, HLA-DRB1, and PILRB levels with LOAD GWAS index SNPs. We also detected other strong cis-SNPs for LOAD candidate genes MEF2C, ZCWPW1, and SLC24A4. MEF2C and SLC24A4, but not ZCWPW1 cis-SNPs, also associate with LOAD risk, independent of the index SNPs. The TCX expression associations could be validated with RNAseq for CR1, HLA-DRB1, ZCWPW1, and SLC24A4. Conclusions: Our results suggest that some LOAD GWAS variants mark brain regulatory loci, nominate genes under regulation by LOAD risk variants, and annotate these variants for their brain regulatory effects.

SNAPR: A Bioinformatics Pipeline for Efficient and Accurate RNA-Seq Alignment and Analysis

Article

Full-text available

Aug 2015

The process of converting raw RNA sequencing (RNA-seq) data to interpretable results can be circuitous and time-consuming, requiring multiple steps. We present an RNA-seq mapping algorithm that streamlines this process. Our algorithm utilizes a hash table approach to leverage the availability and the power of high memory machines. SNAPR, which can be run on a single library or thousands of libraries, can take compressed or uncompressed FASTQ and BAM files, and output a sorted BAM file, individual read counts, and gene fusions, and can identify exogenous RNA species in a single step. SNAPR also does native Phred score filtering of reads. SNAPR is also well suited for future sequencing platforms that generate longer reads. We show how we can analyze data from hundreds of TCGA samples in a matter of hours while identifying gene fusions and viral events at the same time. With the reference genome and transcriptome undergoing periodic updates and the need for uniform parameters when integrating multiple data sets, there is great need for a streamlined process for RNA-seq analysis. We demonstrate how SNAPR does this efficiently and accurately.

Genetic Studies of Quantitative MCI and AD Phenotypes in ADNI: Progress, Opportunities, and Plans

Article

Full-text available

Jul 2015
ALZHEIMERS DEMENT

Genetic data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) have been crucial in advancing the understanding of Alzheimer's disease (AD) pathophysiology. Here, we provide an update on sample collection, scientific progress and opportunities, conceptual issues, and future plans. Lymphoblastoid cell lines and DNA and RNA samples from blood have been collected and banked, and data and biosamples have been widely disseminated. To date, APOE genotyping, genome-wide association study (GWAS), and whole exome and whole genome sequencing data have been obtained and disseminated. ADNI genetic data have been downloaded thousands of times, and >300 publications have resulted, including reports of large-scale GWAS by consortia to which ADNI contributed. Many of the first applications of quantitative endophenotype association studies used ADNI data, including some of the earliest GWAS and pathway-based studies of biospecimen and imaging biomarkers, as well as memory and other clinical/cognitive variables. Other contributions include some of the first whole exome and whole genome sequencing data sets and reports in healthy controls, mild cognitive impairment, and AD. Numerous genetic susceptibility and protective markers for AD and disease biomarkers have been identified and replicated using ADNI data and have heavily implicated immune, mitochondrial, cell cycle/fate, and other biological processes. Early sequencing studies suggest that rare and structural variants are likely to account for significant additional phenotypic variation. Longitudinal analyses of transcriptomic, proteomic, metabolomic, and epigenomic changes will also further elucidate dynamic processes underlying preclinical and prodromal stages of disease. Integration of this unique collection of multiomics data within a systems biology framework will help to separate truly informative markers of early disease mechanisms and potential novel therapeutic targets from the vast background of less relevant biological processes. Fortunately, a broad swath of the scientific community has accepted this grand challenge. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

Developing Predictive Molecular Maps of Human Disease through Community-based Modeling

Article

Apr 2011

Erratum: Interlaboratory comparison of neuropathology assessments in Alzheimer's disease: A study of the consortium to establish a registry for Alzheimer's disease (CERAD) (Journal of Neuropathology and Experimental Neurology (May 1994) 53:3 (303-15))

Article

Jan 1994

Alzheimer's Disease

Article

Dec 2007

Neuropathological stageing of Alzheimer-related neurofibrillary changes

Article

Jan 1991

Eighty-three brains obtained at autopsy from nondemented and demented individuals were examined for extracellular amyloid deposits and intraneuronal neurofibrillary changes. The distribution pattern and packing density of amyloid deposits turned out to be of limited significance for differentiation of neuropathological stages. Neurofibrillary changes occurred in the form of neuritic plaques, neurofibrillary tangles and neuropil threads. The distribution of neuritic plaques varied widely not only within architectonic units but also from one individual to another. Neurofibrillary tangles and neuropil threads, in contrast, exhibited a characteristic distribution pattern permitting the differentiation of six stages. The first two stages were characterized by an either mild or severe alteration of the transentorhinal layer Pre-alpha (transentorhinal stages I-II). The two forms of limbic stages (stages III-IV) were marked by a conspicuous affection of layer Pre-alpha in both transentorhinal region and proper entorhinal cortex. In addition, there was mild involvement of the first Ammon's horn sector. The hallmark of the two isocortical stages (stages V-VI) was the destruction of virtually all isocortical association areas. The investigation showed that recognition of the six stages required qualitative evaluation of only a few key preparations.

Meta-analysis in more than 74,000 individuals identifies 11 new susceptibility loci for Alzheimer's disease

Article

Jul 2013

Jean-Charles Lambert

Genetic variability in the regulation of gene expression in ten regions of the human brain

Article

Aug 2014

Germ-line genetic control of gene expression occurs via expression quantitative trait loci (eQTLs). We present a large, exon-specific eQTL data set covering ten human brain regions. We found that cis-eQTL signals (within 1 Mb of their target gene) were numerous, and many acted heterogeneously among regions and exons. Co-regulation analysis of shared eQTL signals produced well-defined modules of region-specific co-regulated genes, in contrast to standard coexpression analysis of the same samples. We report cis-eQTL signals for 23.1% of catalogued genome-wide association study hits for adult-onset neurological disorders. The data set is publicly available via public data repositories and via http://www.braineac.org/. Our study increases our understanding of the regulation of gene expression in the human brain and will be of value to others pursuing functional follow-up of disease-associated variants.

Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases

Abstract and Figures

Supplementary resource (1)

Recommended publications

Current trends on the role of Copper on conformational polymorphism of DNA: Relevance to human healt...

Phospholipid and Lipid Derivatives as Potential Neuroprotective Compounds

Synucleins: Keys to the mechanisms of neurodegenerative diseases?

Anatomical, pathophysiological and neurochemical bases of rigid-akinetic syndrome