ArticlePDF Available

“Pscore”: A Novel Percentile-Based Metric to Accurately Assess Individual Deviations in Non-Gaussian Distributions of Quantitative MRI Metrics

January 2024
Journal of Magnetic Resonance Imaging

January 2024

DOI:10.1002/jmri.29248

License
CC BY-NC-ND 4.0

Authors:

Rakibul Hafiz

National Institutes of Health

Mustafa Okan Irfanoglu

National Institutes of Health

Amritha Nayak

National Institutes of Health

Carlo Pierpaoli

National Institutes of Health

Background Quantitative magnetic resonance imaging (MRI) metrics could be used in personalized medicine to assess individuals against normative distributions. Conventional Zscore analysis is inadequate in the presence of non‐Gaussian distributions. Therefore, if quantitative MRI metrics deviate from normality, an alternative is needed. Purpose To confirm non‐Gaussianity of diffusion MRI (dMRI) metrics on a publicly available dataset, and to propose a novel percentile‐based method, “Pscore” to address this issue. Study Type Retrospective cohort. Population Nine hundred and sixty‐one healthy young adults (age: 22–35 years, females: 53%) from the Human Connectome Project. Field Strength/Sequence 3‐T, spin‐echo diffusion echo‐planar imaging, T1‐weighted: MPRAGE. Assessment The dMRI data were preprocessed using the TORTOISE pipeline. Forty‐eight regions of interest (ROIs) from the JHU atlas were redrawn on a study‐specific diffusion tensor (DT) template and average values were computed from various DT and mean apparent propagator (MAP) metrics. For each ROI, percentile ranks across participants were computed to generate “Pscores”—which normalized the difference between the median and a participant's value with the corresponding difference between the median and the 5th/95th percentile values. Statistical Tests ROI‐wise distributions were assessed using log transformations, Zscore, and the “Pscore” methods. The percentages of extreme values above‐95th and below‐5th percentile boundaries (PEV >95 (%), PEV <5 (%)) were also assessed in the overall white matter. Bootstrapping was performed to test the reliability of Pscores in small samples (N = 100) using 100 iterations. Results The dMRI metric distributions were systematically non‐Gaussian, including positively skewed (eg, mean and radial diffusivity) and negatively skewed (eg, fractional and propagator anisotropy) metrics. This resulted in unbalanced tails in Zscore distributions (PEV >95 ≠ 5%, PEV <5 ≠ 5%) whereas “Pscore” distributions were symmetric and balanced (PEV >95 = PEV <5 = 5%); even for small bootstrapped samples (average [SD]). Data Conclusion The inherent skewness observed for dMRI metrics may preclude the use of conventional Zscore analysis. The proposed “Pscore” method may help estimating individual deviations more accurately in skewed normative data, even from small datasets. Level of Evidence 1 Technical Efficacy Stage 1

Content uploaded by Rakibul Hafiz

Content may be subject to copyright.

RESEARCH ARTICLE

“Pscore”: A Novel Percentile-Based Metric

to Accurately Assess Individual Deviations

in Non-Gaussian Distributions of

Quantitative MRI Metrics

Rakibul Haﬁz, PhD,

*M. Okan Irfanoglu, PhD,

Amritha Nayak, ME,

1,2,3

and

Carlo Pierpaoli, MD, PhD

Background: Quantitative magnetic resonance imaging (MRI) metrics could be used in personalized medicine to assess

individuals against normative distributions. Conventional Zscore analysis is inadequate in the presence of non-Gaussian dis-

tributions. Therefore, if quantitative MRI metrics deviate from normality, an alternative is needed.

Purpose: To conﬁrm non-Gaussianity of diffusion MRI (dMRI) metrics on a publicly available dataset, and to propose a

novel percentile-based method, “Pscore”to address this issue.

Study Type: Retrospective cohort.

Population: Nine hundred and sixty-one healthy young adults (age: 22–35 years, females: 53%) from the Human

Connectome Project.

Field Strength/Sequence: 3-T, spin-echo diffusion echo-planar imaging, T1-weighted: MPRAGE.

Assessment: The dMRI data were preprocessed using the TORTOISE pipeline. Forty-eight regions of interest

(ROIs) from the JHU atlas were redrawn on a study-speciﬁc diffusion tensor (DT) template and average values

were computed from various DT and mean apparent propagator (MAP) metrics. For each ROI, percentile ranks

across participants were computed to generate “Pscores”—which normalized the difference between the median

and a participant’s value with the corresponding difference between the median and the 5th/95th percentile

values.

Statistical Tests: ROI-wise distributions were assessed using log transformations, Zscore, and the “Pscore”methods. The per-

centages of extreme values above-95th and below-5th percentile boundaries (PEV

>95

(%), PEV

(%)) were also assessed in the

overall white matter. Bootstrapping was performed to test the reliability of Pscores in small samples (N =100) using

100 iterations.

Results: The dMRI metric distributions were systematically non-Gaussian, including positively skewed (eg, mean and radial

diffusivity) and negatively skewed (eg, fractional and propagator anisotropy) metrics. This resulted in unbalanced tails in

Zscore distributions (PEV

>95

≠5%, PEV

≠5%) whereas “Pscore”distributions were symmetric and balanced

(PEV

>95

=PEV

=5%); even for small bootstrapped samples (average PEV>95 ¼PEV<5¼50%[SD]).

Data Conclusion: The inherent skewness observed for dMRI metrics may preclude the use of conventional Zscore analysis.

The proposed “Pscore”method may help estimating individual deviations more accurately in skewed normative data, even

from small datasets.

Level of Evidence: 1

Technical Efﬁcacy: Stage 1

J. MAGN. RESON. IMAGING 2024.

View this article online at wileyonlinelibrary.com. DOI: 10.1002/jmri.29248

Received Dec 6, 2023, Accepted for publication Jan 9, 2024.

*Address reprint requests to: R.H., Building 13, Room 3W43, Bethesda, MD 20892, USA. E-mail: rakibul.haﬁz@nih.gov

From the

Laboratory on Quantitative Medical Imaging, National Institute of Biomedical Imaging and Bioengineering, Bethesda, Maryland, USA;

Military

Traumatic Brain Injury Initiative (MTBI2—formerly known as the Center for Neuroscience and Regenerative Medicine [CNRM]), Bethesda, Maryland, USA; and

The Henry Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA

Additional supporting information may be found in the online version of this article

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in

any medium, provided the original work is properly cited, the use is non-commercial and no modiﬁcations or adaptations are made.

Neuroimaging studies in clinical research typically rely on

group-level analyses, delineating summary outcomes

that differentiate a patient cohort from a group of healthy

controls. However, in clinical practice, there is a need to

assess individual patients. This is typically done by building a

model to evaluate the individual subject against a normative

sample.

1–3

Quantitative magnetic resonance imaging (MRI)

metrics can provide normative data against which individual

deviations can be assessed. Therefore, once the accuracy and

reliability are veriﬁed, quantitative MRI metrics could

become highly relevant for clinical assessment of individual

patients.

Individual assessments against a normative distribution

are often performed using Zscores in clinical and neuroimag-

ing paradigms.

2–4

However, a thorough assessment of the dis-

tribution that generates the normative dataset itself is rarely

performed. An example is the Gaussianity and homogeneity

of variance, a fundamental assumption for parametric models.

Deviation from these assumptions can result in biased statisti-

cal inferences, eg, in neuropsychological test scores.

Quantile

regressions have been adopted to mitigate this by generating

normative distributions at speciﬁc percentiles.

These can

offer a solution based on distributions at selective percentile/

quantile ranks and might avoid the use of conventional mean

regression models that would be biased for asymmetric

distributions.

The problem is pervasive even for quantitative MRI

metrics. In a pilot study (N =48), we have recently shown

prominent deviation from normality and heavy-tailed distri-

butions of several diffusion tensor (DT) and mean apparent

propagator (MAP) metrics.

7–10

For example, mean diffusivity

(MD) had a positively skewed distribution, while fractional

anisotropy (FA) and propagator anisotropy (PA) showed a

negatively skewed distribution.

In our small sample, these

non-Gaussian features were inherently related to the diffusion

characteristics of water in the brain, and not originating from

heterogeneity in the underlying population as it has been pre-

viously reported in large scale-public datasets.

1,3

For example,

for the UK-Biobank data, Fraza et al ﬁt a warped Gaussian

model to compensate for the skewness and kurtosis in the

normative data before using Zscores to assess heterogeneous

individuals.

1,11

Another approach popularly used to address

skewness is log transformation, but it is known to worsen the

issue of skewness and can lead to inaccurate and biased infer-

ences.

12,13

Our initial assessment showed that even when

comparing individuals within the normative sample, Zscores

may show an imbalance in extreme values at the tails: with

more negative extreme values for negatively skewed distribu-

tions and, on the other hand, more positive extreme values

for positively skewed distributions.

Therefore, comparing

patients against such distributions using Zscores might

wrongly place some of them in the extremities of the distribu-

tion and could introduce false-positive ﬁndings.

In this study, we aimed to test if the non-Gaussianity in

diffusion MRI (dMRI) metrics can be reproduced in a large

sample, comprising a homogeneous group of healthy young

participants; and assess whether a percentile-based “Pscore”

could accurately estimate an individual’s deviation from the

central tendencies of a normative distribution; and compare

its accuracy against two popular normalization methods:

“Zscore”and “Log”transformation.

Materials and Methods

Participants

We used the Human Connecoctome Project Young Adult (HCP-

YA) cohort from the S1200 series, which has at least 1113 partici-

pants with 3-T MRI data available from a pool of 1200 young adults

(age range: 22–35 years); who were all recruited with written

informed consent following the guidelines of the institutional review

board (IRB).

The data include structural MRI, functional (fMRI),

and dMRI data along with behavioral and genetic testing. For this

study, our primary interest was the dMRI data. About 100 (out of

1200) participants either did not have dMRI data or the parameter

information on the dMRI acquisition was incomplete. From the

remaining pool of 1100 subjects, 129 participants were excluded

because the distortions in their subject-space dMRI data were

beyond the scope of correction based on visual inspection. Ten out

of the remaining 971 participants were “36+”years of age, and they

were excluded to keep the sample more homogeneous, within the

22–35 years range. Thus, the effective sample included neuroimag-

ing data from 961 participants (53% females). All participants were

scanned on the same equipment, using the same protocol.

Diffusion MRI Protocol

The Connectome 3 T Skyra scanner(SiemensHealthineers,

Erlangen, Germany) was used to acquire dMRI data across six runs

in a full session. Each run was approximately 9 minutes and

50 seconds long, with three different gradient tables, and each

table was acquired with two phase-encoding directions—right-to-

left and left-to-right. Each gradient table consisted of six

b=0 s/mm

acquisitions and approximately 90 diffusion weighting

directions. The diffusion weighting was done across three b-shells—

1000, 2000, and 3000 s/mm

, each with an equal number of acquisi-

tions per run. The imaging parameters were as follows: spin-echo

echo-planar imaging (EPI) sequence with repetition time (TR)

=5520 msec, echo time (TE) =89.5 msec, ﬂip angle =78,

refocusing ﬂip angle =160,ﬁeld of view (FOV) =210 180,

matrix =168 144, slice thickness =1.25 mm, 111 slices acquired

at 1.25 mm isotropic resolution, a multiband factor of 3, and echo

spacing =0.78 msec. The T1-weighted imaging sequence included—

3D MPRAGE images acquired over a period of 7 minutes and

40 seconds at a TR =2400 msec, TE =2.14 msec, inversion time

(TI) =1000 msec, ﬂip angle =8and FOV =224 224 at an

isotropic voxelwise resolution of 0.7 mm.

Preprocessing

We used the TORTOISEV3 (version 3, www.tortoisedti.org) pipe-

line to process the dMRI data, because Irfanoglu et al had shown

considerable improvement in the dMRI metrics using this pipeline

Journal of Magnetic Resonance Imaging

15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

compared to the released version of the HCP dataset.

15,16

In the fol-

lowing, we brieﬂy describe the different stages of pre-processing: 1)

Denoising was performed using a model-free noise mapping technique

proposed by Veraart et al, with a kernel radius of 3.

2) Gibbs-ringing

correction was performed using the subvoxel-shift method, which

showed improvements even without introducing additional imperfec-

tions.

For 3) Inter-volume motion and Eddy current correction,weﬁrst

applied a MAP-based model, as it is independent of shelled data and

accurately extrapolates the unseen q-vector signals.

16,19

All diffusion-

weighted images (DWIs) were aligned to the ideal b=0 s/mm

image. As a ﬁnal step, step 3) was run once more, but in this instance,

the synthesized and the slice-transformed real images were used. These

steps were iterated over until convergence was reached. 4) Susceptibil-

ity-induced distortions were considered as well, given that acquiring data

at high resolution (like for the HCP) can cause severe EPI distor-

tions.

14,20

We applied the DRBUDDI approach, which is a blip-up

and blip-down distortion correction technique with excellent perfor-

mance.

20,21

Besides using the b=0s/mm

image, DRBUDDI also

incorporates DTs and relies on an undistorted T2-weighted (T2W)

structural image.

However, the T2W images from the HCP data

were not compatible with DRBUDDI; therefore, we used a

machine learning-based technique called SynB0-DisCo to gener-

ate a structural image that ﬁt well into the DRBUDDI para-

digm.

5) Gradient nonlinearity correction was used as the HCP

data come with “gradwarped”DWIs and gradient-deviation ten-

sor images.

Effects of inter-volume motion were not considered

when a single gradient-deviation tensor was used for all DWIs.

Therefore, we also computed the voxelwise B matrices, to take

these effects into consideration.

6) Signal drift correction,given

that the scan time was long, signal drift was observed in the HCP

data, and therefore had to be corrected using a method proposed

earlier.

7) Normalization and template generation was performed

on the processed data at both the HCP isotropic resolution of

1.25 mm and at the 1 mm resolution of the processed

T1-weighted (T1W) image. A DT-based registration was applied,

and an atlas was generated using the 1 mm resolution data; and

the DWIs were warped on to the template space using non-linear

transformation.

25,26

Diffusion MRI Metrics

We generated voxelwise maps for four DT metrics—FA, MD, axial

diffusivity (AD), and radial diffusivity (RD).

We also generated

voxelwise maps for ﬁve MAP metrics—PA, return to axis probability

(RTAP), return to origin probability (RTOP), return to plane proba-

bility (RTPP), and non-Gaussianity (NG).

Each of these metrics

carries useful quantitative information about the various diffusion

behaviors of water in the brain.

Quality Control Assessment

Quality control assessments are vital for neuroimaging applica-

tions, especially those that involve complex interpolations and

multiple preprocessing steps. We took several steps in assessing

each dMRI metric map for each subject. All dMRI maps regis-

tered to the study template were ﬁrst visually inspected. The maps

from 961 subjects were checked for misregistration and abnormal

warping by generating an in-house custom-built script using mod-

ules from SPM12 (update revision number: 7771, http://www.ﬁl.

ion.ucl.ac.uk/spm/) within a MATLAB environment (version:

R2022b, MathWorks Inc., Massachusetts, USA). The script gen-

erates contour lines of all maps over the corresponding subject’s

T2W image in the template space. Subjects that failed the quality

control assessment were checked again and appropriate steps were

taken to correct the registration issues. The datasets that could

not be salvaged despite this extensive correction pipeline were

removed from further analysis, leaving an effective sample of

960 and 912 participants fortheDTandMAPmetrics,

respectively.

Regions of Interest

To reduce the number of tests and focus on speciﬁcwhitematter

(WM) regions, we used a set of regions of interest (ROIs) for the

currentstudy.TheROIswereinspiredbytheJohnHopkins

University (JHU) WM ROIs; however, they were manually red-

rawn (by A.N., over 13 years of experience in the ﬁeld of neuro-

anatomy and neuroimaging) on an average DT brain template

built from the HCP dataset to avoid issues with left/right struc-

tural asymmetry that have been reported for the original JHU

ROIs.

19,25,28–30

Moreover, the JHU ROIs were deﬁned in a sca-

lar map and used scalar-based registration, which tend to have

misregistrations.

We used a tensor-based registration, which

provides better alignment.

The JHU ROIs are quite large and

therefore, careful steps were taken when the ROIs were redrawn

to ensure they “were within”the tracts/structures to reduce par-

tial volume effects. The ROI labels were created and the average

DT and MAP metrics were computed for each subject across all

ROIs using ITK-SNAP (version 3.6.0, www.itksnap.org).

Figure 1shows the ROIs in a detailed montage across all three

orthogonal planes for better visualization and assessment.

Table S1 in the Supplemental Material provides more details on

these ROIs.

The Pscore Method

The ﬁrst step in computing Pscores was to generate percentile ranks

for each participant. They were computed within each ROI using

the following formula:

pij ¼nNS ≤xij

nNS

100; i¼1,2,3…N,j¼1,2, 3, …M,ð1Þ

where xij represents the average dMRI metric value of the ith indi-

vidual for the jth ROI, and irepresents 1 …N(960) participants,

and jrepresents 1 …M(48) ROIs, respectively. Furthermore,

nNS ≤xij represents the number of participants within the normative

sample having a value ≤x

. The denominator nNS represents the total

number of participants in the normative sample.

After the percentile computation, each participant’s position

in the ROI-wise distribution was considered to assess which side of

the tail they were represented in. This was identiﬁed by the differ-

ence between the participant’s metric value and the median of the

distribution. This difference was then normalized with either the

difference between the median and the 5th or the 95th percentile

edge value, depending on the participant’s position. The following

equations depict how a Pscore was computed on either side of the

median:

Haﬁz et al.: Assessing Individual Deviations Using “Pscore”

FIGURE 1: The regions of interest (ROIs) used in the current study. A 2 5 montage was created in all orthogonal planes to show the

spatial extent of the ROIs overlaid on the average connectome diffusion tensor (FA) template. The 48 ROIs are shown using a colormap

with a range of 64 colors. The slice labels with the correct direction are provided on the top-left of each image in the montage. The

“A,”“P,”“I,”“S,”“L,”and “R,”labels represent the anterior, posterior, inferior, superior, left, and right directions, respectively.

Journal of Magnetic Resonance Imaging

P5ij ¼ðÞ1:645 dxij

;

dxij ¼xij mjjxij <mj

d5¼mjxj5jxij <mj

ð2Þ

P95ij ¼þðÞ1:645 dxij

d95

;

dxij ¼xij mjjxij >mj

d95 ¼xj95 mjjxij >mj

ð3Þ

where dxij is the difference between xij and the median of the distri-

bution, mjfor the jth ROI. If the metric value of the ith participant,

xij <mj, then dxij < 0, which means that the participant was located

on the left-hand tail between mjand the 5th percentile edge value,

xj5of the distribution. The corresponding denominator d5was com-

puted as the difference between mjand xj5. On the other hand,

dxij > 0, when xij >mj. This indicated that the participant’s position

was on the right-hand tail between mjand the 95th percentile edge

value, xj95 and the denominator d95 was computed as the difference

between xj95 and mj. Both P5ij and P95ij takes on the polarity of dxij ,

generating the negative and positive scores, respectively. The ratios

of these differences were then scaled by j1.645jrepresenting the

Zscore value corresponding to the 5th and 95th percentiles of a nor-

mal distribution. This was done to bring the Pscores to the scale of

Zscores and make them comparable.

Statistical Analysis

We analyzed the data ROI-wise in R (R Core Team (2023); Ver-

sion: 4.3.2; R: A Language and Environment for Statistical Comput-

ing; R Foundation for Statistical Computing, Vienna, Austria;

https://www.R-project.org/) using three normalization techniques—

log transformation, standardized Zscore, and the “Pscore”method

proposed in this study. Histogram distributions of the raw data were

used as a reference to assess the distributions from these three

methods. For each histogram plot, the mean and median lines were

added to assess deviations from the mode and misalignment of these

central tendency measures. A normal density curve was ﬁt on top of

the histograms to assess which normalization method was closer to a

Gaussian distribution. This led to four ﬁgures per ROI for each

dMRI metric, generating 1728 ﬁgures (4 ﬁgures 48 ROIs 9

dMRI metrics).

To summarize and simplify, we took the Zscore and Pscore

values from the entire sample across all ROIs and decomposed them

into two single column vectors. For example, for a DT metric such

as FA, each ROI had Zscores from 960 healthy individuals. This

would create a vector of 46,080 Zscores (960 Zscores 48 ROIs),

and similar observations were true for Pscores. For the MAP metrics,

since data from 912 individuals survived the quality control step, the

vector had 43,776 Zscores (912 Zscores 48 ROIs). These vectors

were then used to generate two distribution plots—one for Zscores

and the other for Pscores. The process was repeated for each DT

and MAP metric. The log transformation method was not included

in this step, as we already mentioned the issues with this approach,

and it performed very poorly at the ROI level (Fig. 2, Figs. S1 and

S2 in the Supplemental Material). Additionally, log-transformed

values were not on the same scale as Zscores and Pscores, making

them incompatible for this comparison. Since the Pscores were in

the same scale as Zscores, they can be compared and assessed

together. This was to showcase which of these two normalization

techniques showed a statistical imbalance in the data spread across

the entire WM and over the entire sample.

More importantly, for each metric, this step also helped to

particularly quantify the imbalance in the number and percentage of

extreme values present in the tails of these distributions. A normal

Z-distribution would have 5% of extreme values above the 95th per-

centile (Z=1.645) and below the 5th percentile (Z=1.645)

boundaries. We quantiﬁed and tabulated the number and percentage

of these extreme values for Zscores and Pscores for all dMRI metrics.

In the presence of a non-Gaussian distribution, the balance of 5%

would be altered in the tails. This helped showcase which of these

normalization methods maintained a systematic imbalance of

extreme values that may lead to inaccurate assessment of individuals

whose values were in the tails of the distribution.

To test how Pscores performed on smaller samples, we per-

formed bootstrapping on the pool of HCP participants (N =960).

We ran 100 iterations, each time randomly selecting 100 participants

and repeating the Zscore vs. Pscore comparison at the overall WM

level. We also performed 20 iterations, independently on the dMRI

metric that showed the strongest imbalance in extreme values across

all WM ROIs. This was done for brevity and showcasing the reli-

ability of the Pscore approach on the metric least expected to con-

form to Gaussianity.

To highlight the level of the imbalance in extreme values in

Zscores compared to Pscores, we generated heatmaps, similar to

the ones shown in our pilot study.

It is an intuitive way to assess

individuals through a visual representation of the level and direc-

tion in which the extreme values tend to increase. The heatmaps

were generated across all ROIs for the entire population, as well as

for a single iteration of the random sampling consisting of 100 par-

ticipants to reduce the complexity of showcasing 92,160 observa-

tions (46,080 Zscores +46,080 Pscores). Since the pattern was

consistent even in small samples, it can be shown with much better

clarity for 100 participants with 9600 observations (4800 Zscores

+4800 Pscores).

Results

ROI-Wise Comparison of Distributions Across

Different Methods

Figure 2shows some examples of distributions from the

normalization techniques tested, within the body of the cor-

pus callosum (BCC). Distributions from only four dMRI

metrics are shown for one ROI; however, assessments were

made across each ROI per dMRI metric. The FA and PA

values showed a negative skew, while MD and RTPP

showed a positive skew. On the contrary, for all Pscore dis-

tributions, the mean, mode, and median appeared well

aligned and they tended to ﬁt closely to the normal density

curve. Therefore, compared to the other normalization

methods tested, and particularly the standardized Zscore

approach, Pscores provided a more symmetric distribution

per ROI. The Supplemental Material provides more exam-

ples from other dMRI metrics across other ROIs (Figs. S1

and S2).

Haﬁz et al.: Assessing Individual Deviations Using “Pscore”

FIGURE 2: Comparing distributions across different normalization methods for two DT and two MAP metrics from a representative ROI: body of the corpus callosum. The DT metrics

are shown in the top row: FA (left) and MD (right); and the MAP metrics are on the bottom row: PA (left) and RTPP (right). For each metric, four distributions are shown. The “Raw”

(light gray) distribution panel represents the average metric values. The “Log”(dark gray), “Zscore”(light red), and “Pscore”(light blue) distribution panels represent the logarithmic,

standardized Zvalues and the proposed Pscores, respectively. The “raw”(light gray) distributions demonstrated the presence of skew in the dMRI metrics which caused the mean,

mode, and median to misalign. For instance, FA (top left) and PA (bottom left) were negatively skewed; however, PA was more heavy-tailed and showed greater skewness. The mean

(dashed green) appeared separated from the median (dashed red) and mode (tallest bar). Comparing the histograms with a ﬁtted normal density curve (dashed purple) also illustrated

the deviation from a Gaussian distribution. These patterns were also clearly visible for the log transformed (dark gray) and “Zscore”(light red) distributions at varying levels. For both

FA and PA, the “mean”underestimated the most common values and appeared before the median and the mode of the distribution for “Raw,”“Log,”and “Zscore”panels. However,

the “Pscore”panel shows that all three central tendencies coincided well and attained a closer ﬁt to the normal density curve (dashed purple). On the other hand, MD (top right) and

RTPP (bottom right) were positively skewed and the “mean”overestimated the most common values for “Raw,”“Log,”and “Zscore”panels; but the “Pscore”panel shows the

consistent alignment of the mean, mode, and median and a closer ﬁt to the normal density curve.

Journal of Magnetic Resonance Imaging

Zscore vs. Pscore Distributions Across All ROIs

Figure 3shows the Zscore and Pscore distributions for each

dMRI metric, generated from the data across all WM ROIs

comprising the overall WM. In general, Zscores showed an

overall imbalance of positive and negative values for all met-

rics (less prominent for AD). Zscore distributions for FA, PA,

and NG showed an overall negative skew with <50% negative

and >50% positive values. This pattern was observed to be

progressively worsening for NG with 54% and for PA with

59% positive values, respectively. However, Pscores from all

the three metrics maintained a balanced number of 50% neg-

ative and positive values. The rest of the DT and MAP met-

rics showed the opposite trend for Zscores, with >50% of

total observations being negative. For example, both MD and

RTAP had 51% of all the observations as negative values.

Therefore, MD had at least 460 (1% of 46,000) and RTAP

had at least 437 (1% of 43,776) more negative Zscores than

positive Zscores in their respective distributions. It was worse

for RD, RTOP, and RTPP, because each of their distribu-

tions had 52% negative values, indicating twice as many neg-

ative values than MD and RTAP. All Pscore distributions, on

the other hand, maintained a balanced distribution of 50%

positive and negative values for every single dMRI metric.

Assessing the Imbalance in Extreme Values at

the Tails

Table 1shows the imbalance of extreme values at the tail

ends of the Zscore distributions and the correct balance of

extreme values from the Pscore distributions. For each dMRI

metric, it shows the percentage of extreme values above the

95th (PEV

>95

(%), Z> 1.645) and below the 5th (PEV

(%),

Z<1.645) percentile boundaries for both these normaliza-

tion methods. Except for AD, where the imbalance was negli-

gible but still <5%, Zscores from every other dMRI metric

showed an imbalance of extreme values in the left or right

tails. For example, FA, NG, and PA had progressively larger

imbalance between negative and positive values leading to

negative skews (Fig. 3), and as a result, a subsequent increase

in the percentage of negative extreme values. Compared to

“PEV

>95

(%),”the “PEV

(%)”values in these three metrics

were higher by approximately 1.2 (4.9/4.1), 1.9 (5.4/2.8),

and 5.8 (5.8/1) times, respectively. On the other hand, MD,

RD, RTAP, RTOP, and RTAP showed positive skews for

Zscores (Fig. 3), leading to an increase in positive extreme

values. All “PEV

>95

(%)”values for these metrics were >5%,

while all “PEV

(%)”values were <5%. Contrarily, for the

same dMRI metrics, the “PEV

(%)”and “PEV

>95

(%)”

values for Pscores maintained a balanced 5% extreme values

in both tails.

Bootstrapping the HCP Sample

Figure 4shows a heatmap of PA generated from one iteration

of a random sampling of 100 HCP participants. The heatmap

demonstrates any systematic increase in extreme values in

individuals (columns) across multiple ROIs. It also highlights

any increase in extreme values present in an ROI (rows)

across the normative sample. The heatmap of Zscores (top)

shows a large number of negative extreme values and very few

positive extreme values, whereas the Pscore heatmap (bottom)

shows a proper balance of positive and negative extreme

values, as expected of a normative sample. For simplicity, we

showcase a heatmap comprising only 100 participants.

Table 2shows similar quantities as Table 1, except for

PA, with 20 iterations of random sampling 100 HCP partici-

pants. The rationale to highlight PA was that it showed the

highest imbalance in extreme values at the tails, among all

dMRI metrics (Fig. 3, Table 1). For all 20 iterations, Zscores

of PA showed large proportions of negative extreme values

(all “PEV

(%)”> 5% and all “PEV

>95

(%)”< 2%) with on

average >4.3 (6.1/1.4) times more negative extreme values

than positives. Contrarily, Pscores robustly maintained 5%

extreme values in both tails for all 20 iterations. Table S2 in

the Supplemental Material provides a summary of the

bootstrapping assessment of all dMRI metrics across

100 iterations.

Discussion

Using the large-scale, high-resolution HCP dataset, we

showed that the distributions of DT and MAP metrics

derived from dMRI, tend to be non-Gaussian. We also

showed that this may lead to an imbalance in the extreme

values at the tails of a normative distribution, when Zscores

are used. We proposed a novel percentile-based metric, the

“Pscore,”which was less sensitive to these non-Gaussian dis-

tributions, both at the ROI level and for overall WM. We

further documented the robustness of this method in smaller

samples using bootstrapping. We replicated our previous ﬁnd-

ings from a pilot study and systematically validated this

method using the HCP-YA cohort.

Our assessment on the metric distributions showed that

diffusivity (eg, MD, RD) tended to be positively skewed,

whereas anisotropy (eg, FA, PA) tended to be negatively

skewed. It is difﬁcult to unequivocally identify a primary

source for these ﬁndings. However, given the narrow age

range of the examined population, it is unlikely to be due to

aging-related effects that have been invoked for large-scale

public datasets.

1,3,31

A possible explanation for these skewed

distributions could be related to the presence of a small per-

centage of fast diffusing water molecules with isotropic diffu-

sion behavior that have been reported in healthy brain

parenchyma.

This fast diffusing water compartment has the

same diffusion signature as cerebrospinal ﬂuid (CSF) partial

volume contamination.

Moreover, CSF contamination

within a ROI can lead to higher MD (positive skew) because

the diffusivity of CSF is at least four times higher than that of

Haﬁz et al.: Assessing Individual Deviations Using “Pscore”

FIGURE 3: Comparing distributions of Zscores against Pscores across all WM ROIs per dMRI metric. For each dMRI metric, a pair of distributions is shown—onewithZscore(top,lightred)

histograms and one for Pscore (bottom, light blue) histograms. The values in these distributions came from concatenating all the scores across all ROIs into a single vector. It was done to

show if there was an overall imbalance in the distribution from the entire WM. A normal density curve (solid red and blue) was also ﬁt on top of each distribution, respectively, to assess

which normalization method attained a closer ﬁt to a Gaussian distribution. Except for AD, where the Zscore distribution was approximately Gaussian, an imbalance of negative and

positive scores was observed for the distributions from all other dMRI metrics. The percentage of values above/below the zero line is provided on either side of each distribution plot. An

imbalance was indicated by a “>50%”and “<50%”valueoneithersideofthezeroline.Anincreaseabove50%isindicated in bold red and a decrease below 50% is indicated in bold blue

numbers. The Zscore distribution of FA (top left, light red), for instance, showed an overall imbalance with 51% of the total observations above 0 and 49% below it. This means there were

at least 460 (1% of 46,080) more positive Zscores than negative Zscores in the distribution. The mean (green vertical dashed line) and the median (red vertical dashed line) also appeared

misaligned. This pattern was also observed for PA and NG distributions. On the contrary, Zscores from MD, RD, RTAP, RTOP, and RTPP showed an overall positive skew, with more

negative (>50%) and less positive (<50%) values around the zero line. Pscores, however, consistently maintained an equal distribution of 50% negative and positive values on either side.

The mean and median lines aligned, and the histograms attained a closer ﬁt to the normal density curve (solid blue) compared to the Zscores (solid red).

Journal of Magnetic Resonance Imaging

the brain parenchyma.

On the other hand, the anisotropy

of CSF is virtually 0, which would cause the anisotropy of

the WM tracts to be lower (negative skew). These consider-

ations suggest that skewed values may be inherently driven by

underlying biological characteristics of the brain and not be

necessarily related to heterogeneity in the demographics. One

of our rationales for choosing the HCP young-adult sample

was its well-balanced homogeneity in demographics (eg, age

and sex).

Our ﬁndings may be of particular interest to investigators

assessing individual deviations against a skewed normative data-

base. Normative models are typically built from very large sam-

ples and often incorporate Gaussian process models.

1,3

This

can be counterintuitive as scaling becomes a big challenge, as

sample sizes get much bigger.

Some methods have been pro-

posed based on machine learning to address this issue, but with

some caveats of pre-tuning and modeling complexity.

33–35

Indeed, one can simply assess a quantity, eg, an individual’s

height, by comparing it against the percentiles generated from

asufﬁciently large population. However, aside from other con-

founds, neuroimaging studies are often limited by the sample

size (median N =23, according to reference 36) and typically

consist of <100 participants. In a healthy cohort of 48 controls,

we had previously shown that normative data generated from

dMRI metrics deviate from normality and suffer the conse-

quences of having unbalanced tail ends when Zscores are

used.

The study also underscored a key advantage of the

“Pscore”approach—its ability to reliably estimate individual

deviations in small samples.

Therefore, Pscores offer a practi-

cal solution to studies with more realistic sample goals (eg,

N=50–150). It can help investigators to accurately assess

individuals by building site-speciﬁc normative databases from

neuroimaging data that do not conform to Gaussianity.

Deviation from Gaussianity leads to biases in the

peripheral centiles of a normative distribution and inaccurate

inferences.

Zscores are commonly used for such inferences,

TABLE 1. Assessing the Extreme Value Imbalance in Zscores Compared to Pscores

Metric Method NEV

>95

NEV

Total

PEV

>95

(%) PEV

(%)

FA Zscore 1873 2255 46,080 4.1 4.9

Pscore 2304 2304 46,080 5.0 5.0

MD Zscore 2383 2020 46,080 5.2 4.4

Pscore 2304 2303 46,080 5.0 5.0

AD Zscore 2117 2183 46,080 4.6 4.7

Pscore 2305 2304 46,080 5.0 5.0

RD Zscore 2423 1907 46,080 5.3 4.1

Pscore 2304 2304 46,080 5.0 5.0

PA Zscore 450 2527 43,776 1.0 5.8

Pscore 2208 2208 43,776 5.0 5.0

RTAP Zscore 2255 2030 43,776 5.2 4.6

Pscore 2208 2208 43,776 5.0 5.0

RTOP Zscore 2348 1831 43,776 5.4 4.2

Pscore 2208 2208 43,776 5.0 5.0

RTPP Zscore 2419 1753 43,776 5.5 4.0

Pscore 2208 2208 43,776 5.0 5.0

NG Zscore 1244 2364 43,776 2.8 5.4

Pscore 2208 2208 43,776 5.0 5.0

The NEV

>95

and PEV

>95

(%) show the number and percentage of extreme values, respectively, above the 95th percentile edge value of

Z=1.645 from a normal distribution. On the other hand, the NEV

(%) and PEV

(%) show the number and percentage of extreme

values below the 5th percentile edge at Z=1.645. N

Total

is the total number of values across all ROIs in the entire sample.

FA =fractional anisotropy; MD =mean diffusivity; AD =axial diffusivity; RD =radial diffusivity; PA =propagator anisotropy;

RTAP =return to axis probability; RTOP =return of origin probability; RTPP =return to plane probability; NG =non-Gaussianity.

Haﬁz et al.: Assessing Individual Deviations Using “Pscore”

FIGURE 4: Heatmap comparing Zscores and Pscores from a single iteration of randomly selected sample of 100 participants for the

propagator anisotropy. The horizontal black line separates the Zscores (top) from the Pscores (bottom). The columns represent

individual participants, and the rows represent the regions of interest (ROIs). The x-axis shows the labels of the 100 participants

randomly selected from the HCP. The range of Pscores is shown in the colorbar. The parenthesis “(”in a colorbar tile means that the

corresponding shade is exclusive of the value next to it, whereas the bracket “]”means the value is inclusive. For example, the

“[1.65,1)”label on top of the lightest shade of red means it represents values in the range “1<Z≤1.65”and so on for others.

Darker shades of blue and red represented more extreme negative and positive values, respectively. An individual with a dark blue

tile with the range label (3, ∞) would correspond to Z<3 from a normal distribution and vice versa for a dark red shade. The

large number of negative extreme values in Zscores (top) was quite conspicuous and a balanced positive and negative extreme

values in the Pscores (bottom) was evident just from visual inspection.

Journal of Magnetic Resonance Imaging

TABLE 2. Comparing the Extreme Value Imbalance in Zscores vs. Pscores of Propagator Anisotropy for 20

Iterations of Bootstrapping 100 HCP Participants

Metric Iteration Method NEV

>95

NEV

Total

PEV

>95

(%) PEV

(%)

Propagator anisotropy 1 Zscore 56 297 4800 1.2 6.2

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 2 Zscore 80 295 4800 1.7 6.1

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 3 Zscore 60 294 4800 1.3 6.1

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 4 Zscore 83 258 4800 1.7 5.4

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 5 Zscore 78 267 4800 1.6 5.6

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 6 Zscore 63 291 4800 1.3 6.1

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 7 Zscore 70 295 4800 1.5 6.1

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 8 Zscore 80 276 4800 1.7 5.8

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 9 Zscore 75 318 4800 1.6 6.6

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 10 Zscore 55 320 4800 1.1 6.7

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 11 Zscore 57 281 4800 1.2 5.9

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 12 Zscore 60 285 4800 1.3 5.9

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 13 Zscore 49 303 4800 1.0 6.3

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 14 Zscore 80 279 4800 1.7 5.8

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 15 Zscore 74 305 4800 1.5 6.4

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 16 Zscore 70 293 4800 1.5 6.1

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 17 Zscore 81 290 4800 1.7 6.0

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 18 Zscore 69 282 4800 1.4 5.9

Haﬁz et al.: Assessing Individual Deviations Using “Pscore”

and we have established the issues with extreme values that

manifest if the issue of Gaussianity is not addressed. There

are approaches that have been adopted to overcome non-

Gaussian traits due to sample heterogeneity, implementing

various modes of Bayesian linear regressions.

1,37

The “Pscore”

approach addresses the issue of Gaussianity at the level of the

data distribution itself. By leveraging the median as reference

and using percentile ranks, it avoids the imbalance issue at

the tails arising from the asymmetry in the distribution.

Therefore, it can be an important addition to the various

arsenal of contemporary normalization techniques.

Pscores

provide a simple platform to normalize raw data and scale

them to standardized Zscores derived from a normal distribu-

tion. This is consequential because conforming to normally

distributed Zscores facilitates statistical relevance to inferences

made on individuals. It allows a meaningful interpretation of

an individual’s position, especially when considering the

extreme ends of a distribution.

When an individual, such as a patient, is expected to be

at the extremities of a normative distribution, it can be consid-

ered as a rare event, as most individuals are not expected to be

positioned there. Extreme value theorem and its subtypes can

prove useful in such cases by ﬁtting asymmetric curves on

extreme value distributions derived from such rare events.

This has been applied in neuroimaging applications to assess

individuals from highly heterogeneous clinical cohorts using

normative modeling.

1,3

The process involves generating norma-

tive probability maps (NPMs) and computing a Zscore per

brain region/voxel that normalizes the difference between the

true normative and predicted individual response with their

corresponding variances. These Zscores are then usually used

to run univariate statistical tests with multiple comparison

adjustments. Our expectation is that Pscores can prove very

useful in these applications and provide more accurate estima-

tions for extreme events compared to Zscores.

Limitations

The Pscore computation relies on percentile ranks of every par-

ticipant in the normative sample. Therefore, the sample size

may be a limiting factor on the conﬁdence and precision of the

percentile edges. For example, in our pilot analysis of

48 controls,

the smallest resolution of each percentile was

2.083 or 2% (100/48). Since the 5th and 95th percentile

edges were used, there was some statistical uncertainty to the

exact values representing these percentiles. However, the Pscores

still maintained a balanced number of extreme values in the two

tails. This issue obviously recedes as sample sizes get bigger, as

we see in the current context (N =961), where the percentile

resolutions were much ﬁner (0.1%) and allowed very precise

edge measurements. Furthermore, the limitation can be miti-

gated even with smaller samples if precise and even percentile

edge measurements can be performed. Regarding the boot-

strapping step for instance, the percentile resolutions were

exactly even at 1% (100/100) and Pscores robustly maintained

5% extreme values over 100 iterations for all dMRI metrics.

Another limitation for Pscores is the selection of the per-

centile edges. We used the 5th and 95th percentile edges

because they are statistically relevant and allow at least 10% of

extreme values in the whole distribution. Choice of a more

extreme percentile edge may affect the normalization accuracy.

An appropriate alternative could be to look at the area under

the curve, instead of point values at percentile edges. That way

the computation will be done on a continuous scale, and it also

preserves the feature of Pscores to address the issue at the level

of the distribution, without making any prior assumption of

Gaussianity. This is part of our future project using various

samples of multimodal neuroimaging data.

Conclusion

Although the “Pscore”approach was tested on dMRI metrics,

the method itself is not data selective. The “Pscore”method

TABLE 2. Continued

Metric Iteration Method NEV

>95

NEV

Total

PEV

>95

(%) PEV

(%)

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 19 Zscore 72 294 4800 1.5 6.1

Pscore 240 240 4800 5.0 5.0

Propagator anisotropy 20 Zscore 75 303 4800 1.6 6.3

Pscore 240 240 4800 5.0 5.0

The NEV

>95

and PEV

>95

(%) show the number and percentage of extreme values, respectively, above the 95th percentile edge value of

Z=1.645 from a normal distribution. On the other hand, the NEV

(%) and PEV

(%) show the number and percentage of extreme

values below the 5th percentile edge at Z=1.645. N

Total

is the total number of values across all ROIs in the entire sample, i.e., 48

ROIs 100 participants =4800 values. For Zscores, the mean PEV <5 %ðÞ¼6:10:3%and that of PEV >95 %ðÞ¼1:40:2%. For

Pscores, the PEV <5 %ðÞ¼PEV >95 %ðÞ¼5:00%.

Journal of Magnetic Resonance Imaging

instead looks at the data by virtue of its distribution. Pscores

can control the imbalance arising from the non-Gaussianity in

the distribution and more accurately estimate individual devia-

tions from a normative database. The robustness of Pscores

observed for small samples in this study indicates its potential

application to assess individuals in studies with smaller data-

bases. The application of this method on data from clinical

and other modalities, such as structural volumetric and func-

tional MRI, may warrant future investigation because metrics

from these paradigms also suffer from small sample limitations

and non-Gaussian distributions.

Acknowledgments

Data were provided (in part) by the Human Connectome

Project, WU-Minn Consortium (principal investigators:

David Van Essen and Kamil Ugurbil; 1U54MH091657)

funded by the 16 NIH Institutes and Centers that support

the NIH Blueprint for Neuroscience Research, and by the

McDonnell Center for Systems Neuroscience at Washington

University. This study was supported by the Intramural

Research Program, NIH. The authors have no conﬂicts of

interest to disclose. The views, information or content,

and conclusions presented do not necessarily represent the

ofﬁcial position or policy of, nor should anyofﬁcial endorse-

ment be inferred on the part of the Uniformed Services Uni-

versity, the Department ofDefense, the U.S. Government or

the Henry M. Jackson Foundation for the Advancement of

Military Medicine, Inc.

References

1. Fraza CJ, Dinga R, Beckmann CF, Marquand AF. Warped Bayesian lin-

ear regression for normative modelling of big data. Neuroimage 2021;

245:118715.

2. Kjelkenes R, Wolfers T, Alnæs D, et al. Deviations from normative brain

white and gray matter structure are associated with psychopathology in

youth. Dev Cogn Neurosci 2022;58:101173.

3. Marquand AF, Rezek I, Buitelaar J, Beckmann CF. Understanding het-

erogeneity in clinical cohorts using normative models: Beyond case-

control studies. Biol Psychiatry 2016;80(7):552-561.

4. Shirk SD, Mitchell MB, Shaughnessy LW, et al. A web-based normative

calculator for the uniform data set (UDS) neuropsychological test bat-

tery. Alzheimers Res Ther 2011;3(6):32.

5. Sherwood B, Zhou AX, Weintraub S, Wang L. Using quantile regression

to create baseline norms for neuropsychological tests. Alzheimers

Dement (Amst) 2016;2:12-18.

6. Koenker R, Bassett G. Regression quantiles. Econometrica 1978;46(1):

33-50.

7. Basser PJ, Mattiello J, LeBihan D. MR diffusion tensor spectroscopy

and imaging. Biophys J 1994;66(1):259-267.

8. Pierpaoli C, Jezzard P, Basser PJ, Barnett A, Di Chiro G. Diffusion ten-

sor MR imaging of the human brain. Radiology 1996;201(3):637-648.

9. Özarslan E, Koay CG, Shepherd TM, et al. Mean apparent propagator

(MAP) MRI: A novel diffusion imaging method for mapping tissue

microstructure. Neuroimage 2013;78:16-32.

10. Haﬁz R, Nayak A, Irfanoglu MO, Chan L, Pierpaoli C. Using ‘P-scores’:

A novel percentile-based normalization method to accurately assess

individual deviation in heavily skewed neuroimaging data. In: 2023

ISMRM & ISMRT Annual Meeting & Exhibition, Toronto, Canada, Pro-

gram Abstract Number #3781; 19 May2023.

11. Sudlow C, Gallacher J, Allen N, et al. UK biobank: An open access

resource for identifying the causes of a wide range of complex diseases

of middle and old age. PLoS Med 2015;12(3):e1001779.

12. Feng C, Wang H, Lu N, et al. Log-transformation and its implications

for data analysis. Shanghai Arch Psychiatry 2014;26(2):105-109.

13. Robert CP, Casella G, Casella G. Monte Carlo statistical methods. New

York: Springer; 1999.

14. Van Essen DC, Ugurbil K, Auerbach E, et al. The Human Connectome

Project: A data acquisition perspective. Neuroimage 2012;62(4):2222-

2231.

15. Irfanoglu MO, Nayak A, Jenkins J, Pierpaoli C. TORTOISE v3: Improve-

ments and new features of the NIH diffusion MRI processing pipeline.

ISMRM 2018;2018.

16. Irfanoglu MO, Nayak A, Taylor P, Pierpaoli C. TORTOISE V4:

ReImagining the NIH diffusion MRI processing pipeline. In: 2023

ISMRM & ISMRT Annual Meeting & Exhibition, Toronto, Canada, Pro-

gram Abstract Number #0080.

17. Veraart J, Fieremans E, Novikov DS. Diffusion MRI noise mapping using

random matrix theory. Magn Reson Med 2016;76(5):1582-1593.

18. Kellner E, Dhital B, Kiselev VG, Reisert M. Gibbs-ringing artifact

removal based on local subvoxel-shifts. Magn Reson Med 2016;76(5):

1574-1581.

19. Irfanoglu MO, Beyh A, Catani M, Dell’Acqua F, Pierpaoli C.

ReImagining the young adult Human Connectome Project (HCP) diffu-

sion MRI dataset. Proc Int Soc Magn Reson Med 2022;30.

20. Irfanoglu MO, Modi P, Nayak A, Hutchinson EB, Sarlls J, Pierpaoli C.

DR-BUDDI (diffeomorphic registration for blip-up blip-down diffusion

imaging) method for correcting echo planar imaging distortions.

Neuroimage 2015;106:284-299.

21. Gu X, Eklund A. Evaluation of six phase encoding based susceptibility

distortion correction methods for diffusion MRI. Front Neuroinform

2019;13:76.

22. Schilling KG, Blaber J, Huo Y, et al. Synthesized b0 for diffusion distor-

tion correction (Synb0-DisCo). Magn Reson Imaging 2019;64:62-70.

23. Vos SB, Tax CM, Luijten PR, Ourselin S, Leemans A, Froeling M. The

importance of correcting for signal drift in diffusion MRI. Magn Reson

Med 2017;77(1):285-299.

24. Rudrapatna U, Parker GD, Roberts J, Jones DK. A comparative study of

gradient nonlinearity correction strategies for processing diffusion data

obtained with ultra-strong gradient MRI scanners. Magn Reson Med

2021;85(2):1104-1113.

25. Irfanoglu MO, Nayak A, Jenkins J, et al. DR-TAMAS: Diffeomorphic

registration for tensor accurate alignment of anatomical structures.

Neuroimage 2016;132:439-454.

26. Nayak A, Irfanoglu MO, Pierpaoli C. Diffusion MRI atlases from the

Human Connectome Project data. Proc Int Soc Magn Reson Med 2020;

3751.

27. Basser PJ, Pierpaoli C. Microstructural and physiological features of tis-

sues elucidated by quantitative-diffusion-tensor MRI. J Magn Reson B

1996;111(3):209-219.

28. Oishi K, Zilles K, Amunts K, et al. Human brain white matter atlas: Iden-

tiﬁcation and assignment of common anatomical structures in superﬁ-

cial white matter. Neuroimage 2008;43(3):447-457.

29. Yushkevich PA, Piven J, Hazlett HC, et al. User-guided 3D active con-

tour segmentation of anatomical structures: Signiﬁcantly improved efﬁ-

ciency and reliability. Neuroimage 2006;31(3):1116-1128.

30. Nayak A, Walker L, Pierpaoli C, The Brain Development Cooperative

Group. Evaluation of pre-deﬁned atlas based ROIs for the analysis of

DTI data in normal brain development. Proc Int Soc Mag Reson Med

2012;20:1872.

Haﬁz et al.: Assessing Individual Deviations Using “Pscore”

31. Cox SR, Ritchie SJ, Tucker-Drob EM, et al. Ageing and brain white mat-

ter structure in 3,513 UK biobank participants. Nat Commun 2016;7:

13629.

32. Pierpaoli C, Jones DK. Removing CSF contamination in brain DT-MRIs

by using a two-compartment tensor model. Proc Int Soc Magn Reson

Med 2004;11.

33. Filippone M, Engler R. Enabling scalable stochastic gradient-based

inference for Gaussian processes by employing the Unbiased LInear

System SolvEr (ULISSE). Int Conf Mach Learn. Volume 37: PMLR; 2015.

p. 1015-1024.

34. Snelson E, Ghahramani Z. Sparse Gaussian processes using pseudo-

inputs. Neural Information Processing Systems. 2005.

35. Saatçi Y. Scalable inference for structured Gaussian process models.

Citeseer. 2012.

36. Marek S, Tervo-Clemmens B, Calabro FJ, et al. Reproducible brain-

wide association studies require thousands of individuals. Nature 2022;

603(7902):654-660.

37. Bishop CM. Pattern recognition and machine learning: All “just the

facts 101”Material. New Delhi: Springer (India) Private Limited; 2013.

38. Casari A, Zheng A. Feature engineering for machine learning.

Sebastopol, CA: O’Reilly Media, Inc; 2018. p 218.

39. Davison AC. Extreme values. Encyclopedia of Biostatistics. 2005, 4.

Journal of Magnetic Resonance Imaging

Hafiz_R_et_al_Supplementary_Material.docx

Data

January 2024

Rakibul Hafiz · Mustafa Okan Irfanoglu · Amritha Nayak · Carlo Pierpaoli

Download

Neuroimaging Findings in US Government Personnel and Their Family Members Involved in Anomalous Health Incidents

Article

Mar 2024
J Am Med Assoc

Importance US government personnel stationed internationally have reported anomalous health incidents (AHIs), with some individuals experiencing persistent debilitating symptoms. Objective To assess the potential presence of magnetic resonance imaging (MRI)–detectable brain lesions in participants with AHIs, with respect to a well-matched control group. Design, Setting, and Participants This exploratory study was conducted at the National Institutes of Health (NIH) Clinical Center and the NIH MRI Research Facility between June 2018 and November 2022. Eighty-one participants with AHIs and 48 age- and sex-matched control participants, 29 of whom had similar employment as the AHI group, were assessed with clinical, volumetric, and functional MRI. A high-quality diffusion MRI scan and a second volumetric scan were also acquired during a different session. The structural MRI acquisition protocol was optimized to achieve high reproducibility. Forty-nine participants with AHIs had at least 1 additional imaging session approximately 6 to 12 months from the first visit. Exposure AHIs. Main Outcomes and Measures Group-level quantitative metrics obtained from multiple modalities: (1) volumetric measurement, voxel-wise and region of interest (ROI)–wise; (2) diffusion MRI–derived metrics, voxel-wise and ROI-wise; and (3) ROI-wise within-network resting-state functional connectivity using functional MRI. Exploratory data analyses used both standard, nonparametric tests and bayesian multilevel modeling. Results Among the 81 participants with AHIs, the mean (SD) age was 42 (9) years and 49% were female; among the 48 control participants, the mean (SD) age was 43 (11) years and 42% were female. Imaging scans were performed as early as 14 days after experiencing AHIs with a median delay period of 80 (IQR, 36-544) days. After adjustment for multiple comparisons, no significant differences between participants with AHIs and control participants were found for any MRI modality. At an unadjusted threshold ( P < .05), compared with control participants, participants with AHIs had lower intranetwork connectivity in the salience networks, a larger corpus callosum, and diffusion MRI differences in the corpus callosum, superior longitudinal fasciculus, cingulum, inferior cerebellar peduncle, and amygdala. The structural MRI measurements were highly reproducible (median coefficient of variation <1% across all global volumetric ROIs and <1.5% for all white matter ROIs for diffusion metrics). Even individuals with large differences from control participants exhibited stable longitudinal results (typically, <±1% across visits), suggesting the absence of evolving lesions. The relationships between the imaging and clinical variables were weak (median Spearman ρ = 0.10). The study did not replicate the results of a previously published investigation of AHIs. Conclusions and Relevance In this exploratory neuroimaging study, there were no significant differences in imaging measures of brain structure or function between individuals reporting AHIs and matched control participants after adjustment for multiple comparisons.

Editorial for "Pscore": A Novel Percentile-Based Metric to Accurately Assess Individual Deviations in Non-Gaussian Distributions of Quantitative MRI Metrics

Article

Feb 2024
J MAGN RESON IMAGING

Deviations from normative brain white and gray matter structure are associated with psychopathology in youth

Article

Full-text available

Dec 2022

Combining imaging modalities and metrics that are sensitive to various aspects of brain structure and maturation may help identify individuals that show deviations in relation to same-aged peers, and thus benefit early-risk-assessment for mental disorders. We used one timepoint multimodal brain imaging, cognitive, and questionnaire data from 1280 eight-to twenty-one-year-olds from the Philadelphia Neurodevelopmental Cohort. We estimated age-related gray and white matter properties and estimated individual deviation scores using normative modeling. Next, we tested for associations between the estimated deviation scores, and with psy-chopathology domain scores and cognition. More negative deviations in DTI-based fractional anisotropy (FA) and the first principal eigenvalue of the diffusion tensor (L1) were associated with higher scores on psychosis positive and prodromal symptoms and general psychopathology. A more negative deviation in cortical thickness (CT) was associated with a higher general psychopathology score. Negative deviations in global FA, surface area, L1 and CT were also associated with poorer cognitive performance. No robust associations were found between the deviation scores based on CT and DTI. The low correlations between the different multimodal magnetic resonance imaging-based deviation scores suggest that psychopathological burden in adolescence can be mapped onto partly distinct neurobiological features.

Reproducible brain-wide association studies require thousands of individuals

Article

Full-text available

Mar 2022
NATURE

Magnetic resonance imaging (MRI) has transformed our understanding of the human brain through well-replicated mapping of abilities to specific structures (for example, lesion studies) and functions1–3 (for example, task functional MRI (fMRI)). Mental health research and care have yet to realize similar advances from MRI. A primary challenge has been replicating associations between inter-individual differences in brain structure or function and complex cognitive or mental health phenotypes (brain-wide association studies (BWAS)). Such BWAS have typically relied on sample sizes appropriate for classical brain mapping⁴ (the median neuroimaging study sample size is about 25), but potentially too small for capturing reproducible brain–behavioural phenotype associations5,6. Here we used three of the largest neuroimaging datasets currently available—with a total sample size of around 50,000 individuals—to quantify BWAS effect sizes and reproducibility as a function of sample size. BWAS associations were smaller than previously thought, resulting in statistically underpowered studies, inflated effect sizes and replication failures at typical sample sizes. As sample sizes grew into the thousands, replication rates began to improve and effect size inflation decreased. More robust BWAS effects were detected for functional MRI (versus structural), cognitive tests (versus mental health questionnaires) and multivariate methods (versus univariate). Smaller than expected brain–phenotype associations and variability across population subsamples can explain widespread BWAS replication failures. In contrast to non-BWAS approaches with larger effects (for example, lesions, interventions and within-person), BWAS reproducibility requires samples with thousands of individuals.

Warped Bayesian linear regression for normative modelling of big data

Article

Full-text available

Dec 2021
NEUROIMAGE

Normative modelling is becoming more popular in neuroimaging due to its ability to make predictions of deviation from a normal trajectory at the level of individual participants. It allows the user to model the distribution of several neuroimaging modalities, giving an estimation for the mean and centiles of variation. With the increase in the availability of big data in neuroimaging, there is a need to scale normative modelling to big data sets. However, the scaling of normative models has come with several challenges. So far, most normative modelling approaches used Gaussian process regression, and although suitable for smaller datasets (up to a few thousand participants) it does not scale well to the large cohorts currently available and being acquired. Furthermore, most neuroimaging modelling methods that are available assume the predictive distribution to be Gaussian in shape. However, deviations from Gaussianity can be frequently found, which may lead to incorrect inferences, particularly in the outer centiles of the distribution. In normative modelling, we use the centiles to give an estimation of the deviation of a particular participant from the ‘normal’ trend. Therefore, especially in normative modelling, the correct estimation of the outer centiles is of utmost importance, which is also where data are sparsest. Here, we present a novel framework based on Bayesian linear regression with likelihood warping that allows us to address these problems, that is, to correctly model non-Gaussian predictive distributions and scale normative modelling elegantly to big data cohorts. In addition, this method provides likelihood-based statistics, which are useful for model selection. To evaluate this framework, we use a range of neuroimaging-derived measures from the UK Biobank study, including image-derived phenotypes (IDPs) and whole-brain voxel-wise measures derived from diffusion tensor imaging. We show good computational scaling and improved accuracy of the warped BLR for certain IDPs and voxels if there was a deviation from normality of these parameters in their residuals. The present results indicate the advantage of a warped BLR in terms of; computational scalability and the flexibility to incorporate non-linearity and non-Gaussianity of the data, giving a wider range of neuroimaging datasets that can be correctly modelled.

A comparative study of gradient nonlinearity correction strategies for processing diffusion data obtained with ultra‐strong gradient MRI scanners

Article

Full-text available

Oct 2020
MAGN RESON MED

Purpose The analysis of diffusion data obtained under large gradient nonlinearities necessitates corrections during data reconstruction and analysis. While two such preprocessing pipelines have been proposed, no comparative studies assessing their performance exist. Furthermore, both pipelines neglect the impact of subject motion during acquisition, which, in the presence of gradient nonlinearities, induces spatio‐temporal B‐matrix variations. Here, spatio‐temporal B‐matrix tracking (STB) is proposed and its performance compared to established pipelines. Methods Diffusion tensor MRI (DT‐MRI) was performed using a 300 mT/m gradient system. Data were acquired with volunteers positioned in regions with pronounced gradient nonlinearities, and used to compare the performance of six different processing pipelines, including STB. Results Up to 30% errors were observed in DT‐MRI parameter estimates when neglecting gradient nonlinearities. Moreover, the order in which B 0 inhomogeneity, eddy current and gradient nonlinearity corrections were performed was found to impact the consistency of parameter estimates significantly. Although, no pipeline emerged as a clear winner, the STB approach seemed to yield the most consistent parameter estimates under large gradient nonlinearities. Conclusions Under large gradient nonlinearities, the choice of preprocessing pipeline significantly impacts the estimated diffusion parameters. Motion‐induced spatio‐temporal B‐matrix variations can lead to systematic bias in the parameter estimates, that can be ameliorated using the proposed STB framework.

Evaluation of Six Phase Encoding Based Susceptibility Distortion Correction Methods for Diffusion MRI

Article

Full-text available

Dec 2019

Purpose: Susceptibility distortions impact diffusion MRI data analysis and is typically corrected during preprocessing. Correction strategies involve three classes of methods: registration to a structural image, the use of a fieldmap, or the use of images acquired with opposing phase encoding directions. It has been demonstrated that phase encoding based methods outperform the other two classes, but unfortunately, the choice of which phase encoding based method to use is still an open question due to the absence of any systematic comparisons. Methods: In this paper we quantitatively evaluated six popular phase encoding based methods for correcting susceptibility distortions in diffusion MRI data. We employed a framework that allows for the simulation of realistic diffusion MRI data with susceptibility distortions. We evaluated the ability for methods to correct distortions by comparing the corrected data with the ground truth. Four diffusion tensor metrics (FA, MD, eigenvalues and eigenvectors) were calculated from the corrected data and compared with the ground truth. We also validated two popular indirect metrics using both simulated data and real data. The two indirect metrics are the difference between the corrected LR and AP data, and the FA standard deviation over the corrected LR, RL, AP, and PA data. Results: We found that DR-BUDDI and TOPUP offered the most accurate and robust correction compared to the other four methods using both direct and indirect evaluation metrics. EPIC and HySCO performed well in correcting b0 images but produced poor corrections for diffusion weighted volumes, and also they produced large errors for the four diffusion tensor metrics. We also demonstrate that the indirect metric (the difference between corrected LR and AP data) gives a different ordering of correction quality than the direct metric. Conclusion: We suggest researchers to use DR-BUDDI or TOPUP for susceptibility distortion correction. The two indirect metrics (the difference between corrected LR and AP data, and the FA standard deviation) should be interpreted together as a measure of distortion correction quality. The performance ranking of the various tools inferred from direct and indirect metrics differs slightly. However, across all tools, the results of direct and indirect metrics are highly correlated indicating that the analysis of indirect metrics may provide a good proxy of the performance of a correction tool if assessment using direct metrics is not feasible.

Ageing and brain white matter structure in 3,513 UK Biobank participants

Article

Full-text available

Dec 2016

Quantifying the microstructural properties of the human brain's connections is necessary for understanding normal ageing and disease. Here we examine brain white matter magnetic resonance imaging (MRI) data in 3,513 generally healthy people aged 44.64–77.12 years from the UK Biobank. Using conventional water diffusion measures and newer, rarely studied indices from neurite orientation dispersion and density imaging, we document large age associations with white matter microstructure. Mean diffusivity is the most age-sensitive measure, with negative age associations strongest in the thalamic radiation and association fibres. White matter microstructure across brain tracts becomes increasingly correlated in older age. This may reflect an age-related aggregation of systemic detrimental effects. We report several other novel results, including age associations with hemisphere and sex, and comparative volumetric MRI analyses. Results from this unusually large, single-scanner sample provide one of the most extensive characterizations of age associations with major white matter tracts in the human brain.

DR-TAMAS: Diffeomorphic Registration for Tensor Accurate Alignment of Anatomical Structures

Article

Full-text available

Feb 2016
NEUROIMAGE

In this work, we propose DR-TAMAS (Diffeomorphic Registration for Tensor Accurate alignMent of Anatomical Structures), a novel framework for intersubject registration of Diffusion Tensor Imaging (DTI) data sets. This framework is optimized for brain data and its main goal is to achieve an accurate alignment of all brain structures, including white matter (WM), gray matter (GM), and spaces containing cerebrospinal fluid (CSF). Currently most DTI-based spatial normalization algorithms emphasize alignment of anisotropic structures. While some diffusion-derived metrics, such as diffusion anisotropy and tensor eigenvector orientation, are highly informative for proper alignment of WM, other tensor metrics such as the trace or mean diffusivity (MD) are fundamental for a proper alignment of GM and CSF boundaries. Moreover, it is desirable to include information from structural MRI data, e.g., T1-weighted or T2-weighted images, which are usually available together with the diffusion data. The fundamental property of DR-TAMAS is to achieve global anatomical accuracy by incorporating in its cost function the most informative metrics locally. Another important feature of DR-TAMAS is a symmetric time-varying velocity-based transformation model, which enables it to account for potentially large anatomical variability in healthy subjects and patients. The performance of DR-TAMAS is evaluated with several data sets and compared with other widely-used diffeomorphic image registration techniques employing both full tensor information and/or DTI-derived scalar maps. Our results show that the proposed method has excellent overall performance in the entire brain, while being equivalent to the best existing methods in WM.

The importance of correcting for signal drift in diffusion MRI

Article

Full-text available

Jan 2016
MAGN RESON MED

Purpose: To investigate previously unreported effects of signal drift as a result of temporal scanner instability on diffusion MRI data analysis and to propose a method to correct this signal drift. Methods: We investigated the signal magnitude of non-diffusion-weighted EPI volumes in a series of diffusion-weighted imaging experiments to determine whether signal magnitude changes over time. Different scan protocols and scanners from multiple vendors were used to verify this on phantom data, and the effects on diffusion kurtosis tensor estimation in phantom and in vivo data were quantified. Scalar metrics (eigenvalues, fractional anisotropy, mean diffusivity, mean kurtosis) and directional information (first eigenvectors and tractography) were investigated. Results: Signal drift, a global signal decrease with subsequently acquired images in the scan, was observed in phantom data on all three scanners, with varying magnitudes up to 5% in a 15-min scan. The signal drift has a noticeable effect on the estimation of diffusion parameters. All investigated quantitative parameters as well as tractography were affected by this artifactual signal decrease during the scan. Conclusion: By interspersing the non-diffusion-weighted images throughout the session, the signal decrease can be estimated and compensated for before data analysis; minimizing the detrimental effects on subsequent MRI analyses. Magn Reson Med, 2016. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.

Synthesized b0 for diffusion distortion correction (Synb0-DisCo)

Article

May 2019
MAGN RESON IMAGING

Diffusion magnetic resonance images typically suffer from spatial distortions due to susceptibility induced off-resonance fields, which may affect the geometric fidelity of the reconstructed volume and cause mismatches with anatomical images. State-of-the art susceptibility correction (for example, FSL's TOPUP algorithm) typically requires data acquired twice with reverse phase encoding directions, referred to as blip-up blip-down acquisitions, in order to estimate an undistorted volume. Unfortunately, not all imaging protocols include a blip-up blip-down acquisition, and cannot take advantage of the state-of-the art susceptibility and motion correction capabilities. In this study, we aim to enable TOPUP-like processing with historical and/or limited diffusion imaging data that include only a structural image and single blip diffusion image. We utilize deep learning to synthesize an undistorted non-diffusion weighted image from the structural image, and use the non-distorted synthetic image as an anatomical target for distortion correction. We evaluate the efficacy of this approach (named Synb0-DisCo) and show that our distortion correction process results in better matching of the geometry of undistorted anatomical images, reduces variation in diffusion modeling, and is practically equivalent to having both blip-up and blip-down non-diffusion weighted images.

Regression quantiles

Article

Jan 1978
ECONOMETRICA

“Pscore”: A Novel Percentile-Based Metric to Accurately Assess Individual Deviations in Non-Gaussian Distributions of Quantitative MRI Metrics

Abstract

Supplementary resource (1)

Recommended publications

′Pscore′ - A Novel Percentile-Based Metric to Accurately Assess Individual Deviations in Non-Gaussia...

Editorial for "Pscore": A Novel Percentile-Based Metric to Accurately Assess Individual Deviations i...

Acquisition and processing strategy for obtaining high quality, distortion free diffusion MRI of the...

Evaluating corrections for Eddy‐currents and other EPI distortions in diffusion MRI: methodology and...