ArticlePDF Available

“Pscore”: A Novel Percentile-Based Metric to Accurately Assess Individual Deviations in Non-Gaussian Distributions of Quantitative MRI Metrics

Authors:

Abstract

Background Quantitative magnetic resonance imaging (MRI) metrics could be used in personalized medicine to assess individuals against normative distributions. Conventional Zscore analysis is inadequate in the presence of non‐Gaussian distributions. Therefore, if quantitative MRI metrics deviate from normality, an alternative is needed. Purpose To confirm non‐Gaussianity of diffusion MRI (dMRI) metrics on a publicly available dataset, and to propose a novel percentile‐based method, “Pscore” to address this issue. Study Type Retrospective cohort. Population Nine hundred and sixty‐one healthy young adults (age: 22–35 years, females: 53%) from the Human Connectome Project. Field Strength/Sequence 3‐T, spin‐echo diffusion echo‐planar imaging, T1‐weighted: MPRAGE. Assessment The dMRI data were preprocessed using the TORTOISE pipeline. Forty‐eight regions of interest (ROIs) from the JHU atlas were redrawn on a study‐specific diffusion tensor (DT) template and average values were computed from various DT and mean apparent propagator (MAP) metrics. For each ROI, percentile ranks across participants were computed to generate “Pscores”—which normalized the difference between the median and a participant's value with the corresponding difference between the median and the 5th/95th percentile values. Statistical Tests ROI‐wise distributions were assessed using log transformations, Zscore, and the “Pscore” methods. The percentages of extreme values above‐95th and below‐5th percentile boundaries (PEV >95 (%), PEV <5 (%)) were also assessed in the overall white matter. Bootstrapping was performed to test the reliability of Pscores in small samples (N = 100) using 100 iterations. Results The dMRI metric distributions were systematically non‐Gaussian, including positively skewed (eg, mean and radial diffusivity) and negatively skewed (eg, fractional and propagator anisotropy) metrics. This resulted in unbalanced tails in Zscore distributions (PEV >95 ≠ 5%, PEV <5 ≠ 5%) whereas “Pscore” distributions were symmetric and balanced (PEV >95 = PEV <5 = 5%); even for small bootstrapped samples (average [SD]). Data Conclusion The inherent skewness observed for dMRI metrics may preclude the use of conventional Zscore analysis. The proposed “Pscore” method may help estimating individual deviations more accurately in skewed normative data, even from small datasets. Level of Evidence 1 Technical Efficacy Stage 1
RESEARCH ARTICLE
Pscore: A Novel Percentile-Based Metric
to Accurately Assess Individual Deviations
in Non-Gaussian Distributions of
Quantitative MRI Metrics
Rakibul Haz, PhD,
1
*M. Okan Irfanoglu, PhD,
1
Amritha Nayak, ME,
1,2,3
and
Carlo Pierpaoli, MD, PhD
1
Background: Quantitative magnetic resonance imaging (MRI) metrics could be used in personalized medicine to assess
individuals against normative distributions. Conventional Zscore analysis is inadequate in the presence of non-Gaussian dis-
tributions. Therefore, if quantitative MRI metrics deviate from normality, an alternative is needed.
Purpose: To conrm non-Gaussianity of diffusion MRI (dMRI) metrics on a publicly available dataset, and to propose a
novel percentile-based method, Pscoreto address this issue.
Study Type: Retrospective cohort.
Population: Nine hundred and sixty-one healthy young adults (age: 2235 years, females: 53%) from the Human
Connectome Project.
Field Strength/Sequence: 3-T, spin-echo diffusion echo-planar imaging, T1-weighted: MPRAGE.
Assessment: The dMRI data were preprocessed using the TORTOISE pipeline. Forty-eight regions of interest
(ROIs) from the JHU atlas were redrawn on a study-specic diffusion tensor (DT) template and average values
were computed from various DT and mean apparent propagator (MAP) metrics. For each ROI, percentile ranks
across participants were computed to generate Pscoreswhich normalized the difference between the median
and a participants value with the corresponding difference between the median and the 5th/95th percentile
values.
Statistical Tests: ROI-wise distributions were assessed using log transformations, Zscore, and the Pscoremethods. The per-
centages of extreme values above-95th and below-5th percentile boundaries (PEV
>95
(%), PEV
<5
(%)) were also assessed in the
overall white matter. Bootstrapping was performed to test the reliability of Pscores in small samples (N =100) using
100 iterations.
Results: The dMRI metric distributions were systematically non-Gaussian, including positively skewed (eg, mean and radial
diffusivity) and negatively skewed (eg, fractional and propagator anisotropy) metrics. This resulted in unbalanced tails in
Zscore distributions (PEV
>95
5%, PEV
<5
5%) whereas Pscoredistributions were symmetric and balanced
(PEV
>95
=PEV
<5
=5%); even for small bootstrapped samples (average PEV>95 ¼PEV<5¼50%[SD]).
Data Conclusion: The inherent skewness observed for dMRI metrics may preclude the use of conventional Zscore analysis.
The proposed Pscoremethod may help estimating individual deviations more accurately in skewed normative data, even
from small datasets.
Level of Evidence: 1
Technical Efcacy: Stage 1
J. MAGN. RESON. IMAGING 2024.
View this article online at wileyonlinelibrary.com. DOI: 10.1002/jmri.29248
Received Dec 6, 2023, Accepted for publication Jan 9, 2024.
*Address reprint requests to: R.H., Building 13, Room 3W43, Bethesda, MD 20892, USA. E-mail: rakibul.haz@nih.gov
From the
1
Laboratory on Quantitative Medical Imaging, National Institute of Biomedical Imaging and Bioengineering, Bethesda, Maryland, USA;
2
Military
Traumatic Brain Injury Initiative (MTBI2formerly known as the Center for Neuroscience and Regenerative Medicine [CNRM]), Bethesda, Maryland, USA; and
3
The Henry Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
Additional supporting information may be found in the online version of this article
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in
any medium, provided the original work is properly cited, the use is non-commercial and no modications or adaptations are made.
Neuroimaging studies in clinical research typically rely on
group-level analyses, delineating summary outcomes
that differentiate a patient cohort from a group of healthy
controls. However, in clinical practice, there is a need to
assess individual patients. This is typically done by building a
model to evaluate the individual subject against a normative
sample.
13
Quantitative magnetic resonance imaging (MRI)
metrics can provide normative data against which individual
deviations can be assessed. Therefore, once the accuracy and
reliability are veried, quantitative MRI metrics could
become highly relevant for clinical assessment of individual
patients.
Individual assessments against a normative distribution
are often performed using Zscores in clinical and neuroimag-
ing paradigms.
24
However, a thorough assessment of the dis-
tribution that generates the normative dataset itself is rarely
performed. An example is the Gaussianity and homogeneity
of variance, a fundamental assumption for parametric models.
Deviation from these assumptions can result in biased statisti-
cal inferences, eg, in neuropsychological test scores.
5
Quantile
regressions have been adopted to mitigate this by generating
normative distributions at specic percentiles.
6
These can
offer a solution based on distributions at selective percentile/
quantile ranks and might avoid the use of conventional mean
regression models that would be biased for asymmetric
distributions.
The problem is pervasive even for quantitative MRI
metrics. In a pilot study (N =48), we have recently shown
prominent deviation from normality and heavy-tailed distri-
butions of several diffusion tensor (DT) and mean apparent
propagator (MAP) metrics.
710
For example, mean diffusivity
(MD) had a positively skewed distribution, while fractional
anisotropy (FA) and propagator anisotropy (PA) showed a
negatively skewed distribution.
10
In our small sample, these
non-Gaussian features were inherently related to the diffusion
characteristics of water in the brain, and not originating from
heterogeneity in the underlying population as it has been pre-
viously reported in large scale-public datasets.
1,3
For example,
for the UK-Biobank data, Fraza et al t a warped Gaussian
model to compensate for the skewness and kurtosis in the
normative data before using Zscores to assess heterogeneous
individuals.
1,11
Another approach popularly used to address
skewness is log transformation, but it is known to worsen the
issue of skewness and can lead to inaccurate and biased infer-
ences.
12,13
Our initial assessment showed that even when
comparing individuals within the normative sample, Zscores
may show an imbalance in extreme values at the tails: with
more negative extreme values for negatively skewed distribu-
tions and, on the other hand, more positive extreme values
for positively skewed distributions.
10
Therefore, comparing
patients against such distributions using Zscores might
wrongly place some of them in the extremities of the distribu-
tion and could introduce false-positive ndings.
In this study, we aimed to test if the non-Gaussianity in
diffusion MRI (dMRI) metrics can be reproduced in a large
sample, comprising a homogeneous group of healthy young
participants; and assess whether a percentile-based Pscore
could accurately estimate an individuals deviation from the
central tendencies of a normative distribution; and compare
its accuracy against two popular normalization methods:
Zscoreand Logtransformation.
Materials and Methods
Participants
We used the Human Connecoctome Project Young Adult (HCP-
YA) cohort from the S1200 series, which has at least 1113 partici-
pants with 3-T MRI data available from a pool of 1200 young adults
(age range: 2235 years); who were all recruited with written
informed consent following the guidelines of the institutional review
board (IRB).
14
The data include structural MRI, functional (fMRI),
and dMRI data along with behavioral and genetic testing. For this
study, our primary interest was the dMRI data. About 100 (out of
1200) participants either did not have dMRI data or the parameter
information on the dMRI acquisition was incomplete. From the
remaining pool of 1100 subjects, 129 participants were excluded
because the distortions in their subject-space dMRI data were
beyond the scope of correction based on visual inspection. Ten out
of the remaining 971 participants were 36+years of age, and they
were excluded to keep the sample more homogeneous, within the
2235 years range. Thus, the effective sample included neuroimag-
ing data from 961 participants (53% females). All participants were
scanned on the same equipment, using the same protocol.
Diffusion MRI Protocol
The Connectome 3 T Skyra scanner(SiemensHealthineers,
Erlangen, Germany) was used to acquire dMRI data across six runs
in a full session. Each run was approximately 9 minutes and
50 seconds long, with three different gradient tables, and each
table was acquired with two phase-encoding directionsright-to-
left and left-to-right. Each gradient table consisted of six
b=0 s/mm
2
acquisitions and approximately 90 diffusion weighting
directions. The diffusion weighting was done across three b-shells
1000, 2000, and 3000 s/mm
2
, each with an equal number of acquisi-
tions per run. The imaging parameters were as follows: spin-echo
echo-planar imaging (EPI) sequence with repetition time (TR)
=5520 msec, echo time (TE) =89.5 msec, ip angle =78,
refocusing ip angle =160,eld of view (FOV) =210 180,
matrix =168 144, slice thickness =1.25 mm, 111 slices acquired
at 1.25 mm isotropic resolution, a multiband factor of 3, and echo
spacing =0.78 msec. The T1-weighted imaging sequence included
3D MPRAGE images acquired over a period of 7 minutes and
40 seconds at a TR =2400 msec, TE =2.14 msec, inversion time
(TI) =1000 msec, ip angle =8and FOV =224 224 at an
isotropic voxelwise resolution of 0.7 mm.
Preprocessing
We used the TORTOISEV3 (version 3, www.tortoisedti.org) pipe-
line to process the dMRI data, because Irfanoglu et al had shown
considerable improvement in the dMRI metrics using this pipeline
2
Journal of Magnetic Resonance Imaging
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
compared to the released version of the HCP dataset.
15,16
In the fol-
lowing, we briey describe the different stages of pre-processing: 1)
Denoising was performed using a model-free noise mapping technique
proposed by Veraart et al, with a kernel radius of 3.
17
2) Gibbs-ringing
correction was performed using the subvoxel-shift method, which
showed improvements even without introducing additional imperfec-
tions.
18
For 3) Inter-volume motion and Eddy current correction,werst
applied a MAP-based model, as it is independent of shelled data and
accurately extrapolates the unseen q-vector signals.
16,19
All diffusion-
weighted images (DWIs) were aligned to the ideal b=0 s/mm
2
image. As a nal step, step 3) was run once more, but in this instance,
the synthesized and the slice-transformed real images were used. These
steps were iterated over until convergence was reached. 4) Susceptibil-
ity-induced distortions were considered as well, given that acquiring data
at high resolution (like for the HCP) can cause severe EPI distor-
tions.
14,20
We applied the DRBUDDI approach, which is a blip-up
and blip-down distortion correction technique with excellent perfor-
mance.
20,21
Besides using the b=0s/mm
2
image, DRBUDDI also
incorporates DTs and relies on an undistorted T2-weighted (T2W)
structural image.
20
However, the T2W images from the HCP data
were not compatible with DRBUDDI; therefore, we used a
machine learning-based technique called SynB0-DisCo to gener-
ate a structural image that t well into the DRBUDDI para-
digm.
22
5) Gradient nonlinearity correction was used as the HCP
data come with gradwarpedDWIs and gradient-deviation ten-
sor images.
14
Effects of inter-volume motion were not considered
when a single gradient-deviation tensor was used for all DWIs.
Therefore, we also computed the voxelwise B matrices, to take
these effects into consideration.
23
6) Signal drift correction,given
that the scan time was long, signal drift was observed in the HCP
data, and therefore had to be corrected using a method proposed
earlier.
24
7) Normalization and template generation was performed
on the processed data at both the HCP isotropic resolution of
1.25 mm and at the 1 mm resolution of the processed
T1-weighted (T1W) image. A DT-based registration was applied,
and an atlas was generated using the 1 mm resolution data; and
the DWIs were warped on to the template space using non-linear
transformation.
25,26
Diffusion MRI Metrics
We generated voxelwise maps for four DT metricsFA, MD, axial
diffusivity (AD), and radial diffusivity (RD).
27
We also generated
voxelwise maps for ve MAP metricsPA, return to axis probability
(RTAP), return to origin probability (RTOP), return to plane proba-
bility (RTPP), and non-Gaussianity (NG).
9
Each of these metrics
carries useful quantitative information about the various diffusion
behaviors of water in the brain.
Quality Control Assessment
Quality control assessments are vital for neuroimaging applica-
tions, especially those that involve complex interpolations and
multiple preprocessing steps. We took several steps in assessing
each dMRI metric map for each subject. All dMRI maps regis-
tered to the study template were rst visually inspected. The maps
from 961 subjects were checked for misregistration and abnormal
warping by generating an in-house custom-built script using mod-
ules from SPM12 (update revision number: 7771, http://www.l.
ion.ucl.ac.uk/spm/) within a MATLAB environment (version:
R2022b, MathWorks Inc., Massachusetts, USA). The script gen-
erates contour lines of all maps over the corresponding subjects
T2W image in the template space. Subjects that failed the quality
control assessment were checked again and appropriate steps were
taken to correct the registration issues. The datasets that could
not be salvaged despite this extensive correction pipeline were
removed from further analysis, leaving an effective sample of
960 and 912 participants fortheDTandMAPmetrics,
respectively.
Regions of Interest
To reduce the number of tests and focus on specicwhitematter
(WM) regions, we used a set of regions of interest (ROIs) for the
currentstudy.TheROIswereinspiredbytheJohnHopkins
University (JHU) WM ROIs; however, they were manually red-
rawn (by A.N., over 13 years of experience in the eld of neuro-
anatomy and neuroimaging) on an average DT brain template
built from the HCP dataset to avoid issues with left/right struc-
tural asymmetry that have been reported for the original JHU
ROIs.
19,25,2830
Moreover, the JHU ROIs were dened in a sca-
lar map and used scalar-based registration, which tend to have
misregistrations.
25
We used a tensor-based registration, which
provides better alignment.
25
The JHU ROIs are quite large and
therefore, careful steps were taken when the ROIs were redrawn
to ensure they were withinthe tracts/structures to reduce par-
tial volume effects. The ROI labels were created and the average
DT and MAP metrics were computed for each subject across all
ROIs using ITK-SNAP (version 3.6.0, www.itksnap.org).
29
Figure 1shows the ROIs in a detailed montage across all three
orthogonal planes for better visualization and assessment.
Table S1 in the Supplemental Material provides more details on
these ROIs.
The Pscore Method
The rst step in computing Pscores was to generate percentile ranks
for each participant. They were computed within each ROI using
the following formula:
pij ¼nNS xij
nNS
100; i¼1,2,3N,j¼1,2, 3, M,ð1Þ
where xij represents the average dMRI metric value of the ith indi-
vidual for the jth ROI, and irepresents 1 N(960) participants,
and jrepresents 1 M(48) ROIs, respectively. Furthermore,
nNS xij represents the number of participants within the normative
sample having a value x
ij
. The denominator nNS represents the total
number of participants in the normative sample.
After the percentile computation, each participants position
in the ROI-wise distribution was considered to assess which side of
the tail they were represented in. This was identied by the differ-
ence between the participants metric value and the median of the
distribution. This difference was then normalized with either the
difference between the median and the 5th or the 95th percentile
edge value, depending on the participants position. The following
equations depict how a Pscore was computed on either side of the
median:
3
Haz et al.: Assessing Individual Deviations Using Pscore
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIGURE 1: The regions of interest (ROIs) used in the current study. A 2 5 montage was created in all orthogonal planes to show the
spatial extent of the ROIs overlaid on the average connectome diffusion tensor (FA) template. The 48 ROIs are shown using a colormap
with a range of 64 colors. The slice labels with the correct direction are provided on the top-left of each image in the montage. The
A,”“P,”“I,”“S,”“L,and R,labels represent the anterior, posterior, inferior, superior, left, and right directions, respectively.
4
Journal of Magnetic Resonance Imaging
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
P5ij ¼ðÞ1:645 dxij
d5
;
dxij ¼xij mjjxij <mj
d5¼mjxj5jxij <mj
,
8
<
:
ð2Þ
P95ij ¼þðÞ1:645 dxij
d95
;
dxij ¼xij mjjxij >mj
d95 ¼xj95 mjjxij >mj
,
8
<
:
ð3Þ
where dxij is the difference between xij and the median of the distri-
bution, mjfor the jth ROI. If the metric value of the ith participant,
xij <mj, then dxij < 0, which means that the participant was located
on the left-hand tail between mjand the 5th percentile edge value,
xj5of the distribution. The corresponding denominator d5was com-
puted as the difference between mjand xj5. On the other hand,
dxij > 0, when xij >mj. This indicated that the participants position
was on the right-hand tail between mjand the 95th percentile edge
value, xj95 and the denominator d95 was computed as the difference
between xj95 and mj. Both P5ij and P95ij takes on the polarity of dxij ,
generating the negative and positive scores, respectively. The ratios
of these differences were then scaled by j1.645jrepresenting the
Zscore value corresponding to the 5th and 95th percentiles of a nor-
mal distribution. This was done to bring the Pscores to the scale of
Zscores and make them comparable.
Statistical Analysis
We analyzed the data ROI-wise in R (R Core Team (2023); Ver-
sion: 4.3.2; R: A Language and Environment for Statistical Comput-
ing; R Foundation for Statistical Computing, Vienna, Austria;
https://www.R-project.org/) using three normalization techniques
log transformation, standardized Zscore, and the Pscoremethod
proposed in this study. Histogram distributions of the raw data were
used as a reference to assess the distributions from these three
methods. For each histogram plot, the mean and median lines were
added to assess deviations from the mode and misalignment of these
central tendency measures. A normal density curve was t on top of
the histograms to assess which normalization method was closer to a
Gaussian distribution. This led to four gures per ROI for each
dMRI metric, generating 1728 gures (4 gures 48 ROIs 9
dMRI metrics).
To summarize and simplify, we took the Zscore and Pscore
values from the entire sample across all ROIs and decomposed them
into two single column vectors. For example, for a DT metric such
as FA, each ROI had Zscores from 960 healthy individuals. This
would create a vector of 46,080 Zscores (960 Zscores 48 ROIs),
and similar observations were true for Pscores. For the MAP metrics,
since data from 912 individuals survived the quality control step, the
vector had 43,776 Zscores (912 Zscores 48 ROIs). These vectors
were then used to generate two distribution plotsone for Zscores
and the other for Pscores. The process was repeated for each DT
and MAP metric. The log transformation method was not included
in this step, as we already mentioned the issues with this approach,
and it performed very poorly at the ROI level (Fig. 2, Figs. S1 and
S2 in the Supplemental Material). Additionally, log-transformed
values were not on the same scale as Zscores and Pscores, making
them incompatible for this comparison. Since the Pscores were in
the same scale as Zscores, they can be compared and assessed
together. This was to showcase which of these two normalization
techniques showed a statistical imbalance in the data spread across
the entire WM and over the entire sample.
More importantly, for each metric, this step also helped to
particularly quantify the imbalance in the number and percentage of
extreme values present in the tails of these distributions. A normal
Z-distribution would have 5% of extreme values above the 95th per-
centile (Z=1.645) and below the 5th percentile (Z=1.645)
boundaries. We quantied and tabulated the number and percentage
of these extreme values for Zscores and Pscores for all dMRI metrics.
In the presence of a non-Gaussian distribution, the balance of 5%
would be altered in the tails. This helped showcase which of these
normalization methods maintained a systematic imbalance of
extreme values that may lead to inaccurate assessment of individuals
whose values were in the tails of the distribution.
To test how Pscores performed on smaller samples, we per-
formed bootstrapping on the pool of HCP participants (N =960).
We ran 100 iterations, each time randomly selecting 100 participants
and repeating the Zscore vs. Pscore comparison at the overall WM
level. We also performed 20 iterations, independently on the dMRI
metric that showed the strongest imbalance in extreme values across
all WM ROIs. This was done for brevity and showcasing the reli-
ability of the Pscore approach on the metric least expected to con-
form to Gaussianity.
To highlight the level of the imbalance in extreme values in
Zscores compared to Pscores, we generated heatmaps, similar to
the ones shown in our pilot study.
10
It is an intuitive way to assess
individuals through a visual representation of the level and direc-
tion in which the extreme values tend to increase. The heatmaps
were generated across all ROIs for the entire population, as well as
for a single iteration of the random sampling consisting of 100 par-
ticipants to reduce the complexity of showcasing 92,160 observa-
tions (46,080 Zscores +46,080 Pscores). Since the pattern was
consistent even in small samples, it can be shown with much better
clarity for 100 participants with 9600 observations (4800 Zscores
+4800 Pscores).
Results
ROI-Wise Comparison of Distributions Across
Different Methods
Figure 2shows some examples of distributions from the
normalization techniques tested, within the body of the cor-
pus callosum (BCC). Distributions from only four dMRI
metrics are shown for one ROI; however, assessments were
made across each ROI per dMRI metric. The FA and PA
values showed a negative skew, while MD and RTPP
showed a positive skew. On the contrary, for all Pscore dis-
tributions, the mean, mode, and median appeared well
aligned and they tended to t closely to the normal density
curve. Therefore, compared to the other normalization
methods tested, and particularly the standardized Zscore
approach, Pscores provided a more symmetric distribution
per ROI. The Supplemental Material provides more exam-
ples from other dMRI metrics across other ROIs (Figs. S1
and S2).
5
Haz et al.: Assessing Individual Deviations Using Pscore
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIGURE 2: Comparing distributions across different normalization methods for two DT and two MAP metrics from a representative ROI: body of the corpus callosum. The DT metrics
are shown in the top row: FA (left) and MD (right); and the MAP metrics are on the bottom row: PA (left) and RTPP (right). For each metric, four distributions are shown. The Raw
(light gray) distribution panel represents the average metric values. The Log(dark gray), Zscore(light red), and Pscore(light blue) distribution panels represent the logarithmic,
standardized Zvalues and the proposed Pscores, respectively. The raw(light gray) distributions demonstrated the presence of skew in the dMRI metrics which caused the mean,
mode, and median to misalign. For instance, FA (top left) and PA (bottom left) were negatively skewed; however, PA was more heavy-tailed and showed greater skewness. The mean
(dashed green) appeared separated from the median (dashed red) and mode (tallest bar). Comparing the histograms with a tted normal density curve (dashed purple) also illustrated
the deviation from a Gaussian distribution. These patterns were also clearly visible for the log transformed (dark gray) and Zscore(light red) distributions at varying levels. For both
FA and PA, the meanunderestimated the most common values and appeared before the median and the mode of the distribution for Raw,”“Log,and Zscorepanels. However,
the Pscorepanel shows that all three central tendencies coincided well and attained a closer t to the normal density curve (dashed purple). On the other hand, MD (top right) and
RTPP (bottom right) were positively skewed and the meanoverestimated the most common values for Raw,”“Log,and Zscorepanels; but the Pscorepanel shows the
consistent alignment of the mean, mode, and median and a closer t to the normal density curve.
6
Journal of Magnetic Resonance Imaging
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Zscore vs. Pscore Distributions Across All ROIs
Figure 3shows the Zscore and Pscore distributions for each
dMRI metric, generated from the data across all WM ROIs
comprising the overall WM. In general, Zscores showed an
overall imbalance of positive and negative values for all met-
rics (less prominent for AD). Zscore distributions for FA, PA,
and NG showed an overall negative skew with <50% negative
and >50% positive values. This pattern was observed to be
progressively worsening for NG with 54% and for PA with
59% positive values, respectively. However, Pscores from all
the three metrics maintained a balanced number of 50% neg-
ative and positive values. The rest of the DT and MAP met-
rics showed the opposite trend for Zscores, with >50% of
total observations being negative. For example, both MD and
RTAP had 51% of all the observations as negative values.
Therefore, MD had at least 460 (1% of 46,000) and RTAP
had at least 437 (1% of 43,776) more negative Zscores than
positive Zscores in their respective distributions. It was worse
for RD, RTOP, and RTPP, because each of their distribu-
tions had 52% negative values, indicating twice as many neg-
ative values than MD and RTAP. All Pscore distributions, on
the other hand, maintained a balanced distribution of 50%
positive and negative values for every single dMRI metric.
Assessing the Imbalance in Extreme Values at
the Tails
Table 1shows the imbalance of extreme values at the tail
ends of the Zscore distributions and the correct balance of
extreme values from the Pscore distributions. For each dMRI
metric, it shows the percentage of extreme values above the
95th (PEV
>95
(%), Z> 1.645) and below the 5th (PEV
<5
(%),
Z<1.645) percentile boundaries for both these normaliza-
tion methods. Except for AD, where the imbalance was negli-
gible but still <5%, Zscores from every other dMRI metric
showed an imbalance of extreme values in the left or right
tails. For example, FA, NG, and PA had progressively larger
imbalance between negative and positive values leading to
negative skews (Fig. 3), and as a result, a subsequent increase
in the percentage of negative extreme values. Compared to
PEV
>95
(%),the PEV
<5
(%)values in these three metrics
were higher by approximately 1.2 (4.9/4.1), 1.9 (5.4/2.8),
and 5.8 (5.8/1) times, respectively. On the other hand, MD,
RD, RTAP, RTOP, and RTAP showed positive skews for
Zscores (Fig. 3), leading to an increase in positive extreme
values. All PEV
>95
(%)values for these metrics were >5%,
while all PEV
<5
(%)values were <5%. Contrarily, for the
same dMRI metrics, the PEV
<5
(%)and PEV
>95
(%)
values for Pscores maintained a balanced 5% extreme values
in both tails.
Bootstrapping the HCP Sample
Figure 4shows a heatmap of PA generated from one iteration
of a random sampling of 100 HCP participants. The heatmap
demonstrates any systematic increase in extreme values in
individuals (columns) across multiple ROIs. It also highlights
any increase in extreme values present in an ROI (rows)
across the normative sample. The heatmap of Zscores (top)
shows a large number of negative extreme values and very few
positive extreme values, whereas the Pscore heatmap (bottom)
shows a proper balance of positive and negative extreme
values, as expected of a normative sample. For simplicity, we
showcase a heatmap comprising only 100 participants.
Table 2shows similar quantities as Table 1, except for
PA, with 20 iterations of random sampling 100 HCP partici-
pants. The rationale to highlight PA was that it showed the
highest imbalance in extreme values at the tails, among all
dMRI metrics (Fig. 3, Table 1). For all 20 iterations, Zscores
of PA showed large proportions of negative extreme values
(all PEV
<5
(%)> 5% and all PEV
>95
(%)< 2%) with on
average >4.3 (6.1/1.4) times more negative extreme values
than positives. Contrarily, Pscores robustly maintained 5%
extreme values in both tails for all 20 iterations. Table S2 in
the Supplemental Material provides a summary of the
bootstrapping assessment of all dMRI metrics across
100 iterations.
Discussion
Using the large-scale, high-resolution HCP dataset, we
showed that the distributions of DT and MAP metrics
derived from dMRI, tend to be non-Gaussian. We also
showed that this may lead to an imbalance in the extreme
values at the tails of a normative distribution, when Zscores
are used. We proposed a novel percentile-based metric, the
Pscore,which was less sensitive to these non-Gaussian dis-
tributions, both at the ROI level and for overall WM. We
further documented the robustness of this method in smaller
samples using bootstrapping. We replicated our previous nd-
ings from a pilot study and systematically validated this
method using the HCP-YA cohort.
Our assessment on the metric distributions showed that
diffusivity (eg, MD, RD) tended to be positively skewed,
whereas anisotropy (eg, FA, PA) tended to be negatively
skewed. It is difcult to unequivocally identify a primary
source for these ndings. However, given the narrow age
range of the examined population, it is unlikely to be due to
aging-related effects that have been invoked for large-scale
public datasets.
1,3,31
A possible explanation for these skewed
distributions could be related to the presence of a small per-
centage of fast diffusing water molecules with isotropic diffu-
sion behavior that have been reported in healthy brain
parenchyma.
32
This fast diffusing water compartment has the
same diffusion signature as cerebrospinal uid (CSF) partial
volume contamination.
32
Moreover, CSF contamination
within a ROI can lead to higher MD (positive skew) because
the diffusivity of CSF is at least four times higher than that of
7
Haz et al.: Assessing Individual Deviations Using Pscore
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIGURE 3: Comparing distributions of Zscores against Pscores across all WM ROIs per dMRI metric. For each dMRI metric, a pair of distributions is shownonewithZscore(top,lightred)
histograms and one for Pscore (bottom, light blue) histograms. The values in these distributions came from concatenating all the scores across all ROIs into a single vector. It was done to
show if there was an overall imbalance in the distribution from the entire WM. A normal density curve (solid red and blue) was also t on top of each distribution, respectively, to assess
which normalization method attained a closer t to a Gaussian distribution. Except for AD, where the Zscore distribution was approximately Gaussian, an imbalance of negative and
positive scores was observed for the distributions from all other dMRI metrics. The percentage of values above/below the zero line is provided on either side of each distribution plot. An
imbalance was indicated by a >50%and <50%valueoneithersideofthezeroline.Anincreaseabove50%isindicated in bold red and a decrease below 50% is indicated in bold blue
numbers. The Zscore distribution of FA (top left, light red), for instance, showed an overall imbalance with 51% of the total observations above 0 and 49% below it. This means there were
at least 460 (1% of 46,080) more positive Zscores than negative Zscores in the distribution. The mean (green vertical dashed line) and the median (red vertical dashed line) also appeared
misaligned. This pattern was also observed for PA and NG distributions. On the contrary, Zscores from MD, RD, RTAP, RTOP, and RTPP showed an overall positive skew, with more
negative (>50%) and less positive (<50%) values around the zero line. Pscores, however, consistently maintained an equal distribution of 50% negative and positive values on either side.
The mean and median lines aligned, and the histograms attained a closer t to the normal density curve (solid blue) compared to the Zscores (solid red).
8
Journal of Magnetic Resonance Imaging
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
the brain parenchyma.
8
On the other hand, the anisotropy
of CSF is virtually 0, which would cause the anisotropy of
the WM tracts to be lower (negative skew). These consider-
ations suggest that skewed values may be inherently driven by
underlying biological characteristics of the brain and not be
necessarily related to heterogeneity in the demographics. One
of our rationales for choosing the HCP young-adult sample
was its well-balanced homogeneity in demographics (eg, age
and sex).
Our ndings may be of particular interest to investigators
assessing individual deviations against a skewed normative data-
base. Normative models are typically built from very large sam-
ples and often incorporate Gaussian process models.
1,3
This
can be counterintuitive as scaling becomes a big challenge, as
sample sizes get much bigger.
3
Some methods have been pro-
posed based on machine learning to address this issue, but with
some caveats of pre-tuning and modeling complexity.
3335
Indeed, one can simply assess a quantity, eg, an individuals
height, by comparing it against the percentiles generated from
asufciently large population. However, aside from other con-
founds, neuroimaging studies are often limited by the sample
size (median N =23, according to reference 36) and typically
consist of <100 participants. In a healthy cohort of 48 controls,
we had previously shown that normative data generated from
dMRI metrics deviate from normality and suffer the conse-
quences of having unbalanced tail ends when Zscores are
used.
10
The study also underscored a key advantage of the
Pscoreapproachits ability to reliably estimate individual
deviations in small samples.
10
Therefore, Pscores offer a practi-
cal solution to studies with more realistic sample goals (eg,
N=50150). It can help investigators to accurately assess
individuals by building site-specic normative databases from
neuroimaging data that do not conform to Gaussianity.
Deviation from Gaussianity leads to biases in the
peripheral centiles of a normative distribution and inaccurate
inferences.
1
Zscores are commonly used for such inferences,
TABLE 1. Assessing the Extreme Value Imbalance in Zscores Compared to Pscores
Metric Method NEV
>95
NEV
<5
N
Total
PEV
>95
(%) PEV
<5
(%)
FA Zscore 1873 2255 46,080 4.1 4.9
Pscore 2304 2304 46,080 5.0 5.0
MD Zscore 2383 2020 46,080 5.2 4.4
Pscore 2304 2303 46,080 5.0 5.0
AD Zscore 2117 2183 46,080 4.6 4.7
Pscore 2305 2304 46,080 5.0 5.0
RD Zscore 2423 1907 46,080 5.3 4.1
Pscore 2304 2304 46,080 5.0 5.0
PA Zscore 450 2527 43,776 1.0 5.8
Pscore 2208 2208 43,776 5.0 5.0
RTAP Zscore 2255 2030 43,776 5.2 4.6
Pscore 2208 2208 43,776 5.0 5.0
RTOP Zscore 2348 1831 43,776 5.4 4.2
Pscore 2208 2208 43,776 5.0 5.0
RTPP Zscore 2419 1753 43,776 5.5 4.0
Pscore 2208 2208 43,776 5.0 5.0
NG Zscore 1244 2364 43,776 2.8 5.4
Pscore 2208 2208 43,776 5.0 5.0
The NEV
>95
and PEV
>95
(%) show the number and percentage of extreme values, respectively, above the 95th percentile edge value of
Z=1.645 from a normal distribution. On the other hand, the NEV
<5
(%) and PEV
<5
(%) show the number and percentage of extreme
values below the 5th percentile edge at Z=1.645. N
Total
is the total number of values across all ROIs in the entire sample.
FA =fractional anisotropy; MD =mean diffusivity; AD =axial diffusivity; RD =radial diffusivity; PA =propagator anisotropy;
RTAP =return to axis probability; RTOP =return of origin probability; RTPP =return to plane probability; NG =non-Gaussianity.
9
Haz et al.: Assessing Individual Deviations Using Pscore
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIGURE 4: Heatmap comparing Zscores and Pscores from a single iteration of randomly selected sample of 100 participants for the
propagator anisotropy. The horizontal black line separates the Zscores (top) from the Pscores (bottom). The columns represent
individual participants, and the rows represent the regions of interest (ROIs). The x-axis shows the labels of the 100 participants
randomly selected from the HCP. The range of Pscores is shown in the colorbar. The parenthesis (in a colorbar tile means that the
corresponding shade is exclusive of the value next to it, whereas the bracket ]means the value is inclusive. For example, the
[1.65,1)label on top of the lightest shade of red means it represents values in the range 1<Z1.65and so on for others.
Darker shades of blue and red represented more extreme negative and positive values, respectively. An individual with a dark blue
tile with the range label (3, ) would correspond to Z<3 from a normal distribution and vice versa for a dark red shade. The
large number of negative extreme values in Zscores (top) was quite conspicuous and a balanced positive and negative extreme
values in the Pscores (bottom) was evident just from visual inspection.
10
Journal of Magnetic Resonance Imaging
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 2. Comparing the Extreme Value Imbalance in Zscores vs. Pscores of Propagator Anisotropy for 20
Iterations of Bootstrapping 100 HCP Participants
Metric Iteration Method NEV
>95
NEV
<5
N
Total
PEV
>95
(%) PEV
<5
(%)
Propagator anisotropy 1 Zscore 56 297 4800 1.2 6.2
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 2 Zscore 80 295 4800 1.7 6.1
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 3 Zscore 60 294 4800 1.3 6.1
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 4 Zscore 83 258 4800 1.7 5.4
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 5 Zscore 78 267 4800 1.6 5.6
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 6 Zscore 63 291 4800 1.3 6.1
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 7 Zscore 70 295 4800 1.5 6.1
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 8 Zscore 80 276 4800 1.7 5.8
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 9 Zscore 75 318 4800 1.6 6.6
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 10 Zscore 55 320 4800 1.1 6.7
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 11 Zscore 57 281 4800 1.2 5.9
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 12 Zscore 60 285 4800 1.3 5.9
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 13 Zscore 49 303 4800 1.0 6.3
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 14 Zscore 80 279 4800 1.7 5.8
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 15 Zscore 74 305 4800 1.5 6.4
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 16 Zscore 70 293 4800 1.5 6.1
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 17 Zscore 81 290 4800 1.7 6.0
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 18 Zscore 69 282 4800 1.4 5.9
11
Haz et al.: Assessing Individual Deviations Using Pscore
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
and we have established the issues with extreme values that
manifest if the issue of Gaussianity is not addressed. There
are approaches that have been adopted to overcome non-
Gaussian traits due to sample heterogeneity, implementing
various modes of Bayesian linear regressions.
1,37
The Pscore
approach addresses the issue of Gaussianity at the level of the
data distribution itself. By leveraging the median as reference
and using percentile ranks, it avoids the imbalance issue at
the tails arising from the asymmetry in the distribution.
Therefore, it can be an important addition to the various
arsenal of contemporary normalization techniques.
38
Pscores
provide a simple platform to normalize raw data and scale
them to standardized Zscores derived from a normal distribu-
tion. This is consequential because conforming to normally
distributed Zscores facilitates statistical relevance to inferences
made on individuals. It allows a meaningful interpretation of
an individuals position, especially when considering the
extreme ends of a distribution.
When an individual, such as a patient, is expected to be
at the extremities of a normative distribution, it can be consid-
ered as a rare event, as most individuals are not expected to be
positioned there. Extreme value theorem and its subtypes can
prove useful in such cases by tting asymmetric curves on
extreme value distributions derived from such rare events.
39
This has been applied in neuroimaging applications to assess
individuals from highly heterogeneous clinical cohorts using
normative modeling.
1,3
The process involves generating norma-
tive probability maps (NPMs) and computing a Zscore per
brain region/voxel that normalizes the difference between the
true normative and predicted individual response with their
corresponding variances. These Zscores are then usually used
to run univariate statistical tests with multiple comparison
adjustments. Our expectation is that Pscores can prove very
useful in these applications and provide more accurate estima-
tions for extreme events compared to Zscores.
Limitations
The Pscore computation relies on percentile ranks of every par-
ticipant in the normative sample. Therefore, the sample size
may be a limiting factor on the condence and precision of the
percentile edges. For example, in our pilot analysis of
48 controls,
10
the smallest resolution of each percentile was
2.083 or 2% (100/48). Since the 5th and 95th percentile
edges were used, there was some statistical uncertainty to the
exact values representing these percentiles. However, the Pscores
still maintained a balanced number of extreme values in the two
tails. This issue obviously recedes as sample sizes get bigger, as
we see in the current context (N =961), where the percentile
resolutions were much ner (0.1%) and allowed very precise
edge measurements. Furthermore, the limitation can be miti-
gated even with smaller samples if precise and even percentile
edge measurements can be performed. Regarding the boot-
strapping step for instance, the percentile resolutions were
exactly even at 1% (100/100) and Pscores robustly maintained
5% extreme values over 100 iterations for all dMRI metrics.
Another limitation for Pscores is the selection of the per-
centile edges. We used the 5th and 95th percentile edges
because they are statistically relevant and allow at least 10% of
extreme values in the whole distribution. Choice of a more
extreme percentile edge may affect the normalization accuracy.
An appropriate alternative could be to look at the area under
the curve, instead of point values at percentile edges. That way
the computation will be done on a continuous scale, and it also
preserves the feature of Pscores to address the issue at the level
of the distribution, without making any prior assumption of
Gaussianity. This is part of our future project using various
samples of multimodal neuroimaging data.
Conclusion
Although the Pscoreapproach was tested on dMRI metrics,
the method itself is not data selective. The Pscoremethod
TABLE 2. Continued
Metric Iteration Method NEV
>95
NEV
<5
N
Total
PEV
>95
(%) PEV
<5
(%)
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 19 Zscore 72 294 4800 1.5 6.1
Pscore 240 240 4800 5.0 5.0
Propagator anisotropy 20 Zscore 75 303 4800 1.6 6.3
Pscore 240 240 4800 5.0 5.0
The NEV
>95
and PEV
>95
(%) show the number and percentage of extreme values, respectively, above the 95th percentile edge value of
Z=1.645 from a normal distribution. On the other hand, the NEV
<5
(%) and PEV
<5
(%) show the number and percentage of extreme
values below the 5th percentile edge at Z=1.645. N
Total
is the total number of values across all ROIs in the entire sample, i.e., 48
ROIs 100 participants =4800 values. For Zscores, the mean PEV <5 %ðÞ¼6:10:3%and that of PEV >95 %ðÞ¼1:40:2%. For
Pscores, the PEV <5 %ðÞ¼PEV >95 %ðÞ¼5:00%.
12
Journal of Magnetic Resonance Imaging
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
instead looks at the data by virtue of its distribution. Pscores
can control the imbalance arising from the non-Gaussianity in
the distribution and more accurately estimate individual devia-
tions from a normative database. The robustness of Pscores
observed for small samples in this study indicates its potential
application to assess individuals in studies with smaller data-
bases. The application of this method on data from clinical
and other modalities, such as structural volumetric and func-
tional MRI, may warrant future investigation because metrics
from these paradigms also suffer from small sample limitations
and non-Gaussian distributions.
Acknowledgments
Data were provided (in part) by the Human Connectome
Project, WU-Minn Consortium (principal investigators:
David Van Essen and Kamil Ugurbil; 1U54MH091657)
funded by the 16 NIH Institutes and Centers that support
the NIH Blueprint for Neuroscience Research, and by the
McDonnell Center for Systems Neuroscience at Washington
University. This study was supported by the Intramural
Research Program, NIH. The authors have no conicts of
interest to disclose. The views, information or content,
and conclusions presented do not necessarily represent the
ofcial position or policy of, nor should anyofcial endorse-
ment be inferred on the part of the Uniformed Services Uni-
versity, the Department ofDefense, the U.S. Government or
the Henry M. Jackson Foundation for the Advancement of
Military Medicine, Inc.
References
1. Fraza CJ, Dinga R, Beckmann CF, Marquand AF. Warped Bayesian lin-
ear regression for normative modelling of big data. Neuroimage 2021;
245:118715.
2. Kjelkenes R, Wolfers T, Alnæs D, et al. Deviations from normative brain
white and gray matter structure are associated with psychopathology in
youth. Dev Cogn Neurosci 2022;58:101173.
3. Marquand AF, Rezek I, Buitelaar J, Beckmann CF. Understanding het-
erogeneity in clinical cohorts using normative models: Beyond case-
control studies. Biol Psychiatry 2016;80(7):552-561.
4. Shirk SD, Mitchell MB, Shaughnessy LW, et al. A web-based normative
calculator for the uniform data set (UDS) neuropsychological test bat-
tery. Alzheimers Res Ther 2011;3(6):32.
5. Sherwood B, Zhou AX, Weintraub S, Wang L. Using quantile regression
to create baseline norms for neuropsychological tests. Alzheimers
Dement (Amst) 2016;2:12-18.
6. Koenker R, Bassett G. Regression quantiles. Econometrica 1978;46(1):
33-50.
7. Basser PJ, Mattiello J, LeBihan D. MR diffusion tensor spectroscopy
and imaging. Biophys J 1994;66(1):259-267.
8. Pierpaoli C, Jezzard P, Basser PJ, Barnett A, Di Chiro G. Diffusion ten-
sor MR imaging of the human brain. Radiology 1996;201(3):637-648.
9. Özarslan E, Koay CG, Shepherd TM, et al. Mean apparent propagator
(MAP) MRI: A novel diffusion imaging method for mapping tissue
microstructure. Neuroimage 2013;78:16-32.
10. Haz R, Nayak A, Irfanoglu MO, Chan L, Pierpaoli C. Using P-scores:
A novel percentile-based normalization method to accurately assess
individual deviation in heavily skewed neuroimaging data. In: 2023
ISMRM & ISMRT Annual Meeting & Exhibition, Toronto, Canada, Pro-
gram Abstract Number #3781; 19 May2023.
11. Sudlow C, Gallacher J, Allen N, et al. UK biobank: An open access
resource for identifying the causes of a wide range of complex diseases
of middle and old age. PLoS Med 2015;12(3):e1001779.
12. Feng C, Wang H, Lu N, et al. Log-transformation and its implications
for data analysis. Shanghai Arch Psychiatry 2014;26(2):105-109.
13. Robert CP, Casella G, Casella G. Monte Carlo statistical methods. New
York: Springer; 1999.
14. Van Essen DC, Ugurbil K, Auerbach E, et al. The Human Connectome
Project: A data acquisition perspective. Neuroimage 2012;62(4):2222-
2231.
15. Irfanoglu MO, Nayak A, Jenkins J, Pierpaoli C. TORTOISE v3: Improve-
ments and new features of the NIH diffusion MRI processing pipeline.
ISMRM 2018;2018.
16. Irfanoglu MO, Nayak A, Taylor P, Pierpaoli C. TORTOISE V4:
ReImagining the NIH diffusion MRI processing pipeline. In: 2023
ISMRM & ISMRT Annual Meeting & Exhibition, Toronto, Canada, Pro-
gram Abstract Number #0080.
17. Veraart J, Fieremans E, Novikov DS. Diffusion MRI noise mapping using
random matrix theory. Magn Reson Med 2016;76(5):1582-1593.
18. Kellner E, Dhital B, Kiselev VG, Reisert M. Gibbs-ringing artifact
removal based on local subvoxel-shifts. Magn Reson Med 2016;76(5):
1574-1581.
19. Irfanoglu MO, Beyh A, Catani M, DellAcqua F, Pierpaoli C.
ReImagining the young adult Human Connectome Project (HCP) diffu-
sion MRI dataset. Proc Int Soc Magn Reson Med 2022;30.
20. Irfanoglu MO, Modi P, Nayak A, Hutchinson EB, Sarlls J, Pierpaoli C.
DR-BUDDI (diffeomorphic registration for blip-up blip-down diffusion
imaging) method for correcting echo planar imaging distortions.
Neuroimage 2015;106:284-299.
21. Gu X, Eklund A. Evaluation of six phase encoding based susceptibility
distortion correction methods for diffusion MRI. Front Neuroinform
2019;13:76.
22. Schilling KG, Blaber J, Huo Y, et al. Synthesized b0 for diffusion distor-
tion correction (Synb0-DisCo). Magn Reson Imaging 2019;64:62-70.
23. Vos SB, Tax CM, Luijten PR, Ourselin S, Leemans A, Froeling M. The
importance of correcting for signal drift in diffusion MRI. Magn Reson
Med 2017;77(1):285-299.
24. Rudrapatna U, Parker GD, Roberts J, Jones DK. A comparative study of
gradient nonlinearity correction strategies for processing diffusion data
obtained with ultra-strong gradient MRI scanners. Magn Reson Med
2021;85(2):1104-1113.
25. Irfanoglu MO, Nayak A, Jenkins J, et al. DR-TAMAS: Diffeomorphic
registration for tensor accurate alignment of anatomical structures.
Neuroimage 2016;132:439-454.
26. Nayak A, Irfanoglu MO, Pierpaoli C. Diffusion MRI atlases from the
Human Connectome Project data. Proc Int Soc Magn Reson Med 2020;
3751.
27. Basser PJ, Pierpaoli C. Microstructural and physiological features of tis-
sues elucidated by quantitative-diffusion-tensor MRI. J Magn Reson B
1996;111(3):209-219.
28. Oishi K, Zilles K, Amunts K, et al. Human brain white matter atlas: Iden-
tication and assignment of common anatomical structures in super-
cial white matter. Neuroimage 2008;43(3):447-457.
29. Yushkevich PA, Piven J, Hazlett HC, et al. User-guided 3D active con-
tour segmentation of anatomical structures: Signicantly improved ef-
ciency and reliability. Neuroimage 2006;31(3):1116-1128.
30. Nayak A, Walker L, Pierpaoli C, The Brain Development Cooperative
Group. Evaluation of pre-dened atlas based ROIs for the analysis of
DTI data in normal brain development. Proc Int Soc Mag Reson Med
2012;20:1872.
13
Haz et al.: Assessing Individual Deviations Using Pscore
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
31. Cox SR, Ritchie SJ, Tucker-Drob EM, et al. Ageing and brain white mat-
ter structure in 3,513 UK biobank participants. Nat Commun 2016;7:
13629.
32. Pierpaoli C, Jones DK. Removing CSF contamination in brain DT-MRIs
by using a two-compartment tensor model. Proc Int Soc Magn Reson
Med 2004;11.
33. Filippone M, Engler R. Enabling scalable stochastic gradient-based
inference for Gaussian processes by employing the Unbiased LInear
System SolvEr (ULISSE). Int Conf Mach Learn. Volume 37: PMLR; 2015.
p. 1015-1024.
34. Snelson E, Ghahramani Z. Sparse Gaussian processes using pseudo-
inputs. Neural Information Processing Systems. 2005.
35. Saatçi Y. Scalable inference for structured Gaussian process models.
Citeseer. 2012.
36. Marek S, Tervo-Clemmens B, Calabro FJ, et al. Reproducible brain-
wide association studies require thousands of individuals. Nature 2022;
603(7902):654-660.
37. Bishop CM. Pattern recognition and machine learning: All just the
facts 101Material. New Delhi: Springer (India) Private Limited; 2013.
38. Casari A, Zheng A. Feature engineering for machine learning.
Sebastopol, CA: OReilly Media, Inc; 2018. p 218.
39. Davison AC. Extreme values. Encyclopedia of Biostatistics. 2005, 4.
14
Journal of Magnetic Resonance Imaging
15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29248 by National Institutes Of Health, Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Article
Importance US government personnel stationed internationally have reported anomalous health incidents (AHIs), with some individuals experiencing persistent debilitating symptoms. Objective To assess the potential presence of magnetic resonance imaging (MRI)–detectable brain lesions in participants with AHIs, with respect to a well-matched control group. Design, Setting, and Participants This exploratory study was conducted at the National Institutes of Health (NIH) Clinical Center and the NIH MRI Research Facility between June 2018 and November 2022. Eighty-one participants with AHIs and 48 age- and sex-matched control participants, 29 of whom had similar employment as the AHI group, were assessed with clinical, volumetric, and functional MRI. A high-quality diffusion MRI scan and a second volumetric scan were also acquired during a different session. The structural MRI acquisition protocol was optimized to achieve high reproducibility. Forty-nine participants with AHIs had at least 1 additional imaging session approximately 6 to 12 months from the first visit. Exposure AHIs. Main Outcomes and Measures Group-level quantitative metrics obtained from multiple modalities: (1) volumetric measurement, voxel-wise and region of interest (ROI)–wise; (2) diffusion MRI–derived metrics, voxel-wise and ROI-wise; and (3) ROI-wise within-network resting-state functional connectivity using functional MRI. Exploratory data analyses used both standard, nonparametric tests and bayesian multilevel modeling. Results Among the 81 participants with AHIs, the mean (SD) age was 42 (9) years and 49% were female; among the 48 control participants, the mean (SD) age was 43 (11) years and 42% were female. Imaging scans were performed as early as 14 days after experiencing AHIs with a median delay period of 80 (IQR, 36-544) days. After adjustment for multiple comparisons, no significant differences between participants with AHIs and control participants were found for any MRI modality. At an unadjusted threshold ( P < .05), compared with control participants, participants with AHIs had lower intranetwork connectivity in the salience networks, a larger corpus callosum, and diffusion MRI differences in the corpus callosum, superior longitudinal fasciculus, cingulum, inferior cerebellar peduncle, and amygdala. The structural MRI measurements were highly reproducible (median coefficient of variation <1% across all global volumetric ROIs and <1.5% for all white matter ROIs for diffusion metrics). Even individuals with large differences from control participants exhibited stable longitudinal results (typically, <±1% across visits), suggesting the absence of evolving lesions. The relationships between the imaging and clinical variables were weak (median Spearman ρ = 0.10). The study did not replicate the results of a previously published investigation of AHIs. Conclusions and Relevance In this exploratory neuroimaging study, there were no significant differences in imaging measures of brain structure or function between individuals reporting AHIs and matched control participants after adjustment for multiple comparisons.
Article
Full-text available
Combining imaging modalities and metrics that are sensitive to various aspects of brain structure and maturation may help identify individuals that show deviations in relation to same-aged peers, and thus benefit early-risk-assessment for mental disorders. We used one timepoint multimodal brain imaging, cognitive, and questionnaire data from 1280 eight-to twenty-one-year-olds from the Philadelphia Neurodevelopmental Cohort. We estimated age-related gray and white matter properties and estimated individual deviation scores using normative modeling. Next, we tested for associations between the estimated deviation scores, and with psy-chopathology domain scores and cognition. More negative deviations in DTI-based fractional anisotropy (FA) and the first principal eigenvalue of the diffusion tensor (L1) were associated with higher scores on psychosis positive and prodromal symptoms and general psychopathology. A more negative deviation in cortical thickness (CT) was associated with a higher general psychopathology score. Negative deviations in global FA, surface area, L1 and CT were also associated with poorer cognitive performance. No robust associations were found between the deviation scores based on CT and DTI. The low correlations between the different multimodal magnetic resonance imaging-based deviation scores suggest that psychopathological burden in adolescence can be mapped onto partly distinct neurobiological features.
Article
Full-text available
Magnetic resonance imaging (MRI) has transformed our understanding of the human brain through well-replicated mapping of abilities to specific structures (for example, lesion studies) and functions1–3 (for example, task functional MRI (fMRI)). Mental health research and care have yet to realize similar advances from MRI. A primary challenge has been replicating associations between inter-individual differences in brain structure or function and complex cognitive or mental health phenotypes (brain-wide association studies (BWAS)). Such BWAS have typically relied on sample sizes appropriate for classical brain mapping⁴ (the median neuroimaging study sample size is about 25), but potentially too small for capturing reproducible brain–behavioural phenotype associations5,6. Here we used three of the largest neuroimaging datasets currently available—with a total sample size of around 50,000 individuals—to quantify BWAS effect sizes and reproducibility as a function of sample size. BWAS associations were smaller than previously thought, resulting in statistically underpowered studies, inflated effect sizes and replication failures at typical sample sizes. As sample sizes grew into the thousands, replication rates began to improve and effect size inflation decreased. More robust BWAS effects were detected for functional MRI (versus structural), cognitive tests (versus mental health questionnaires) and multivariate methods (versus univariate). Smaller than expected brain–phenotype associations and variability across population subsamples can explain widespread BWAS replication failures. In contrast to non-BWAS approaches with larger effects (for example, lesions, interventions and within-person), BWAS reproducibility requires samples with thousands of individuals.
Article
Full-text available
Normative modelling is becoming more popular in neuroimaging due to its ability to make predictions of deviation from a normal trajectory at the level of individual participants. It allows the user to model the distribution of several neuroimaging modalities, giving an estimation for the mean and centiles of variation. With the increase in the availability of big data in neuroimaging, there is a need to scale normative modelling to big data sets. However, the scaling of normative models has come with several challenges. So far, most normative modelling approaches used Gaussian process regression, and although suitable for smaller datasets (up to a few thousand participants) it does not scale well to the large cohorts currently available and being acquired. Furthermore, most neuroimaging modelling methods that are available assume the predictive distribution to be Gaussian in shape. However, deviations from Gaussianity can be frequently found, which may lead to incorrect inferences, particularly in the outer centiles of the distribution. In normative modelling, we use the centiles to give an estimation of the deviation of a particular participant from the ‘normal’ trend. Therefore, especially in normative modelling, the correct estimation of the outer centiles is of utmost importance, which is also where data are sparsest. Here, we present a novel framework based on Bayesian linear regression with likelihood warping that allows us to address these problems, that is, to correctly model non-Gaussian predictive distributions and scale normative modelling elegantly to big data cohorts. In addition, this method provides likelihood-based statistics, which are useful for model selection. To evaluate this framework, we use a range of neuroimaging-derived measures from the UK Biobank study, including image-derived phenotypes (IDPs) and whole-brain voxel-wise measures derived from diffusion tensor imaging. We show good computational scaling and improved accuracy of the warped BLR for certain IDPs and voxels if there was a deviation from normality of these parameters in their residuals. The present results indicate the advantage of a warped BLR in terms of; computational scalability and the flexibility to incorporate non-linearity and non-Gaussianity of the data, giving a wider range of neuroimaging datasets that can be correctly modelled.
Article
Full-text available
Purpose The analysis of diffusion data obtained under large gradient nonlinearities necessitates corrections during data reconstruction and analysis. While two such preprocessing pipelines have been proposed, no comparative studies assessing their performance exist. Furthermore, both pipelines neglect the impact of subject motion during acquisition, which, in the presence of gradient nonlinearities, induces spatio‐temporal B‐matrix variations. Here, spatio‐temporal B‐matrix tracking (STB) is proposed and its performance compared to established pipelines. Methods Diffusion tensor MRI (DT‐MRI) was performed using a 300 mT/m gradient system. Data were acquired with volunteers positioned in regions with pronounced gradient nonlinearities, and used to compare the performance of six different processing pipelines, including STB. Results Up to 30% errors were observed in DT‐MRI parameter estimates when neglecting gradient nonlinearities. Moreover, the order in which B 0 inhomogeneity, eddy current and gradient nonlinearity corrections were performed was found to impact the consistency of parameter estimates significantly. Although, no pipeline emerged as a clear winner, the STB approach seemed to yield the most consistent parameter estimates under large gradient nonlinearities. Conclusions Under large gradient nonlinearities, the choice of preprocessing pipeline significantly impacts the estimated diffusion parameters. Motion‐induced spatio‐temporal B‐matrix variations can lead to systematic bias in the parameter estimates, that can be ameliorated using the proposed STB framework.
Article
Full-text available
Purpose: Susceptibility distortions impact diffusion MRI data analysis and is typically corrected during preprocessing. Correction strategies involve three classes of methods: registration to a structural image, the use of a fieldmap, or the use of images acquired with opposing phase encoding directions. It has been demonstrated that phase encoding based methods outperform the other two classes, but unfortunately, the choice of which phase encoding based method to use is still an open question due to the absence of any systematic comparisons. Methods: In this paper we quantitatively evaluated six popular phase encoding based methods for correcting susceptibility distortions in diffusion MRI data. We employed a framework that allows for the simulation of realistic diffusion MRI data with susceptibility distortions. We evaluated the ability for methods to correct distortions by comparing the corrected data with the ground truth. Four diffusion tensor metrics (FA, MD, eigenvalues and eigenvectors) were calculated from the corrected data and compared with the ground truth. We also validated two popular indirect metrics using both simulated data and real data. The two indirect metrics are the difference between the corrected LR and AP data, and the FA standard deviation over the corrected LR, RL, AP, and PA data. Results: We found that DR-BUDDI and TOPUP offered the most accurate and robust correction compared to the other four methods using both direct and indirect evaluation metrics. EPIC and HySCO performed well in correcting b0 images but produced poor corrections for diffusion weighted volumes, and also they produced large errors for the four diffusion tensor metrics. We also demonstrate that the indirect metric (the difference between corrected LR and AP data) gives a different ordering of correction quality than the direct metric. Conclusion: We suggest researchers to use DR-BUDDI or TOPUP for susceptibility distortion correction. The two indirect metrics (the difference between corrected LR and AP data, and the FA standard deviation) should be interpreted together as a measure of distortion correction quality. The performance ranking of the various tools inferred from direct and indirect metrics differs slightly. However, across all tools, the results of direct and indirect metrics are highly correlated indicating that the analysis of indirect metrics may provide a good proxy of the performance of a correction tool if assessment using direct metrics is not feasible.
Article
Full-text available
Quantifying the microstructural properties of the human brain's connections is necessary for understanding normal ageing and disease. Here we examine brain white matter magnetic resonance imaging (MRI) data in 3,513 generally healthy people aged 44.64–77.12 years from the UK Biobank. Using conventional water diffusion measures and newer, rarely studied indices from neurite orientation dispersion and density imaging, we document large age associations with white matter microstructure. Mean diffusivity is the most age-sensitive measure, with negative age associations strongest in the thalamic radiation and association fibres. White matter microstructure across brain tracts becomes increasingly correlated in older age. This may reflect an age-related aggregation of systemic detrimental effects. We report several other novel results, including age associations with hemisphere and sex, and comparative volumetric MRI analyses. Results from this unusually large, single-scanner sample provide one of the most extensive characterizations of age associations with major white matter tracts in the human brain.
Article
Full-text available
In this work, we propose DR-TAMAS (Diffeomorphic Registration for Tensor Accurate alignMent of Anatomical Structures), a novel framework for intersubject registration of Diffusion Tensor Imaging (DTI) data sets. This framework is optimized for brain data and its main goal is to achieve an accurate alignment of all brain structures, including white matter (WM), gray matter (GM), and spaces containing cerebrospinal fluid (CSF). Currently most DTI-based spatial normalization algorithms emphasize alignment of anisotropic structures. While some diffusion-derived metrics, such as diffusion anisotropy and tensor eigenvector orientation, are highly informative for proper alignment of WM, other tensor metrics such as the trace or mean diffusivity (MD) are fundamental for a proper alignment of GM and CSF boundaries. Moreover, it is desirable to include information from structural MRI data, e.g., T1-weighted or T2-weighted images, which are usually available together with the diffusion data. The fundamental property of DR-TAMAS is to achieve global anatomical accuracy by incorporating in its cost function the most informative metrics locally. Another important feature of DR-TAMAS is a symmetric time-varying velocity-based transformation model, which enables it to account for potentially large anatomical variability in healthy subjects and patients. The performance of DR-TAMAS is evaluated with several data sets and compared with other widely-used diffeomorphic image registration techniques employing both full tensor information and/or DTI-derived scalar maps. Our results show that the proposed method has excellent overall performance in the entire brain, while being equivalent to the best existing methods in WM.
Article
Full-text available
Purpose: To investigate previously unreported effects of signal drift as a result of temporal scanner instability on diffusion MRI data analysis and to propose a method to correct this signal drift. Methods: We investigated the signal magnitude of non-diffusion-weighted EPI volumes in a series of diffusion-weighted imaging experiments to determine whether signal magnitude changes over time. Different scan protocols and scanners from multiple vendors were used to verify this on phantom data, and the effects on diffusion kurtosis tensor estimation in phantom and in vivo data were quantified. Scalar metrics (eigenvalues, fractional anisotropy, mean diffusivity, mean kurtosis) and directional information (first eigenvectors and tractography) were investigated. Results: Signal drift, a global signal decrease with subsequently acquired images in the scan, was observed in phantom data on all three scanners, with varying magnitudes up to 5% in a 15-min scan. The signal drift has a noticeable effect on the estimation of diffusion parameters. All investigated quantitative parameters as well as tractography were affected by this artifactual signal decrease during the scan. Conclusion: By interspersing the non-diffusion-weighted images throughout the session, the signal decrease can be estimated and compensated for before data analysis; minimizing the detrimental effects on subsequent MRI analyses. Magn Reson Med, 2016. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.
Article
Diffusion magnetic resonance images typically suffer from spatial distortions due to susceptibility induced off-resonance fields, which may affect the geometric fidelity of the reconstructed volume and cause mismatches with anatomical images. State-of-the art susceptibility correction (for example, FSL's TOPUP algorithm) typically requires data acquired twice with reverse phase encoding directions, referred to as blip-up blip-down acquisitions, in order to estimate an undistorted volume. Unfortunately, not all imaging protocols include a blip-up blip-down acquisition, and cannot take advantage of the state-of-the art susceptibility and motion correction capabilities. In this study, we aim to enable TOPUP-like processing with historical and/or limited diffusion imaging data that include only a structural image and single blip diffusion image. We utilize deep learning to synthesize an undistorted non-diffusion weighted image from the structural image, and use the non-distorted synthetic image as an anatomical target for distortion correction. We evaluate the efficacy of this approach (named Synb0-DisCo) and show that our distortion correction process results in better matching of the geometry of undistorted anatomical images, reduces variation in diffusion modeling, and is practically equivalent to having both blip-up and blip-down non-diffusion weighted images.