PreprintPDF Available

BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis

Authors:

Abstract and Figures

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies to deploy for data analysis. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data; and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we have developed user-friendly tutorials (in Jupyter format) and exercises for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs, with large memory consumption, we have provided detailed guidance on using high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at hackathons at Princeton, Yale, and Virginia Tech. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.
Content may be subject to copyright.
BrainIAK tutorials: user-friendly learning materials for
1
advanced fMRI analysis
2
3
4
*Manoj Kumar1, Cameron T. Ellis2, Qihong Lu1, Hejia Zhang3, Mihai Capota4, Theodore L.
5
Willke4, Peter J. Ramadge1,3, Nicholas B. Turk-Browne2, Kenneth A. Norman1,5
6
7
1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
8
2Department of Psychology, Yale University, New Haven, CT, USA
9
3Center for Statistics and Machine Learning, Princeton University, NJ, USA
10
4Brain-Inspired Computing Lab, Intel Corporation, Portland, OR, USA
11
5Department of Psychology, Princeton University, Princeton, NJ, USA
12
13
* Corresponding author
14
E-mail: mk35@princeton.edu (MK)
15
Contributions
16
Conceptualization: M.K., C.T.E., P.J.R., N.B.T.-B., K.A.N.; Software: M.K., C.T.E., Q.L., H.Z.,
17
M.C., T.L.W.; Supervision: T.L.W., P.J.R., N.B.T.-B., K.A.N.; Writing Original Draft
18
Preparation: M.K.; Writing Review & Editing: M.K., C.T.E., Q.L., H.Z., M.C., P.J.R., N.B.T.-
19
B., K.A.N.
20
21
22
Abstract
23
24
Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA),
25
functional connectivity, and functional alignment, have become powerful tools in cognitive
26
neuroscience over the past decade. These tools are implemented in custom code and separate
27
packages, often requiring different software and language proficiencies to deploy for data
28
analysis. Although usable by expert researchers, novice users face a steep learning curve. These
29
difficulties stem from the use of new programming languages (e.g., Python), learning how to
30
apply machine-learning methods to high-dimensional fMRI data; and minimal documentation
31
and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL,
32
SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with
33
advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source
34
Python software package that seamlessly integrates several cutting-edge, computationally
35
efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling,
36
visualization, and machine learning. To disseminate these powerful tools, we have developed
37
user-friendly tutorials (in Jupyter format) and exercises for learning BrainIAK and advanced
38
fMRI analysis in Python more generally. These materials cover techniques including: MVPA
39
(pattern classification and representational similarity analysis); parallelized searchlight analysis;
40
background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject
41
functional connectivity; shared response modeling; event segmentation using hidden Markov
42
models; and real-time fMRI. For long-running jobs, with large memory consumption, we have
43
provided detailed guidance on using high-performance computing clusters. These notebooks
44
were successfully tested at multiple sites, including as problem sets for courses at Yale and
45
Princeton universities and at hackathons at Princeton, Yale, and Virginia Tech. These materials
46
are freely shared, with the hope that they become part of a pool of open-source software and
47
educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.
48
49
Introduction
50
The latest methods for analyzing brain activity recorded via functional magnetic resonance
51
imaging (fMRI) are complex to learn and execute. This is particularly true for multivariate
52
pattern analysis (MVPA) methods, which focus on extracting information about a person’s
53
cognitive state (i.e., percepts, thoughts, memories) from spatially and/or temporally distributed
54
patterns of fMRI activity. Beginners and even intermediate users face a steep learning curve and
55
uncertainty in using these complex techniques because of the relative paucity of documentation
56
and guidance about their practical use. Even expert users are hesitant to add new, more advanced
57
techniques, to their existing pipelines and can also face significant software and hardware
58
challenges. These difficulties continue despite the use of MVPA techniques for almost two
59
decades and their wide usefulness and success for a variety of questions in cognitive
60
neuroscience.
61
62
MVPA encompasses a wide range of analyses: from pattern classifiers that map between
63
distributed brain patterns and cognitive states [14], as well as techniques that explore the
64
similarity structure exploited by classifiers (e.g., representational similarity analysis, RSA; [5,6]).
65
There are also related multivariate techniques for functional connectivity and functional
66
alignment, including: full correlation matrix analysis (FCMA; [7]), inter-subject correlation
67
(ISC; [8,9]), inter-subject functional connectivity (ISFC; [10]), shared response modeling (SRM;
68
[11]), and event segmentation [12]. Finally, these analyses can be run after data collection is
69
completed, or in real-time for neurofeedback training or adaptive design optimization [1316].
70
71
One barrier to increasing the accessibility of these techniques was that they were generally
72
created as custom code within individual labs and not part of standard fMRI software packages.
73
To address this, we implemented them in an open-source Python package called BrainIAK. The
74
present tutorials provide structured guidance for learning how to use these techniques.
75
76
In a typical fMRI analysis pipeline, the data are first pre-processed, a general linear model
77
(GLM) might be fit, and then MVPA or other more advanced analyses are performed. For pre-
78
processing and GLM analysis of fMRI data, a number of tutorials and bootcamps are available to
79
learn software packages such as AFNI, FSL, SPM, and fmriprep [1720]. In contrast, for MVPA
80
and more advanced analysis techniques, fewer learning resources are available. We have
81
designed the present learning materials to make it easy for the novice user to learn MVPA and
82
more advanced techniques. An expert user can also use our learning materials to understand
83
BrainIAK’s implementation of these techniques, to train other researchers, and to teach research
84
methods classes.
85
86
There are three main steps to learning and implementing BrainIAK methods: (1) learning to
87
write code and scripts, (2) understanding machine learning algorithms and how to apply them to
88
cognitive neuroscience data, and (3) executing jobs on high-performance compute clusters. We
89
elaborate on each of these steps below.
90
91
First, one needs to learn a programming language, for example, BrainIAK uses Python. This can
92
present a significant challenge to a beginner as learning to program and how to apply these skills
93
to scientific computing is a time-consuming process. Such skills have only recently been added
94
to the curriculum in some psychology and neuroscience departments, and been included as
95
components of hackathons and summer schools. As instructors tend to teach in the language they
96
are most familiar with, different programming languages are often used to teach various
97
techniques making it difficult for users to switch flexibly between different methods.
98
99
Second, the analysis techniques in BrainIAK involve significant use of machine learning
100
algorithms that may be unfamiliar to cognitive neuroscientists. There exist multiple tutorials on
101
machine learning, however, only a few cover the use of machine learning in cognitive
102
neuroscience: the documentation for Nilearn [21], lectures from the MIND summer school,
103
lectures from the Organization for Human Brain Mapping education section and hackathons, and
104
blogs such as MVPA Meanderings. For some of the more cutting-edge techniques in BrainIAK,
105
no tutorials exist or they are taught only as a part of special workshops. Furthermore, the
106
application of general-purpose machine learning algorithms, usually to independently collected
107
cognitive neuroscience data, needs to be done with care as not all data points are independent of
108
each other in space and time; this has led to the insidious problem of circular inference or
109
“double dipping” [22].
110
111
Third, the execution of these programs on high-performance compute clusters is non-trivial even
112
for advanced practitioners who are proficient at executing code on individual machines. Using
113
clusters can accelerate analyses dramatically, but sizing the memory needed and enabling
114
parallel code execution for optimal run-times requires an understanding of how data are
115
processed in a cluster environment. It is a challenge to find training materials on how to run
116
fMRI analyses on a cluster.
117
118
We have created learning materials (herein referred to as tutorials) that address each of the above
119
challenges, making it easier for novice users to learn MVPA and for expert users to learn more
120
advanced BrainIAK analyses, such as FCMA and SRM. To aid learning to code, the tutorials
121
provide an interactive environment to read, write, and execute code. Specifically, for the novice
122
user, a simple way to learn to code is to provide small snippets of code with detailed
123
explanations and a clear goal of what is being accomplished by the code. Our use of Jupyter
124
notebooks allows for detailed explanations of the code with text and figures embedded in-line.
125
The user can execute the code step-by-step and interact with data at each step using plotting
126
functions. In order to ease users into the use of these techniques, we first introduce them to a
127
fully working version of the code. After mastering the simplified version of the code, we
128
encourage users to delve deeper and learn more about helper functions and input/output
129
variables. The expert user, who may wish to examine the details of how the data are being
130
processed, or even modify the code to suit their needs, can readily do so using the open-source
131
Python code contained in the Jupyter notebooks. For all users, we embed background material
132
and references, prompts for further self-study, and programmatic exercises to help them learn
133
how to generate and adapt code.
134
135
To help learn how to apply machine learning algorithms to cognitive neuroscience data, we take
136
advantage of several open-source machine learning tools that are available in Python. For data
137
loading/handling and basic machine learning, we use: Nilearn [21], Nibabel [23], and Scikit-
138
learn [24]. We have included detailed instructions and exercises in the tutorials on avoiding
139
problems of circular inference and double-dipping. We also use tools native to BrainIAK for
140
applying cutting-edge machine learning to fMRI data, including searchlight analysis [25]. An
141
important consideration is how to prepare the data in a suitable format. Publicly available
142
datasets are often in a raw state and need to be pre-processed (e.g., motion correction,
143
registration, and masking) before they can be used for advanced analyses. The pre-processing
144
can take a significant amount of time and add to the burden on the learner. To circumvent this
145
problem, we provide pre-processed data along with our tutorials, making it significantly easier
146
for a novice user to get started and quickly perform a successful analysis.
147
148
Having made it easy to access code and use machine learning algorithms, we encounter the third
149
challenge: running the code efficiently using compute clusters. It can be difficult to take code
150
that works on a laptop and modify it to efficiently leverage the resources of a cluster and scale
151
performance to meet the demands of large datasets. This can be a significant burden on the user
152
and requires specialized expertise to write efficient, properly parallelized code. BrainIAK has
153
built-in tools for making the most of clusters to scale analyses easily. In fact, the same code
154
works seamlessly from a laptop (with a few cores) to clusters (with thousands of cores). For
155
example, searchlight analysis (see [26]) involves running the same MVPA thousands of times at
156
different points in the brain, which can be extremely slow on a laptop or desktop. BrainIAK
157
includes a searchlight function that distributes these jobs on a cluster to run them in parallel. This
158
function can be invoked using a few lines of code and runs seamlessly on any computing
159
hardware. The tutorials give example code for cluster computing that can easily be extended to
160
novel datasets.
161
162
In addition to parallelizing the code, cluster environments can present other complications for
163
learners. In particular, the interactive nature of working on a laptop or desktop is absent when
164
working on a cluster, making troubleshooting difficult. Cluster environments also demand
165
resource allocations up front (i.e., number of cores and amount of memory); increasing memory
166
or extending time during program execution is not permitted, so users need to have a clear
167
understanding of the computational needs of their code. The tutorials use the SLURM scheduler
168
[27] and provide instructions on how to determine the resources required to execute jobs and
169
how to monitor running jobs.
170
171
In summary, we present a set of tutorials created to enable users of all skill levels to learn and
172
deploy advanced multivariate fMRI analysis techniques. In addition to covering the latest
173
incarnation of MVPA [1,5] and associated conventions, we provide best-practice
174
recommendations on optimizing classifiers with strategies to avoid double-dipping. We also
175
cover a range of cutting-edge techniques available in BrainIAK, including searchlight analysis,
176
FCMA, ISC, ISFC, SRM, real-time fMRI, and event segmentation using hidden Markov models.
177
We have released these tutorials publicly and freely.
178
179
Results
180
Our goal was to create user-friendly educational materials that can be used by novice or expert
181
practitioners to learn how to deploy advanced fMRI analyses in their research. The tutorials are
182
written as Jupyter notebooks with detailed explanations provided in the form of text and figures
183
for each section of the code. The execution of the notebooks on a cluster is also made simple. If
184
the requisite software and data are installed on the cluster, a user simply needs to connect to the
185
cluster from their laptop/desktop computer, open a web browser, and access the Jupyter
186
notebooks. Each tutorial notebook has an overarching theme of a scientific question relevant to
187
cognitive neuroscience. The accompanying notebook exercises help the user understand the
188
method and its applicability to the scientific question by requiring that they generate answers or
189
code. These questions are posed and answered in the context of a publicly available fMRI
190
dataset. These datasets are distributed with the tutorials in a ready-to-use (pre-processed) state.
191
192
Once the user has acquired proficiency in executing the notebooks from a browser, we introduce
193
running programs on clusters by submitting scripts as batch jobs. The commands to run such jobs
194
on clusters are first covered in a notebook and associated batch scripts are provided. Each of the
195
notebooks can be run independently. For the beginning and intermediate user, we recommend
196
starting at the first notebook and working through 1-7. After this, the user can choose to focus on
197
a particular method in notebooks 8-13. An advanced user already familiar with Python and
198
machine learning, can start with any notebook in the sequence. For those who are new to using
199
clusters but are otherwise proficient at fMRI analysis, the searchlight notebook is a useful
200
starting point. We describe the contents of each notebook in more detail below:
201
202
Tutorial notebooks
203
At the time of writing there are 13 notebooks available. As time permits, we intend to produce
204
more tutorials as needs or new methods demand.
205
1. Setup: An introductory notebook to help users learn how to work with Jupyter.
206
2. Data handling and normalization: We load fMRI datasets into a Python environment
207
using Nilearn and Nibabel packages. The importance of normalizing the data is shown
208
via an exercise using a simulated dataset.
209
3. Classification: Once the data have been loaded and normalized, the BOLD signal is
210
extracted with a shift to account for hemodynamic lag and classification is performed
211
using a linear classifier. The importance of separating training and test data is
212
emphasized and cross-validation is introduced. The pitfalls of double-dipping are
213
highlighted and the leave-one-run-out approach is covered. We use a category localizer
214
dataset to examine modular vs. distributed processing in the visual system.
215
4. Dimensionality reduction: We introduce principal component analysis (PCA) and
216
explore how to select the number of dimensions. We highlight the importance of using
217
cross-validation to perform feature selection. This approach is used to determine the
218
smallest number of components yielding the best decoding accuracy. We then show
219
how other dimensionality reduction techniques can be substituted into this pipeline.
220
5. Classifier optimization: We use grid search and pipelines from Scikit-learn to tune
221
hyperparameters and perform nested cross-validation. We cover how to handle mild
222
forms of double-dipping (e.g., “peeking” at unlabeled test data by including it in z-
223
scoring) that are often unavoidable, by performing permutation tests with randomized
224
labels.
225
6. Representational similarity analysis (RSA): Using pattern similarity and
226
representational dissimilarity matrices we explore the neural representation of different
227
categories of objects in a way that can be compared to behavioral judgments and
228
computational models, and solve the identity of some unlabeled “mystery” objects.
229
7. Searchlights: We explore where in the brain local areas contain multivariate
230
information that discriminates between faces and scenes. We begin with a small mask
231
to build proficiency and end by running a whole-brain searchlight analysis. We
232
demonstrate how to execute this computationally intensive analysis rapidly on a cluster
233
using batch scripts, and cover resource planning and monitoring of large batch jobs.
234
8. Seed-based functional connectivity: To explore how large-scale brain networks, not
235
just individual regions, contribute to cognitive processing, we examine the temporal
236
correlation (functional connectivity) between regions. We show how connectivity
237
changes during an attention task, and we show how to remove stimulus-evoked
238
responses to isolate background connectivity.
239
9. Full correlation matrix analysis (FCMA): Rather than focus on connectivity with
240
one or more seed regions of interest, we calculate and analyze an unbiased measure of
241
connectivity: the correlation of every voxel in the brain with every other voxel. We
242
highlight differences between FCMA (which classifies based on connectivity)
243
compared to MVPA (which classifies based on activity), including brain regions that
244
are equally active for faces and scenes but are differentially connected.
245
10. Inter-subject connectivity (ISC): We examine what is common across people by
246
measuring correlations over time in the activity of matching voxels in their brains in
247
response to a common stimulus (e.g., story or movie). We can measure functional
248
connectivity across people by correlating non-matching voxels (e.g., between angular
249
gyrus in one subject and hippocampus in another). We show how these techniques can
250
reveal stimulus-driven variance in the brain by comparing listening to intact vs.
251
scrambled stories.
252
11. Shared response model (SRM): A common stimulus across subjects, can be used to
253
align subject brains functionally, rather than typical anatomical registration. SRM seeks
254
to find shared variance in the fMRI data across subjects, in a reduced dimension feature
255
space. This results in weights that map between voxels and features, allowing other
256
data to be projected into the aligned space. SRM can also be viewed as a technique for
257
isolating reliable stimulus-related responses by removing responses that are either noise
258
or idiosyncratic subject responses. We show the utility of this approach by improving
259
time-segment matching in movie data and image classification with MVPA.
260
12. Event segmentation: We use hidden Markov models (HMMs) to identify a sequence
261
of transitions between stable brain patterns in fMRI data. We illustrate how fitting
262
HMMs to data from high-level brain regions (obtained during movie-watching)
263
subdivides the time series into chunks that track events in the movie. We also explore
264
whether retrieving events from memory leads to similar neural transitions.
265
13. Real-time fMRI: Most fMRI studies involve collecting data and analyzing them days
266
or weeks later. By analyzing data on the fly, real-time fMRI makes new kinds of
267
experiments possible, such as neurofeedback training and adaptive designs. We
268
demonstrate the use of an fMRI data simulator, which generates brain images at the
269
rate of an fMRI study (every 1-2 s), and then address how to pre-process data online
270
and how to complete MVPA or other advanced analyses incrementally, before the next
271
brain image.
272
273
Cluster computing
274
Analyses that require either a long run-time or large memory need to be run in batch mode. The
275
Jupyter notebooks for these jobs serve as a template and may be used as the starting point for a
276
batch script. Once the contents of the notebook have been learned, the user is directed to execute
277
batch scripts associated with the notebook on the cluster.
278
279
Executing batch jobs on clusters is non-trivial as it involves allocating the correct memory
280
utilization, number of tasks, and the time required. Given the non-interactive nature of most
281
clusters, debugging performance issues can be challenging. In the Searchlight notebook (#7) we
282
have provided step-by-step instructions for cluster execution. To make the transition to running
283
on clusters easier, we provide recommendations such as running small samples of the analyses
284
and extrapolating to make memory and time estimates for the analysis of the entire dataset. We
285
also provide batch scripts with parameters that can be changed to fit the needs of the user.
286
Finally, we provide some basic tips on how to monitor the status of batch jobs on the clusters.
287
288
Other resources
289
To use the tutorials, a user will need to interact with multiple software tools:
290
1. GitHub: The repository for the tutorials and scripts. The user is not expected to know all
291
the features of GitHub to use the tutorials. We provide simple instructions on how to
292
download the tutorials.
293
2. Python: All programs are written in Python.
294
3. Basic Unix/Linux: To navigate data folders and launch batch scripts.
295
To make it easier for a new user to navigate these tools, we have created a website
296
https://brainiak.org/tutorials/#resources, where a new user can access tutorials and become
297
familiar with these topics.
298
299
Public release
300
These tutorials and their associated datasets can be found here: https://brainiak.org/tutorials and
301
the associated datasets may be downloaded from Zenodo (for more details see here:
302
https://brainiak.org/tutorials/). To use the full set of tutorials a number of other packages need to
303
be installed. The Miniconda package installer is recommended for easy installation and use. We
304
provide detailed installation instructions at the website above. The packages can be installed on
305
compute clusters or on individual machines. The tutorials can also be run on the cloud for free
306
via Google Colaboratory (see https://brainiak.org/tutorials/ for instructions).
307
308
Discussion
309
Our tutorials make it possible for novice users to learn and execute advanced fMRI analyses.
310
Using open-source tools, pre-processed datasets, and detailed instructions for running each
311
analysis technique on compute clusters, we provide a robust framework for learning these
312
advanced methods. The Jupyter notebooks provide an easy interface to execute code in small
313
chunks and thus enable users to learn the material in a step-by-step manner. The tutorials may be
314
run on a cluster or on individual machines, with functions in BrainIAK scaling the execution
315
based on resources available.
316
317
The notebooks and batch scripts can also be modified to run the analysis on a user’s dataset. All
318
that is required is that the data are pre-processed, masks have been created, and, for classification
319
analyses, that labels have been included for each volume. Once these inputs are ready, and the
320
directory paths have been appropriately specified, the analysis can be run on any dataset.
321
322
The tutorials have undergone extensive development and field testing at multiple sites (Princeton
323
University, Yale University, and Virginia Tech) by participants in BrainIAK workshops and
324
hackathons. A version of these tutorials was also used as problem sets for formal courses on
325
advanced fMRI analysis at Yale (Spring 2018, 2019) and Princeton (Fall 2018). The feedback
326
provided by the instructors and students was used to enhance the tutorials and make them more
327
user-friendly.
328
329
In sum, these tutorials contribute to the education needs of cognitive neuroscience community by
330
providing robust resources for learning cutting-edge analyses that are dominating the latest
331
discoveries and publications in the field. These tutorials will also be relevant to more advanced
332
users, by introducing them to the comprehensive and optimized BrainIAK software package, by
333
providing practical training on how to use high-performance computing to accelerate fMRI
334
analyses, and by promoting good software development and open science tools and practices.
335
We plan to develop and release additional tutorials for existing and future methods in BrainIAK,
336
as the field progresses.
337
338
Methods
339
340
Tutorials
341
Our learning materials are built entirely using freely available tools. The tutorials are written in
342
the Python programming language. They are presented as Jupyter notebooks with explanations
343
and figures for each section of the code. Other open-source packages are integrated into these
344
tutorials. For data loading, masking, and writing files in NIFTI format, we use Nibabel and
345
Nilearn. A variety of functions useful for machine learning are called from Scikit-learn. We use
346
functions in BrainIAK to cover the following advanced fMRI analysis techniques: FCMA, ISC,
347
ISFC, SRM, and Event Segmentation. We use the searchlight function in BrainIAK to perform
348
whole-brain multivariate analyses. Each notebook is paired with a publicly-available dataset that
349
is analyzed using the code (see Table 1). These datasets have already been pre-processed using
350
standard steps and parameters, allowing the user to focus on learning the analyses rather than
351
getting bogged down in the pre-processing of fMRI data. We have compiled a condensed version
352
of these datasets, by reducing the number of subjects to make it easy for tutorial use [28]. The
353
results from the analyses are plotted using Matplotlib [29] and Seaborn [30]. For network
354
connectivity diagrams, Networkx [31] and Nxviz were used. To load hdf5 files, the Deepdish
355
package was used. The Watchdog package was used to indicate when new files were created.
356
357
Table 1. The datasets used in the tutorials.
358
Datasets
Source
Used in tutorials
1.
Faces, places, and objects
[32]
1-5,7
2.
Ninety-six objects
[33]
6
3.
Faces and scenes
[34]
7, 9
4.
Lateralized attention
[35]
8
5.
Pieman story
[10]
10
6.
Raiders movie
[36]
11
7.
Raiders images
[36]
11
7.
Sherlock movie
[37]
12
359
BrainIAK
360
BrainIAK is a software library for advanced fMRI analysis co-designed by cognitive
361
neuroscientists and computer scientists. BrainIAK offers a Python interface and is mostly written
362
in Python, but contains optimized code written in Cython and C++. Many of the methods
363
implemented in BrainIAK scale from a laptop to compute clusters using OpenMP [38] and MPI
364
[39] parallel and distributed computing technologies. There are no pre-processing or plotting
365
methods in BrainIAK. Data are exchanged in standard NIFTI and NumPy formats with existing
366
tools such as Nibabel or Nilearn.
367
368
Hardware configurations
369
We have provided detailed instructions on how to configure the tutorials on the different
370
computing platforms below here: https://brainiak.org/tutorials
371
372
Compute clusters: All software packages and datasets may be installed on compute clusters.
373
The clusters we used to test the tutorials used SLURM as the scheduler. We provide scripts to
374
launch Jupyter notebooks on the clusters and connect to the tutorials via a browser tunnel. The
375
sequence of steps to accomplish this is as follows: A command is executed on the cluster to
376
launch a Jupyter notebook with all required software modules and packages, on a specific
377
port. Using a laptop or a desktop computer, each user connects to the cluster using a ssh-tunnel
378
to this port. Once the ssh-tunnel is established, the user launches a browser, connecting to the
379
port. This browser provides access to all of the Jupyter notebooks and enables step-by-step
380
execution of the notebook with the associated dataset. No other installation is required on the
381
user’s computer. For remote servers that don’t use a scheduler, a small modification to the scripts
382
that use a scheduler is needed. We also provide bash scripts for running the tutorials on these
383
remote servers. For long-running jobs that need large amounts of resources on the cluster, we use
384
Python scripts that are submitted to the cluster as batch jobs instead of the more interactive
385
Jupyter notebooks. These scripts are also provided along with the tutorials.
386
387
Laptop or desktop: The tutorials and related packages may also be installed and run on
388
individual laptops or standalone desktop workstations. Depending on the system specifications
389
and the processing/memory requirements of the notebook, performance may be slow.
390
391
Classroom deployment
392
These notebooks were initially developed for research methods courses taught at an advanced
393
undergraduate/graduate level at Yale and Princeton. Each notebook was intentionally designed to
394
be a suitable length for a weekly problem set that would take students between three and twelve
395
hours, depending on the skill level of the student and complexity of the topic. To implement
396
these tutorials in a classroom setting, we configured cluster resources for the class and
397
distributed and collected assigned notebooks using GitHub Classroom. Another feature of
398
GitHub Classroom is that it keeps students’ responses private from other students.
399
400
Acknowledgments
401
402
We would like to thank the following people for help with many aspects of the tutorials: David
403
Turner for helping us with high performance computing; Benjamin Singer for help with software
404
installations and data management; Grant Wallace for cloud configurations and testing; and
405
Daniel Suo for website management. Several individuals contributed to specific tutorials, as
406
listed in the contributions section for each tutorial; we especially thank Chris Baldassano for
407
creating the initial HMM notebook and writing an example script to compute ISC, and Po-Hsuan
408
(Cameron) Chen for providing initial code for the SRM notebook. We would also like to thank
409
the students (at Yale and Princeton Universities) and workshop and hackathon participants (at
410
Yale, Princeton, and Virginia Tech) for their participation and feedback on these materials, Ed
411
Clayton for organizing logistics for the workshops and hackathons, and Jonathan D. Cohen for
412
his overall project oversight. The author order for N.B.T.-B and K.A.N was determined by a coin
413
flip.
414
415
Financial Disclosure Statement
416
Funding for this project was provided by Intel Labs (P.J.R., N.B.T.-B, and K.A.N).
417
418
419
References
420
1. Norman KA, Polyn SM, Detre GJ, Haxby JV. Beyond mind-reading: multi-voxel pattern
421
analysis of fMRI data. Trends in Cognitive Sciences. 2006;10: 424430.
422
doi:10.1016/j.tics.2006.07.005
423
2. Chadwick MJ, Bonnici HM, Maguire EA. Decoding information in the human hippocampus:
424
A user’s guide. Neuropsychologia. 2012;50: 31073121.
425
doi:10.1016/j.neuropsychologia.2012.07.007
426
3. Kriegeskorte N, Kreiman G. Visual Population Codes: Toward a Common Multivariate
427
Framework for Cell Recording and Functional Imaging. MIT Press; 2012.
428
4. Kaplan JT, Man K, Greening SG. Multivariate cross-classification: applying machine
429
learning techniques to characterize abstraction in neural representations. Front Hum
430
Neurosci. 2015;9. doi:10.3389/fnhum.2015.00151
431
5. Kriegeskorte N, Mur M, Bandettini P. Representational Similarity Analysis Connecting the
432
Branches of Systems Neuroscience. Front Syst Neurosci. 2008;2.
433
doi:10.3389/neuro.06.004.2008
434
6. Nili H, Wingfield C, Walther A, Su L, Marslen-Wilson W, Kriegeskorte N. A Toolbox for
435
Representational Similarity Analysis. Prlic A, editor. PLoS Computational Biology. 2014;10:
436
e1003553. doi:10.1371/journal.pcbi.1003553
437
7. Wang Y, Anderson MJ, Cohen JD, Heinecke A, Li K, Satish N, et al. Full correlation matrix
438
analysis of fMRI data on Intel® Xeon PhiTM coprocessors. ACM Press; 2015. pp. 112.
439
doi:10.1145/2807591.2807631
440
8. Hasson U, Nir Y, Levy I, Fuhrmann G, Malach R. Intersubject Synchronization of Cortical
441
Activity During Natural Vision. Science. 2004;303: 16341640.
442
doi:10.1126/science.1089506
443
9. Nastase SA, Gazzola V, Hasson U, Keysers C. Measuring shared responses across subjects
444
using intersubject correlation. bioRxiv. 2019; 600114. doi:10.1101/600114
445
10. Simony E, Honey CJ, Chen J, Lositsky O, Yeshurun Y, Wiesel A, et al. Dynamic
446
reconfiguration of the default mode network during narrative comprehension. Nature
447
Communications. 2016;7: 12141. doi:10.1038/ncomms12141
448
11. Chen P-H (Cameron), Chen J, Yeshurun Y, Hasson U, Haxby J, Ramadge PJ. A Reduced-
449
Dimension fMRI Shared Response Model. In: Cortes C, Lawrence ND, Lee DD, Sugiyama
450
M, Garnett R, editors. Advances in Neural Information Processing Systems 28. Curran
451
Associates, Inc.; 2015. pp. 460468. Available: http://papers.nips.cc/paper/5855-a-reduced-
452
dimension-fmri-shared-response-model.pdf
453
12. Baldassano C, Chen J, Zadbood A, Pillow JW, Hasson U, Norman KA. Discovering Event
454
Structure in Continuous Narrative Perception and Memory. Neuron. 2017;95: 709-721.e5.
455
doi:10.1016/j.neuron.2017.06.041
456
13. deBettencourt MT, Cohen JD, Lee RF, Norman KA, Turk-Browne NB. Closed-loop training
457
of attention with real-time brain imaging. Nature Neuroscience. 2015;18: 470475.
458
doi:10.1038/nn.3940
459
14. Wang Y, Keller B, Capota M, Anderson MJ, Sundaram N, Cohen JD, et al. Real-time full
460
correlation matrix analysis of fMRI data. 2016 IEEE International Conference on Big Data
461
(Big Data). 2016. pp. 12421251. doi:10.1109/BigData.2016.7840728
462
15. Sitaram R, Ros T, Stoeckel L, Haller S, Scharnowski F, Lewis-Peacock J, et al. Closed-loop
463
brain training: the science of neurofeedback. Nature Reviews Neuroscience. 2017;18: 86
464
100. doi:10.1038/nrn.2016.164
465
16. Lorenz R, Hampshire A, Leech R. Neuroadaptive Bayesian Optimization and Hypothesis
466
Testing. Trends in Cognitive Sciences. 2017;21: 155167. doi:10.1016/j.tics.2017.01.006
467
17. Cox RW. AFNI: Software for Analysis and Visualization of Functional Magnetic Resonance
468
Neuroimages. Computers and Biomedical Research. 1996;29: 162173.
469
doi:10.1006/cbmr.1996.0014
470
18. Jenkinson M, Beckmann CF, Behrens TEJ, Woolrich MW, Smith SM. FSL. NeuroImage.
471
2012;62: 782790. doi:10.1016/j.neuroimage.2011.09.015
472
19. Friston KJ, Ashburner J, Kiebel SJ, Nichols TE, Penny WD, editors. Statistical Parametric
473
Mapping: The Analysis of Functional Brain Images [Internet]. Academic Press; 2007.
474
Available: http://store.elsevier.com/product.jsp?isbn=9780123725608
475
20. Esteban O, Markiewicz CJ, Blair RW, Moodie CA, Isik AI, Erramuzpe A, et al. fMRIPrep: a
476
robust preprocessing pipeline for functional MRI. Nature Methods. 2019;16: 111116.
477
doi:10.1038/s41592-018-0235-4
478
21. Abraham A, Pedregosa F, Eickenberg M, Gervais P, Mueller A, Kossaifi J, et al. Machine
479
learning for neuroimaging with scikit-learn. Front Neuroinform. 2014;8.
480
doi:10.3389/fninf.2014.00014
481
22. Kriegeskorte N, Simmons WK, Bellgowan PSF, Baker CI. Circular analysis in systems
482
neuroscience: the dangers of double dipping. Nature Neuroscience. 2009;12: 535540.
483
doi:10.1038/nn.2303
484
23. Matthew Brett, Michael Hanke, Chris Markiewicz, Marc-Alexandre Côté, Paul McCarthy,
485
Chris Cheng, et al. nipy/nibabel: 2.3.1 [Internet]. Zenodo; 2018.
486
doi:10.5281/zenodo.1464282
487
24. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn:
488
Machine Learning in Python. Journal of Machine Learning Research. 2011;12: 28252830.
489
25. Anderson MJ, Capota M, Turek JS, Zhu X, Willke TL, Wang Y, et al. Enabling factor
490
analysis on thousand-subject neuroimaging datasets. IEEE; 2016. pp. 11511160.
491
doi:10.1109/BigData.2016.7840719
492
26. Kriegeskorte N, Goebel R, Bandettini P. Information-based functional brain mapping.
493
Proceedings of the National Academy of Sciences of the United States of America.
494
2006;103: 38633868. Available: http://www.pnas.org/content/103/10/3863.short
495
27. Jette MA, Yoo AB, Grondona M. SLURM: Simple Linux Utility for Resource Management.
496
In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies for Parallel
497
Processing (JSSPP) 2003. Springer-Verlag; 2002. pp. 4460.
498
28. Kumar M, Ellis CT, Lu Q, Zhang H, Ramadge PJ, Turk-Browne NB, et al. BrainIAK
499
Tutorials: Condensed Datasets [Internet]. Zenodo; 2019. doi:10.5281/zenodo.2598755
500
29. Droettboom M, Caswell TA, John Hunter, Eric Firing, Jens Hedegaard Nielsen, Antony Lee,
501
et al. matplotlib/matplotlib v2.2.2 [Internet]. Zenodo; 2018. doi:10.5281/zenodo.1202077
502
30. Michael Waskom, Olga Botvinnik, Drew O’Kane, Paul Hobson, Joel Ostblom, Saulius
503
Lukauskas, et al. mwaskom/seaborn: v0.9.0 (July 2018) [Internet]. Zenodo; 2018.
504
doi:10.5281/zenodo.1313201
505
31. Proceedings of the Python in Science Conference (SciPy): Exploring Network Structure,
506
Dynamics, and Function using NetworkX [Internet]. [cited 4 May 2019]. Available:
507
http://conference.scipy.org/proceedings/SciPy2008/paper_2/
508
32. Kim G, Norman KA, Turk-Browne NB. Neural Overlap In Item Representations Across
509
Episodes Impairs Context Memory. 2017; doi:10.1101/125971
510
33. Kriegeskorte N, Mur M, Ruff DA, Kiani R, Bodurka J, Esteky H, et al. Matching Categorical
511
Object Representations in Inferior Temporal Cortex of Man and Monkey. Neuron. 2008;60:
512
11261141. doi:10.1016/j.neuron.2008.10.043
513
34. Turk-Browne NB, Simon MG, Sederberg PB. Scene Representations in Parahippocampal
514
Cortex Depend on Temporal Context. J Neurosci. 2012;32: 72027207.
515
doi:10.1523/JNEUROSCI.0942-12.2012
516
35. Hutchinson JB, Pak SS, Turk-Browne NB. Biased Competition during Long-term Memory
517
Formation. J Cogn Neurosci. 2016;28: 187197. doi:10.1162/jocn_a_00889
518
36. Haxby JV, Guntupalli JS, Connolly AC, Halchenko YO, Conroy BR, Gobbini MI, et al. A
519
common, high-dimensional model of the representational space in human ventral temporal
520
cortex. Neuron. 2011;72: 404416. doi:10.1016/j.neuron.2011.08.026
521
37. Chen J, Leong YC, Honey CJ, Yong CH, Norman KA, Hasson U. Shared memories reveal
522
shared structure in neural activity across individuals. Nature Neuroscience. 2017;20: 115
523
125. doi:10.1038/nn.4450
524
38. Dagum L, Menon R. OpenMP: An Industry-Standard API for Shared-Memory
525
Programming. IEEE Comput Sci Eng. 1998;5: 4655. doi:10.1109/99.660313
526
39. Forum MP. MPI: A Message-Passing Interface Standard. Knoxville, TN, USA: University of
527
Tennessee; 1994.
528
529
Article
Full-text available
Closed-loop neurofeedback has sparked great interest since its inception in the late 1960s. However, the field has historically faced various methodological challenges. Decoded fMRI neurofeedback may provide solutions to some of these problems. Notably, thanks to the recent advancements of machine learning approaches, it is now possible to target unconscious occurrences of specific multivoxel representations. In this Tools of the trade paper, we discuss how to implement these interventions in rigorous double-blind placebo-controlled experiments. We aim to provide a step-by-step guide to address some of the most common methodological and analytical considerations. We also discuss tools that can be used to facilitate the implementation of new experiments. We hope that this will encourage more researchers to try out this powerful new intervention method.
Article
Full-text available
With advances in methods for collecting and analyzing fMRI data, there is a concurrent need to understand how to reliably evaluate and optimally use these methods. Simulations of fMRI data can aid in both the evaluation of complex designs and the analysis of data. We present fmrisim, a new Python package for standardized, realistic simulation of fMRI data. This package is part of BrainIAK: a recently released open-source Python toolbox for advanced neuroimaging analyses. We describe how to use fmrisim to extract noise properties from real fMRI data and then create a synthetic dataset with matched noise properties and a user-specified signal. We validate the noise generated by fmrisim to show that it can approximate the noise properties of real data. We further show how fmrisim can help researchers find the optimal design in terms of power. The fmrisim package holds promise for improving the design of fMRI experiments, which may facilitate both the pre-registration of such experiments as well as the analysis of fMRI data.
Preprint
Full-text available
Our capacity to jointly represent information about the world underpins our social experience. By leveraging one individual's brain activity to model another's, we can measure shared information across brains—even in dynamic, naturalistic scenarios where an explicit response model may be unobtainable. Introducing experimental manipulations allows us to measure, for example, shared responses between speakers and listeners, or between perception and recall. In this tutorial, we develop the logic of intersubject correlation (ISC) analysis and discuss the family of neuroscientific questions that stem from this approach. We also extend this logic to spatially distributed response patterns and functional network estimation. We provide a thorough and accessible treatment of methodological considerations specific to ISC analysis, and outline best practices.
Article
Full-text available
Preprocessing of functional magnetic resonance imaging (fMRI) involves numerous steps to clean and standardize the data before statistical analysis. Generally, researchers create ad hoc preprocessing workflows for each dataset, building upon a large inventory of available tools. The complexity of these workflows has snowballed with rapid advances in acquisition and processing. We introduce fMRIPrep, an analysis-agnostic tool that addresses the challenge of robust and reproducible preprocessing for fMRI data. fMRIPrep automatically adapts a best-in-breed workflow to the idiosyncrasies of virtually any dataset, ensuring high-quality preprocessing without manual intervention. By introducing visual assessment checkpoints into an iterative integration framework for software testing, we show that fMRIPrep robustly produces high-quality results on a diverse fMRI data collection. Additionally, fMRIPrep introduces less uncontrolled spatial smoothness than observed with commonly used preprocessing tools. fMRIPrep equips neuroscientists with an easy-to-use and transparent preprocessing workflow, which can help ensure the validity of inference and the interpretability of results. © 2018, The Author(s), under exclusive licence to Springer Nature America, Inc.
Preprint
Full-text available
We frequently encounter the same item in different contexts, and when that happens, memories of earlier encounters can get reactivated in the brain. Here we examined how these existing memories are changed as a result of such reactivation. We hypothesized that when an item's initial and subsequent neural representations overlap, this allows the initial item to become associated with novel contextual information, interfering with later retrieval of the initial context. That is, we predicted a negative relationship between representational similarity across repeated experiences of an item and subsequent source memory for the initial context. We tested this hypothesis in an fMRI study, in which objects were presented multiple times during different tasks. We measured the similarity of the neural patterns in lateral occipital cortex that were elicited by the first and second presentations of objects, and related this neural overlap score to source memory in a subsequent test. Consistent with our hypothesis, greater item-specific pattern similarity was linked to worse source memory for the initial task. Our findings suggest that the influence of novel experiences on an existing context memory depends on how reliably a shared component (i.e., same item) is represented across these episodes.
Article
Full-text available
Neurofeedback is a psychophysiological procedure in which online feedback of neural activation is provided to the participant for the purpose of self-regulation. Learning control over specific neural substrates has been shown to change specific behaviours. As a progenitor of brain–machine interfaces, neurofeedback has provided a novel way to investigate brain function and neuroplasticity. In this Review, we examine the mechanisms underlying neurofeedback, which have started to be uncovered. We also discuss how neurofeedback is being used in novel experimental and clinical paradigms from a multidisciplinary perspective, encompassing neuroscientific, neuroengineering and learning-science viewpoints.
Article
Full-text available
The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores.
Article
During realistic, continuous perception, humans automatically segment experiences into discrete events. Using a novel model of cortical event dynamics, we investigate how cortical structures generate event representations during narrative perception and how these events are stored to and retrieved from memory. Our data-driven approach allows us to detect event boundaries as shifts between stable patterns of brain activity without relying on stimulus annotations and reveals a nested hierarchy from short events in sensory regions to long events in high-order areas (including angular gyrus and posterior medial cortex), which represent abstract, multimodal situation models. High-order event boundaries are coupled to increases in hippocampal activity, which predict pattern reinstatement during later free recall. These areas also show evidence of anticipatory reinstatement as subjects listen to a familiar narrative. Based on these results, we propose that brain activity is naturally structured into nested events, which form the basis of long-term memory representations.
Article
Cognitive neuroscientists are often interested in broad research questions, yet use overly narrow experimental designs by considering only a small subset of possible experimental conditions. This limits the generalizability and reproducibility of many research findings. Here, we propose an alternative approach that resolves these problems by taking advantage of recent developments in real-time data analysis and machine learning. Neuroadaptive Bayesian optimization is a powerful strategy to efficiently explore more experimental conditions than is currently possible with standard methodology. We argue that such an approach could broaden the hypotheses considered in cognitive science, improving the generalizability of findings. In addition, Bayesian optimization can be combined with preregistration to cover exploration, mitigating researcher bias more broadly and improving reproducibility.
Article
Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a 50-min movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to 40 min. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar among people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints, and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events.
Conference Paper
Real-time functional magnetic resonance imaging (rtfMRI) is an emerging approach for studying the functioning of the human brain. Computational challenges combined with high data velocity have to this point restricted rtfMRI analyses to studying regions of the brain independently. However , given that neural processing is accomplished via functional interactions among brain regions, neuroscience could stand to benefit from rtfMRI analyses of full-brain interactions. In this paper, we extend such an offline analysis method, full correlation matrix analysis (FCMA), to enable its use in rtfMRI studies. Specifically, we introduce algorithms capable of processing real-time data for all stages of the FCMA machine learning workflow: incremental feature selection , model updating, and real-time classification. We also present an actor-model based distributed system designed to support FCMA and other rtfMRI analysis methods. Experiments show that our system successfully analyzes a stream of brain volumes and returns neurofeedback with less than 180 ms of lag. Our real-time FCMA implementation provides the same accuracy as an optimized offline FCMA toolbox while running 3.6–6.2× faster.