PreprintPDF Available

BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis

May 2019

May 2019

DOI:10.31219/osf.io/j4sbc

Authors:

Manoj Kumar

Princeton University

Qihong Lu

Columbia University

Hejia Zhang

Princeton University

Show all 9 authorsHide

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies to deploy for data analysis. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data; and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we have developed user-friendly tutorials (in Jupyter format) and exercises for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs, with large memory consumption, we have provided detailed guidance on using high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at hackathons at Princeton, Yale, and Virginia Tech. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.

The datasets used in the tutorials. 358

…

Figures - uploaded by Manoj Kumar

Content may be subject to copyright.

Content uploaded by Manoj Kumar

Content may be subject to copyright.

BrainIAK tutorials: user-friendly learning materials for

advanced fMRI analysis

*Manoj Kumar1, Cameron T. Ellis2, Qihong Lu1, Hejia Zhang3, Mihai Capota4, Theodore L.

Willke4, Peter J. Ramadge1,3, Nicholas B. Turk-Browne2, Kenneth A. Norman1,5

1Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA

2Department of Psychology, Yale University, New Haven, CT, USA

3Center for Statistics and Machine Learning, Princeton University, NJ, USA

4Brain-Inspired Computing Lab, Intel Corporation, Portland, OR, USA

5Department of Psychology, Princeton University, Princeton, NJ, USA

* Corresponding author

E-mail: mk35@princeton.edu (MK)

Contributions

Conceptualization: M.K., C.T.E., P.J.R., N.B.T.-B., K.A.N.; Software: M.K., C.T.E., Q.L., H.Z.,

M.C., T.L.W.; Supervision: T.L.W., P.J.R., N.B.T.-B., K.A.N.; Writing – Original Draft

Preparation: M.K.; Writing – Review & Editing: M.K., C.T.E., Q.L., H.Z., M.C., P.J.R., N.B.T.-

B., K.A.N.

Abstract

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA),

functional connectivity, and functional alignment, have become powerful tools in cognitive

neuroscience over the past decade. These tools are implemented in custom code and separate

packages, often requiring different software and language proficiencies to deploy for data

analysis. Although usable by expert researchers, novice users face a steep learning curve. These

difficulties stem from the use of new programming languages (e.g., Python), learning how to

apply machine-learning methods to high-dimensional fMRI data; and minimal documentation

and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL,

SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with

advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source

Python software package that seamlessly integrates several cutting-edge, computationally

efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling,

visualization, and machine learning. To disseminate these powerful tools, we have developed

user-friendly tutorials (in Jupyter format) and exercises for learning BrainIAK and advanced

fMRI analysis in Python more generally. These materials cover techniques including: MVPA

(pattern classification and representational similarity analysis); parallelized searchlight analysis;

background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject

functional connectivity; shared response modeling; event segmentation using hidden Markov

models; and real-time fMRI. For long-running jobs, with large memory consumption, we have

provided detailed guidance on using high-performance computing clusters. These notebooks

were successfully tested at multiple sites, including as problem sets for courses at Yale and

Princeton universities and at hackathons at Princeton, Yale, and Virginia Tech. These materials

are freely shared, with the hope that they become part of a pool of open-source software and

educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.

Introduction

The latest methods for analyzing brain activity recorded via functional magnetic resonance

imaging (fMRI) are complex to learn and execute. This is particularly true for multivariate

pattern analysis (MVPA) methods, which focus on extracting information about a person’s

cognitive state (i.e., percepts, thoughts, memories) from spatially and/or temporally distributed

patterns of fMRI activity. Beginners and even intermediate users face a steep learning curve and

uncertainty in using these complex techniques because of the relative paucity of documentation

and guidance about their practical use. Even expert users are hesitant to add new, more advanced

techniques, to their existing pipelines and can also face significant software and hardware

challenges. These difficulties continue despite the use of MVPA techniques for almost two

decades and their wide usefulness and success for a variety of questions in cognitive

neuroscience.

MVPA encompasses a wide range of analyses: from pattern classifiers that map between

distributed brain patterns and cognitive states [1–4], as well as techniques that explore the

similarity structure exploited by classifiers (e.g., representational similarity analysis, RSA; [5,6]).

There are also related multivariate techniques for functional connectivity and functional

alignment, including: full correlation matrix analysis (FCMA; [7]), inter-subject correlation

(ISC; [8,9]), inter-subject functional connectivity (ISFC; [10]), shared response modeling (SRM;

[11]), and event segmentation [12]. Finally, these analyses can be run after data collection is

completed, or in real-time for neurofeedback training or adaptive design optimization [13–16].

One barrier to increasing the accessibility of these techniques was that they were generally

created as custom code within individual labs and not part of standard fMRI software packages.

To address this, we implemented them in an open-source Python package called BrainIAK. The

present tutorials provide structured guidance for learning how to use these techniques.

In a typical fMRI analysis pipeline, the data are first pre-processed, a general linear model

(GLM) might be fit, and then MVPA or other more advanced analyses are performed. For pre-

processing and GLM analysis of fMRI data, a number of tutorials and bootcamps are available to

learn software packages such as AFNI, FSL, SPM, and fmriprep [17–20]. In contrast, for MVPA

and more advanced analysis techniques, fewer learning resources are available. We have

designed the present learning materials to make it easy for the novice user to learn MVPA and

more advanced techniques. An expert user can also use our learning materials to understand

BrainIAK’s implementation of these techniques, to train other researchers, and to teach research

methods classes.

There are three main steps to learning and implementing BrainIAK methods: (1) learning to

write code and scripts, (2) understanding machine learning algorithms and how to apply them to

cognitive neuroscience data, and (3) executing jobs on high-performance compute clusters. We

elaborate on each of these steps below.

First, one needs to learn a programming language, for example, BrainIAK uses Python. This can

present a significant challenge to a beginner as learning to program and how to apply these skills

to scientific computing is a time-consuming process. Such skills have only recently been added

to the curriculum in some psychology and neuroscience departments, and been included as

components of hackathons and summer schools. As instructors tend to teach in the language they

are most familiar with, different programming languages are often used to teach various

techniques making it difficult for users to switch flexibly between different methods.

Second, the analysis techniques in BrainIAK involve significant use of machine learning

100

algorithms that may be unfamiliar to cognitive neuroscientists. There exist multiple tutorials on

101

machine learning, however, only a few cover the use of machine learning in cognitive

102

neuroscience: the documentation for Nilearn [21], lectures from the MIND summer school,

103

lectures from the Organization for Human Brain Mapping education section and hackathons, and

104

blogs such as MVPA Meanderings. For some of the more cutting-edge techniques in BrainIAK,

105

no tutorials exist or they are taught only as a part of special workshops. Furthermore, the

106

application of general-purpose machine learning algorithms, usually to independently collected

107

cognitive neuroscience data, needs to be done with care as not all data points are independent of

108

each other in space and time; this has led to the insidious problem of circular inference or

109

“double dipping” [22].

110

111

Third, the execution of these programs on high-performance compute clusters is non-trivial even

112

for advanced practitioners who are proficient at executing code on individual machines. Using

113

clusters can accelerate analyses dramatically, but sizing the memory needed and enabling

114

parallel code execution for optimal run-times requires an understanding of how data are

115

processed in a cluster environment. It is a challenge to find training materials on how to run

116

fMRI analyses on a cluster.

117

118

We have created learning materials (herein referred to as tutorials) that address each of the above

119

challenges, making it easier for novice users to learn MVPA and for expert users to learn more

120

advanced BrainIAK analyses, such as FCMA and SRM. To aid learning to code, the tutorials

121

provide an interactive environment to read, write, and execute code. Specifically, for the novice

122

user, a simple way to learn to code is to provide small snippets of code with detailed

123

explanations and a clear goal of what is being accomplished by the code. Our use of Jupyter

124

notebooks allows for detailed explanations of the code with text and figures embedded in-line.

125

The user can execute the code step-by-step and interact with data at each step using plotting

126

functions. In order to ease users into the use of these techniques, we first introduce them to a

127

fully working version of the code. After mastering the simplified version of the code, we

128

encourage users to delve deeper and learn more about helper functions and input/output

129

variables. The expert user, who may wish to examine the details of how the data are being

130

processed, or even modify the code to suit their needs, can readily do so using the open-source

131

Python code contained in the Jupyter notebooks. For all users, we embed background material

132

and references, prompts for further self-study, and programmatic exercises to help them learn

133

how to generate and adapt code.

134

135

To help learn how to apply machine learning algorithms to cognitive neuroscience data, we take

136

advantage of several open-source machine learning tools that are available in Python. For data

137

loading/handling and basic machine learning, we use: Nilearn [21], Nibabel [23], and Scikit-

138

learn [24]. We have included detailed instructions and exercises in the tutorials on avoiding

139

problems of circular inference and double-dipping. We also use tools native to BrainIAK for

140

applying cutting-edge machine learning to fMRI data, including searchlight analysis [25]. An

141

important consideration is how to prepare the data in a suitable format. Publicly available

142

datasets are often in a raw state and need to be pre-processed (e.g., motion correction,

143

registration, and masking) before they can be used for advanced analyses. The pre-processing

144

can take a significant amount of time and add to the burden on the learner. To circumvent this

145

problem, we provide pre-processed data along with our tutorials, making it significantly easier

146

for a novice user to get started and quickly perform a successful analysis.

147

148

Having made it easy to access code and use machine learning algorithms, we encounter the third

149

challenge: running the code efficiently using compute clusters. It can be difficult to take code

150

that works on a laptop and modify it to efficiently leverage the resources of a cluster and scale

151

performance to meet the demands of large datasets. This can be a significant burden on the user

152

and requires specialized expertise to write efficient, properly parallelized code. BrainIAK has

153

built-in tools for making the most of clusters to scale analyses easily. In fact, the same code

154

works seamlessly from a laptop (with a few cores) to clusters (with thousands of cores). For

155

example, searchlight analysis (see [26]) involves running the same MVPA thousands of times at

156

different points in the brain, which can be extremely slow on a laptop or desktop. BrainIAK

157

includes a searchlight function that distributes these jobs on a cluster to run them in parallel. This

158

function can be invoked using a few lines of code and runs seamlessly on any computing

159

hardware. The tutorials give example code for cluster computing that can easily be extended to

160

novel datasets.

161

162

In addition to parallelizing the code, cluster environments can present other complications for

163

learners. In particular, the interactive nature of working on a laptop or desktop is absent when

164

working on a cluster, making troubleshooting difficult. Cluster environments also demand

165

resource allocations up front (i.e., number of cores and amount of memory); increasing memory

166

or extending time during program execution is not permitted, so users need to have a clear

167

understanding of the computational needs of their code. The tutorials use the SLURM scheduler

168

[27] and provide instructions on how to determine the resources required to execute jobs and

169

how to monitor running jobs.

170

171

In summary, we present a set of tutorials created to enable users of all skill levels to learn and

172

deploy advanced multivariate fMRI analysis techniques. In addition to covering the latest

173

incarnation of MVPA [1,5] and associated conventions, we provide best-practice

174

recommendations on optimizing classifiers with strategies to avoid double-dipping. We also

175

cover a range of cutting-edge techniques available in BrainIAK, including searchlight analysis,

176

FCMA, ISC, ISFC, SRM, real-time fMRI, and event segmentation using hidden Markov models.

177

We have released these tutorials publicly and freely.

178

179

Results

180

Our goal was to create user-friendly educational materials that can be used by novice or expert

181

practitioners to learn how to deploy advanced fMRI analyses in their research. The tutorials are

182

written as Jupyter notebooks with detailed explanations provided in the form of text and figures

183

for each section of the code. The execution of the notebooks on a cluster is also made simple. If

184

the requisite software and data are installed on the cluster, a user simply needs to connect to the

185

cluster from their laptop/desktop computer, open a web browser, and access the Jupyter

186

notebooks. Each tutorial notebook has an overarching theme of a scientific question relevant to

187

cognitive neuroscience. The accompanying notebook exercises help the user understand the

188

method and its applicability to the scientific question by requiring that they generate answers or

189

code. These questions are posed and answered in the context of a publicly available fMRI

190

dataset. These datasets are distributed with the tutorials in a ready-to-use (pre-processed) state.

191

192

Once the user has acquired proficiency in executing the notebooks from a browser, we introduce

193

running programs on clusters by submitting scripts as batch jobs. The commands to run such jobs

194

on clusters are first covered in a notebook and associated batch scripts are provided. Each of the

195

notebooks can be run independently. For the beginning and intermediate user, we recommend

196

starting at the first notebook and working through 1-7. After this, the user can choose to focus on

197

a particular method in notebooks 8-13. An advanced user already familiar with Python and

198

machine learning, can start with any notebook in the sequence. For those who are new to using

199

clusters but are otherwise proficient at fMRI analysis, the searchlight notebook is a useful

200

starting point. We describe the contents of each notebook in more detail below:

201

202

Tutorial notebooks

203

At the time of writing there are 13 notebooks available. As time permits, we intend to produce

204

more tutorials as needs or new methods demand.

205

1. Setup: An introductory notebook to help users learn how to work with Jupyter.

206

2. Data handling and normalization: We load fMRI datasets into a Python environment

207

using Nilearn and Nibabel packages. The importance of normalizing the data is shown

208

via an exercise using a simulated dataset.

209

3. Classification: Once the data have been loaded and normalized, the BOLD signal is

210

extracted with a shift to account for hemodynamic lag and classification is performed

211

using a linear classifier. The importance of separating training and test data is

212

emphasized and cross-validation is introduced. The pitfalls of double-dipping are

213

highlighted and the leave-one-run-out approach is covered. We use a category localizer

214

dataset to examine modular vs. distributed processing in the visual system.

215

4. Dimensionality reduction: We introduce principal component analysis (PCA) and

216

explore how to select the number of dimensions. We highlight the importance of using

217

cross-validation to perform feature selection. This approach is used to determine the

218

smallest number of components yielding the “best” decoding accuracy. We then show

219

how other dimensionality reduction techniques can be substituted into this pipeline.

220

5. Classifier optimization: We use grid search and pipelines from Scikit-learn to tune

221

hyperparameters and perform nested cross-validation. We cover how to handle mild

222

forms of double-dipping (e.g., “peeking” at unlabeled test data by including it in z-

223

scoring) that are often unavoidable, by performing permutation tests with randomized

224

labels.

225

6. Representational similarity analysis (RSA): Using pattern similarity and

226

representational dissimilarity matrices we explore the neural representation of different

227

categories of objects in a way that can be compared to behavioral judgments and

228

computational models, and solve the identity of some unlabeled “mystery” objects.

229

7. Searchlights: We explore where in the brain local areas contain multivariate

230

information that discriminates between faces and scenes. We begin with a small mask

231

to build proficiency and end by running a whole-brain searchlight analysis. We

232

demonstrate how to execute this computationally intensive analysis rapidly on a cluster

233

using batch scripts, and cover resource planning and monitoring of large batch jobs.

234

8. Seed-based functional connectivity: To explore how large-scale brain networks, not

235

just individual regions, contribute to cognitive processing, we examine the temporal

236

correlation (functional connectivity) between regions. We show how connectivity

237

changes during an attention task, and we show how to remove stimulus-evoked

238

responses to isolate background connectivity.

239

9. Full correlation matrix analysis (FCMA): Rather than focus on connectivity with

240

one or more seed regions of interest, we calculate and analyze an unbiased measure of

241

connectivity: the correlation of every voxel in the brain with every other voxel. We

242

highlight differences between FCMA (which classifies based on connectivity)

243

compared to MVPA (which classifies based on activity), including brain regions that

244

are equally active for faces and scenes but are differentially connected.

245

10. Inter-subject connectivity (ISC): We examine what is common across people by

246

measuring correlations over time in the activity of matching voxels in their brains in

247

response to a common stimulus (e.g., story or movie). We can measure functional

248

connectivity across people by correlating non-matching voxels (e.g., between angular

249

gyrus in one subject and hippocampus in another). We show how these techniques can

250

reveal stimulus-driven variance in the brain by comparing listening to intact vs.

251

scrambled stories.

252

11. Shared response model (SRM): A common stimulus across subjects, can be used to

253

align subject brains functionally, rather than typical anatomical registration. SRM seeks

254

to find shared variance in the fMRI data across subjects, in a reduced dimension feature

255

space. This results in weights that map between voxels and features, allowing other

256

data to be projected into the aligned space. SRM can also be viewed as a technique for

257

isolating reliable stimulus-related responses by removing responses that are either noise

258

or idiosyncratic subject responses. We show the utility of this approach by improving

259

time-segment matching in movie data and image classification with MVPA.

260

12. Event segmentation: We use hidden Markov models (HMMs) to identify a sequence

261

of transitions between stable brain patterns in fMRI data. We illustrate how fitting

262

HMMs to data from high-level brain regions (obtained during movie-watching)

263

subdivides the time series into chunks that track events in the movie. We also explore

264

whether retrieving events from memory leads to similar neural transitions.

265

13. Real-time fMRI: Most fMRI studies involve collecting data and analyzing them days

266

or weeks later. By analyzing data on the fly, real-time fMRI makes new kinds of

267

experiments possible, such as neurofeedback training and adaptive designs. We

268

demonstrate the use of an fMRI data simulator, which generates brain images at the

269

rate of an fMRI study (every 1-2 s), and then address how to pre-process data online

270

and how to complete MVPA or other advanced analyses incrementally, before the next

271

brain image.

272

273

Cluster computing

274

Analyses that require either a long run-time or large memory need to be run in batch mode. The

275

Jupyter notebooks for these jobs serve as a template and may be used as the starting point for a

276

batch script. Once the contents of the notebook have been learned, the user is directed to execute

277

batch scripts associated with the notebook on the cluster.

278

279

Executing batch jobs on clusters is non-trivial as it involves allocating the correct memory

280

utilization, number of tasks, and the time required. Given the non-interactive nature of most

281

clusters, debugging performance issues can be challenging. In the Searchlight notebook (#7) we

282

have provided step-by-step instructions for cluster execution. To make the transition to running

283

on clusters easier, we provide recommendations such as running small samples of the analyses

284

and extrapolating to make memory and time estimates for the analysis of the entire dataset. We

285

also provide batch scripts with parameters that can be changed to fit the needs of the user.

286

Finally, we provide some basic tips on how to monitor the status of batch jobs on the clusters.

287

288

Other resources

289

To use the tutorials, a user will need to interact with multiple software tools:

290

1. GitHub: The repository for the tutorials and scripts. The user is not expected to know all

291

the features of GitHub to use the tutorials. We provide simple instructions on how to

292

download the tutorials.

293

2. Python: All programs are written in Python.

294

3. Basic Unix/Linux: To navigate data folders and launch batch scripts.

295

To make it easier for a new user to navigate these tools, we have created a website

296

https://brainiak.org/tutorials/#resources, where a new user can access tutorials and become

297

familiar with these topics.

298

299

Public release

300

These tutorials and their associated datasets can be found here: https://brainiak.org/tutorials and

301

the associated datasets may be downloaded from Zenodo (for more details see here:

302

https://brainiak.org/tutorials/). To use the full set of tutorials a number of other packages need to

303

be installed. The Miniconda package installer is recommended for easy installation and use. We

304

provide detailed installation instructions at the website above. The packages can be installed on

305

compute clusters or on individual machines. The tutorials can also be run on the cloud for free

306

via Google Colaboratory (see https://brainiak.org/tutorials/ for instructions).

307

308

Discussion

309

Our tutorials make it possible for novice users to learn and execute advanced fMRI analyses.

310

Using open-source tools, pre-processed datasets, and detailed instructions for running each

311

analysis technique on compute clusters, we provide a robust framework for learning these

312

advanced methods. The Jupyter notebooks provide an easy interface to execute code in small

313

chunks and thus enable users to learn the material in a step-by-step manner. The tutorials may be

314

run on a cluster or on individual machines, with functions in BrainIAK scaling the execution

315

based on resources available.

316

317

The notebooks and batch scripts can also be modified to run the analysis on a user’s dataset. All

318

that is required is that the data are pre-processed, masks have been created, and, for classification

319

analyses, that labels have been included for each volume. Once these inputs are ready, and the

320

directory paths have been appropriately specified, the analysis can be run on any dataset.

321

322

The tutorials have undergone extensive development and field testing at multiple sites (Princeton

323

University, Yale University, and Virginia Tech) by participants in BrainIAK workshops and

324

hackathons. A version of these tutorials was also used as problem sets for formal courses on

325

advanced fMRI analysis at Yale (Spring 2018, 2019) and Princeton (Fall 2018). The feedback

326

provided by the instructors and students was used to enhance the tutorials and make them more

327

user-friendly.

328

329

In sum, these tutorials contribute to the education needs of cognitive neuroscience community by

330

providing robust resources for learning cutting-edge analyses that are dominating the latest

331

discoveries and publications in the field. These tutorials will also be relevant to more advanced

332

users, by introducing them to the comprehensive and optimized BrainIAK software package, by

333

providing practical training on how to use high-performance computing to accelerate fMRI

334

analyses, and by promoting good software development and open science tools and practices.

335

We plan to develop and release additional tutorials for existing and future methods in BrainIAK,

336

as the field progresses.

337

338

Methods

339

340

Tutorials

341

Our learning materials are built entirely using freely available tools. The tutorials are written in

342

the Python programming language. They are presented as Jupyter notebooks with explanations

343

and figures for each section of the code. Other open-source packages are integrated into these

344

tutorials. For data loading, masking, and writing files in NIFTI format, we use Nibabel and

345

Nilearn. A variety of functions useful for machine learning are called from Scikit-learn. We use

346

functions in BrainIAK to cover the following advanced fMRI analysis techniques: FCMA, ISC,

347

ISFC, SRM, and Event Segmentation. We use the searchlight function in BrainIAK to perform

348

whole-brain multivariate analyses. Each notebook is paired with a publicly-available dataset that

349

is analyzed using the code (see Table 1). These datasets have already been pre-processed using

350

standard steps and parameters, allowing the user to focus on learning the analyses rather than

351

getting bogged down in the pre-processing of fMRI data. We have compiled a condensed version

352

of these datasets, by reducing the number of subjects to make it easy for tutorial use [28]. The

353

results from the analyses are plotted using Matplotlib [29] and Seaborn [30]. For network

354

connectivity diagrams, Networkx [31] and Nxviz were used. To load hdf5 files, the Deepdish

355

package was used. The Watchdog package was used to indicate when new files were created.

356

357

Table 1. The datasets used in the tutorials.

358

Datasets

Source

Used in tutorials

Faces, places, and objects

[32]

1-5,7

Ninety-six objects

[33]

Faces and scenes

[34]

7, 9

Lateralized attention

[35]

Pieman story

[10]

Raiders movie

[36]

Raiders images

[36]

Sherlock movie

[37]

359

BrainIAK

360

BrainIAK is a software library for advanced fMRI analysis co-designed by cognitive

361

neuroscientists and computer scientists. BrainIAK offers a Python interface and is mostly written

362

in Python, but contains optimized code written in Cython and C++. Many of the methods

363

implemented in BrainIAK scale from a laptop to compute clusters using OpenMP [38] and MPI

364

[39] parallel and distributed computing technologies. There are no pre-processing or plotting

365

methods in BrainIAK. Data are exchanged in standard NIFTI and NumPy formats with existing

366

tools such as Nibabel or Nilearn.

367

368

Hardware configurations

369

We have provided detailed instructions on how to configure the tutorials on the different

370

computing platforms below here: https://brainiak.org/tutorials

371

372

Compute clusters: All software packages and datasets may be installed on compute clusters.

373

The clusters we used to test the tutorials used SLURM as the scheduler. We provide scripts to

374

launch Jupyter notebooks on the clusters and connect to the tutorials via a browser tunnel. The

375

sequence of steps to accomplish this is as follows: A command is executed on the cluster to

376

launch a Jupyter notebook with all required software modules and packages, on a specific

377

port. Using a laptop or a desktop computer, each user connects to the cluster using a ssh-tunnel

378

to this port. Once the ssh-tunnel is established, the user launches a browser, connecting to the

379

port. This browser provides access to all of the Jupyter notebooks and enables step-by-step

380

execution of the notebook with the associated dataset. No other installation is required on the

381

user’s computer. For remote servers that don’t use a scheduler, a small modification to the scripts

382

that use a scheduler is needed. We also provide bash scripts for running the tutorials on these

383

remote servers. For long-running jobs that need large amounts of resources on the cluster, we use

384

Python scripts that are submitted to the cluster as batch jobs instead of the more interactive

385

Jupyter notebooks. These scripts are also provided along with the tutorials.

386

387

Laptop or desktop: The tutorials and related packages may also be installed and run on

388

individual laptops or standalone desktop workstations. Depending on the system specifications

389

and the processing/memory requirements of the notebook, performance may be slow.

390

391

Classroom deployment

392

These notebooks were initially developed for research methods courses taught at an advanced

393

undergraduate/graduate level at Yale and Princeton. Each notebook was intentionally designed to

394

be a suitable length for a weekly problem set that would take students between three and twelve

395

hours, depending on the skill level of the student and complexity of the topic. To implement

396

these tutorials in a classroom setting, we configured cluster resources for the class and

397

distributed and collected assigned notebooks using GitHub Classroom. Another feature of

398

GitHub Classroom is that it keeps students’ responses private from other students.

399

400

Acknowledgments

401

402

We would like to thank the following people for help with many aspects of the tutorials: David

403

Turner for helping us with high performance computing; Benjamin Singer for help with software

404

installations and data management; Grant Wallace for cloud configurations and testing; and

405

Daniel Suo for website management. Several individuals contributed to specific tutorials, as

406

listed in the contributions section for each tutorial; we especially thank Chris Baldassano for

407

creating the initial HMM notebook and writing an example script to compute ISC, and Po-Hsuan

408

(Cameron) Chen for providing initial code for the SRM notebook. We would also like to thank

409

the students (at Yale and Princeton Universities) and workshop and hackathon participants (at

410

Yale, Princeton, and Virginia Tech) for their participation and feedback on these materials, Ed

411

Clayton for organizing logistics for the workshops and hackathons, and Jonathan D. Cohen for

412

his overall project oversight. The author order for N.B.T.-B and K.A.N was determined by a coin

413

flip.

414

415

Financial Disclosure Statement

416

Funding for this project was provided by Intel Labs (P.J.R., N.B.T.-B, and K.A.N).

417

418

419

References

420

1. Norman KA, Polyn SM, Detre GJ, Haxby JV. Beyond mind-reading: multi-voxel pattern

421

analysis of fMRI data. Trends in Cognitive Sciences. 2006;10: 424–430.

422

doi:10.1016/j.tics.2006.07.005

423

2. Chadwick MJ, Bonnici HM, Maguire EA. Decoding information in the human hippocampus:

424

A user’s guide. Neuropsychologia. 2012;50: 3107–3121.

425

doi:10.1016/j.neuropsychologia.2012.07.007

426

3. Kriegeskorte N, Kreiman G. Visual Population Codes: Toward a Common Multivariate

427

Framework for Cell Recording and Functional Imaging. MIT Press; 2012.

428

4. Kaplan JT, Man K, Greening SG. Multivariate cross-classification: applying machine

429

learning techniques to characterize abstraction in neural representations. Front Hum

430

Neurosci. 2015;9. doi:10.3389/fnhum.2015.00151

431

5. Kriegeskorte N, Mur M, Bandettini P. Representational Similarity Analysis – Connecting the

432

Branches of Systems Neuroscience. Front Syst Neurosci. 2008;2.

433

doi:10.3389/neuro.06.004.2008

434

6. Nili H, Wingfield C, Walther A, Su L, Marslen-Wilson W, Kriegeskorte N. A Toolbox for

435

Representational Similarity Analysis. Prlic A, editor. PLoS Computational Biology. 2014;10:

436

e1003553. doi:10.1371/journal.pcbi.1003553

437

7. Wang Y, Anderson MJ, Cohen JD, Heinecke A, Li K, Satish N, et al. Full correlation matrix

438

analysis of fMRI data on Intel® Xeon PhiTM coprocessors. ACM Press; 2015. pp. 1–12.

439

doi:10.1145/2807591.2807631

440

8. Hasson U, Nir Y, Levy I, Fuhrmann G, Malach R. Intersubject Synchronization of Cortical

441

Activity During Natural Vision. Science. 2004;303: 1634–1640.

442

doi:10.1126/science.1089506

443

9. Nastase SA, Gazzola V, Hasson U, Keysers C. Measuring shared responses across subjects

444

using intersubject correlation. bioRxiv. 2019; 600114. doi:10.1101/600114

445

10. Simony E, Honey CJ, Chen J, Lositsky O, Yeshurun Y, Wiesel A, et al. Dynamic

446

reconfiguration of the default mode network during narrative comprehension. Nature

447

Communications. 2016;7: 12141. doi:10.1038/ncomms12141

448

11. Chen P-H (Cameron), Chen J, Yeshurun Y, Hasson U, Haxby J, Ramadge PJ. A Reduced-

449

Dimension fMRI Shared Response Model. In: Cortes C, Lawrence ND, Lee DD, Sugiyama

450

M, Garnett R, editors. Advances in Neural Information Processing Systems 28. Curran

451

Associates, Inc.; 2015. pp. 460–468. Available: http://papers.nips.cc/paper/5855-a-reduced-

452

dimension-fmri-shared-response-model.pdf

453

12. Baldassano C, Chen J, Zadbood A, Pillow JW, Hasson U, Norman KA. Discovering Event

454

Structure in Continuous Narrative Perception and Memory. Neuron. 2017;95: 709-721.e5.

455

doi:10.1016/j.neuron.2017.06.041

456

13. deBettencourt MT, Cohen JD, Lee RF, Norman KA, Turk-Browne NB. Closed-loop training

457

of attention with real-time brain imaging. Nature Neuroscience. 2015;18: 470–475.

458

doi:10.1038/nn.3940

459

14. Wang Y, Keller B, Capota M, Anderson MJ, Sundaram N, Cohen JD, et al. Real-time full

460

correlation matrix analysis of fMRI data. 2016 IEEE International Conference on Big Data

461

(Big Data). 2016. pp. 1242–1251. doi:10.1109/BigData.2016.7840728

462

15. Sitaram R, Ros T, Stoeckel L, Haller S, Scharnowski F, Lewis-Peacock J, et al. Closed-loop

463

brain training: the science of neurofeedback. Nature Reviews Neuroscience. 2017;18: 86–

464

100. doi:10.1038/nrn.2016.164

465

16. Lorenz R, Hampshire A, Leech R. Neuroadaptive Bayesian Optimization and Hypothesis

466

Testing. Trends in Cognitive Sciences. 2017;21: 155–167. doi:10.1016/j.tics.2017.01.006

467

17. Cox RW. AFNI: Software for Analysis and Visualization of Functional Magnetic Resonance

468

Neuroimages. Computers and Biomedical Research. 1996;29: 162–173.

469

doi:10.1006/cbmr.1996.0014

470

18. Jenkinson M, Beckmann CF, Behrens TEJ, Woolrich MW, Smith SM. FSL. NeuroImage.

471

2012;62: 782–790. doi:10.1016/j.neuroimage.2011.09.015

472

19. Friston KJ, Ashburner J, Kiebel SJ, Nichols TE, Penny WD, editors. Statistical Parametric

473

Mapping: The Analysis of Functional Brain Images [Internet]. Academic Press; 2007.

474

Available: http://store.elsevier.com/product.jsp?isbn=9780123725608

475

20. Esteban O, Markiewicz CJ, Blair RW, Moodie CA, Isik AI, Erramuzpe A, et al. fMRIPrep: a

476

robust preprocessing pipeline for functional MRI. Nature Methods. 2019;16: 111–116.

477

doi:10.1038/s41592-018-0235-4

478

21. Abraham A, Pedregosa F, Eickenberg M, Gervais P, Mueller A, Kossaifi J, et al. Machine

479

learning for neuroimaging with scikit-learn. Front Neuroinform. 2014;8.

480

doi:10.3389/fninf.2014.00014

481

22. Kriegeskorte N, Simmons WK, Bellgowan PSF, Baker CI. Circular analysis in systems

482

neuroscience: the dangers of double dipping. Nature Neuroscience. 2009;12: 535–540.

483

doi:10.1038/nn.2303

484

23. Matthew Brett, Michael Hanke, Chris Markiewicz, Marc-Alexandre Côté, Paul McCarthy,

485

Chris Cheng, et al. nipy/nibabel: 2.3.1 [Internet]. Zenodo; 2018.

486

doi:10.5281/zenodo.1464282

487

24. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn:

488

Machine Learning in Python. Journal of Machine Learning Research. 2011;12: 2825–2830.

489

25. Anderson MJ, Capota M, Turek JS, Zhu X, Willke TL, Wang Y, et al. Enabling factor

490

analysis on thousand-subject neuroimaging datasets. IEEE; 2016. pp. 1151–1160.

491

doi:10.1109/BigData.2016.7840719

492

26. Kriegeskorte N, Goebel R, Bandettini P. Information-based functional brain mapping.

493

Proceedings of the National Academy of Sciences of the United States of America.

494

2006;103: 3863–3868. Available: http://www.pnas.org/content/103/10/3863.short

495

27. Jette MA, Yoo AB, Grondona M. SLURM: Simple Linux Utility for Resource Management.

496

In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies for Parallel

497

Processing (JSSPP) 2003. Springer-Verlag; 2002. pp. 44–60.

498

28. Kumar M, Ellis CT, Lu Q, Zhang H, Ramadge PJ, Turk-Browne NB, et al. BrainIAK

499

Tutorials: Condensed Datasets [Internet]. Zenodo; 2019. doi:10.5281/zenodo.2598755

500

29. Droettboom M, Caswell TA, John Hunter, Eric Firing, Jens Hedegaard Nielsen, Antony Lee,

501

et al. matplotlib/matplotlib v2.2.2 [Internet]. Zenodo; 2018. doi:10.5281/zenodo.1202077

502

30. Michael Waskom, Olga Botvinnik, Drew O’Kane, Paul Hobson, Joel Ostblom, Saulius

503

Lukauskas, et al. mwaskom/seaborn: v0.9.0 (July 2018) [Internet]. Zenodo; 2018.

504

doi:10.5281/zenodo.1313201

505

31. Proceedings of the Python in Science Conference (SciPy): Exploring Network Structure,

506

Dynamics, and Function using NetworkX [Internet]. [cited 4 May 2019]. Available:

507

http://conference.scipy.org/proceedings/SciPy2008/paper_2/

508

32. Kim G, Norman KA, Turk-Browne NB. Neural Overlap In Item Representations Across

509

Episodes Impairs Context Memory. 2017; doi:10.1101/125971

510

33. Kriegeskorte N, Mur M, Ruff DA, Kiani R, Bodurka J, Esteky H, et al. Matching Categorical

511

Object Representations in Inferior Temporal Cortex of Man and Monkey. Neuron. 2008;60:

512

1126–1141. doi:10.1016/j.neuron.2008.10.043

513

34. Turk-Browne NB, Simon MG, Sederberg PB. Scene Representations in Parahippocampal

514

Cortex Depend on Temporal Context. J Neurosci. 2012;32: 7202–7207.

515

doi:10.1523/JNEUROSCI.0942-12.2012

516

35. Hutchinson JB, Pak SS, Turk-Browne NB. Biased Competition during Long-term Memory

517

Formation. J Cogn Neurosci. 2016;28: 187–197. doi:10.1162/jocn_a_00889

518

36. Haxby JV, Guntupalli JS, Connolly AC, Halchenko YO, Conroy BR, Gobbini MI, et al. A

519

common, high-dimensional model of the representational space in human ventral temporal

520

cortex. Neuron. 2011;72: 404–416. doi:10.1016/j.neuron.2011.08.026

521

37. Chen J, Leong YC, Honey CJ, Yong CH, Norman KA, Hasson U. Shared memories reveal

522

shared structure in neural activity across individuals. Nature Neuroscience. 2017;20: 115–

523

125. doi:10.1038/nn.4450

524

38. Dagum L, Menon R. OpenMP: An Industry-Standard API for Shared-Memory

525

Programming. IEEE Comput Sci Eng. 1998;5: 46–55. doi:10.1109/99.660313

526

39. Forum MP. MPI: A Message-Passing Interface Standard. Knoxville, TN, USA: University of

527

Tennessee; 1994.

528

529

Conducting Decoded Neurofeedback Studies

Article

Full-text available

May 2020
SOC COGN AFFECT NEUR

Closed-loop neurofeedback has sparked great interest since its inception in the late 1960s. However, the field has historically faced various methodological challenges. Decoded fMRI neurofeedback may provide solutions to some of these problems. Notably, thanks to the recent advancements of machine learning approaches, it is now possible to target unconscious occurrences of specific multivoxel representations. In this Tools of the trade paper, we discuss how to implement these interventions in rigorous double-blind placebo-controlled experiments. We aim to provide a step-by-step guide to address some of the most common methodological and analytical considerations. We also discuss tools that can be used to facilitate the implementation of new experiments. We hope that this will encourage more researchers to try out this powerful new intervention method.

Facilitating open-science with realistic fMRI simulation: validation and application

Article

Full-text available

Feb 2020

With advances in methods for collecting and analyzing fMRI data, there is a concurrent need to understand how to reliably evaluate and optimally use these methods. Simulations of fMRI data can aid in both the evaluation of complex designs and the analysis of data. We present fmrisim, a new Python package for standardized, realistic simulation of fMRI data. This package is part of BrainIAK: a recently released open-source Python toolbox for advanced neuroimaging analyses. We describe how to use fmrisim to extract noise properties from real fMRI data and then create a synthetic dataset with matched noise properties and a user-specified signal. We validate the noise generated by fmrisim to show that it can approximate the noise properties of real data. We further show how fmrisim can help researchers find the optimal design in terms of power. The fmrisim package holds promise for improving the design of fMRI experiments, which may facilitate both the pre-registration of such experiments as well as the analysis of fMRI data.

Measuring shared responses across subjects using intersubject correlation

Preprint

Full-text available

Apr 2019

Our capacity to jointly represent information about the world underpins our social experience. By leveraging one individual's brain activity to model another's, we can measure shared information across brains—even in dynamic, naturalistic scenarios where an explicit response model may be unobtainable. Introducing experimental manipulations allows us to measure, for example, shared responses between speakers and listeners, or between perception and recall. In this tutorial, we develop the logic of intersubject correlation (ISC) analysis and discuss the family of neuroscientific questions that stem from this approach. We also extend this logic to spatially distributed response patterns and functional network estimation. We provide a thorough and accessible treatment of methodological considerations specific to ISC analysis, and outline best practices.

fMRIPrep: a robust preprocessing pipeline for functional MRI

Article

Full-text available

Jan 2019
Br J Pharmacol

Preprocessing of functional magnetic resonance imaging (fMRI) involves numerous steps to clean and standardize the data before statistical analysis. Generally, researchers create ad hoc preprocessing workflows for each dataset, building upon a large inventory of available tools. The complexity of these workflows has snowballed with rapid advances in acquisition and processing. We introduce fMRIPrep, an analysis-agnostic tool that addresses the challenge of robust and reproducible preprocessing for fMRI data. fMRIPrep automatically adapts a best-in-breed workflow to the idiosyncrasies of virtually any dataset, ensuring high-quality preprocessing without manual intervention. By introducing visual assessment checkpoints into an iterative integration framework for software testing, we show that fMRIPrep robustly produces high-quality results on a diverse fMRI data collection. Additionally, fMRIPrep introduces less uncontrolled spatial smoothness than observed with commonly used preprocessing tools. fMRIPrep equips neuroscientists with an easy-to-use and transparent preprocessing workflow, which can help ensure the validity of inference and the interpretability of results. © 2018, The Author(s), under exclusive licence to Springer Nature America, Inc.

Neural overlap in item representations across episodes impairs context memory

Preprint

Full-text available

Apr 2017

We frequently encounter the same item in different contexts, and when that happens, memories of earlier encounters can get reactivated in the brain. Here we examined how these existing memories are changed as a result of such reactivation. We hypothesized that when an item's initial and subsequent neural representations overlap, this allows the initial item to become associated with novel contextual information, interfering with later retrieval of the initial context. That is, we predicted a negative relationship between representational similarity across repeated experiences of an item and subsequent source memory for the initial context. We tested this hypothesis in an fMRI study, in which objects were presented multiple times during different tasks. We measured the similarity of the neural patterns in lateral occipital cortex that were elicited by the first and second presentations of objects, and related this neural overlap score to source memory in a subsequent test. Consistent with our hypothesis, greater item-specific pattern similarity was linked to worse source memory for the initial task. Our findings suggest that the influence of novel experiences on an existing context memory depends on how reliably a shared component (i.e., same item) is represented across these episodes.

Special Issue - MPI - A Message Passing Interface Standard

Article

Full-text available

Jan 1994

Closed-loop brain training: The science of neurofeedback

Article

Full-text available

Dec 2016
NAT REV NEUROSCI

Neurofeedback is a psychophysiological procedure in which online feedback of neural activation is provided to the participant for the purpose of self-regulation. Learning control over specific neural substrates has been shown to change specific behaviours. As a progenitor of brain–machine interfaces, neurofeedback has provided a novel way to investigate brain function and neuroplasticity. In this Review, we examine the mechanisms underlying neurofeedback, which have started to be uncovered. We also discuss how neurofeedback is being used in novel experimental and clinical paradigms from a multidisciplinary perspective, encompassing neuroscientific, neuroengineering and learning-science viewpoints.

Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets

Article

Full-text available

Aug 2016

The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores.

Discovering Event Structure in Continuous Narrative Perception and Memory

Article

Aug 2017

During realistic, continuous perception, humans automatically segment experiences into discrete events. Using a novel model of cortical event dynamics, we investigate how cortical structures generate event representations during narrative perception and how these events are stored to and retrieved from memory. Our data-driven approach allows us to detect event boundaries as shifts between stable patterns of brain activity without relying on stimulus annotations and reveals a nested hierarchy from short events in sensory regions to long events in high-order areas (including angular gyrus and posterior medial cortex), which represent abstract, multimodal situation models. High-order event boundaries are coupled to increases in hippocampal activity, which predict pattern reinstatement during later free recall. These areas also show evidence of anticipatory reinstatement as subjects listen to a familiar narrative. Based on these results, we propose that brain activity is naturally structured into nested events, which form the basis of long-term memory representations.

Neuroadaptive Bayesian Optimization and Hypothesis Testing

Article

Feb 2017

Cognitive neuroscientists are often interested in broad research questions, yet use overly narrow experimental designs by considering only a small subset of possible experimental conditions. This limits the generalizability and reproducibility of many research findings. Here, we propose an alternative approach that resolves these problems by taking advantage of recent developments in real-time data analysis and machine learning. Neuroadaptive Bayesian optimization is a powerful strategy to efficiently explore more experimental conditions than is currently possible with standard methodology. We argue that such an approach could broaden the hypotheses considered in cognitive science, improving the generalizability of findings. In addition, Bayesian optimization can be combined with preregistration to cover exploration, mitigating researcher bias more broadly and improving reproducibility.

Shared memories reveal shared structure in neural activity across individuals

Article

Dec 2016

Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a 50-min movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to 40 min. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar among people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints, and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events.

Real-time Full Correlation Matrix Analysis of fMRI Data

Conference Paper

Dec 2016

Real-time functional magnetic resonance imaging (rtfMRI) is an emerging approach for studying the functioning of the human brain. Computational challenges combined with high data velocity have to this point restricted rtfMRI analyses to studying regions of the brain independently. However , given that neural processing is accomplished via functional interactions among brain regions, neuroscience could stand to benefit from rtfMRI analyses of full-brain interactions. In this paper, we extend such an offline analysis method, full correlation matrix analysis (FCMA), to enable its use in rtfMRI studies. Specifically, we introduce algorithms capable of processing real-time data for all stages of the FCMA machine learning workflow: incremental feature selection , model updating, and real-time classification. We also present an actor-model based distributed system designed to support FCMA and other rtfMRI analysis methods. Experiments show that our system successfully analyzes a stream of brain volumes and returns neurofeedback with less than 180 ms of lag. Our real-time FCMA implementation provides the same accuracy as an optimized offline FCMA toolbox while running 3.6–6.2× faster.

BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis

Abstract and Figures

Recommended publications

BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis

BrainIAK: The Brain Imaging Analysis Kit

Computational approaches to fMRI analysis

A probabilistic approach to discovering dynamic full-brain functional connectivity patterns