Emmanuel BarillotInstitut Curie · Department of Computational Systems Biology of Cancer
Emmanuel Barillot
PhD
About
480
Publications
71,953
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
20,317
Citations
Publications
Publications (480)
Background
Computational models in systems biology are becoming more important with the advancement of experimental techniques to query the mechanistic details responsible for leading to phenotypes of interest. In particular, Boolean models are well fit to describe the complexity of signaling networks while being simple enough to scale to a very la...
Simple Summary
Topic modeling, widely used in natural language processing, categorizes text documents into themes based on word frequency analysis. It has found success in various biological data analyses, including the accurate prediction of cancer subtypes and the simultaneous identification of genes, enhancers, and cell types from sparse single-...
Computational models in systems biology are becoming more important with the advancement of experimental techniques to query the mechanistic details responsible for leading to phenotypes of interest. In particular, Boolean models are well fit to describe the complexity of signaling networks while being simple enough to scale to a very large number...
Introduction
The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing.
Methods
Extensive com...
In systems biology, mathematical models and simulations play a crucial role in understanding complex biological systems. Different modelling frameworks are employed depending on the nature and scales of the system under study. For instance, signalling and regulatory networks can be simulated using Boolean modelling, whereas multicellular systems ca...
Motivation:
Mathematical models of biological processes altered in cancer are built using the knowledge of complex networks of signaling pathways, detailing the molecular regulations inside different cell types, such as tumor cells, immune and other stromal cells. If these models mainly focus on intracellular information, they often omit a descrip...
[This corrects the article DOI: 10.1016/j.csbj.2022.10.003.].
The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Community-driven and highly interdi...
Mathematical models of biological processes implicated in cancer are built using the knowledge of complex networks of signaling pathways, describing the molecular regulations inside different cell types, such as tumor cells, immune and other stromal cells. If these models mainly focus on intracellular information, they often omit a description of t...
As a result of the development of experimental technologies and the accumulation of data, biological and molecular processes can be described as complex networks of signaling pathways. These networks are often directed and signed, where nodes represent entities (genes/proteins) and arrows interactions. They are translated into mathematical models b...
Single-cell RNA sequencing is a powerful tool to explore cancer heterogeneity. However, the expression of lncRNAs in single cells is still to be studied extensively and methods to deal with the sparsity of this type of data are lacking. Here, we propose a topic modeling approach to investigate the transcriptional heterogeneity of luminal and triple...
AMoNet (Artificial Molecular Networks) is a tool that aims to predict cancer patients’ survival when only targeted gene sequencing data are available. Outcome predictions from sparse data can benefit from new methods including deep learning. Our approach optimizes large recurrent directed molecular networks built from prior knowledge supported by s...
We developed BIODICA, an integrated computational environment for application of Independent Component Analysis (ICA) to bulk and single-cell molecular profiles, interpretation of the results in terms of biological functions and correlation with metadata. The computational core is the novel Python package stabilized-ica which provides interface to...
Mathematical modeling aims at understanding the effects of biological perturbations, suggesting ways to intervene and to reestablish proper cell functioning in diseases such as cancer or in autoimmune disorders. This is a difficult task for obvious reasons: the level of details needed to describe the intra-cellular processes involved, the numerous...
Prostate cancer is the second most occurring cancer in men worldwide. To better understand the mechanisms of tumorigenesis and possible treatment responses, we developed a mathematical model of prostate cancer which considers the major signalling pathways known to be deregulated. We personalised this Boolean model to molecular data to reflect the h...
Cell cycle is a biological process underlying the existence and propagation of life in time and space. It has been an object for mathematical modeling for long, with several alternative mechanistic modeling principles suggested, describing in more or less details the known molecular mechanisms. Recently, cell cycle has been investigated at single c...
Motivation
Cancer progression is a complex phenomenon that spans multiple scales from molecular to cellular and intercellular. Simulations can be used to perturb the underlying mechanisms of those systems and to generate hypotheses on novel therapies. We present a new version of PhysiBoSS, a multiscale modelling framework designed to cover multiple...
WebMaBoSS is an easy-to-use web interface for conversion, storage, simulation and analysis of Boolean models that allows to get insight from these models without any specific knowledge of modeling or coding. It relies on an existing software, MaBoSS, which simulates Boolean models using a stochastic approach: it applies continuous time Markov proce...
We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 mole...
Retinoblastoma is the most frequent intraocular malignancy in children, originating from a maturing cone precursor in the developing retina. Little is known on the molecular basis underlying the biological and clinical behavior of this cancer. Here, using multi-omics data, we demonstrate the existence of two retinoblastoma subtypes. Subtype 1, of e...
The identification of miRNAs’ targets and associated regulatory networks might allow the definition of new strategies using drugs whose association mimics a given miRNA’s effects. Based on this assumption we devised a multi-omics approach to precisely characterize miRNAs’ effects. We combined miR-491-5p target affinity purification, RNA microarray,...
Prostate cancer is the second most occurring cancer in men worldwide. To better understand the mechanisms of tumorigenesis and possible treatment responses, we developed a mathematical model of prostate cancer which considers the major signalling pathways known to be deregulated.
We personalised this Boolean model to molecular data to reflect the h...
A bstract
Cell cycle is the most fundamental biological process underlying the existence and propagation of life in time and space. It has been an object for mathematical modeling for long, with several alternative mechanistic modeling principles suggested, describing in more or less details the known molecular mechanisms. Recently, cell cycle has...
The study of response to cancer treatments has benefited greatly from the contribution of different omics data but their interpretation is sometimes difficult. Some mathematical models based on prior biological knowledge of signaling pathways, facilitate this interpretation but often require fitting of their parameters using perturbation data. We p...
Single‐nucleotide polymorphisms (SNPs) in over 180 loci have been associated with breast cancer (BC) through genome‐wide association studies involving mostly unselected population‐based case‐control series. Some of them modify BC risk of women carrying a BRCA1 or BRCA2 (BRCA1/2) mutation and may also explain BC risk variability in BC‐prone families...
Simple Summary
The future of cancer immunotherapy relies on a combination of individually targeted therapies. However, a lot of experiments are needed to define the most effective combinations of drugs. A computational and modelling approach could help reduce the number of experiments and suggest optimal treatments to test. This article presents a...
Background
Large observational clinical datasets are becoming increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete disease state develops through stereotypical routes, charac...
In this work we present PhysiBoSS-COVID, an effort to integrate MaBoSS, a stochastic Boolean modelling software, into PhysiCell-COVID to allow the leverage of cell- and pathway-specific Boolean models in this framework. To obtain these COVID-19-specific models, we have taken advantage of CaSQ ability to convert all Covid19 Disease maps into SBML-qu...
As opposed to the standard tolerogenic apoptosis, immunogenic cell death (ICD) constitutes a type of cellular demise that elicits an adaptive immune response. ICD has been characterized in malignant cells following cytotoxic interventions, such as chemotherapy or radiotherapy. Briefly, ICD of cancer cells releases some stress/danger signals that at...
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized b...
Background:
Solutions to stochastic Boolean models are usually estimated by Monte Carlo simulations, but as the state space of these models can be enormous, there is an inherent uncertainty about the accuracy of Monte Carlo estimates and whether simulations have reached all attractors. Moreover, these models have timescale parameters (transition r...
One of the aims of mathematical modeling is to understand and simulate the effects of biological perturbations and suggest ways to intervene and reestablish proper cell functioning. However, it remains a challenge, especially when considering the dynamics at the level of a cell population, with cells dying, dividing and interacting. Here, we introd...
The study of response to cancer treatments has benefited greatly from the contribution of different omics data but their interpretation is sometimes difficult. Some mathematical models based on prior biological knowledge of signalling pathways, facilitate this interpretation but often require fitting of their parameters using perturbation data. We...
Researchers around the world join forces to reconstruct the molecular processes of the virus-host interactions aiming to combat the cause of the ongoing pandemic.
The processes leading to, or avoiding cell death are widely studied, because of their frequent perturbation in various diseases. Cell death occurs in three highly interconnected steps: Initiation, signaling and execution. We used a systems biology approach to gather information about all known modes of regulated cell death (RCD). Based on the exper...
CCCTC-binding factor (CTCF) is a conserved architectural protein that plays crucial roles in gene regulation and three-dimensional (3D) chromatin organization. To better understand mechanisms and evolution of vertebrate genome organization, we analyzed genome occupancy of CTCF in zebrafish utilizing an endogenously epitope-tagged CTCF knock-in alle...
Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some c...
English Wikipedia, containing more than five millions articles, has approximately eleven thousands web pages devoted to proteins or genes most of which were generated by the Gene Wiki project. These pages contain information about interactions between proteins and their functional relationships. At the same time, they are interconnected with other...
EWSR1-FLI1, the chimeric oncogene specific for Ewing sarcoma (EwS), induces a cascade of signaling events leading to cell transformation. However, it remains elusive how genetically homogeneous EwS cells can drive the heterogeneity of transcriptional programs. Here, we combine independent component analysis of single-cell RNA sequencing data from d...
Motivation:
CellDesigner is a well-established biological map editor used in many large-scale scientific efforts (Funahashi et al., 2007). However, the interoperability between the Systems Biology Graphical Notation (SBGN) Markup Language (SBGN-ML) and the CellDesigner's proprietary Systems Biology Markup Language (SBML) extension formats remains...
Cancer driver gene alterations influence cancer development, occurring in oncogenes, tumor suppressors, and dual role genes. Discovering dual role cancer genes is difficult because of their elusive context-dependent behavior. We define oncogenic mediators as genes controlling biological processes. With them, we classify cancer driver genes, unveili...
ACSN (https://acsn.curie.fr) is a web-based resource of multi-scale biological maps depicting molecular processes in cancer cell and tumor microenvironment. The core of the Atlas is a set of interconnected cancer-related signaling and metabolic network maps. Molecular mechanisms are depicted on the maps at the level of biochemical interactions, for...
Background:
Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy...
Motivation
Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). MF algorithms have never been compared based on the between-datasets reproducibility of their outputs in similar independent datasets. Lack of this knowledge might have a crucial...
The lack of integrated resources depicting the complexity of the innate immune response in cancer represents a bottleneck for high-throughput data interpretation. To address this challenge, we perform a systematic manual literature mining of molecular mechanisms governing the innate immune response in cancer and represent it as a signalling network...
Motivation
Solutions to stochastic Boolean models are usually estimated by Monte Carlo simulations, but as the state space of these models can be enormous, there is an inherent uncertainty about the accuracy of Monte Carlo estimates and whether simulations have reached all asymptotic solutions. Moreover, these models have timescale parameters (tran...
Background
Deep learning (DL) is one of the best approaches to predict nonlinear behaviors from high dimensional data. Nevertheless predicting the outcome of patients affected by cancers from transcriptomic data has shown limited performance, even with DL (C-index usually <0.65). Transfer learning is a DL two-step method where a model is pre-traine...
Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possible. Initially suggested for solving source blind separation problems in various fields, ICA was shown to be successful in analyzing functional magnetic resonanc...
Matrix factorization (MF) is an established paradigm for large-scale biological data analysis with tremendous potential in computational biology. Here, we challenge MF in depicting the molecular bases of epidemiologically described disease–disease (DD) relationships. As a use case, we focus on the inverse comorbidity association between Alzheimer’s...
Matrix Factorization (MF) is an established paradigm for large-scale biological data analysis with tremendous potential in computational biology.
We here challenge MF in depicting the molecular bases of epidemiologically described Disease-Disease (DD) relationships. As use case, we focus on the inverse comorbidity association between Alzheimer’s di...
EWSR1-FLI1, the chimeric oncogene specific for Ewing sarcoma (EwS), induces a cascade of signaling events leading to cell transformation. However, it remains elusive how genetically homogeneous EwS cells can drive heterogeneity of transcriptional programs. Here, we combined independent component analysis of single cell RNA-sequencing data from dive...
English Wikipedia, containing more than five millions articles, has approximately eleven thousands pages devoted to proteins or genes most of which were generated by the Gene Wiki project. These pages contain information about interactions between proteins and their functional relationships. At the same time, they are interconnected with other Wiki...
Background
The interplay between metabolic processes and signalling pathways remains poorly understood. Global, detailed and comprehensive reconstructions of human metabolism and signalling pathways exist in the form of molecular maps, but they have never been integrated together. We aim at filling in this gap by integrating of both signalling and...
Cancer initiation and progression are associated with multiple molecular mechanisms. The knowledge of these mechanisms is expanding and should be converted into guidelines for tackling the disease. Here, we discuss the formalization of biological knowledge into a comprehensive resource: the Atlas of Cancer Signalling Network (ACSN) and the Google M...
Logical models of cancer pathways are typically built by mining the literature for relevant experimental observations. They are usually generic as they apply for large cohorts of individuals. As a consequence, they generally do not capture the heterogeneity of patient tumors and their therapeutic responses. We present here a novel framework, referr...
The development of computational approaches in systems biology has reached a state of maturity that allows their transition to systems medicine. Despite this progress, intuitive visualisation and context-dependent knowledge representation still present a major bottleneck. In this paper, we describe the Disease Maps Project, an effort towards a comm...
The current consensus recognizes four main medulloblastoma subgroups (wingless, Sonic hedgehog, group 3 and group 4). While medulloblastoma subgroups have been characterized extensively at the (epi-)genomic and transcriptomic levels, the proteome and phosphoproteome landscape remain to be comprehensively elucidated. Using quantitative (phospho)-pro...
Background:
Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a mat...
Mathematical modeling of biological networks is a promising approach to understand the complexity of cancer progression, which can be understood as accumulated abnormalities in the kinetics of cellular biochemistry. Two major modeling formalisms (languages) have been used for this purpose in the last couple of decades: one is based on the applicati...
Motivation:
Due to the complexity and heterogeneity of multicellular biological systems, mathematical models that take into account cell signalling, cell population behaviour and the extracellular environment are particularly helpful. We present PhysiBoSS, an open source software which combines intracellular signalling using Boolean modelling (MaB...
Independent Component Analysis (ICA) can be used to model gene expression data as an action of a set of statistically independent hidden factors. The ICA analysis with a downstream component analysis was successfully applied to transcriptomic data previously in order to decompose bulk transcriptomic data into interpretable hidden factors. Some of t...
Motivation
Matrix factorization methods are widely exploited in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). Applying such methods to similar independent datasets should yield reproducible inter-series outputs, though it was never demonstrated yet.
Results
We systematically test state-o...
The Disease Maps Project builds on a network of scientific and clinical groups that exchange best practices, share information and develop systems biomedicine tools. The project aims for an integrated, highly curated and user-friendly platform for disease-related knowledge. The primary focus of disease maps is on interconnected signaling, metabolic...
We present ElPiGraph, a method for approximating data distributions having non-trivial topological features such as the existence of excluded regions or branching structures. Unlike many existing methods, ElPiGraph is not based on the construction of a k-nearest neighbour graph, a procedure that can perform poorly in the case of multidimensional an...