José Luís OliveiraUniversity of Aveiro | UA
José Luís Oliveira
About
292
Publications
67,110
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,712
Citations
Publications
Publications (292)
Deep learning techniques have recently yielded remarkable results across various fields. However, the quality of these results depends heavily on the quality and quantity of data used during the training phase. One common issue in multi-class and multi-label classification is class imbalance, where one or several classes make up a substantial porti...
Background
Multimodal histology image registration is a process that transforms into a common coordinate system two or more images obtained from different microscopy modalities. The combination of information from various modalities can contribute to a comprehensive understanding of tissue specimens, aiding in more accurate diagnoses, and improved...
Single Sign-On (SSO) methods are the primary solution to authenticate users across multiple web systems. These mechanisms streamline the authentication procedure by avoiding duplicate developments of authentication modules for each application. Besides, these mechanisms also provide convenience to the end-user by keeping the user authenticated when...
Neural networks have achieved remarkable success in various applications such as image classification, speech recognition, and natural language processing. However, the growing size of neural networks poses significant challenges in terms of memory usage, computational cost, and deployment on resource-constrained devices. Pruning is a popular techn...
Heart failure with preserved ejection fraction (HFpEF) represents a global health challenge, with limited therapies proven to enhance patient outcomes. This makes the elucidation of disease mechanisms and the identification of novel potential therapeutic targets a priority. Here, we performed RNA sequencing on ventricular myocardial biopsies from p...
The design of compounds that target specific biological functions with relevant selectivity is critical in the context of drug discovery, especially due to the polypharmacological nature of most existing drug molecules. In recent years, in silico-based methods combined with deep learning have shown promising results in the de novo drug design chall...
Rare diseases are affecting over 350 million individuals on a worldwide scale. However, studying such diseases is challenging due to the lack of individuals compliant with the study protocols. This unavailability of information raises some challenges when defining the best treatments or diagnosing patients in the early stages. Multiple organization...
The identification of genetic variations in large cohorts is a critical issue to identify patient cohorts, disease risks, and to develop more effective treatments. To help this analysis, we improved a variant calling pipeline for the human genome using state-of-the-art tools, including GATK (Hard Filter/VQSR) and DeepVariant. The pipeline was teste...
Genomics has significantly impacted the field of medicine, with advances in DNA sequencing leading to personalized medicine and a deeper understanding of the genomic basis of various diseases. The ability to share genomic data is crucial for advancing this field and developing new approaches to understanding the genome. However, the sensitive natur...
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of...
Nicotinamide adenine dinucleotide (NAD) levels are essential for the normal physiology of the cell and are strictly regulated to prevent pathological conditions. NAD functions as a coenzyme in redox reactions, as a substrate of regulatory proteins, and as a mediator of protein-protein interactions. The main objectives of this study were to identify...
The taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with...
As software applications continue to become more complex and attractive to cyber-attackers, enhancing resilience against cyber threats becomes essential. Aiming to provide more robust solutions, different approaches were proposed for vulnerability detection in different stages of the application life-cycle. This article explores three main approach...
Objectives: Existing individual-level human data cover large populations on many dimensions such as lifestyle, demography, laboratory measures, clinical parameters, etc. Recent years have seen large investments in data catalogues to FAIRify data descriptions to capitalise on this great promise, i.e. make catalogue contents more Findable, Accessible...
Background
Secondary use of health data is a valuable source of knowledge that boosts observational studies, leading to important discoveries in the medical and biomedical sciences. The fundamental guiding principle for performing a successful observational study is the research question and the approach in advance of executing a study. However, in...
Biomedical databases often have restricted access policies and governance rules. Thus, an adequate description of their content is essential for researchers who wish to use them for medical research. A strategy for publishing information without disclosing patient-level data is through database fingerprinting and aggregate characterisations. Howeve...
Background
Many scientific studies have sought to obtain a better understanding of specific medical conditions. Concerning Alzheimer’s Disease, there is a lack of reliable diagnostics and this can be related to the availability of only small-scale ongoing biomarker studies and longitudinal cohorts including these subjects. Aiming to generate more s...
Nicotinamide adenine dinucleotide (NAD) is an essential metabolite in normal cellular physiology and its deregulation may lead to several pathological conditions. NAD interacts with a vast number of proteins, acting as a coenzyme, as a substrate and regulating the interaction between proteins. The goals of this study were to characterize the protei...
This paper describes the work conducted by the Bioinformatics group of the Institute of Electronics and Engineering Informatics of University of Aveiro through several participations in the CLEF eRisk shared tasks related to the estimation of the level of depression. The eRisk initiative fosters Natural Language Processing research for the automati...
Anonymisation is currently one of the biggest challenges when sharing sensitive personal information. Its importance depends largely on the application domain, but when dealing with health information, this becomes a more serious issue. A simpler approach to avoid inadequate disclosure is to ensure that all data that can be associated directly with...
The stigma related to mental health continues to be present in online newspapers, where mental diseases are often used metaphorically to refer to entities or situations outside the clinical of mental health. This project explores the implementation of Artificial Intelligence and Natural Language Processing techniques for the task of automatically c...
Drug design is an important area of study for pharmaceutical businesses. However, low efficacy, off-target delivery, time consumption, and high cost are challenges and can create barriers that impact this process. Deep Learning models are emerging as a promising solution to perform de novo drug design, i.e., to generate drug-like molecules tailored...
The accurate identification of Drug-Target Interactions (DTIs) remains a critical turning point in drug discovery and understanding of the binding process. Despite recent advances in computational solutions to overcome the challenges of in vitro and in vivo experiments, most of the proposed in silico-based methods still focus on binary classificati...
Background
Several computational advances have been achieved in the drug discovery field, promoting the identification of novel drug-target interactions and new leads. However, most of these methodologies have been overlooking the importance of providing explanations to the decision-making process of Deep Learning architectures. In this research s...
At the end of the twentieth century, new technology was developed that allowed an entire tissue section to be scanned on an objective slide. Originally called virtual microscopy, this technology it is now known as Whole Slide Imaging (WSI). WSI presents new challenges for reading, visualization, storage, and analysis. For this reason, several techn...
Many clinical studies are greatly dependent on an efficient identification of relevant datasets. This selection can be performed in existing health data catalogues, by searching for available metadata. The search process can be optimised through questioning-answering interfaces, to help researchers explore the available data present. However, when...
Drug design is an important area of study for pharmaceutical businesses. However, low efficacy, off-target delivery, time consumption, and high cost are challenges and can create barriers that impact this process. Deep Learning models are emerging as a promising solution to perform de novo drug design, i.e., to generate drug-like molecules tailored...
The nicotinate phosphoribosyltransferase (NAPRT) gene has gained relevance in the research of cancer therapeutic strategies due to its main role as a NAD biosynthetic enzyme. NAD metabolism is an attractive target for the development of anti-cancer therapies, given the high energy requirements of proliferating cancer cells and NAD-dependent signali...
Background: Many scientific studies have sought to obtain better understanding of specific medical conditions. Concerning Alzheimer’s Disease, there is a lack of reliable diagnostics and this can be related to the availability of only small-scale ongoing biomarker studies and longitudinal cohorts including these subjects. Aiming to generate more su...
The continuous growth of new sources of information has led to an unprecedented increase in the data collected. The dimensionality and heterogeneity of these data requires efficient strategies for searching, accessing and integrating from multiple repositories. The techniques underlying this goal are usually known as Extraction, Transformation and...
Many clinical trials and scientific studies have been conducted aiming for better understanding of specific medical conditions. However, these studies are often based on a small number of participants due to the difficulty in finding people with similar medical characteristics and available to participate in the studies. This is particularly critic...
Motivation
The process of placing new drugs into the market is time-consuming, expensive and complex. The application of computational methods for designing molecules with bespoke properties can contribute to saving resources throughout this process. However, the fundamental properties to be optimized are often not considered or conflicting with ea...
Abstract Over the years, a growing number of semantic data repositories have been made available on the web. However, this has created new challenges in exploiting these resources efficiently. Querying services require knowledge beyond the typical user’s expertise, which is a critical issue in adopting semantic information solutions. Several propos...
Background:
The content of the clinical notes that have been continuously collected along patients' health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observatio...
The process of refining the research question in a medical study depends greatly on the current background of the investigated subject. The information found in prior works can directly impact several stages of the study, namely the cohort definition stage. Besides previous published methods, researchers could also leverage on other materials, such...
With the continuous increase in the use of social networks, social mining is steadily becoming a powerful component of digital phenotyping. In this paper we explore social mining for the classification of self-diagnosed depressed users of Reddit as social network. We conduct a cross evaluation study based on two public datasets in order to understa...
Personalised treatment is usually needed for hospitalised patients afflicted by secondary illnesses that demand daily medication. Even though clinical guidelines were designed to consider those circumstances exist, current decision-support features fail to assimilate detailed relevant patient information. This creates opportunities for the developm...
Clinical treatments are mostly the result of consecutive success of medical procedures. The patterns in those procedures lead to creation of clinical guidelines which are currently essential to have better health treatments. The use of electronic health record systems (EHR) helps the patient management, but it fails in the treatment guidance due to...
Social media writings have been explored over the last years, in the context of mental health, as a potential source of information for extending the so-called digital phenotyping of a person. In this paper we present a computational approach for the classification of depressed social media users. We conducted a cross evaluation study based on two...
Privacy issues limit the analysis and cross-exploration of most distributed and private biobanks, often raised by the multiple dimensionality and sensitivity of the data associated with access restrictions and policies. These characteristics prevent collaboration between entities, constituting a barrier to emergent personalized and public health ch...
This paper describes the participation of the Bioinformatics
group of the Institute of Electronics and Engineering Informatics of University of Aveiro in the ImageCLEF lifelog task, more specifically in the
Lifelog Moment Retrieval (LMRT) sub-task. In our first participation
last year we tackled the LMRT challenge with an automatic approach.
Follow...
The society is becoming increasingly dependent on digital data sources. However, our trust on the sources and its contents is only ensured if we can also rely on robust methods that prevent fraudulent forgery. As digital forensic experts are continually dealing with the detection of forged data, new fraudulent approaches are emerging, making it dif...
The Semantic Web and Linked Data concepts and technologies have empowered the scientific community with solutions to take full advantage of the increasingly available distributed and heterogeneous data in distinct silos. Additionally, FAIR Data principles established guidelines for data to be Findable, Accessible, Interoperable, and Reusable, and t...
Background:
Heart disease is the leading cause of death worldwide. Knowing a gene expression signature in heart disease can lead to the development of more efficient diagnosis and treatments that may prevent premature deaths. A large amount of microarray data is available in public repositories and can be used to identify differentially expressed...
Next-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of...
This study investigated the feasibility of a postpartum depression predictor based on social media writings. The current broad use of social media networks generates a large amount of digital data, which, when coupled with artificial intelligence methods, have the potential to disclose significant health related insights. In this paper we explore t...
Aiming to better understand the genetic and environmental associations of Alzheimer's disease, many clinical trials and scientific studies have been conducted. However, these studies are often based on a small number of participants. To address this limitation, there is an increasing demand of multi-cohorts studies, which can provide higher statist...
The FAIR guiding Principles for scientific data management and stewardship are a fundamental enabler for digital transformation and transparent research. They were designed with the purpose of improving data quality, by making it Findable, Accessible, Interoperable and Reusable. While these principles have been endorsed by both data owners and regu...
The World Health Organization reports that half of all mental illnesses begin by the age of 14. Most of these cases go undetected and untreated. The expanding use of social media has the potential to leverage the early identification of mental health diseases. As data gathered via social media are already digital, they have the ability to power up...
The increasing number of mobile and wearable devices is dramatically changing the way we collect data about person’s life. These devices allow recording our daily activities and behavior in several forms, e.g., text, images, bio-signals, or video. However, many times, the collected data includes low quality or irrelevant contents, feeding lifeloggi...
Next-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of...
Protein-protein interactions (PPIs) can be conveniently represented as networks, allowing the use of graph theory for their study. Network topology studies may reveal patterns associated with specific organisms. Here, we propose a new methodology to denoise PPI networks and predict missing links solely based on the network topology, the organizatio...
During the last decades, most European countries dedicated huge efforts in collecting and maintaining Electronic Health Records (EHR). With the continuous grow of these datasets, it became obvious that its secondary use for research may lead to new insights about diseases and treatments outcomes.
Background:
Many healthcare databases have been routinely collected over the past decades, to support clinical practice and administrative services. However, their secondary use for research is often hindered by restricted governance rules. Furthermore, health research studies typically involve many participants with complementary roles and respon...
Background:
The global shift from paper health records to electronic ones has led to an impressive growth of biomedical digital data along the past two decades. Exploring and extracting knowledge from these data has the potential to enhance translational research and lead to positive outcomes for the population's health and healthcare.
Obective:...
Objective:
The collaboration and knowledge exchange between researchers are often hindered by the nonexistence of accurate information about which databases may support research studies. Even though a considerable amount of patient health information does exist, it is usually distributed and hidden in many institutions. The goal of this project is...
Protein-protein interactions (PPI) can be conveniently represented as networks, allowing the use of graph theory in their study. Network topology studies may reveal patterns associated to specific organisms. Here we propose a new methodology to denoise PPI networks and predict missing links solely based on the network topology, the Organization Mea...
BACKGROUND
With the current society’s lifestyle people became more concerned and started seeking for solutions that may help them to monitor their health conditions. Traditional monitoring systems present some limitations and today’s smartphones appear to be a good tool as they are unobtrusive and discrete. Additionally, they can continuously colle...
Background:
Technological advancements, together with the decrease in both price and size of a large variety of sensors, has expanded the role and capabilities of regular mobile phones, turning them into powerful yet ubiquitous monitoring systems. At present, smartphones have the potential to continuously collect information about the users, monit...
Sensing health and well-being parameters from citizens and patients has been an increasing concern in our society. However, since the traditional data collection methods rely mostly on dedicated and expensive equipments, in the recent years, the potential of smartphones has been largely investigated because of its unobtrusiveness and embedded senso...
The assignment of ICD-9-CM codes to patient's clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in...
Background and Objective
Data catalogues are a common form of capturing and presenting information about a specific kind of entity (e.g. products, services, professionals, datasets, etc.). However, the construction of a web-based catalogue for a particular scenario normally implies the development of a specific and dedicated solution. In this paper...
The proliferation of electronic health databases has resulted in the existence of a wide collection of diversified clinical digital data. These data are fragmented over dispersed databases in different clinical silos around the world. The exploration of these electronic health records (EHRs) is essential for clinical and pharmaceutical research and...
Task management systems are crucial tools in modern organizations, by simplifying the coordination of teams and their work. Those tools were developed mainly for task scheduling, assignment, follow-up and accountability. Then again, scientific workflow systems also appeared to help putting together a set of computational processes through the pipel...
Biomedical data integration and processing is a very sensitive issue and a main barrier for research, since it normally implies dealing with private clinical information. To overcome this problem, we propose a solution based on multiple levels of data visibility, combined with a fine-grained access control over the shared data. Through our proposal...
The ever-increasing number of bioinformatics software tools that are publicly available, is leading to greater expectations about its regular use in clinical practice. However, from the end-users' perspective, they face many time the challenge of choosing the right tool for each task, from a panoply of solutions that have been developed over the ye...
Publishing, analysing or properly accessing the abundant information resulting largely from experimental studies in the biomedical domain are current challenges for the research community. Problems with the extraction of relevant information, redundant data, and lack of associations or provenance are good examples of the main concerns. The innovati...
Computational annotation of textual information has taken on an important role in knowledge extraction from the biomedical literature, since most of the relevant information from scientific findings is still maintained in text format. In this endeavour, annotation tools can assist in the identification of biomedical concepts and their relationships...
The worldwide surge of multiresistant microbial strains has propelled the search for alternative treatment options. The study of Protein-Protein Interactions (PPIs) has been a cornerstone in the clarification of complex physiological and pathogenic processes, thus being a priority for the identification of vital components and mechanisms in pathoge...
Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, thes...
Most bioinformatics tools available today were not written by professional software developers, but by people that wanted to solve their own problems, using computational solutions and spending the minimum time and effort possible, since these were just the means to an end. Consequently, a vast number of software applications are currently availabl...
Clinical data sharing between healthcare institutions, and between practitioners is often hindered by privacy protection requirements. This problem is critical in collaborative scenarios where data sharing is fundamental for establishing a workflow among parties. The anonymization of patient information burned in DICOM images requires elaborate pro...
The production of medical imaging data has grown tremendously in the last decades. Nowadays, even small institutions produce a considerable amount of studies. Furthermore, the general trend in new imaging modalities is to produce more data per examination. As a result, the design and implementation of tomorrow's storage and communication systems mu...