Anne Thessen

Anne Thessen
Ronin Institute

Ph.D. Biological Oceanography

About

98
Publications
39,587
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,897
Citations
Introduction
I am a biologist who has become involved in numerous "big data" projects.
Additional affiliations
May 2013 - October 2013
Ronin Institute
Position
  • Researcher
June 2012 - June 2013
Arizona State University
Position
  • Assistant Research Professor
August 2009 - July 2012
Position
  • Data Conservancy
Education
September 2002 - December 2007
University of Maryland, College Park
Field of study
  • Biological Oceanography
September 1997 - December 2001
Nicholls State University
Field of study
  • Marine Biology

Publications

Publications (98)
Preprint
Full-text available
This study presents an innovative approach for understanding the genetic underpinnings of two key phenotypes in Sorghum bicolor : maximum canopy height and maximum growth rate. Genome-Wide Association Studies (GWAS) are widely used to decipher the genetic basis of traits in organisms, but the challenge lies in selecting an appropriate statistically...
Article
Full-text available
The exposome refers to all of the internal and external life-long exposures that an individual experiences. These exposures, either acute or chronic, are associated with changes in metabolism that will positively or negatively influence the health and well-being of individuals. Nutrients and other dietary compounds modulate similar biochemical proc...
Article
Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data,...
Article
Full-text available
Knowledge graphs have become a common approach for knowledge representation. Yet, the application of graph methodology is elusive due to the sheer number and complexity of knowledge sources. In addition, semantic incompatibilities hinder efforts to harmonize and integrate across these diverse sources. As part of The Biomedical Translator Consortium...
Preprint
Full-text available
Over the last several decades, there has been rapid growth in the number and scope of agricultural genetics, genomics and breeding (GGB) databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources covering model or crop plant and animal GGB data, ontologies, pathways, genetic variat...
Article
Full-text available
Motivation: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Results: Here we present KG-Hub, a platform that enables standardized cons...
Article
Full-text available
Introduction Climate change is already affecting ecosystems around the world and forcing us to adapt to meet societal needs. The speed with which climate change is progressing necessitates a massive scaling up of the number of species with understood genotype-environment-phenotype (G×E×P) dynamics in order to increase ecosystem and agriculture resi...
Article
Full-text available
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or...
Article
Full-text available
Background: Evaluating the impact of environmental exposures on organism health is a key goal of modern biomedicine and is critically important in an age of greater pollution and chemicals in our environment. Environmental health utilizes many different research methods and generates a variety of data types. However, to date, no comprehensive data...
Preprint
Full-text available
Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and...
Preprint
Full-text available
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or...
Article
Full-text available
Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph‐based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. Howe...
Article
Full-text available
Toxicological evaluation of chemicals using early-life stage zebrafish (Danio rerio) involves the observation and recording of altered phenotypes. Substantial variability has been observed among researchers in phenotypes reported from similar studies, as well as a lack of consistent data annotation, indicating a need for both terminological and dat...
Preprint
Full-text available
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be...
Article
Full-text available
Decades of reductionist approaches in biology have achieved spectacular progress, but the proliferation of subdisciplines, each with its own technical and social practices regarding data, impedes the growth of the multidisciplinary and interdisciplinary approaches now needed to address pressing societal challenges. Data integration is key to a rein...
Article
Full-text available
People are one of the best known and most stable entities in the biodiversity knowledge graph. The wealth of public information associated with people and the ability to identify them uniquely open up the possibility to make more use of these data in biodiversity science. Person data are almost always associated with entities such as specimens, mol...
Article
Full-text available
The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heter...
Article
Full-text available
Research collections are an important tool for understanding the Earth, its systems, and human interaction. Despite the importance of collections, many are not maintained or curated as thoroughly as we would like. Part of the reason for this is the lack of professional reward for collection, curation, or maintenance. To address this gap in attribut...
Article
Full-text available
In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting dise...
Article
Full-text available
Research collections are an important tool for understanding the Earth, its systems, and human interaction. Despite the importance of collections, many are not maintained or curated as thoroughly as we would like. Part of the reason for this is the lack of professional reward for collection, curation, or maintenance. To address this gap in attribut...
Method
Full-text available
Annotation of Texts - Preparation of Resources for NLP in the Earth Sciences
Poster
Full-text available
To develop new semantic software tools and resources for the earth science fields of geology, biology and cryology-specifically earthquakes, ecology, sea-ice. • Achieve this with high efficiency and effectiveness by porting resources, tools and methods from the biomedical field.
Presentation
Full-text available
Explanation of the ClearEarth project
Technical Report
Full-text available
Annotation Methods for Creation of Training Data for Natural Language Processing in the Earth Sciences
Conference Paper
Full-text available
Logical definitions, in particular those following the Entity-Quality approach, are increasingly used to drive automated classification of phenotypes and integrate phenotypes across species semantically. Over the years, the lack of consistent and widespread use of common standards resulted in conceptually equivalent or similar phenotypes with logic...
Article
Full-text available
Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called “modifiers”. With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such...
Article
Full-text available
Biodiversity information is made available through numerous databases that each have their own data models, web services, and data types. Combining data across databases leads to new insights, but is not easy because each database uses its own system of identifiers. In the absence of stable and interoperable identifiers, databases are often linked...
Article
Full-text available
The institutions of science are in a state of flux. Declining public funding for basic science, the increasingly corporatized administration of universities, increasing “adjunctification” of the professoriate and poor academic career prospects for postdoctoral scientists indicate a significant mismatch between the reality of the market economy and...
Article
Full-text available
The cTAKES package (using the ClearTK Natural Language Processing toolkit Bethard et al. 2014,http://cleartk.github.io/cleartk/) has been successfully used to automatically read clinical notes in the medical field (Albright et al. 2013, Styler et al. 2014). It is used on a daily basis to automatically process clinical notes and extract relevant inf...
Preprint
Full-text available
Biodiversity information is made available through numerous databases that each have their own data models, web services, and data types. Combining data across databases leads to new insights, but is not easy because each database uses its own system of identifiers. In the absence of stable and interoperable identifiers, databases are often linked...
Preprint
Full-text available
Biodiversity information is made available through numerous databases that each have their own data models, web services, and data types. Combining data across databases leads to new insights, but is not easy because each database uses its own system of identifiers. In the absence of stable and interoperable identifiers, databases are often linked...
Preprint
Full-text available
The institutions of science are in a state of flux. Declining public funding for basic science, the increasingly corporatized administration of universities, increasing “adjunctification” of the professoriate and poor academic career prospects for postdoctoral scientists indicate a significant mismatch between the reality of the market economy and...
Preprint
The institutions of science are in a state of flux. Declining public funding for basic science, the increasingly corporatized administration of universities, increasing “adjunctification” of the professoriate and poor academic career prospects for postdoctoral scientists indicate a significant mismatch between the reality of the market economy and...
Preprint
Full-text available
The institutions of science are in a state of flux. Declining public funding for basic science, the increasingly corporatized administration of universities, increasing “adjunctification” of the professoriate and poor academic career prospects for postdoctoral scientists indicate a significant mismatch between the reality of the market economy and...
Chapter
Full-text available
Biodiversity informatics, the application of informatics techniques to biodiversity data, is rooted in physical objects and nomenclatural codes. Through two user stories, one from wildlife conservation and another from agriculture, we demonstrate the importance and process of biodiversity informatics. We discuss the importance and integration of ta...
Preprint
Full-text available
Natural Language Processing (NLP) is an important field of study dedicated to improving automated reading and understanding of human text by machines through the development of specialized algorithms. These algorithms need a large corpus of annotated text in order to learn the semantics and syntax of human language, which is often specific and nuan...
Article
Full-text available
The size of biodiversity data sets, and the size of people’s questions around them, are outgrowing the capabilities of desktop applications, single computers, and single developers. Numerous articles in the corporate sector (Delgado 2016) have been written on how much time professionals spend manipulating and formatting large data sets compared to...
Conference Paper
Full-text available
Report on Results of a Hackathon to Progress with the Training Resources for Natural Language Processing (NLP) in Ecology
Article
Biodegradation is an important process for hydrocarbon weathering that influences its fate and transport, yet little is known about in situ biodegradation rates of specific hydrocarbon compounds in the deep ocean. Using data collected in the Gulf of Mexico below 700 m during and after the Deepwater Horizon oil spill, we calculated first-order degra...
Poster
Full-text available
This project is funded by NSF-Award ACI 1443085. ClearEarth aims to bring semantic technologies from the biomedical field into the earth-surface earth, ice and life sciences. The products will be applied to operations such as query and reasoning.
Article
Full-text available
Scientists have unprecedented access to a wide variety of high-quality datasets. These datasets, which are often independently curated, commonly use unstructured spreadsheets to store their data. Standardized annotations are essential to perform synthesis studies across investigators, but are often not used in practice. Therefore, accurately combin...
Data
Cancer and ecology datasets. (ZIP)
Article
Full-text available
Background The natural sciences, such as ecology and earth science, study complex interactions between biotic and abiotic systems in order to understand and make predictions. Machine-learning-based methods have an advantage over traditional statistical methods in studying these systems because the former do not impose unrealistic assumptions (such...
Article
Full-text available
The need for a names-based cyber-infrastructure for digital biology is based on the argument that scientific names serve as a standardized metadata system that has been used consistently and near universally for 250 years. As we move towards data-centric biology, name-strings can be called on to discover, index, manage, and analyze accessible digit...
Article
Full-text available
Today's low cost digital data provides unprecedented opportunities for scientific discovery from synthesis studies. For example, the medical field is revolutionizing patient care by creating personalized treatment plans based upon mining electronic medical records, imaging, and genomics data. Standardized annotations are essential to subsequent ana...
Article
Full-text available
Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardiz...
Article
Process studies and coupled-model validation efforts in geosciences often require integration of multiple data types across time and space. For example, improved prediction of hydrocarbon fate and transport is an important societal need which fundamentally relies upon synthesis of oceanography and hydrocarbon chemistry. Yet, there are no publically...
Article
Full-text available
Holistic understanding of estuarine and coastal environments across interacting domains with high-dimensional complexity can profitably be approached through data-centric synthesis studies. Synthesis has been defined as “the inferential process whereby new models are developed from analysis of multiple data sets to explain observed patterns across...
Conference Paper
The difficult job market for PhD scientists has forced many from more traditional academic paths to look for opportunities in industry positions. This workshop will include talks from entrepreneurs and others describing their journey from academia to industry and general advice from scientists entering the private work force.
Article
Full-text available
A better understanding of oil droplet formation, degradation, and dispersal in deep waters is needed to enhance prediction of the fate and transport of subsurface oil spills. This research evaluates the influence of initial droplet size and rates of biodegradation on the subsurface transport of oil droplets, specifically those from the Deepwater Ho...
Article
Full-text available
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bot...
Article
Full-text available
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human-and machine-inter-pretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bot...
Article
Full-text available
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bot...
Article
Full-text available
Background. Mexico has the world’s fifth largest population of amphibians and the second country with the highest quantity of threatened amphibian species. About 10% of Mexican amphibians lack enough data to be assigned to a risk category by the IUCN, so in this paper we want to test a statistical tool that, in the absence of specific demographic d...
Preprint
Full-text available
Background: Mexico is the fourth richest country in amphibians and the second country with the highest quantity of threatened amphibian species, and this number could be higher as many species are too poorly known to be accurately assigned to a risk category. The absence of a risk status or an unknown population trend can slow or halt conservation...
Preprint
Background: Mexico is the fourth richest country in amphibians and the second country with the highest quantity of threatened amphibian species, and this number could be higher as many species are too poorly known to be accurately assigned to a risk category. The absence of a risk status or an unknown population trend can slow or halt conservation...
Preprint
Full-text available
Background: Mexico is the fourth richest country in amphibians and the second country with the highest quantity of threatened amphibian species, and this number could be higher as many species are too poorly known to be accurately assigned to a risk category. The absence of a risk status or an unknown population trend can slow or halt conservation...
Article
Full-text available
The role that ontologies play or can play in designing and employing semantic technologies has been widely acknowledged by the SemanticWeb and Linked Data communities. But the level of collaboration between these communities and the Applied Ontology community has been much less than expected. Also, ontologies and ontological techniques appear to be...
Technical Report
Full-text available
+++ UPDATED VERSION PUBLISHED IN APPLIED ONTOLOGY VOL.9, ISSUE 2, 2014+++ This version 1.0.0 (2014.04.29-10:45) of the OntologySummit2014_Communique was adopted by the community at the Ontology Summit 2014 Symposium (Arlington, Virginia, USA). It summarizes the activity of 4+ months of discussions of the Ontology Community (IAOA) and its collabora...
Article
Full-text available
Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags...
Article
Synthesis science requires significant investment in data discovery, access, and integration, which can be difficult when the data have not been published or deposited. The modeling efforts of the Gulf Integrated Spill Research Consortium (GISR) required an integrated database of oceanographic and hydrocarbon field measurements collected from the G...
Article
Full-text available
Among the key services that institutional data management infrastructures must provide are provenance and lineage tracking and the ability to associate data with contextual information needed for understanding and use. These functionalities are critical for addressing a number of key issues faced by data collectors and users, including trust in dat...
Chapter
Full-text available
Synergy between science and informatics is required to develop a more robust understanding of the earth as a system of systems. Interaction of these systems is recorded in both geological and biological data, yet the capability to integrate across disciplines is hampered by diverse social and technological approaches to research and communication....
Article
Data sharing has become an important issue in modern biodiversity research to address large scale questions. Despite the steadily growing scientific demand, data are not easily accessed. Why is this the case? This study explores the reasons for the reluctance to share data on the one hand and the motivations for sharing on the other by summarising...
Article
Full-text available
Taxonomists have been tasked with cataloguing and quantifying the Earth's biodiversity. Their progress is measured in code-compliant species descriptions that include text, images, type material and molecular sequences. It is from this material that other researchers are to identify individuals of the same species in future observations. It has bee...
Data
Names of species of Gymnodinium and their synonym groups. (DOCX)
Data
Names of Gymnodinium no longer associated with the genus [309]–[337]. The current name and/or the reason for rejecting the name is given. A name is listed as not code compliant if it is used without the existence of an original description. A name is listed as erroneous if it is an incorrect combination of genus name and species epithet. (DOCX)
Data
Names associated with extinct species of Gymnodinium [304]–[308]. (DOCX)
Data
List of species of Gymnodinium following removal of oncers that do not meet the selection criteria used here. (DOCX)
Data
List of rejected Gymnodinium names. (DOCX)
Data
Internet search results for each species of Gymnodinium. Websites searched were Biodiversity Heritage Library (BHL, www.biodiversitylibrary.org), Global Biodiversity Information Facility (GBIF, www.gbif.org), GenBank (www.ncbi.nlm.nih.gov/genbank/), Google Scholar (scholar.google.com) and the Web of Science (ISI, www.webofknowledge.com). Numbers in...
Article
Full-text available
Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of th...
Article
Full-text available
Over the last decade, our understanding of the environmental controls on Pseudo-nitzschia blooms and domoic acid (DA) production has matured. Pseudo-nitzschia have been found along most of the world’s coastlines, while the impacts of its toxin, DA, are most persistent and detrimental in upwelling systems. However, Pseudo-nitzschia and DA have recen...
Article
Full-text available
We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the "Big New Biology". Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances wi...
Conference Paper
Background/Question/Methods Preservation of historical data is essential toward an understanding of the cumulative effects of anthropogenic impacts on marine ecosystems, such as the shifting baselines syndrome or trophic cascades. Historic data on marine organisms are very scattered (much in hard copy only) and in many cases highly susceptible to...
Article
Full-text available
This report summarizes the proceedings of the one day BioSharing meeting held at the Intelligent Systems for Molecular Biology (ISMB) 2010 conference in Boston, MA, USA This inaugural BioSharing event was hosted by the Genomic Standards Consortium as part of its M3 & BioSharing special interest group (SIG) workshop. The BioSharing event included in...
Article
Full-text available
In this chapter we provide a brief history of what is known about marine microbial diversity, summarize our achievements in performing a global census of marine microbes, and reflect on the questions and priorities for the future of the marine microbial census.
Conference Paper
Full-text available
This paper presents a novel architecture that brings together Information Extraction (IE) with Event Processing (EP)research areas to globally monitor human activities and biodiversity dynamics and measure their impact on ecosystems. The two areas (IE and EP) are rich on their own and we believe their integration will achieve a much more comprehens...
Article
Full-text available
Despite high abundances of toxic Pseudo-nitzschia spp. over Louisiana oyster beds (Crassostrea virginica; eastern oyster) there have been no documented cases of amnesic shellfish poisoning (ASP) in the state. Two possible explanations are that oysters do not readily feed on long pointed chains of Pseudo-nitzschia cells or they discriminate against...
Article
Full-text available
Clonal cultures of plankton are widely used in laboratory experiments and have contributed greatly to knowledge of microbial systems. However, many physiological characteristics vary drastically between strains of the same species, calling into question our ability to make ecologically relevant inferences about populations based on studying one or...
Article
Links between eutrophication, plankton community structure, microzooplankton grazing and dinoflagellate abundance were investigated in two tributaries of the Chesapeake Bay, the Choptank and Patuxent Rivers (MD, USA). Sampling and experiments were conducted during the spring of consecutive dry (below average freshwater flow) and wet (above average...
Article
Full-text available
Very little research has been conducted on mid-Atlantic estuarine populations of the diatom Pseudo-nitzschia despite recent evidence of toxicity in regional isolates. We collected field samples from the Chesapeake Bay region from 2002 to 2007 for Pseudo-nitzschia enumeration and toxin analysis. Abundances of Pseudo-nitzschia were highest in the win...
Article
Domoic acid (DA) is a potent algal neurotoxin produced primarily by members of the diatom genus Pseudo-nitzschia, most of which are considered cosmopolitan and can produce harmful blooms in estuarine and coastal waters. Many of these habitats are subject to extreme fluctuations in salinity and are extensively utilised as shellfish growing/harvestin...
Article
The first recorded bloom of Karenia spp., resulting in brevetoxin in oysters, in the low salinity waters of the Northern Gulf of Mexico (NGOMEX) occurred in November 1996. It raised questions about the salinity tolerance of Karenia spp., previously considered unlikely to occur at salinities <24 psu, and the likelihood that the bloom would reoccur i...
Article
Salinity varies widely in coastal areas that often have a high abundance of Pseudo-nitzschia H. Peragallo. Pseudo-nitzschia is abundant in Louisiana waters, and high cellular domoic acid has been observed in natural samples but no human illness has been reported. To assess the threat of amnesic shellfish poisoning (ASP), we examined the effect of s...

Network

Cited By