Scalability test results (run on 448 GiB RAM, eight 8-core 64bit processors Intel Xeon TM X7560 2.26 GHz driven machine).

Source publication

Web Interface and Collection for Mathematical Retrieval : WebMIaS and MREC

Article

Full-text available

Jan 2011

We demonstrate searching of mathematical expressions in technical digital libraries on a MREC collection of 439,423 real scientific documents with more than 158 million mathematical formulae. Our solution-the WebMIaS system-allows the retrieval of mathematical expressions written in TEX or MathML. TEX queries are converted on-the-fly into tree repr...

Context 1

... is shown in Table 2 on page 83, the performance of the system scales linearly. This gives feasible response times even for our billions of indexed subformulae. ...

View in full-text

Models of Digital Educational Resources Indexing and Dynamic User Proﬁle Evolution

Article

Full-text available

Feb 2016

The modelling of the user proﬁle and its integration into the search process is an effective way in personalized information search within a repository of educational digital resources. Therefore, it raises gradually the issue concerning the dynamic development of this proﬁle so as the information requester sets up queries. In our approach presente...

Figure 1. Facets generates counts for metadata categories

Figure 2. Spatial faceting enables heatmaps showing the distribution of...

Implementing an open source spatio-temporal search platform for Spatial Data Infrastructures

Article

Full-text available

Oct 2016

A Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide an efficient and flexible way to use spatial information. One of the key software components of an SDI is the catalogue service which is needed to discover, query, and manage the metadata. Catalogue services in an SDI are typically b...

SEUPD@CLEF: Team INTSEG on Argument Retrieval for Controversial Questions Notebook for the Touché Lab on Argument Retrieval at CLEF 2022

Conference Paper

Full-text available

Mar 2023

Search Engines play important roles in helping users to rapidly retrieve relevant information. The technology underlying Search Engines has been improved in the last years, both in terms of hardware capabilities and in terms of software. However, they are still affected by many issues due to the continuously growing amount of data and the various f...

piNET: a versatile web platform for downstream analysis and visualization of proteomics data

Article

Full-text available

May 2020

Rapid progress in proteomics and large-scale profiling of biological systems at the protein level necessitates the continued development of efficient computational tools for the analysis and interpretation of proteomics data. Here, we present the piNET server that facilitates integrated annotation, analysis and visualization of quantitative proteom...

Fig. 2: Schematic diagram Algorithm 1: Calculating Relevance Score...

Answering Follow-up Questions on Bug Reports with Structured Information Retrieval and Deep Learning

Preprint

Full-text available

Apr 2023

Software bug reports reported on bug-tracking systems often lack crucial information for the developers to promptly resolve them, costing companies billions of dollars. There has been significant research on effectively eliciting information from bug reporters in bug tracking systems using different templates that bug reporters need to use. However...

Leveraging Formulae and Text for Improved Math Retrieval

Thesis

Jul 2022

Behrooz Mansouri

Large collections containing millions of math formulas are available online. Retrieving math expressions from these collections is challenging. Users can use formula, formula+text, or math questions to express their math information needs. The structural complexity of formulas requires specialized processing. Despite the existence of math search systems and online community question-answering websites for math, little is known about mathematical information needs. This research first explores the characteristics of math searches using a general search engine. The findings show how math searches are different from general searches. Then, test collections for math-aware search are introduced. The ARQMath test collections have two main tasks: 1) finding answers for math questions and 2) contextual formula search. In each test collection (ARQMath-1 to -3) the same collection is used, Math Stack Exchange posts from 2010 to 2018, introducing different topics for each task. Compared to the previous test collections, ARQMath has a much larger number of diverse topics, and improved evaluation protocol. Another key role of this research is to leverage text and math information for improved math information retrieval. Three formula search models that only use the formula, with no context are introduced. The first model is an n-gram embedding model using both symbol layout tree and operator tree representations. The second model uses tree-edit distance to re-rank the results from the first model. Finally, a learning-to-rank model that leverages full-tree, sub-tree, and vector similarity scores is introduced. To use context, Math Abstract Meaning Representation (MathAMR) is introduced, which generalizes AMR trees to include math formula operations and arguments. This MathAMR is then used for contextualized formula search using a fine-tuned Sentence-BERT model. The experiments show tree-edit distance ranking achieves the current state-of-the-art results on contextual formula search task, and the MathAMR model can be beneficial or re-ranking. This research also addresses the answer retrieval task, introducing a two-step retrieval model in which similar questions are first found and then answers previously given to those similar questions are ranked. The proposed model, fine-tunes two Sentence-BERT models, one for finding similar questions and another one for ranking the answers. For Sentence-BERT model, raw text as well as MathAMR are used.

Learning to Match Mathematical Statements with Proofs

Preprint

Feb 2021

We introduce a novel task consisting in assigning a proof to a given mathematical statement. The task is designed to improve the processing of research-level mathematical texts. Applying Natural Language Processing (NLP) tools to research level mathematical articles is both challenging, since it is a highly specialized domain which mixes natural language and mathematical formulae. It is also an important requirement for developing tools for mathematical information retrieval and computer-assisted theorem proving. We release a dataset for the task, consisting of over 180k statement-proof pairs extracted from mathematical research articles. We carry out preliminary experiments to assess the difficulty of the task. We first experiment with two bag-of-words baselines. We show that considering the assignment problem globally and using weighted bipartite matching algorithms helps a lot in tackling the task. Finally, we introduce a self-attention-based model that can be trained either locally or globally and outperforms baselines by a wide margin.

MIaS: Math-Aware Retrieval in Digital Mathematical Libraries

Preprint

Full-text available

Aug 2018

Digital mathematical libraries (DMLs) such as arXiv, Numdam, and EuDML contain mainly documents from STEM fields, where mathematical formulae are often more important than text for understanding. Conventional information retrieval (IR) systems are unable to represent formulae and they are therefore ill-suited for math information retrieval (MIR). To fill the gap, we have developed, and open-sourced the MIaS MIR system. MIaS is based on the full-text search engine Apache Lucene. On top of text retrieval, MIaS also incorporates a set of tools for preprocessing mathematical formulae. We describe the design of the system and present speed, and quality evaluation results. We show that MIaS is both efficient, and effective, as evidenced by our victory in the NTCIR-11 Math-2 task.

Classifying MathML Expressions by Multilayer Perceptron

Article

Jul 2018
IEICE T INF SYST

MathML is a standard markup language for describing math expressions. MathML consists of two sets of elements: Presentation Markup and Content Markup. The former is widely used to display math expressions in Web pages, while the latter is more suited to the calculation of math expressions. In this letter, we focus on the former and consider classifying Presentation MathML expressions. Identifying the classes of given Presentation MathML expressions is helpful for several applications, e.g., Presentation to Content MathML conversion, text-to-speech, and so on. We propose a method for classifying Presentation MathML expressions by using multilayer perceptron. Experimental results show that our method classifies MathML expressions with high accuracy.

Variable Typing: Assigning Meaning to Variables in Mathematical Text

Conference Paper

Full-text available

Jan 2018

Classification of MathML Expressions Using Multilayer Perceptron

Conference Paper

Aug 2017

MathML consists of two sets of elements: Presentation Markup and Content Markup. The former is more widely used to display math expressions in Web pages, while the latter is more suited to the calculation of math expressions. In this paper, we consider classifying math expressions in Presentation Markup. In general, a math expression in Presentation Markup cannot be uniquely converted into the corresponding expression in Content Markup. If the class of a given math expression can be identified automatically, such conversions can be done more appropriately. Moreover, identifying the class of a given math expression is useful for text-to-speech of math expression. In this paper, we propose a method for classifying math expressions in Presentation Markup by using a kind of deep learning; multilayer perceptron. Experimental results show that our method classifies math expressions with high accuracy.

Presenting and Searching Mathematics in Digital Repositories

Article

Sep 2015

The paper presents an overview of the current development of tools for search for mathematical formulae and their implementation in Digital Mat hematical Libraries and reference databases such as zbMATH, MathSciNet and EuDML for mathematical scholarly literature.

A Survey on Retrieval of Mathematical Knowledge

Conference Paper

Full-text available

May 2015

We present a short survey of the literature on indexing and retrieval of mathematical knowledge, with pointers to 72 papers and tentative taxonomies of both retrieval problems and recurring techniques.

Retrieval of Research-level Mathematical Information Needs: A Test Collection and Technical Terminology Experiment

Conference Paper

Jan 2015

Math Indexer and Searcher Web Interface: Towards Fulfillment of Mathematicians' Information Needs

Article

Full-text available

Apr 2014

We are designing and developing a web user interface for digital mathematics libraries called WebMIaS. It allows queries to be expressed by mathematicians through a faceted search interface. Users can combine standard textual autocompleted keywords with keywords in the form of mathematical formulae in LaTeX or MathML formats. Formulae are shown rendered by the web browser on-the-fly for users' feedback. We describe WebMIaS design principles and our experiences deploying in the European Digital Mathematics Library (EuDML). We further describe the issues addressed by formulae canonicalization and by extending the MIaS indexing engine with Content MathML support.

Scalability test results (run on 448 GiB RAM, eight 8-core 64bit processors Intel Xeon TM X7560 2.26 GHz driven machine).

Context in source publication

Similar publications

Citations