Taxicab Geometry: An Adventure in Non-Euclidean Geometry

A reproducible experimental survey on biomedical sentence similarity: a string-based method sets the state of the art

Preprint

Full-text available

May 2022

This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most of current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate an unexplored benchmark, called Corpus-Transcriptional-Regulation; (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of reproducibility resources for methods and experiments in this line of research. Our experimental survey is based on a single software platform that is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure sets the new state of the art on the sentence similarity task in the biomedical domain and significantly outperforms all the methods evaluated herein, except one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool, have a significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and warn on the need of refining the current benchmarks. Finally, a noticeable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning models evaluated herein.

Two-parameter Persistence for Images via Distance Transform

Conference Paper

Full-text available

Oct 2021

The distance transform of a binary image is a classic tool in computer vision and it has been widely used in the field of Topological Data Analysis (TDA) to study porous media. A common practice is to convert grayscale images to binary ones to apply the distance transform. In this work, by considering the threshold decomposition of a grayscale image, we prove that threshold decomposition and distance transform together to formulate a two-parameter filtration. This would offer the TDA community a concrete example to apply multi-parameter persistence on digital image analysis. We demonstrate our method on the firn dataset.

Image Haziness Contrast Metric Describing Optical Scattering Depth

Article

Full-text available

Sep 2023

Contrast is not uniquely defined in the literature. There is a need for a contrast measure that scales linearly and monotonically with the optical scattering depth of a translucent scattering layer that covers an object. Here, we address this issue by proposing an image contrast metric, which we call the Haziness contrast metric. In its essence, the Haziness contrast compares normalized histograms of multiple blocks of the image, a pair at a time. Subsequently, we test several prominent contrast metrics in the literature, as well as the new one, by using milk as a scattering medium in front of an object to simulate a decline in image contrast. Compared to other contrast metrics, the Haziness contrast metric is monotonic and close to linear for increasing density of the scattering material, compared with other metrics in the literature. The Haziness contrast has a wider dynamic range, and it correctly predicts the order of scattering depth for all the channels in the RGB image. Utilization of the metric to evaluate the performance assessment of dehazing algorithms is also suggested.

Applications of NeutroGeometry and AntiGeometry in Real World

Article

Jan 2023

Erick Gonzalez Gonzalez-Caballero

NeutroGeometries are those geometric structures where at least one definition, axiom, property, theorem, among others, is only partially satisfied. In AntiGeometries at least one of these concepts is never satisfied. Smarandache Geometry is a geometric structure where at least one axiom or theorem behaves differently in the same space, either partially true and partially false, or totally false but its negation done in many ways. This paper offers examples in images of nature, everyday objects, and celestial bodies where the existence of Smarandechean or NeutroGeometric structures in our universe is revealed. On the other hand, a practical study of surfaces with characteristics of NeutroGeometry is carried out, based on the properties or more specifically NeutroProperties of the famous quadrilaterals of Saccheri and Lambert on these surfaces. The article contributes to demonstrating the importance of building a theory such as NeutroGeometries or Smarandache Geometries because it would allow us to study geometric structures where the well-known Euclidean, Hyperbolic or Elliptic geometries are not enough to capture properties of elements that are part of the universe, but they have sense only within a NeutroGeometric framework. It also offers an axiomatic option to the Riemannian idea of Two-Dimensional Manifolds. In turn, we prove some properties of the NeutroGeometries and the materialization of the symmetric triad , , and .

Juxtaposition of Different Machine Learning Techniques for Improved Time Series Classification

Article

Full-text available

Nov 2019

An essential type of TS analysis is classification, which can, for instance, advance energy load forecasting in smart grids by discovering the varieties of electronic gadgets based totally on their strength expenditure profiles recorded by way of computerized sensors. Such applications are very often characterised by using (a) very lengthy TS and (b) extensive TS datasets needing classification. but, current techniques to time series classification (TSC) cannot deal with such facts volumes at desirable accuracy. WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is each rapid and unique. Like different today's TSC techniques, WEASEL modifies time collection into characteristic vectors, the use of a sliding-window approach, which is then surpassed via a device getting to know classifier. Our approach here is the amalgamation of Distance-specific approaches such as DTW alongwith feature-specific approaches namely SAX and WEASEL and hence, this method may be effortlessly prolonged to be used in aggregate with different strategies. specially, we show that once blended with the space measures which include Minkowski distance measures, DTW, SAX and PAA, it outperforms the previously known methods.

Theory of Distances in NeutroGeometry

Article

Full-text available

Jun 2024

NeutroGeometry is one of the most recent approaches to geometry. In NeutroGeometry models, the main condition is to satisfy an axiom, definition, property, operator and so on, that is neither entirely true nor entirely false. When one of these concepts is not satisfied at all it is called AntiGeometry. One of the problems that this new theory has had is the scarcity of models. Another open problem is the definition of angle and distance measurements within the framework of NeutroGeometry. This paper aims to introduce a general theory of distance measures in any NeutroGeometry. We also present an algorithm for distance measurement in real-life problems.

Some Constructions and Mathematical Properties of Zero-Correlation-Zone Sonar Sequences

Article

Full-text available

Apr 2024
Entropy

In this paper, we propose the zero-correlation-zone (ZCZ) of radius r on two-dimensional m×n sonar sequences and define the (m,n,r) ZCZ sonar sequences. We also define some new optimality of an (m,n,r) ZCZ sonar sequence which has the largest r for given m and n. Because of the ZCZ for perfect autocorrelation, we are able to relax the distinct difference property of the conventional sonar sequences, and hence, the autocorrelation of ZCZ sonar sequences outside ZCZ may not be upper bounded by 1. We may sometimes require such an ideal autocorrelation outside ZCZ, and we define ZCZ-DD sonar sequences, indicating that it has an additional distinct difference (DD) property. We first derive an upper bound on the ZCZ radius r in terms of m and n≥m. We next propose some constructions for (m,n,r) ZCZ sonar sequences, which leads to some very good constructive lower bound on r. Furthermore, this construction suggests that for m and r, the parameter n can be as large as possible indefinitely. We present some exhaustive search results on the existence of (m,n,r) ZCZ sonar sequences for some small values of r. For ZCZ-DD sonar sequences, we prove that some variations of Costas arrays construct some ZCZ-DD sonar sequences with ZCZ radius r=2. We also provide some exhaustive search results on the existence of (m,n,r) ZCZ-DD sonar sequences. Lots of open problems are listed at the end.

AGRUPAMENTO DOS ESTADOS BRASILEIROS VISANDO IDENTIFICAR SIMILARIDADES COM A ATUAL DIVISÃO DO PAÍS EM REGIÕES

Conference Paper

Jan 2018

Over 200,000 kilometers of free-flowing river habitat in Europe is altered due to impoundments

Article

Full-text available

Oct 2023

European rivers are disconnected by more than one million man-made barriers that physically limit aquatic species migration and contribute to modification of freshwater habitats. Here, a Conceptual Habitat Alteration Model for Ponding is developed to aid in evaluating the effects of impoundments on fish habitats. Fish communities present in rivers with low human impact and their broad environmental settings enable classification of European rivers into 15 macrohabitat types. These classifications, together with the estimated fish sensitivity to alteration of their habitat are used for assessing the impacts of six main barrier types (dams, weirs, sluices, culverts, fords, and ramps). Our results indicate that over 200,000 km or 10% of previously free-flowing river habitat has been altered due to impoundments. Although they appear less frequently, dams, weirs and sluices cause much more habitat alteration than the other types. Their impact is regionally diverse, which is a function of barrier height, type and density, as well as biogeographical location. This work allows us to foresee what potential environmental gain or loss can be expected with planned barrier management actions in rivers, and to prioritize management actions.

Cabelos Negros Olhos Azuis e Outras Feições das Matemáticas Puras e Aplicadas

Book

Full-text available

Sep 2023

John Fossa

Black Hair, Blue Eyes and Other Features of Pure and Applied Mathematics (in Portuguese). 2nd edition, revised and amplified. Nine articles dealing, implicitly or explicitly, with mathematical thinking. Topics: 1. conceptualization of pure and applied mathematics; 2. Euclid’s demonstration of the Pythagorean Theorem; 3. the golden ratio; 4. negative numbers; 5. a trio of algebraists; 6. an isoperimetric taxicab metric; 7. exceptions and rule proving; 8. mathematical induction; 9. on note-taking.

Solving Serial Acquirer Puzzles

Article

Full-text available

Jul 2023

Using a novel typology of serial acquirers, we examine several puzzles documented in the prior literature. We show that acquisitions by different types of acquirers are driven by different factors, they acquire different sizes of targets, and subsequent acquisitions by acquirers are predictable ex ante. Controlling for market anticipation, the most frequent serial acquirers do not earn declining returns as they continue acquiring, while less frequent acquirers do. Our methodology enhances our understanding of serial acquisition dynamics, anticipation, and economic value adjustments. The methodology is likely to be relevant to topics related to event anticipation beyond those covered in this study. (JEL G14, G34, G35) Received April 18, 2023; editorial decision June 14, 2023 by Editor Isil Erel

illuminating high school students with some interesting non-Euclidean geometries in the plane

Article

Full-text available

Jul 2023

Abdullah Kurudirek

Global trends in mathematics education show that modern teaching methods are rapidly evolving at the national, regional, and global levels. In this regard, teaching non-Euclidean geometry concepts in schools can be essential in developing students' spatial imagination and enhancing scientific inquiry competencies. This paper aims to engage and increase students' interest in geometry science by introducing the fundamental concepts of several non-Euclidean geometries. With this aim, we will first give you a modern definition of geometry and move towards the exciting and fun world of non-Euclidean geometry. Of course, remember that the target audience we will talk about these issues is talented students in secondary and high schools.

Comparison study of unsupervised paraphrase detection: Deep learning—The key for semantic similarity detection

Article

Full-text available

Jun 2023
EXPERT SYST

Automatic detection of concealed plagiarism in the form of paraphrases is a difficult task, and finding a successful unsupervised approach for paraphrase detection is necessary as a precondition to change that. This comparative study identified the most efficient methods for unsupervised paraphrased document detection using similarity measures alone or combined with Deep Learning (DL) models. It proved the hypothesis that some DL models are more successful than the best statistically‐based methods in that task. Many experiments were carried out, and their results were compared. The text similarities between documents are obtained from 60 different methods using five paraphrase corpora, including the new one made by authors, as an important original contribution. Some DL models achieved significantly better results than those obtained by the best statistical methods, especially pre‐trained transformer‐based language models with average values of Accuracy and F1 of 85.8% and 88.3%, respectively, with top values of 99.9% and 98.4% for Accuracy and F1 on some corpora. These results are even better than those of supervised and combined approaches. Therefore, here presented results prove that detecting concealed plagiarism becomes an attainable goal. This study highlighted those language models with the best overall results for paraphrase detection as best suited for further research. The study also discussed the choice of similarity/distance measure paired with embeddings produced by DL models and some advantages of using cosine similarity as the fastest measure. For 60 different methods, complexity has been defined in O notation. Times needed for their implementation have also been presented. The article's results and conclusions are a firm base for future semantic similarity, paraphrasing, and plagiarism detection studies, clearly marking state‐of‐the‐art tools and methods.

МЕТОД ЗЛИТТЯ БАГАТОМОДАЛЬНИХ ВЕКТОРНИХ ПРЕДСТАВЛЕНЬ СЛІВ У МАЛОРЕСУРСНОМУ СЕРЕДОВИЩІ

Article

Mar 2023

У даній статті представлено метод злиття багатомодальних векторних представлень слів у малоресурсному середовищі. Цей метод, на відміну від інших методів злиття векторних представлень слів, враховує обмеження малоресурсного середовища і дозволяє поєднувати вектори слів з різних джерел, таких як документи та словники. Метод покладається на обчислення міжрядкової відстані замість побудови повних синтаксичних і морфологічних моделей, що часто неможливо в малоресурсних мовах. Його можна використовувати на проміжних етапах побудови систем обробки природної мови та машинного навчання при вирішенні практичних завдань, таких як машинний переклад чи класифікація документів. Крім того, проведено аналіз різних методів злиття багатомодальних векторних представлень слів у малоресурсному середовищі. У статті описуються переваги, недоліки та обмеження кожного підходу, враховуючи завдання побудови уніфікованого векторного представлення тексту в поєднанні з даними з додаткових джерел. У дослідженні прикладом завдання у малоресурсному середовищі була обрана класифікація петицій до Київської міської ради, написаних українською мовою. Велика кількість функцій обчислення міжрядкової відстані ускладнює їх вибір при вирішенні практичних задач. Ми пропонуємо набір рекомендацій у контексті малоресурсних середовищ, а також методологію вибору найкращого для вирішення поставлених завдань. Проаналізовані функції обчислення міжрядкової відстані включають відстань Левенштейна, подібність Жаккара, Мангеттенську відстань, відстань Хеммінга та коефіцієнт Дайса. Наші результати демонструють, що метод на основі відстані Левенштейна збільшує якість класифікації документів сильніше, ніж альтернативи. Ці висновки мають практичне значення для різних сфер, включаючи обробку природної мови, аналіз текстів та пошук інформації.

Weighted edit distance optimized using genetic algorithm for SMILES-based compound similarity

Article

Full-text available

Feb 2023
PATTERN ANAL APPL

A method for developing new drugs is the ligand-based approach, which requires intermolecular similarity computation. The simplified molecular input line entry system (SMILES) is primarily used to represent the molecular structure in one dimension. It is a representation of molecular structure; the properties can be completely different even if only one character is changed. Applying the conventional edit distance method makes it difficult to obtain optimal results, because the insertion, deletion, and substitution of molecules are considered the same in calculating the distance. This study proposes a novel edit distance using an optimal weight set for three operations. To determine the optimal weight set, we present a genetic algorithm with suitable hyperparameters. To emphasize the impact of the proposed genetic algorithm, we compare it with the exhaustive search algorithm. The experiments performed with four well-known datasets showed that the weighted edit distance optimized with the genetic algorithm resulted in an average performance improvement in approximately 20%.

Genomic Signature in Evolutionary Biology: A Review

Article

Full-text available

Feb 2023

Simple Summary In a broad sense, genomic signature refers to characteristics associated to DNA sequences. Many studies analyze genotype–phenotype patterns in a group of genes, thus targeting genomic signatures associated to a given disease or identifying a gene expression profile. However, some studies in comparative genomics and evolutionary biology refer to genomic signature as the statistical properties of DNA sequences, such as the distribution of k-words. In these fields of study, genomic signatures are species-specific and can be informative about phylogenetic relationships. In this review, we identify the main genomic signatures in a large collection of articles by performing a bibliometric analysis and then rename each signature according to its conceptual meaning. Among the different signatures, we use the term organismal signature to denote the DNA patterns able to infer evolutionary relationships and go on to review its formulation and applications in the second part of the article. Abstract Organisms are unique physical entities in which information is stored and continuously processed. The digital nature of DNA sequences enables the construction of a dynamic information reservoir. However, the distinction between the hardware and software components in the information flow is crucial to identify the mechanisms generating specific genomic signatures. In this work, we perform a bibliometric analysis to identify the different purposes of looking for particular patterns in DNA sequences associated with a given phenotype. This study has enabled us to make a conceptual breakdown of the genomic signature and differentiate the leading applications. On the one hand, it refers to gene expression profiling associated with a biological function, which may be shared across taxa. This signature is the focus of study in precision medicine. On the other hand, it also refers to characteristic patterns in species-specific DNA sequences. This interpretation plays a key role in comparative genomics, identifying evolutionary relationships. Looking at the relevant studies in our bibliographic database, we highlight the main factors causing heterogeneities in genome composition and how they can be quantified. All these findings lead us to reformulate some questions relevant to evolutionary biology.

Ciclo de formação em ensino de matemática: contribuições do ensino, da pesquisa e da extensão na formação do professor de Matemática

Book

Full-text available

Apr 2022

O que é apresentado nesta obra é resultado do trabalho docente incessante e de alta qualidade durante a pandemia do novo coronavírus (Covid-19). Os autores que escreveram suas pesquisas nesta obra científica representam a força e a competência do profissionalismo docente brasileiro, promovendo ações remotas durante a pandemia com tecnologias móveis com materiais adaptados e lúdicos para atrair a atenção de milhares de estudantes que tiveram que estudar em suas casas, muito deles com extrema dificuldade de conexão de internet, mas sempre com o amparo e a dedicação dos professores e de suas famílias. O livro é composto por quatro capítulos, a saber: No primeiro capítulo, Geometria na Pandemia, os autores André Luiz Souza Silva, José Carlos Gonçalves Gaspar e Vilmar Gomes da Fonseca apresentam uma proposta didática inovadora para o ensino de simetria axial, com base em uma sequência didática de quatro tarefas exploratórias com recurso de dobraduras, recortes e uso de tecnologias digitais. Os autores apresentam uma descrição comentada da proposta didática e seus objetivos, e finalizam propondo reflexões sobre possibilidades de criação de ambientes de ensino promotores de aprendizagens significativas. No segundo capítulo, intitulado Práticas Didáticas Interativas e Avaliativas no Ensino Remoto, o autor, a partir do Ensino Remoto Emergencial (ERE) que surgiu no ano de 2020 como alternativa para dar continuidade às atividades escolares e, simultaneamente, preservar a saúde de estudantes e docentes, durante a pandemia causada pelo vírus Sars-CoV-2, discute que inicialmente muitos docentes se posicionaram contrários ao ERE porque entendiam que não seria possível fazer ensino remoto de qualidade na educação básica. No entanto, com o passar dos dias e o estudo e a dedicação de milhares de docentes em todo o mundo, foram criados caminhos alvissareiros para a educação frente aos desafios que se apresentaram. Neste texto são apresentadas algumas práticas didáticas desenvolvidas e utilizadas por docentes do Colégio de Aplicação da UFRJ, de forma que o potencial dos recursos digitais fosse demonstrado por meio das avaliações positivas de estudantes, responsáveis, licenciandos e docentes de diferentes disciplinas. No terceiro capítulo, cujo tema é Conceitos elementares de Geometrias não euclidianas na escola básica: por quê? Para quê?, a autora espera responder a algumas questões que acredita serem pertinentes a esse tempo da pandemia provocada pelo novo coronavírus, bem como para o futuro, com os ensinos híbrido e remoto. Apresenta um conjunto de questões relacionadas às geometrias não euclidianas (GNE) que podem levar o(a) leitor(a) a pensar não ter ligação com a pandemia. Para responder ao “por quê?” do título, a autora apresenta como entende a criação de novos conhecimentos científicos e as lógicas envolvidas com as ações relacionadas ao pensamento científico que os embasam, à Educação e à Matemática. O texto do quarto capítulo, Problematização dos números reais com recursos da Geometria Dinâmica: Descobrindo lacunas na reta numérica, introduz um conjunto de atividades educacionais pensadas para a problematização e a aprendizagem dos números reais com o uso de recursos tecnológicos escolhidos de maneira a propiciar uma abordagem interativa e dinâmica do tema. São três construções eletrônicas, chamadas de applets, que favorecem a exploração de importantes orientações didáticas.

AptaMat: a matrix-based algorithm to compare single-stranded oligonucleotides secondary structures

Article

Full-text available

Nov 2022
BIOINFORMATICS

Motivation: Comparing single-stranded nucleic acids (ssNAs) secondary structures is fundamental when investigating their function and evolution and predicting the effect of mutations on their structures. Many comparison metrics exist, although they are either too elaborate or not sensitive enough to distinguish close ssNAs structures. Results: In this context, we developed AptaMat, a simple and sensitive algorithm for ssNAs secondary structures comparison based on matrices representing the ssNAs secondary structures and a metric built upon the Manhattan distance in the plane. We applied AptaMat to several examples and compared the results to those obtained by the most frequently used metrics, namely the Hamming distance and the RNAdistance, and by a recently developed image-based approach. We showed that AptaMat is able to discriminate between similar sequences, outperforming all the other here considered metrics. In addition, we showed that AptaMat was able to correctly classify 14 RFAM families within a clustering procedure. Supplementary information: Supplementary data are available at Bioinformatics online.

Security Challenges and Economic-Geographical Metrics for Analyzing Safety to Achieve Sustainable Protection

Article

Full-text available

Nov 2022

In this article, we aim to develop the theoretical background for the possible application of Economic-Geographical metrics in the field of population protection. We deal with various options for analyzing the availability of “safety” for citizens using studied metrics. Among others, we apply well-known metrics such as the Gini coefficient, Hoover index and even establish their generalizations. We develop a theoretical background and evaluate our findings on generated and actual data. We find that the metrics used can have an opposite interpretation depending on the scenario we are considering. We also discover that some scenarios demand a modification to the usual metric. We conclude that Economic-Geographical metrics give valuable tools to address specific security challenges. Metric’s generalizations could serve as a potent tool for other authors working in the field of population protection. Nevertheless, we must keep in mind that metrics also have drawbacks.

Contrastive study of minimum edit distance and cosine similarity measures in the context of word suggestions for misspelled Marathi words

Article

Full-text available

Oct 2022
MULTIMED TOOLS APPL

Spelling errors are fundamental errors in text writing. The digital era has added another dimension called keyboard layout to this problem. Memorization, language orthography, and keyboard layout are sources of spelling errors in electronic texts. English is being the linked language of the world, good quantum of work towards the spelling error detection and plausible suggestions has been done for English language. But it is not the case for digital resources scarce languages like Indian languages. Marathi which is the official language of Maharashtra State in India and the world’s 10th highest spoken language is not exception to this. Various computational approaches for spelling error detection and correction have been advocated in the literature. Amongst these, similarity-based measures have proven to be the prominent ones. This paper discusses the detailed contrastive study of the two popular similarity measures viz. minimum edit distance and cosine similarity measures in the context of mis-spelled Marathi words. The philosophical and empirical aspects of these methods have also been presented. For experimentation purpose we have chosen a dataset of 9, 29, 663 unique Marathi words harvested from various sources. We have obtained an accuracy of 85.88% and 86.76% for minimum edit distance algorithm and the cosine similarity algorithm, respectively.

A combinatorial interpretation of the noncommutative inverse Kostka matrix

Preprint

Jul 2022

We provide a combinatorial formula for the expansion of immaculate noncommutative symmetric functions into complete homogeneous noncommutative symmetric functions. To do this, we introduce generalizations of Ferrers diagrams which we call GBPR diagrams. We define tunnel hooks, which play a role similar to that of the special rim hooks appearing in the E\u{g}ecio\u{g}lu-Remmel formula for the symmetric inverse Kostka matrix. We extend this interpretation to skew shapes and fully generalize to define immaculate functions indexed by integer sequences skewed by integer sequences. Finally, as an application of our combinatorial formula, we extend Campbell's results on ribbon decompositions of immaculate functions to a larger class of shapes.

Conceitos elementares de Geometrias não euclidianas na Escola Básica: Por quê? Para quê?

Chapter

Full-text available

Jun 2022

Ana Kaleff

HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness

Preprint

May 2022

The process of optimizing the latency of DNN operators with ML models and hardware-in-the-loop, called auto-tuning, has established itself as a pervasive method for the deployment of neural networks. From a search space of loop-optimizations, the candidate providing the best performance has to be selected. Performance of individual configurations is evaluated through hardware measurements. The combinatorial explosion of possible configurations, together with the cost of hardware evaluation makes exhaustive explorations of the search space infeasible in practice. Machine Learning methods, like random forests or reinforcement learning are used to aid in the selection of candidates for hardware evaluation. For general purpose hardware like x86 and GPGPU architectures impressive performance gains can be achieved, compared to hand-optimized libraries like cuDNN. The method is also useful in the space of hardware accelerators with less wide-spread adoption, where a high-performance library is not always available. However, hardware accelerators are often less flexible with respect to their programming which leads to operator configurations not executable on the hardware target. This work evaluates how these invalid configurations affect the auto-tuning process and its underlying performance prediction model for the VTA hardware. From these results, a validity-driven initialization method for AutoTVM is developed, only requiring 41.6% of the necessary hardware measurements to find the best solution, while improving search robustness.

Over 200,000 kilometers of river-fish habitat is lost to impoundments in Europe.

Preprint

Full-text available

Jan 2022

We estimate over 200,000 km or 10% of previously free-flowing river habitat length has been lost due to impoundments, an amount equivalent to the entire length of rivers in Italy. This loss strongly depends on the biogeographical location and a type of impounding barrier. European rivers are disconnected by more than one million man-made barriers that physically limit or completely block aquatic species migration and contribute to the loss of freshwater habitats8. One of the pervasive effects of barriers is the one caused by impoundment, which directly modifies lotic (flowing) stretches of river into lentic (lake-like) habitats5. Depending on structure and composition of fish communities expected at the barrier location the biological consequences may vary. EU-wide analysis of fish communities observed at river sections with low human induced alteration resulted in a macrohabitat classification of European rivers into 15 river types with expected fish community structure. This set a baseline for assessing the impacts of six main barrier types (dams, weirs, sluices, culverts, ramps, and fords) on river fish habitats across Europe. The largest habitat losses are caused by dams, weirs and sluices in mountainous areas where fish most sensitive to ponding are expected. Although many impoundments are smaller than in lowlands their individual impacts are the greatest. Hence, regional variation in the magnitude of impoundment impact is not only a function of barrier height and density, but to large extent of biogeographical location and barrier type. Strategies for enhancing European riverine biodiversity should focus on prioritization of most sensitive regions and barrier types causing high degree of habitat fragmentation. This work is based on four novel methodological approaches: fish community grouping into habitat use guilds, continental river reference model for ecological sound river management, landscape scale application of physical habitat models and conceptual model of impoundment impacts on fish habitat.

Combination of a New Metaheuristic Algorithm Based on Cooperative Grouper Fish - Octopus and DBSCAN Algorithm to Automatic Clustering

Preprint

Full-text available

Nov 2021

Alireza Balavand

Density-based spatial clustering of applications with noise (DBSCAN) has been used to cluster data with arbitrary shapes which clustering is done based on the density among objects in data. Given that DBSCAN is a proper tool for identifying outliers and clustering non-convex data, it can be used for automatic clustering of non-convex data and covered the weakness of most automatic clustering algorithms in not recognizing non-convex clusters. So, in this paper, a new automatic clustering algorithm is introduced which is a combination of DBSCAN and grouper fish - octopus (GFO) algorithm. GFO-DBSCAN finds the best number of clusters in two main steps in an iterative manner. In the first step, the values of \(esp\) and \(minpts\) are generated by GFO algorithm and in the second step, the clustering of data is performed using DBSCAN algorithm with \(eps\) and \(minpts\) that are generated in the previous step. After each clustering, using correct data labels, and cluster centroids, the Calinski-Harabasz (CH) index is calculated. Finally, after passing some iterations of GFO, the best number of clusters is reported. In this study, three categories of data are used to measure the performance of the GFO-DBSCAN algorithm. Also, DBSCAN is compared with ACDE, DCPSO, and GCUK algorithms. According to the results, GFO-DBSCAN has achieved the optimal number of clusters in most data and has outperformed other well-known algorithms.

Motion Planning for Mobile Manipulators—A Systematic Review

Article

Full-text available

Jan 2022

One of the fundamental fields of research is motion planning. Mobile manipulators present a unique set of challenges for the planning algorithms, as they are usually kinematically redundant and dynamically complex owing to the different dynamic behavior of the mobile base and the manipulator. The purpose of this article is to systematically review the different planning algorithms specifically used for mobile manipulator motion planning. Depending on how the two subsystems are treated during planning, sampling-based, optimization-based, search-based, and other planning algorithms are grouped into two broad categories. Then, planning algorithms are dissected and discussed based on common components. The problem of dealing with the kinematic redundancy in calculating the goal configuration is also analyzed. While planning separately for the mobile base and the manipulator provides convenience, the results are sub-optimal. Coordinating between the mobile base and manipulator while utilizing their unique capabilities provides better solution paths. Based on the analysis, challenges faced by the current planning algorithms and future research directions are presented.

Investigation of Cyber Crime Conducted by Abusing Weak or Default Passwords with a Medium Interaction Honeypot

Preprint

Full-text available

Dec 2021

Recently, advances in cyber-physical systems and IoT led to an increase in devices connected to the internet. This rise of functionality also comes with an increased attack surface for cyber criminals. A proven method for forensic investigations of trends and developments in crimes conducted in the virtual world are honeypots. We set up a medium interaction honeypot offering telnet and SSH services. With this honeypot we captured data from attack sessions. This data was used for statistical and behavioural analysis, such as distributions of attacks and different attacker IPs, originating countries, employed anonymi-sation services, skill level of an adversary and commonly targeted embedded devices. Furthermore, machine learning techniques that are capable of identifying unique types of sessions based on issued commands and provided credentials are presented in this work. There are strong indicators that most of the traffic captured during our research is caused by botnet activities, which corresponds to findings of different research activities.

Rapid and Blind Watermarking Approach of the 3D Objects Using QR Code Images for Securing Copyright

Article

Full-text available

Nov 2021
Comput Intell Neurosci

Watermarking techniques in a wide range of digital media was utilized as a host cover to hide or embed a piece of information message in such a way that it is invisible to a human observer. This study aims to develop an enhanced rapid and blind method for producing a watermarked 3D object using QR code images with high imperceptibility and transparency. The proposed method is based on the spatial domain, and it starts with converting the 3D object triangles from the three-dimensional Cartesian coordinate system to the two-dimensional coordinates domain using the corresponding transformation matrix. Then, it applies a direct modification on the third vertex point of each triangle. Each triangle’s coordinates in the 3D object can be used to embed one pixel from the QR code image. In the extraction process, the QR code pixels can be successfully extracted without the need for the original image. The imperceptibly and the transparency performances of the proposed watermarking algorithm were evaluated using Euclidean distance, Manhattan distance, cosine distance, and the correlation distance values. The proposed method was tested under various filtering attacks, such as rotation, scaling, and translation. The proposed watermarking method improved the robustness and visibility of extracting the QR code image. The results reveal that the proposed watermarking method yields watermarked 3D objects with excellent execution time, imperceptibility, and robustness to common filtering attacks.

Analysis of Postmortem Intestinal Microbiota Successional Patterns with Application in Postmortem Interval Estimation

Article

Full-text available

Nov 2021
MICROB ECOL

Microorganisms play a vital role in the decomposition of vertebrate remains in natural nutrient cycling, and the postmortem microbial succession patterns during decomposition remain unclear. The present study used hierarchical clustering based on Manhattan distances to analyze the similarities and differences among postmortem intestinal microbial succession patterns based on microbial 16S rDNA sequences in a mouse decomposition model. Based on the similarity, seven different classes of succession patterns were obtained. Generally, the normal intestinal flora in the cecum was gradually decreased with changes in the living conditions after death, while some facultative anaerobes and obligate anaerobes grew and multiplied upon oxygen consumption. Furthermore, a random forest regression model was developed to predict the postmortem interval based on the microbial succession trend dataset. The model demonstrated a mean absolute error of 20.01 h and a squared correlation coefficient of 0.95 during 15-day decomposition. Lactobacillus, Dubosiella, Enterococcus, and the Lachnospiraceae NK4A136 group were considered significant biomarkers for this model according to the ranked list. The present study explored microbial succession patterns in terms of relative abundances and variety, aiding in the prediction of postmortem intervals and offering some information on microbial behaviors in decomposition ecology.

Évaluation de systèmes d'aide à la conduite. Génération automatique de vérité terrain augmentée à partir d’un capteur haute résolution et d’une cartographie sémantique et 3D ; Evaluation de fonctions de perception tierces

Thesis

Jun 2021

Rémi Defraiteur

Le véhicule autonome représente l'un des défis technologiques actuels majeurs dans le secteur automobile. Les véhicules actuels se complexifient et intègrent de nouveaux systèmes reposants sur des fonctionnalités clés telles que la perception. Permettant au véhicule d'appréhender l'environnement dans lequel il évolue, elle est exploitée sous différents aspects pour garantir une mobilité plus sûre. Étant donné le rôle essentiel de la perception dans le bon comportement d'un véhicule autonome, il est nécessaire de s'assurer que les solutions de perception utilisées soient suffisamment performantes pour garantir une circulation sécurisée. L’évaluation de telles solutions de perception reste cependant une tâche complexe et peu explorée. L’un des points critiques est la difficulté de produire et de disposer de données de référence suffisantes pour mener des évaluations pertinentes. L'objectif de cette thèse est de mettre au point un nouvel outil de validation permettant d'évaluer les performances et niveaux d'erreurs de différentes solutions de perception, tout en utilisant le minimum de traitements manuels. Via cet outil, il sera alors possible de mettre en concurrence différentes solutions en se basant sur des critères communs. La mise au point de cet outil se décompose selon deux parties principales qui sont : la génération automatisée de données de référence et la méthode d'évaluation des solutions de perceptions testées.

Comparison between Calculation Methods for Semantic Text Similarity based on Siamese Networks

Conference Paper

Full-text available

Jul 2021

Quantum algorithms for group convolution, cross-correlation, and equivariant transformations

Preprint

Sep 2021

Group convolutions and cross-correlations, which are equivariant to the actions of group elements, are commonly used in mathematics to analyze or take advantage of symmetries inherent in a given problem setting. Here, we provide efficient quantum algorithms for performing linear group convolutions and cross-correlations on data stored as quantum states. Runtimes for our algorithms are logarithmic in the dimension of the group thus offering an exponential speedup compared to classical algorithms when input data is provided as a quantum state and linear operations are well conditioned. Motivated by the rich literature on quantum algorithms for solving algebraic problems, our theoretical framework opens a path for quantizing many algorithms in machine learning and numerical methods that employ group operations.

Identifying Aggregation Artery Architecture of constrained Origin-Destination flows using Manhattan L-function

Preprint

Full-text available

Aug 2021

The movement of humans and goods in cities can be represented by constrained flow, which is defined as the movement of objects between origin and destination in road networks. Flow aggregation, namely origins and destinations aggregated simultaneously, is one of the most common patterns, say the aggregated origin-to-destination flows between two transport hubs may indicate the great traffic demand between two sites. Developing a clustering method for constrained flows is crucial for determining urban flow aggregation. Among existing methods about identifying flow aggregation, L-function of flows is the major one. Nevertheless, this method depends on the aggregation scale, the key parameter detected by Euclidean L-function, it does not adapt to road network. The extracted aggregation may be overestimated and dispersed. Therefore, we propose a clustering method based on L-function of Manhattan space, which consists of three major steps. The first is to detect aggregation scales by Manhattan L-function. The second is to determine core flows possessing highest local L-function values at different scales. The final step is to take the intersection of core flows neighbourhoods, the extent of which depends on corresponding scale. By setting the number of core flows, we could concentrate the aggregation and thus highlight Aggregation Artery Architecture (AAA), which depicts road sections that contain the projection of key flow cluster on the road networks. Experiment using taxi flows showed that AAA could clarify resident movement type of identified aggregated flows. Our method also helps selecting locations for distribution sites, thereby supporting accurate analysis of urban interactions.

Multiple Taxicab Correspondence Analysis of a Survey Related to Health Services

Article

Jul 2021
J Data Sci

Exploring Options for Public Green Space Development: Research by Design and GIS-Based Scenario Modelling

Article

Full-text available

Jul 2021

Green spaces have a positive influence on human well-being. Therefore, an accurate evaluation of public green space provision is crucial for administrations to achieve decent urban environmental quality for all. Whereas inequalities in green space access have been studied in relation to income, the relation between neighbourhood affluence and remediation difficulty remains insufficiently investigated. A methodology is proposed for co-creating scenarios for green space development through green space proximity modelling. For Brussels, a detailed analysis of potential interventions allows for classification according to relative investment scales. This resulted in three scenarios of increasing ambition. Results of scenario modelling are combined with socio-economic data to analyse the relation between average income and green space proximity. The analysis confirms the generally accepted hypothesis that non-affluent neighbourhoods are on average underserved. The proposed scenarios reveal that the possibility of reaching a very high standard in green space proximity throughout the study area if authorities would be willing to allocate budgets for green space development that go beyond the regular construction costs of urban green spaces, and that the types of interventions require a higher financial investment per area of realised green space in non-affluent neighbourhoods.

Solving Scheduling Problems under Disruptions

Thesis

Full-text available

Jul 2021

Alexandre Lemos

Scheduling problems are common in many applications that range from factories and transports to universities. Most times, these problems are optimization problems for which we want to find the best way to manage scarce resources. Reality is dynamic, and thus unexpected disruptions can make the original solution invalid. There are many methods to deal with disruptions well described in the literature. These methods can be divided into two main approaches: (i) create robust solutions for the most common disruptions, and (ii) solve the problem again from scratch extended with new constraints. The goal of creating robust solutions is to ensure their validity even after the most common dis�ruptions occur. For this reason, it requires a detailed study of the most likely disruptive scenarios. The main disadvantage of creating a robust solution is a possible reduction in the overall quality (e.g., financial cost, customer satisfaction) to support the most likely disruptive scenarios that may never occur. Regardless of the robustness of the solution, we may need to solve the problem again. Most of the methods developed to recover solutions after disruptions occur consist of re-solving the problem from scratch with an additional cost function. This cost function ensures that the new solution is close to the original. In other words, the methods solve the Minimal Perturbation Problem (MPP). However, all these methods require more execution time than the original problem to find a new solution. This can be explained by the fact that we solve a different problem (with more optimization criteria). One can mitigate this problem by re-using the search. Moreover, they use generic cost functions (e.g., Hamming distance) that may have little significance in practice. In this work, we propose novel algorithms to solve the MPP applied to two domains: university course timetabling and train scheduling. We tested our algorithms to solve university timetabling prob�lems with data sets obtained from Instituto Superior Técnico and the 2019 International Timetabling Competition. One of these algorithms was ranked in the top 5 of the competition. When considering the train scheduling case study, we tested our algorithms with data from the Swiss Federal Railways and from PESPLib. The evaluation shows that the new algorithms are more efficient than the ones described in the literature. Summing up, the proposed algorithms show a significant improvement on the state-of-the-art to re-solve scheduling problems under disruptions.

Image haziness contrast metric describing optical scattering depth

Conference Paper

May 2021

Zero-Correlation-Zone Sonar Sequences

Conference Paper

Jun 2023

An Arithmetic Triangle Arising From a One-Way Street Grid

Article

Mar 2023

Pascal’s triangle arises by counting the number of shortest paths from (0, 0) to (n, k) on a square street grid. The length of the shortest path is the Manhattan distance from (0, 0) to (n, k). We consider the case of a square street grid of one-way streets, with successive parallel streets being oppositely directed. We investigate the associated distance function q (which is only a quasi-metric, since q(A, B) may differ from q(B, A)) and the arithmetic triangle obtained by counting shortest routes on the one-way grid from (0, 0) to (n, k).

Personalized Symptom Clusters that Predict Depression Treatment Outcomes: A Replication of Machine Learning Methods

Article

Jan 2023

Objectives : The purpose of this study is to use independent datasets to externally validate the three symptom clusters of unipolar depression identified by Chekroud, to evaluate personalized treatment trajectories and outcomes based on these symptom clusters, and to verify predictors. Methods : The Quick Inventory of Depressive Symptomatology-Self Report (QIDS-SR16)¹ and Hamilton Rating Scale for Depression (Ham-D)² data from two placebo controlled, double-blind clinical trials (Dual Therapy and Duloxetine) were used for external validation. Machine learning methods were applied to replicate the three symptom clusters and to produce treatment trajectories. Penalized logistic regressions were conducted to identify top baseline variables that best predicted treatment outcomes. Results : The variables Chekroud identified as comprising sleep, atypical and core emotional clusters are replicated. Treatment trajectories demonstrate that dual treatment (escitalopram and bupropion) performed best across all symptom clusters but did not outperform escitalopram monotherapy over time. For each symptom cluster, there were differences in treatment efficacy among antidepressants. Conclusion : By using different treatment trajectories based on a patient's symptom cluster profile, clinicians could potentially select best fit antidepressants to achieve the biggest benefit. Our results showed that total baseline QIDS, Ham-D score, anxiety disorder diagnosis and course of depressive illness were the best baseline predictors. Results could enhance personalized depression treatment plans and help to improve outcomes. Clinical Trials Registration : NCT00519428, NCT00360724.

A reproducible experimental survey on biomedical sentence similarity: A string-based method sets the state of the art

Article

Full-text available

Nov 2022
PLOS ONE

This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate for the first time an unexplored benchmark, called Corpus-Transcriptional-Regulation (CTR); (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of software and data reproducibility resources for methods and experiments in this line of research. Our reproducible experimental survey is based on a single software platform, which is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods, and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure establishes the new state of the art in sentence similarity analysis in the biomedical domain and significantly outperforms all the methods evaluated herein, with the only exception of one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool for ontology-based methods, have a very significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and highlight the need to refine the current benchmarks. Finally, a notable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning (ML) models evaluated herein.

Graph-Based Molecular Pareto Optimisation

Article

Full-text available

Jun 2022

Jonas Verhellen

Computer-assisted design of small molecules has experienced a resurgence in academic and industrial interest due to the widespread use of data-driven techniques such as deep generative models. While the ability...

Graph-Based Molecular Pareto Optimisation

Preprint

Full-text available

Feb 2022

Jonas Verhellen

Computer-assisted design of small molecules has experienced a resurgence in academic and indus- trial interest due to the widespread use of data-driven techniques such as deep generative models. While the ability to generate molecules that fulfill required chemical properties is encouraging, the use of deep learning models requires significant, if not prohibitive, amounts of data and computa- tional power. At the same time, open-sourcing of more traditional techniques such as graph-based genetic algorithms for molecular optimisation [Jensen, Chem. Sci., 2019, 12, 3567-3572] has shown that simple and training-free algorithms can be efficient and robust alternatives. Further research alleviated the common genetic algorithm issue of evolutionary stagnation by enforcing molecular diversity during optimisation [Van den Abeele, Chem. Sci., 2020, 42, 11485-11491]. The crucial lesson distilled from the simultaneous development of deep generative models and advanced genetic algorithms has been the importance of chemical space exploration [Aspuru-Guzik, Chem. Sci., 2021, 12, 7079-7090]. For single-objective optimisation problems, chemical space exploration had to be discovered as a usable resource but in multi-objective optimisation problems, an exploration of trade- offs between conflicting objectives is inherently present. In this paper we provide state-of-the-art and open-source implementations of two generations of graph-based non-dominated sorting genetic algorithms (NSGA-II, NSGA-III) for molecular multi-objective optimisation. In addition, we provide the results of a series of benchmarks for the inverse design of small molecule drugs for both the NSGA-II and NSGA-III algorithms.

Computational Intelligence in Data Science 4th IFIP TC 12 International Conference, ICCIDS 2021, Chennai, India, March 18–20, 2021, Revised Selected Papers: 4th IFIP TC 12 International Conference, ICCIDS 2021, Chennai, India, March 18–20, 2021, Revised Selected Papers

Book

Jan 2021

This book constitutes the refereed post-conference proceedings of the Fourth IFIP TC 12 International Conference on Computational Intelligence in Data Science, ICCIDS 2021, held in Chennai, India, in March 2021. The 20 revised full papers presented were carefully reviewed and selected from 75 submissions. The papers cover topics such as computational intelligence for text analysis; computational intelligence for image and video analysis; blockchain and data science.

Crop Recommendation by Analysing the Soil Nutrients Using Machine Learning Techniques: A Study

Chapter

Full-text available

Dec 2021

According to India Brand Equity Foundation (IBEF), 32% of the global food market is dependent on Indian agricultural sector. Due to urbanisation, the fertile land have been utilised for non-agricultural purposes. The loss of agricultural lands impacts the productivity and results with diminishing yield. Soil is the most important factor for the thriving agriculture, since it contains the essential nutrients. The food production could be improved through the viable usage of soil nutrients. To identify the soil nutrients, the physical, chemical and biological parameters were examined using many machine learning algorithms. However, the environmental factors such as sunlight, temperature, humidity, and rainfall plays a major role in improving the soil nutrients since it is responsible for the process of photosynthesis, germination, and saturation. The objective is to determine the soil nutrient level by accessing the associative properties including the environmental variables. The proposed system termed as Agrarian application which recommends crops for the particular land using classification algorithms and predicts the yield rate by employing regression techniques. The application will help the farmers in selecting the crops based on the soil nutrient content, environmental factors and predicts the yield rate for the same.

Evaluating Candidate Answers Based on Derivative Lexical Similarity and Space Padding for the Arabic Language

Chapter

Dec 2021

Character difference represents one of the most common problems that can be occurred when students try to answer questions of fill in the gaps or one-word answer that is needed mostly to one word as the answer. To improve the evolution of the student answer using Hamming distance, we proposed Hamming model tried to solve the drawbacks of the standard Hamming model by applying the stemming approach to achieve derivative lexical similarity and applying the space padding to deal with unequal lengths of the texts.

Color Similarity

Chapter

Oct 2021

Investigation 22. Supporting Group Cognition with a Cognitive Tool

Chapter

Apr 2021

Stahl Gerry

The Virtual Math Teams project is exploring how to create, structure, support, and assess an online chat-based collaborative community devoted to mathematics discourse. It is analyzing the forms of group cognition that emerge from the use of shared cognitive tools with specific functionalities. Centered on a case study of a synchronous online interchange, this Investigation discusses the use of a graphical referencing tool in coordination with text chat to achieve a group orientation to a particular mathematical object in a shared whiteboard. Deictic referencing is seen to be a critical foundation of intersubjective cognitive processes that index objects of shared attention. The case study suggests that cognitive tools to support group referencing can be important in supporting group alignment, intentionality, and cognition in online communities such as this one for collaborative mathematics.

Voting on urban land development

Article

Full-text available

Aug 2021

Matthias Wrede

In this paper, we analyze several local referendums on land development and land-use regulation in the City of Erlangen (Germany) between 2011 and 2018. To identify the positive influence of the travel distance on approval for land development, we control for distance to the city center and density, employ a two-way fixed-effects model, and use spatial instruments. We also analyze the heterogeneity of city dwellers’ preferences for the development of residential and commercial areas. In particular, we examine the differences between homeowners and tenants in this regard. This article is protected by copyright. All rights reserved.

On Integral Metrics and Trajectory Classification

Article

Nov 2020

Lilly Vernor

In this project, we explore distance in the context of metrics. More specifically, we take a look at an integral metric that is used to determine the distance between sets. The motivation behind this project is to determine that the integral metric we use is meaningful in the context of our data set. The data set we useconsists of trajectories of cars along a portion of the I5 highway. Through training, testing, and evaluating this model on the data set, we can reach conclusions on the structure of the data and the success of this integral metric in terms of a classifier. In the end, we can both determine whether or not this integral metric fits with the data set chosen and explore other areas where the metric could possibly succeed or where it might fail.

Taxicab Geometry: An Adventure in Non-Euclidean Geometry

No full-text available

Recommended publications

Eine Klassifikation der M�bius-Ebenen

Non-Euclidean Geometries

Zur Parallelenfrage

An introduction to non-Euclidean geometry / David Gans