Article

Taxicab Geometry: An Adventure in Non-Euclidean Geometry

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The two main research problems tackled in this work are the design and implementation of a large and reproducible experimental survey on sentence similarity measures for the biomedical domain, and the evaluation of a set of unexplored methods based on adaptations from previous methods used in the general language domain. Our main contributions are as follows: (1) the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity; (2) the first collection of self-contained and reproducible benchmarks on biomedical sentence similarity; (3) the evaluation of a set of previously unexplored methods, such as a new string-based sentence similarity method, based on Li et al. [54] and Block distance [55], eight variants of the current ontology-based methods from the literature based on the work of Sogancioglu et al. [30], and a new pre-trained Word Embedding (WE) model based on FastText [56] and trained on the full-text of articles in the PMC-BioC corpus [19]; (4) the evaluation for the first time of an unexplored benchmark, called CTR [51]; (5) the study on the impact of the pre-processing stage and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; (6) the integration for the first time of most sentence similarity methods for the biomedical domain into the same software library, called HESML-STS, which is available both in Github 2 and in a reproducible dataset [42]; (7) a detailed reproducibility protocol together with a collection of software tools and datasets provided as supplementary material to allow the exact replication of all our experiments and results; and finally, (8) an analysis of the drawbacks and limitations of the current state-of-the-art methods. ...
... This section introduces a new string-based sentence similarity method based on the aggregation of the Li et al. [54] similarity and Block distance [55] measures, called LiBlock, as well as eight new variants of the ontology-based methods proposed by Sogancioglu et al. [30], and a new pre-trained word embedding model based on FastText [56] and trained on the full-text of the articles in the PMC-BioC corpus [19]. ...
... To overcome the drawbacks and limitations of the string-based and ontology-based methods detailed above, we propose here a new aggregated string-based measure called LiBlock and denoted by sim LiBk henceforth, which is based on the combination of a similarity measure derived from the Block Distance [55] and an adaptation from the ontology-based similarity measure introduced by Li et al. [54] that removes the use of ontologies, such as WordNet [58] or Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) [59]. The LiBlock similarity measure obtains the best results in combination with the cTAKES NER tool [60], which allows the detection of synonyms of CUI concepts. ...
Preprint
Full-text available
This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most of current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate an unexplored benchmark, called Corpus-Transcriptional-Regulation; (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of reproducibility resources for methods and experiments in this line of research. Our experimental survey is based on a single software platform that is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure sets the new state of the art on the sentence similarity task in the biomedical domain and significantly outperforms all the methods evaluated herein, except one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool, have a significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and warn on the need of refining the current benchmarks. Finally, a noticeable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning models evaluated herein.
... In particular, d ∞ is also known as Chebyshev distance [2], d 1 is known as the taxicab metric [18], and d 2 is known as the Euclidean distance. ...
... In particular, if U (α,β) := T d (f α ) β −1 (0), then (21) defines a bi-filtration of sets: Figure 4. A toy example of the distance transform bi-filtration (18) of images of the size 800 × 600 pixels. In this bi-filtration, we consider the sets of black pixels. ...
... Ultimately, the firn layers reach the density of glacial ice at the bottom of the firn column, and the interconnected pore space is transformed into individual closed-off bubbles. These bubbles trap a direct sample of atmospheric air, and make up an important paleoclimate record of past Figure 5. Illustration of the distance transform bi-filtration (18). In this bi-filtration, we consider the sets of black pixels. ...
Conference Paper
Full-text available
The distance transform of a binary image is a classic tool in computer vision and it has been widely used in the field of Topological Data Analysis (TDA) to study porous media. A common practice is to convert grayscale images to binary ones to apply the distance transform. In this work, by considering the threshold decomposition of a grayscale image, we prove that threshold decomposition and distance transform together to formulate a two-parameter filtration. This would offer the TDA community a concrete example to apply multi-parameter persistence on digital image analysis. We demonstrate our method on the firn dataset.
... The × pixels image blocks have area-normalized histograms ����⃗ and ���⃗ , respectively, where the vectors � �⃗ represent the values of each image block. By area-normalized we mean that the sum of their entries is 1, that is, ‖ ����⃗ ‖ 1 = 1, where ‖ ⋅ ‖ 1 is the taxicab 1 -norm [19]. Each histogram vector, ����⃗ and ���⃗ , has 2 entries, where is the bit-depth of the image (e.g., 8-bit). ...
... where · 1 is the taxicab l 1 -norm [19]. Each histogram vector, → H i and → H j , has 2 b entries, where b is the bit-depth of the image (e.g., 8-bit). ...
Article
Full-text available
Contrast is not uniquely defined in the literature. There is a need for a contrast measure that scales linearly and monotonically with the optical scattering depth of a translucent scattering layer that covers an object. Here, we address this issue by proposing an image contrast metric, which we call the Haziness contrast metric. In its essence, the Haziness contrast compares normalized histograms of multiple blocks of the image, a pair at a time. Subsequently, we test several prominent contrast metrics in the literature, as well as the new one, by using milk as a scattering medium in front of an object to simulate a decline in image contrast. Compared to other contrast metrics, the Haziness contrast metric is monotonic and close to linear for increasing density of the scattering material, compared with other metrics in the literature. The Haziness contrast has a wider dynamic range, and it correctly predicts the order of scattering depth for all the channels in the RGB image. Utilization of the metric to evaluate the performance assessment of dehazing algorithms is also suggested.
... More recently Smarandache Geometry appeared, "A Smarandache Geometry is a geometry which has at least one Smarandachely denied axiom, i.e., an axiom behaves in at least two different ways within the same space, i.e., validated and invalidated, or only invalidated but in multiple distinct ways and a Smarandache n-manifold is an n-manifold that support a Smarandache Geometry" [12,17,19]. ...
... Thus, Taxicab Geometry can be a solution to this problem [19]. This is a Euclidean geometry where the equation of distance between two points is changed to model urban zones. ...
Article
NeutroGeometries are those geometric structures where at least one definition, axiom, property, theorem, among others, is only partially satisfied. In AntiGeometries at least one of these concepts is never satisfied. Smarandache Geometry is a geometric structure where at least one axiom or theorem behaves differently in the same space, either partially true and partially false, or totally false but its negation done in many ways. This paper offers examples in images of nature, everyday objects, and celestial bodies where the existence of Smarandechean or NeutroGeometric structures in our universe is revealed. On the other hand, a practical study of surfaces with characteristics of NeutroGeometry is carried out, based on the properties or more specifically NeutroProperties of the famous quadrilaterals of Saccheri and Lambert on these surfaces. The article contributes to demonstrating the importance of building a theory such as NeutroGeometries or Smarandache Geometries because it would allow us to study geometric structures where the well-known Euclidean, Hyperbolic or Elliptic geometries are not enough to capture properties of elements that are part of the universe, but they have sense only within a NeutroGeometric framework. It also offers an axiomatic option to the Riemannian idea of Two-Dimensional Manifolds. In turn, we prove some properties of the NeutroGeometries and the materialization of the symmetric triad , , and .
... In Fig 1, the comparison of the Manhattan and Euclidean distances has been indicated, where the green line being the Euclidean distance, and others are Manhattan distance (MD) [4]. The equation to compute Manhattan distance is as stated in (2). ...
... A significant variation of linear regression is a logistic regression where the output (q) has assigned a binary outcome instead of a set of continuous numerals. The illustrative logistic regression equation as indicated in (4). ...
Article
Full-text available
An essential type of TS analysis is classification, which can, for instance, advance energy load forecasting in smart grids by discovering the varieties of electronic gadgets based totally on their strength expenditure profiles recorded by way of computerized sensors. Such applications are very often characterised by using (a) very lengthy TS and (b) extensive TS datasets needing classification. but, current techniques to time series classification (TSC) cannot deal with such facts volumes at desirable accuracy. WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is each rapid and unique. Like different today's TSC techniques, WEASEL modifies time collection into characteristic vectors, the use of a sliding-window approach, which is then surpassed via a device getting to know classifier. Our approach here is the amalgamation of Distance-specific approaches such as DTW alongwith feature-specific approaches namely SAX and WEASEL and hence, this method may be effortlessly prolonged to be used in aggregate with different strategies. specially, we show that once blended with the space measures which include Minkowski distance measures, DTW, SAX and PAA, it outperforms the previously known methods.
... The theory presented so far is a generalization of already known geometries. For example, Minkowski's Taxicab geometry and the distance function that bears this name [28,29]: ...
Article
Full-text available
NeutroGeometry is one of the most recent approaches to geometry. In NeutroGeometry models, the main condition is to satisfy an axiom, definition, property, operator and so on, that is neither entirely true nor entirely false. When one of these concepts is not satisfied at all it is called AntiGeometry. One of the problems that this new theory has had is the scarcity of models. Another open problem is the definition of angle and distance measurements within the framework of NeutroGeometry. This paper aims to introduce a general theory of distance measures in any NeutroGeometry. We also present an algorithm for distance measurement in real-life problems.
... In this paper, we will define ZCZ in the Manhattan metric. The Manhattan metric is also known as the taxicab metric [29]. In a 2-dimensional plain, the Manhattan distance D between two dots in positions a = (x 1 , y 1 ) and b = (x 2 , y 2 ) is defined by ...
Article
Full-text available
In this paper, we propose the zero-correlation-zone (ZCZ) of radius r on two-dimensional m×n sonar sequences and define the (m,n,r) ZCZ sonar sequences. We also define some new optimality of an (m,n,r) ZCZ sonar sequence which has the largest r for given m and n. Because of the ZCZ for perfect autocorrelation, we are able to relax the distinct difference property of the conventional sonar sequences, and hence, the autocorrelation of ZCZ sonar sequences outside ZCZ may not be upper bounded by 1. We may sometimes require such an ideal autocorrelation outside ZCZ, and we define ZCZ-DD sonar sequences, indicating that it has an additional distinct difference (DD) property. We first derive an upper bound on the ZCZ radius r in terms of m and n≥m. We next propose some constructions for (m,n,r) ZCZ sonar sequences, which leads to some very good constructive lower bound on r. Furthermore, this construction suggests that for m and r, the parameter n can be as large as possible indefinitely. We present some exhaustive search results on the existence of (m,n,r) ZCZ sonar sequences for some small values of r. For ZCZ-DD sonar sequences, we prove that some variations of Costas arrays construct some ZCZ-DD sonar sequences with ZCZ radius r=2. We also provide some exhaustive search results on the existence of (m,n,r) ZCZ-DD sonar sequences. Lots of open problems are listed at the end.
... Um detalhe importante é que existem duas maneiras principais para o cálculo da distância entre dois pontos utilizado no cálculo da dispersão apresentado em (2). A primeira forma é calcular ∥ − ∥ pela distância Euclidiana (Deza e Deza, 2009) ou, então, a segunda forma, é calcular pela distância de Manhattan (Krause, 1975). ...
... A distance matrix was created by standardizing the data using Gower and Manhattan similarity distances for the physical and mixed datasets, respectively 59,60 . The number of clusters was determined with the help of scree and silhouette plots. ...
Article
Full-text available
European rivers are disconnected by more than one million man-made barriers that physically limit aquatic species migration and contribute to modification of freshwater habitats. Here, a Conceptual Habitat Alteration Model for Ponding is developed to aid in evaluating the effects of impoundments on fish habitats. Fish communities present in rivers with low human impact and their broad environmental settings enable classification of European rivers into 15 macrohabitat types. These classifications, together with the estimated fish sensitivity to alteration of their habitat are used for assessing the impacts of six main barrier types (dams, weirs, sluices, culverts, fords, and ramps). Our results indicate that over 200,000 km or 10% of previously free-flowing river habitat has been altered due to impoundments. Although they appear less frequently, dams, weirs and sluices cause much more habitat alteration than the other types. Their impact is regionally diverse, which is a function of barrier height, type and density, as well as biogeographical location. This work allows us to foresee what potential environmental gain or loss can be expected with planned barrier management actions in rivers, and to prioritize management actions.
... Dessa forma, é mais útil definir a distância entre dois pontos como sendo a soma das diferenças horizontais e verticais entre os mesmos. Isso resulta na chamada Geometria do Taxi, a qual é descrita por, entre outros, KRAUSE (1982), KRAUSE (1986) ...
Book
Full-text available
Black Hair, Blue Eyes and Other Features of Pure and Applied Mathematics (in Portuguese). 2nd edition, revised and amplified. Nine articles dealing, implicitly or explicitly, with mathematical thinking. Topics: 1. conceptualization of pure and applied mathematics; 2. Euclid’s demonstration of the Pythagorean Theorem; 3. the golden ratio; 4. negative numbers; 5. a trio of algebraists; 6. an isoperimetric taxicab metric; 7. exceptions and rule proving; 8. mathematical induction; 9. on note-taking.
... Cluster analysis groups a set of acquirers in such a way that acquirers in the same cluster are more similar to each other than to those in other clusters (Jain and Dubes 1988). The kth median cluster algorithm identifies k centers such that the clusters formed by them are the most compact (Krause 1986). The median for each attribute is computed in each single dimension in the rectilineardistance formulation of the kth median problem, so the individual attributes are determined from the data set. ...
Article
Full-text available
Using a novel typology of serial acquirers, we examine several puzzles documented in the prior literature. We show that acquisitions by different types of acquirers are driven by different factors, they acquire different sizes of targets, and subsequent acquisitions by acquirers are predictable ex ante. Controlling for market anticipation, the most frequent serial acquirers do not earn declining returns as they continue acquiring, while less frequent acquirers do. Our methodology enhances our understanding of serial acquisition dynamics, anticipation, and economic value adjustments. The methodology is likely to be relevant to topics related to event anticipation beyond those covered in this study. (JEL G14, G34, G35) Received April 18, 2023; editorial decision June 14, 2023 by Editor Isil Erel
... Taxicab geometry, on the other hand, is one of the geometries that can only be explained by the coordinate system on the plane and is recommended as an engaging study for secondary and high school students. This geometry is discovered by [12] . The core notions of this geometry will be demonstrated. ...
Article
Full-text available
Global trends in mathematics education show that modern teaching methods are rapidly evolving at the national, regional, and global levels. In this regard, teaching non-Euclidean geometry concepts in schools can be essential in developing students' spatial imagination and enhancing scientific inquiry competencies. This paper aims to engage and increase students' interest in geometry science by introducing the fundamental concepts of several non-Euclidean geometries. With this aim, we will first give you a modern definition of geometry and move towards the exciting and fun world of non-Euclidean geometry. Of course, remember that the target audience we will talk about these issues is talented students in secondary and high schools.
... The first group was a kind of control group of approaches (the baseline): Term Frequency-Inverse Document Frequency (Tf-Idf) (Salton & Buckley, 1988), Latent Semantic Indexing or Latent Semantic Analysis (LSI) (Deerwester et al., 1990), Latent Dirichlet Allocation (LDA) (Blei et al., 2003), Hierarchical Dirichlet Process (HDP) (Teh et al., 2006), Random Projections or Random Indexing (RP) (Sahlgren, 2005), LogEntropy (LE) (Lee et al., 2005), Jaccard (Jaccard, 1912), Levenshtein (Levenshtein, 1966) and Soft Cosine as similarity measures were used, and Manhattan (Krause, 1986), Euclidean (Huang, 2008), and Word Mover's Distance (WMD) (Kusner et al., 2015) as distance measures. ...
Article
Full-text available
Automatic detection of concealed plagiarism in the form of paraphrases is a difficult task, and finding a successful unsupervised approach for paraphrase detection is necessary as a precondition to change that. This comparative study identified the most efficient methods for unsupervised paraphrased document detection using similarity measures alone or combined with Deep Learning (DL) models. It proved the hypothesis that some DL models are more successful than the best statistically‐based methods in that task. Many experiments were carried out, and their results were compared. The text similarities between documents are obtained from 60 different methods using five paraphrase corpora, including the new one made by authors, as an important original contribution. Some DL models achieved significantly better results than those obtained by the best statistical methods, especially pre‐trained transformer‐based language models with average values of Accuracy and F1 of 85.8% and 88.3%, respectively, with top values of 99.9% and 98.4% for Accuracy and F1 on some corpora. These results are even better than those of supervised and combined approaches. Therefore, here presented results prove that detecting concealed plagiarism becomes an attainable goal. This study highlighted those language models with the best overall results for paraphrase detection as best suited for further research. The study also discussed the choice of similarity/distance measure paired with embeddings produced by DL models and some advantages of using cosine similarity as the fastest measure. For 60 different methods, complexity has been defined in O notation. Times needed for their implementation have also been presented. The article's results and conclusions are a firm base for future semantic similarity, paraphrasing, and plagiarism detection studies, clearly marking state‐of‐the‐art tools and methods.
... Ключовим елементом запропонованого методу є функція міжрядкової відстані, яка напряму впливає на якість злиття векторних представлень з різних джерел. Популярними функціями міжрядкової відстані є відстань Левенштейна [12], подібність Жаккара [13], Мангеттенська відстань[14], відстань Хемінга[15] та коефіцієнт Дайса[16]. Кількісні показники впливу вибору функції міжрядкової відстані у запропонованому методі на результат класифікації можна побачити у таблиці 2. ...
Article
У даній статті представлено метод злиття багатомодальних векторних представлень слів у малоресурсному середовищі. Цей метод, на відміну від інших методів злиття векторних представлень слів, враховує обмеження малоресурсного середовища і дозволяє поєднувати вектори слів з різних джерел, таких як документи та словники. Метод покладається на обчислення міжрядкової відстані замість побудови повних синтаксичних і морфологічних моделей, що часто неможливо в малоресурсних мовах. Його можна використовувати на проміжних етапах побудови систем обробки природної мови та машинного навчання при вирішенні практичних завдань, таких як машинний переклад чи класифікація документів. Крім того, проведено аналіз різних методів злиття багатомодальних векторних представлень слів у малоресурсному середовищі. У статті описуються переваги, недоліки та обмеження кожного підходу, враховуючи завдання побудови уніфікованого векторного представлення тексту в поєднанні з даними з додаткових джерел. У дослідженні прикладом завдання у малоресурсному середовищі була обрана класифікація петицій до Київської міської ради, написаних українською мовою. Велика кількість функцій обчислення міжрядкової відстані ускладнює їх вибір при вирішенні практичних задач. Ми пропонуємо набір рекомендацій у контексті малоресурсних середовищ, а також методологію вибору найкращого для вирішення поставлених завдань. Проаналізовані функції обчислення міжрядкової відстані включають відстань Левенштейна, подібність Жаккара, Мангеттенську відстань, відстань Хеммінга та коефіцієнт Дайса. Наші результати демонструють, що метод на основі відстані Левенштейна збільшує якість класифікації документів сильніше, ніж альтернативи. Ці висновки мають практичне значення для різних сфер, включаючи обробку природної мови, аналіз текстів та пошук інформації.
... SMILES is represented by a string, many string-based distance algorithms are adaptable to SMILES. Some of these have were introduced in [11], including an edit distance [12], normalized longest common subsequence (LCS) [13], combination of LCS models [13], SMILES representation-based string kernel [14], SMILES fingerprint [15] with city block distance [16] or Tanimoto coefficient [4], LINGOsim [17], LINGO-based TF [18], TF-IDF [19], and combination of SIMCOMP with TF-IDF or LINGOsim [11]. ...
Article
Full-text available
A method for developing new drugs is the ligand-based approach, which requires intermolecular similarity computation. The simplified molecular input line entry system (SMILES) is primarily used to represent the molecular structure in one dimension. It is a representation of molecular structure; the properties can be completely different even if only one character is changed. Applying the conventional edit distance method makes it difficult to obtain optimal results, because the insertion, deletion, and substitution of molecules are considered the same in calculating the distance. This study proposes a novel edit distance using an optimal weight set for three operations. To determine the optimal weight set, we present a genetic algorithm with suitable hyperparameters. To emphasize the impact of the proposed genetic algorithm, we compare it with the exhaustive search algorithm. The experiments performed with four well-known datasets showed that the weighted edit distance optimized with the genetic algorithm resulted in an average performance improvement in approximately 20%.
... where and denote the sequences under comparison and = 16 corresponds to the total number of words of size = 2. Other metrics have also been used for this purpose, such as the Pearson correlation distance [129], the DSSIM [130], the Manhattan distance [131], or the approximated information distance [39]. From this characterization, the succession of nucleotides along a sequence follows a zero-order Markov chain, i.e., the probability of finding a given nucleotide does not depend on its neighbor composition. ...
Article
Full-text available
Simple Summary In a broad sense, genomic signature refers to characteristics associated to DNA sequences. Many studies analyze genotype–phenotype patterns in a group of genes, thus targeting genomic signatures associated to a given disease or identifying a gene expression profile. However, some studies in comparative genomics and evolutionary biology refer to genomic signature as the statistical properties of DNA sequences, such as the distribution of k-words. In these fields of study, genomic signatures are species-specific and can be informative about phylogenetic relationships. In this review, we identify the main genomic signatures in a large collection of articles by performing a bibliometric analysis and then rename each signature according to its conceptual meaning. Among the different signatures, we use the term organismal signature to denote the DNA patterns able to infer evolutionary relationships and go on to review its formulation and applications in the second part of the article. Abstract Organisms are unique physical entities in which information is stored and continuously processed. The digital nature of DNA sequences enables the construction of a dynamic information reservoir. However, the distinction between the hardware and software components in the information flow is crucial to identify the mechanisms generating specific genomic signatures. In this work, we perform a bibliometric analysis to identify the different purposes of looking for particular patterns in DNA sequences associated with a given phenotype. This study has enabled us to make a conceptual breakdown of the genomic signature and differentiate the leading applications. On the one hand, it refers to gene expression profiling associated with a biological function, which may be shared across taxa. This signature is the focus of study in precision medicine. On the other hand, it also refers to characteristic patterns in species-specific DNA sequences. This interpretation plays a key role in comparative genomics, identifying evolutionary relationships. Looking at the relevant studies in our bibliographic database, we highlight the main factors causing heterogeneities in genome composition and how they can be quantified. All these findings lead us to reformulate some questions relevant to evolutionary biology.
... Entre os sistemas possíveis de serem levados para a escola, cabe lembrar uma geometria criada com fins didáticos pelo professor americano Eugene Krause e designada em inglês por Taxicab Geometry, segundo Krause (1975). Em língua portuguesa, esse sistema geométrico é designado por "geometria do motorista do táxi", "geometria pombalina" como em Jorge et al. (1999), "geometria do taxista" como em Bigode (2002), ou ainda, como denominamos, "geometria do táxi" (GT). ...
Book
Full-text available
O que é apresentado nesta obra é resultado do trabalho docente incessante e de alta qualidade durante a pandemia do novo coronavírus (Covid-19). Os autores que escreveram suas pesquisas nesta obra científica representam a força e a competência do profissionalismo docente brasileiro, promovendo ações remotas durante a pandemia com tecnologias móveis com materiais adaptados e lúdicos para atrair a atenção de milhares de estudantes que tiveram que estudar em suas casas, muito deles com extrema dificuldade de conexão de internet, mas sempre com o amparo e a dedicação dos professores e de suas famílias. O livro é composto por quatro capítulos, a saber: No primeiro capítulo, Geometria na Pandemia, os autores André Luiz Souza Silva, José Carlos Gonçalves Gaspar e Vilmar Gomes da Fonseca apresentam uma proposta didática inovadora para o ensino de simetria axial, com base em uma sequência didática de quatro tarefas exploratórias com recurso de dobraduras, recortes e uso de tecnologias digitais. Os autores apresentam uma descrição comentada da proposta didática e seus objetivos, e finalizam propondo reflexões sobre possibilidades de criação de ambientes de ensino promotores de aprendizagens significativas. No segundo capítulo, intitulado Práticas Didáticas Interativas e Avaliativas no Ensino Remoto, o autor, a partir do Ensino Remoto Emergencial (ERE) que surgiu no ano de 2020 como alternativa para dar continuidade às atividades escolares e, simultaneamente, preservar a saúde de estudantes e docentes, durante a pandemia causada pelo vírus Sars-CoV-2, discute que inicialmente muitos docentes se posicionaram contrários ao ERE porque entendiam que não seria possível fazer ensino remoto de qualidade na educação básica. No entanto, com o passar dos dias e o estudo e a dedicação de milhares de docentes em todo o mundo, foram criados caminhos alvissareiros para a educação frente aos desafios que se apresentaram. Neste texto são apresentadas algumas práticas didáticas desenvolvidas e utilizadas por docentes do Colégio de Aplicação da UFRJ, de forma que o potencial dos recursos digitais fosse demonstrado por meio das avaliações positivas de estudantes, responsáveis, licenciandos e docentes de diferentes disciplinas. No terceiro capítulo, cujo tema é Conceitos elementares de Geometrias não euclidianas na escola básica: por quê? Para quê?, a autora espera responder a algumas questões que acredita serem pertinentes a esse tempo da pandemia provocada pelo novo coronavírus, bem como para o futuro, com os ensinos híbrido e remoto. Apresenta um conjunto de questões relacionadas às geometrias não euclidianas (GNE) que podem levar o(a) leitor(a) a pensar não ter ligação com a pandemia. Para responder ao “por quê?” do título, a autora apresenta como entende a criação de novos conhecimentos científicos e as lógicas envolvidas com as ações relacionadas ao pensamento científico que os embasam, à Educação e à Matemática. O texto do quarto capítulo, Problematização dos números reais com recursos da Geometria Dinâmica: Descobrindo lacunas na reta numérica, introduz um conjunto de atividades educacionais pensadas para a problematização e a aprendizagem dos números reais com o uso de recursos tecnológicos escolhidos de maneira a propiciar uma abordagem interativa e dinâmica do tema. São três construções eletrônicas, chamadas de applets, que favorecem a exploração de importantes orientações didáticas.
... Haussdorf distance, Huttenlocher et al., 1993). At the moment, AptaMat algorithm implements a metric based on the Manhattan distance, which was chosen for its simplicity, as it is expressed as the sum of the absolute differences between the coordinates of the compared points (Krause, 1988). However, other distances can be easily implemented. ...
Article
Full-text available
Motivation: Comparing single-stranded nucleic acids (ssNAs) secondary structures is fundamental when investigating their function and evolution and predicting the effect of mutations on their structures. Many comparison metrics exist, although they are either too elaborate or not sensitive enough to distinguish close ssNAs structures. Results: In this context, we developed AptaMat, a simple and sensitive algorithm for ssNAs secondary structures comparison based on matrices representing the ssNAs secondary structures and a metric built upon the Manhattan distance in the plane. We applied AptaMat to several examples and compared the results to those obtained by the most frequently used metrics, namely the Hamming distance and the RNAdistance, and by a recently developed image-based approach. We showed that AptaMat is able to discriminate between similar sequences, outperforming all the other here considered metrics. In addition, we showed that AptaMat was able to correctly classify 14 RFAM families within a clustering procedure. Supplementary information: Supplementary data are available at Bioinformatics online.
... There are different ways to compute the distance between two points. In our article, we work with the simple Euclidian distance in a plane as well as with the grid-based distance, which is more suitable for urban environment (see also [38]). These distances are an approximation of the distance in the real world. ...
Article
Full-text available
In this article, we aim to develop the theoretical background for the possible application of Economic-Geographical metrics in the field of population protection. We deal with various options for analyzing the availability of “safety” for citizens using studied metrics. Among others, we apply well-known metrics such as the Gini coefficient, Hoover index and even establish their generalizations. We develop a theoretical background and evaluate our findings on generated and actual data. We find that the metrics used can have an opposite interpretation depending on the scenario we are considering. We also discover that some scenarios demand a modification to the usual metric. We conclude that Economic-Geographical metrics give valuable tools to address specific security challenges. Metric’s generalizations could serve as a potent tool for other authors working in the field of population protection. Nevertheless, we must keep in mind that metrics also have drawbacks.
... Token-based similarity is used for distinguishing the term reorganization by breaking the string into substrings. The examples of Token-based approach are Jaccard similarity [27], Dice's coefficient [16], Cosine similarity [8], Manhattan distance [34], and Euclidean distance [21] etc. This paper is organized into six sections; Section 1 introduces the proposed approach. ...
Article
Full-text available
Spelling errors are fundamental errors in text writing. The digital era has added another dimension called keyboard layout to this problem. Memorization, language orthography, and keyboard layout are sources of spelling errors in electronic texts. English is being the linked language of the world, good quantum of work towards the spelling error detection and plausible suggestions has been done for English language. But it is not the case for digital resources scarce languages like Indian languages. Marathi which is the official language of Maharashtra State in India and the world’s 10th highest spoken language is not exception to this. Various computational approaches for spelling error detection and correction have been advocated in the literature. Amongst these, similarity-based measures have proven to be the prominent ones. This paper discusses the detailed contrastive study of the two popular similarity measures viz. minimum edit distance and cosine similarity measures in the context of mis-spelled Marathi words. The philosophical and empirical aspects of these methods have also been presented. For experimentation purpose we have chosen a dataset of 9, 29, 663 unique Marathi words harvested from various sources. We have obtained an accuracy of 85.88% and 86.76% for minimum edit distance algorithm and the cosine similarity algorithm, respectively.
... Note that (ν r + 1 − q) + (p − r) is the taxicab (or Manhattan) distance from the cell (r, ν r + 1) to the cell τ = (p, q) [Kra86]. Therefore, set ...
Preprint
We provide a combinatorial formula for the expansion of immaculate noncommutative symmetric functions into complete homogeneous noncommutative symmetric functions. To do this, we introduce generalizations of Ferrers diagrams which we call GBPR diagrams. We define tunnel hooks, which play a role similar to that of the special rim hooks appearing in the E\u{g}ecio\u{g}lu-Remmel formula for the symmetric inverse Kostka matrix. We extend this interpretation to skew shapes and fully generalize to define immaculate functions indexed by integer sequences skewed by integer sequences. Finally, as an application of our combinatorial formula, we extend Campbell's results on ribbon decompositions of immaculate functions to a larger class of shapes.
... Entre os sistemas possíveis de serem levados para a escola, cabe lembrar uma geometria criada com fins didáticos pelo professor americano Eugene Krause e designada em inglês por Taxicab Geometry, segundo Krause (1975). Em língua portuguesa, esse sistema geométrico é designado por "geometria do motorista do táxi", "geometria pombalina" como em Jorge et al. (1999), "geometria do taxista" como em Bigode (2002), ou ainda, como denominamos, "geometria do táxi" (GT , 4, 5, 8, 9, 11, 12, 18, 24, 26, 28, 29, 36, 40, 42, 44, 45, 54 pensamento científico, 4, 9, 44, 45, 47, 54, 55 ...
... When interpreting this grid as a graph, it is possible to analyse the spatial distribution of valid and invalid configurations in the search spaces. To interpret the search space as a graph, two nodes are connected if the Manhattan distance [10] between their indices in the grid is exactly one. ...
Preprint
The process of optimizing the latency of DNN operators with ML models and hardware-in-the-loop, called auto-tuning, has established itself as a pervasive method for the deployment of neural networks. From a search space of loop-optimizations, the candidate providing the best performance has to be selected. Performance of individual configurations is evaluated through hardware measurements. The combinatorial explosion of possible configurations, together with the cost of hardware evaluation makes exhaustive explorations of the search space infeasible in practice. Machine Learning methods, like random forests or reinforcement learning are used to aid in the selection of candidates for hardware evaluation. For general purpose hardware like x86 and GPGPU architectures impressive performance gains can be achieved, compared to hand-optimized libraries like cuDNN. The method is also useful in the space of hardware accelerators with less wide-spread adoption, where a high-performance library is not always available. However, hardware accelerators are often less flexible with respect to their programming which leads to operator configurations not executable on the hardware target. This work evaluates how these invalid configurations affect the auto-tuning process and its underlying performance prediction model for the VTA hardware. From these results, a validity-driven initialization method for AutoTVM is developed, only requiring 41.6% of the necessary hardware measurements to find the best solution, while improving search robustness.
... A distance matrix was created by standardizing the data using Gower and Manhattan similarity distances for the physical and mixed datasets, respectively 52,53 . The number of clusters was determined with the help of scree and silhouette plots. ...
Preprint
Full-text available
We estimate over 200,000 km or 10% of previously free-flowing river habitat length has been lost due to impoundments, an amount equivalent to the entire length of rivers in Italy. This loss strongly depends on the biogeographical location and a type of impounding barrier. European rivers are disconnected by more than one million man-made barriers that physically limit or completely block aquatic species migration and contribute to the loss of freshwater habitats8. One of the pervasive effects of barriers is the one caused by impoundment, which directly modifies lotic (flowing) stretches of river into lentic (lake-like) habitats5. Depending on structure and composition of fish communities expected at the barrier location the biological consequences may vary. EU-wide analysis of fish communities observed at river sections with low human induced alteration resulted in a macrohabitat classification of European rivers into 15 river types with expected fish community structure. This set a baseline for assessing the impacts of six main barrier types (dams, weirs, sluices, culverts, ramps, and fords) on river fish habitats across Europe. The largest habitat losses are caused by dams, weirs and sluices in mountainous areas where fish most sensitive to ponding are expected. Although many impoundments are smaller than in lowlands their individual impacts are the greatest. Hence, regional variation in the magnitude of impoundment impact is not only a function of barrier height and density, but to large extent of biogeographical location and barrier type. Strategies for enhancing European riverine biodiversity should focus on prioritization of most sensitive regions and barrier types causing high degree of habitat fragmentation. This work is based on four novel methodological approaches: fish community grouping into habitat use guilds, continental river reference model for ecological sound river management, landscape scale application of physical habitat models and conceptual model of impoundment impacts on fish habitat.
... The Minkowski metric with = 2 is used in this paper that its equation is according Eq. In Eq (5), city block [39] with = 1, the, euclidean metric with = 2, and Chebyshev metric [40] with = ∞, are calculated. ...
Preprint
Full-text available
Density-based spatial clustering of applications with noise (DBSCAN) has been used to cluster data with arbitrary shapes which clustering is done based on the density among objects in data. Given that DBSCAN is a proper tool for identifying outliers and clustering non-convex data, it can be used for automatic clustering of non-convex data and covered the weakness of most automatic clustering algorithms in not recognizing non-convex clusters. So, in this paper, a new automatic clustering algorithm is introduced which is a combination of DBSCAN and grouper fish - octopus (GFO) algorithm. GFO-DBSCAN finds the best number of clusters in two main steps in an iterative manner. In the first step, the values of \(esp\) and \(minpts\) are generated by GFO algorithm and in the second step, the clustering of data is performed using DBSCAN algorithm with \(eps\) and \(minpts\) that are generated in the previous step. After each clustering, using correct data labels, and cluster centroids, the Calinski-Harabasz (CH) index is calculated. Finally, after passing some iterations of GFO, the best number of clusters is reported. In this study, three categories of data are used to measure the performance of the GFO-DBSCAN algorithm. Also, DBSCAN is compared with ACDE, DCPSO, and GCUK algorithms. According to the results, GFO-DBSCAN has achieved the optimal number of clusters in most data and has outperformed other well-known algorithms.
... However, the actual motion plan cost was not explicitly calculated. In [89], the authors considered multiple goal configurations arranged in the ascending order of Manhattan distance [160] from the start configuration and sequentially calculated the motion plans for each goal configuration in the list. Then, the best goal configuration was selected if the actual cost of the motion plan was less than the Manhattan distance cost of the next goal configuration in the list. ...
Article
Full-text available
One of the fundamental fields of research is motion planning. Mobile manipulators present a unique set of challenges for the planning algorithms, as they are usually kinematically redundant and dynamically complex owing to the different dynamic behavior of the mobile base and the manipulator. The purpose of this article is to systematically review the different planning algorithms specifically used for mobile manipulator motion planning. Depending on how the two subsystems are treated during planning, sampling-based, optimization-based, search-based, and other planning algorithms are grouped into two broad categories. Then, planning algorithms are dissected and discussed based on common components. The problem of dealing with the kinematic redundancy in calculating the goal configuration is also analyzed. While planning separately for the mobile base and the manipulator provides convenience, the results are sub-optimal. Coordinating between the mobile base and manipulator while utilizing their unique capabilities provides better solution paths. Based on the analysis, challenges faced by the current planning algorithms and future research directions are presented.
... Finally, the clusters have to be calculated based on the distance between the attack instances. In this work, distance is calculated based on the Manhattan-metric [29]. ...
Preprint
Full-text available
Recently, advances in cyber-physical systems and IoT led to an increase in devices connected to the internet. This rise of functionality also comes with an increased attack surface for cyber criminals. A proven method for forensic investigations of trends and developments in crimes conducted in the virtual world are honeypots. We set up a medium interaction honeypot offering telnet and SSH services. With this honeypot we captured data from attack sessions. This data was used for statistical and behavioural analysis, such as distributions of attacks and different attacker IPs, originating countries, employed anonymi-sation services, skill level of an adversary and commonly targeted embedded devices. Furthermore, machine learning techniques that are capable of identifying unique types of sessions based on issued commands and provided credentials are presented in this work. There are strong indicators that most of the traffic captured during our research is caused by botnet activities, which corresponds to findings of different research activities.
... e Manhattan distance, also known as the taxicab metric, is the sum of the absolute differences of Cartesian coordinates between two points [28]. ...
Article
Full-text available
Watermarking techniques in a wide range of digital media was utilized as a host cover to hide or embed a piece of information message in such a way that it is invisible to a human observer. This study aims to develop an enhanced rapid and blind method for producing a watermarked 3D object using QR code images with high imperceptibility and transparency. The proposed method is based on the spatial domain, and it starts with converting the 3D object triangles from the three-dimensional Cartesian coordinate system to the two-dimensional coordinates domain using the corresponding transformation matrix. Then, it applies a direct modification on the third vertex point of each triangle. Each triangle’s coordinates in the 3D object can be used to embed one pixel from the QR code image. In the extraction process, the QR code pixels can be successfully extracted without the need for the original image. The imperceptibly and the transparency performances of the proposed watermarking algorithm were evaluated using Euclidean distance, Manhattan distance, cosine distance, and the correlation distance values. The proposed method was tested under various filtering attacks, such as rotation, scaling, and translation. The proposed watermarking method improved the robustness and visibility of extracting the QR code image. The results reveal that the proposed watermarking method yields watermarked 3D objects with excellent execution time, imperceptibility, and robustness to common filtering attacks.
... In addition, the class results were significantly influenced by the choice of THE dissimilarity or distance measurement method. In the present study, the Manhattan distance, or the taxicab distance, was used; the distance between the two points corresponds to the sum of the absolute differences of their Cartesian coordinates [58], which was defined as ...
Article
Full-text available
Microorganisms play a vital role in the decomposition of vertebrate remains in natural nutrient cycling, and the postmortem microbial succession patterns during decomposition remain unclear. The present study used hierarchical clustering based on Manhattan distances to analyze the similarities and differences among postmortem intestinal microbial succession patterns based on microbial 16S rDNA sequences in a mouse decomposition model. Based on the similarity, seven different classes of succession patterns were obtained. Generally, the normal intestinal flora in the cecum was gradually decreased with changes in the living conditions after death, while some facultative anaerobes and obligate anaerobes grew and multiplied upon oxygen consumption. Furthermore, a random forest regression model was developed to predict the postmortem interval based on the microbial succession trend dataset. The model demonstrated a mean absolute error of 20.01 h and a squared correlation coefficient of 0.95 during 15-day decomposition. Lactobacillus, Dubosiella, Enterococcus, and the Lachnospiraceae NK4A136 group were considered significant biomarkers for this model according to the ranked list. The present study explored microbial succession patterns in terms of relative abundances and variety, aiding in the prediction of postmortem intervals and offering some information on microbial behaviors in decomposition ecology.
... Remarque : Pour conserver un esprit de synthèse, le choix a été fait de s'orienter sur certains critères spécifiques et les plus évidents vis-à-vis du cadre de ces travaux. Cependant, d'autres critères non présentés pourraient être exploités, tels qu'un critère d'association basé sur les normales à la surface des points [107] ou sur la distance de Manhattan [108]. Ils pourront faire l'objet d'études ciblées, dans des travaux ultérieurs, pour mesurer leurs apports vis-à-vis de la méthode d'association actuellement exploitée. ...
Thesis
Le véhicule autonome représente l'un des défis technologiques actuels majeurs dans le secteur automobile. Les véhicules actuels se complexifient et intègrent de nouveaux systèmes reposants sur des fonctionnalités clés telles que la perception. Permettant au véhicule d'appréhender l'environnement dans lequel il évolue, elle est exploitée sous différents aspects pour garantir une mobilité plus sûre. Étant donné le rôle essentiel de la perception dans le bon comportement d'un véhicule autonome, il est nécessaire de s'assurer que les solutions de perception utilisées soient suffisamment performantes pour garantir une circulation sécurisée. L’évaluation de telles solutions de perception reste cependant une tâche complexe et peu explorée. L’un des points critiques est la difficulté de produire et de disposer de données de référence suffisantes pour mener des évaluations pertinentes. L'objectif de cette thèse est de mettre au point un nouvel outil de validation permettant d'évaluer les performances et niveaux d'erreurs de différentes solutions de perception, tout en utilisant le minimum de traitements manuels. Via cet outil, il sera alors possible de mettre en concurrence différentes solutions en se basant sur des critères communs. La mise au point de cet outil se décompose selon deux parties principales qui sont : la génération automatisée de données de référence et la méthode d'évaluation des solutions de perceptions testées.
... It takes the string co-occurrence and repetition degree as the similarity measure. Some of them based on characters, such as Edit Distance, Hamming Distance, Longest Common Substring (LCS), Jaro-Winkler [5] and N-gram [6]; Others based on terms, like Cosine Similarity, Euclidean Distance, Manhattan Distance [7], Jaccard similarity [8] and Dice's Coefficient [9]. Although this method is easy to implement, it does not consider words' meaning and their relationship. ...
... where D( t, t ) = d i min{|t i − t i |, L i − |t i − t i |} is the Manhattan distance [29] defined over the periodic lattice of size L i in each dimension (in our case d = 2 and L i = 1), ...
Preprint
Group convolutions and cross-correlations, which are equivariant to the actions of group elements, are commonly used in mathematics to analyze or take advantage of symmetries inherent in a given problem setting. Here, we provide efficient quantum algorithms for performing linear group convolutions and cross-correlations on data stored as quantum states. Runtimes for our algorithms are logarithmic in the dimension of the group thus offering an exponential speedup compared to classical algorithms when input data is provided as a quantum state and linear operations are well conditioned. Motivated by the rich literature on quantum algorithms for solving algebraic problems, our theoretical framework opens a path for quantizing many algorithms in machine learning and numerical methods that employ group operations.
... Definition 1 Maximum Manhattan distance: "Manhattan distance" is also termed "city block distance" and "taxicab distance" (Krause, 1986). It can roughly depict the distance along a road network. ...
Preprint
Full-text available
The movement of humans and goods in cities can be represented by constrained flow, which is defined as the movement of objects between origin and destination in road networks. Flow aggregation, namely origins and destinations aggregated simultaneously, is one of the most common patterns, say the aggregated origin-to-destination flows between two transport hubs may indicate the great traffic demand between two sites. Developing a clustering method for constrained flows is crucial for determining urban flow aggregation. Among existing methods about identifying flow aggregation, L-function of flows is the major one. Nevertheless, this method depends on the aggregation scale, the key parameter detected by Euclidean L-function, it does not adapt to road network. The extracted aggregation may be overestimated and dispersed. Therefore, we propose a clustering method based on L-function of Manhattan space, which consists of three major steps. The first is to detect aggregation scales by Manhattan L-function. The second is to determine core flows possessing highest local L-function values at different scales. The final step is to take the intersection of core flows neighbourhoods, the extent of which depends on corresponding scale. By setting the number of core flows, we could concentrate the aggregation and thus highlight Aggregation Artery Architecture (AAA), which depicts road sections that contain the projection of key flow cluster on the road networks. Experiment using taxi flows showed that AAA could clarify resident movement type of identified aggregated flows. Our method also helps selecting locations for distribution sites, thereby supporting accurate analysis of urban interactions.
... Is there a parabola in Figure 6? The answer is yes: By suppressing the circled respondents which make less than 6% of the data, we see an inverted V shaped band of points, which represents a taxicab parabola with a lot of dispersion, see for instance Krause (1986). ...
... ↑ North-Scale: 5 km. [25]. 2 Considering smallest displacement (71% of ground distance), taxicab geometry [43]. 3 The search perimeter is restricted to a focus area as indicated in Figure 8. Table 6. ...
Article
Full-text available
Green spaces have a positive influence on human well-being. Therefore, an accurate evaluation of public green space provision is crucial for administrations to achieve decent urban environmental quality for all. Whereas inequalities in green space access have been studied in relation to income, the relation between neighbourhood affluence and remediation difficulty remains insufficiently investigated. A methodology is proposed for co-creating scenarios for green space development through green space proximity modelling. For Brussels, a detailed analysis of potential interventions allows for classification according to relative investment scales. This resulted in three scenarios of increasing ambition. Results of scenario modelling are combined with socio-economic data to analyse the relation between average income and green space proximity. The analysis confirms the generally accepted hypothesis that non-affluent neighbourhoods are on average underserved. The proposed scenarios reveal that the possibility of reaching a very high standard in green space proximity throughout the study area if authorities would be willing to allocate budgets for green space development that go beyond the regular construction costs of urban green spaces, and that the types of interventions require a higher financial investment per area of realised green space in non-affluent neighbourhoods.
... This distance can be generalized to take into consideration the differences between the variables domains. This new distance is called Manhattan [119], and the definition is as follows. Nevertheless, the HD (and other domain independent metrics) has its problems caused by being a generalist measure (see Example 13). ...
Thesis
Full-text available
Scheduling problems are common in many applications that range from factories and transports to universities. Most times, these problems are optimization problems for which we want to find the best way to manage scarce resources. Reality is dynamic, and thus unexpected disruptions can make the original solution invalid. There are many methods to deal with disruptions well described in the literature. These methods can be divided into two main approaches: (i) create robust solutions for the most common disruptions, and (ii) solve the problem again from scratch extended with new constraints. The goal of creating robust solutions is to ensure their validity even after the most common dis�ruptions occur. For this reason, it requires a detailed study of the most likely disruptive scenarios. The main disadvantage of creating a robust solution is a possible reduction in the overall quality (e.g., financial cost, customer satisfaction) to support the most likely disruptive scenarios that may never occur. Regardless of the robustness of the solution, we may need to solve the problem again. Most of the methods developed to recover solutions after disruptions occur consist of re-solving the problem from scratch with an additional cost function. This cost function ensures that the new solution is close to the original. In other words, the methods solve the Minimal Perturbation Problem (MPP). However, all these methods require more execution time than the original problem to find a new solution. This can be explained by the fact that we solve a different problem (with more optimization criteria). One can mitigate this problem by re-using the search. Moreover, they use generic cost functions (e.g., Hamming distance) that may have little significance in practice. In this work, we propose novel algorithms to solve the MPP applied to two domains: university course timetabling and train scheduling. We tested our algorithms to solve university timetabling prob�lems with data sets obtained from Instituto Superior Técnico and the 2019 International Timetabling Competition. One of these algorithms was ranked in the top 5 of the competition. When considering the train scheduling case study, we tested our algorithms with data from the Swiss Federal Railways and from PESPLib. The evaluation shows that the new algorithms are more efficient than the ones described in the literature. Summing up, the proposed algorithms show a significant improvement on the state-of-the-art to re-solve scheduling problems under disruptions.
... = 1, where ‖ ⋅ ‖ ! is the taxicab ! -norm [14]. Each histogram vector, * 8 9999⃗ and * ; 999⃗ , has 2 2 entries, where 0 is the bit-depth of the image (e.g., 8-bit). ...
Article
Pascal’s triangle arises by counting the number of shortest paths from (0, 0) to (n, k) on a square street grid. The length of the shortest path is the Manhattan distance from (0, 0) to (n, k). We consider the case of a square street grid of one-way streets, with successive parallel streets being oppositely directed. We investigate the associated distance function q (which is only a quasi-metric, since q(A, B) may differ from q(B, A)) and the arithmetic triangle obtained by counting shortest routes on the one-way grid from (0, 0) to (n, k).
Article
Objectives : The purpose of this study is to use independent datasets to externally validate the three symptom clusters of unipolar depression identified by Chekroud, to evaluate personalized treatment trajectories and outcomes based on these symptom clusters, and to verify predictors. Methods : The Quick Inventory of Depressive Symptomatology-Self Report (QIDS-SR16)¹ and Hamilton Rating Scale for Depression (Ham-D)² data from two placebo controlled, double-blind clinical trials (Dual Therapy and Duloxetine) were used for external validation. Machine learning methods were applied to replicate the three symptom clusters and to produce treatment trajectories. Penalized logistic regressions were conducted to identify top baseline variables that best predicted treatment outcomes. Results : The variables Chekroud identified as comprising sleep, atypical and core emotional clusters are replicated. Treatment trajectories demonstrate that dual treatment (escitalopram and bupropion) performed best across all symptom clusters but did not outperform escitalopram monotherapy over time. For each symptom cluster, there were differences in treatment efficacy among antidepressants. Conclusion : By using different treatment trajectories based on a patient's symptom cluster profile, clinicians could potentially select best fit antidepressants to achieve the biggest benefit. Our results showed that total baseline QIDS, Ham-D score, anxiety disorder diagnosis and course of depressive illness were the best baseline predictors. Results could enhance personalized depression treatment plans and help to improve outcomes. Clinical Trials Registration : NCT00519428, NCT00360724.
Article
Full-text available
This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate for the first time an unexplored benchmark, called Corpus-Transcriptional-Regulation (CTR); (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of software and data reproducibility resources for methods and experiments in this line of research. Our reproducible experimental survey is based on a single software platform, which is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods, and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure establishes the new state of the art in sentence similarity analysis in the biomedical domain and significantly outperforms all the methods evaluated herein, with the only exception of one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool for ontology-based methods, have a very significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and highlight the need to refine the current benchmarks. Finally, a notable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning (ML) models evaluated herein.
Article
Full-text available
Computer-assisted design of small molecules has experienced a resurgence in academic and industrial interest due to the widespread use of data-driven techniques such as deep generative models. While the ability...
Preprint
Full-text available
Computer-assisted design of small molecules has experienced a resurgence in academic and indus- trial interest due to the widespread use of data-driven techniques such as deep generative models. While the ability to generate molecules that fulfill required chemical properties is encouraging, the use of deep learning models requires significant, if not prohibitive, amounts of data and computa- tional power. At the same time, open-sourcing of more traditional techniques such as graph-based genetic algorithms for molecular optimisation [Jensen, Chem. Sci., 2019, 12, 3567-3572] has shown that simple and training-free algorithms can be efficient and robust alternatives. Further research alleviated the common genetic algorithm issue of evolutionary stagnation by enforcing molecular diversity during optimisation [Van den Abeele, Chem. Sci., 2020, 42, 11485-11491]. The crucial lesson distilled from the simultaneous development of deep generative models and advanced genetic algorithms has been the importance of chemical space exploration [Aspuru-Guzik, Chem. Sci., 2021, 12, 7079-7090]. For single-objective optimisation problems, chemical space exploration had to be discovered as a usable resource but in multi-objective optimisation problems, an exploration of trade- offs between conflicting objectives is inherently present. In this paper we provide state-of-the-art and open-source implementations of two generations of graph-based non-dominated sorting genetic algorithms (NSGA-II, NSGA-III) for molecular multi-objective optimisation. In addition, we provide the results of a series of benchmarks for the inverse design of small molecule drugs for both the NSGA-II and NSGA-III algorithms.
Book
This book constitutes the refereed post-conference proceedings of the Fourth IFIP TC 12 International Conference on Computational Intelligence in Data Science, ICCIDS 2021, held in Chennai, India, in March 2021. The 20 revised full papers presented were carefully reviewed and selected from 75 submissions. The papers cover topics such as computational intelligence for text analysis; computational intelligence for image and video analysis; blockchain and data science.
Chapter
Full-text available
According to India Brand Equity Foundation (IBEF), 32% of the global food market is dependent on Indian agricultural sector. Due to urbanisation, the fertile land have been utilised for non-agricultural purposes. The loss of agricultural lands impacts the productivity and results with diminishing yield. Soil is the most important factor for the thriving agriculture, since it contains the essential nutrients. The food production could be improved through the viable usage of soil nutrients. To identify the soil nutrients, the physical, chemical and biological parameters were examined using many machine learning algorithms. However, the environmental factors such as sunlight, temperature, humidity, and rainfall plays a major role in improving the soil nutrients since it is responsible for the process of photosynthesis, germination, and saturation. The objective is to determine the soil nutrient level by accessing the associative properties including the environmental variables. The proposed system termed as Agrarian application which recommends crops for the particular land using classification algorithms and predicts the yield rate by employing regression techniques. The application will help the farmers in selecting the crops based on the soil nutrient content, environmental factors and predicts the yield rate for the same.
Chapter
Character difference represents one of the most common problems that can be occurred when students try to answer questions of fill in the gaps or one-word answer that is needed mostly to one word as the answer. To improve the evolution of the student answer using Hamming distance, we proposed Hamming model tried to solve the drawbacks of the standard Hamming model by applying the stemming approach to achieve derivative lexical similarity and applying the space padding to deal with unequal lengths of the texts.
Chapter
The Virtual Math Teams project is exploring how to create, structure, support, and assess an online chat-based collaborative community devoted to mathematics discourse. It is analyzing the forms of group cognition that emerge from the use of shared cognitive tools with specific functionalities. Centered on a case study of a synchronous online interchange, this Investigation discusses the use of a graphical referencing tool in coordination with text chat to achieve a group orientation to a particular mathematical object in a shared whiteboard. Deictic referencing is seen to be a critical foundation of intersubjective cognitive processes that index objects of shared attention. The case study suggests that cognitive tools to support group referencing can be important in supporting group alignment, intentionality, and cognition in online communities such as this one for collaborative mathematics.
Article
Full-text available
In this paper, we analyze several local referendums on land development and land-use regulation in the City of Erlangen (Germany) between 2011 and 2018. To identify the positive influence of the travel distance on approval for land development, we control for distance to the city center and density, employ a two-way fixed-effects model, and use spatial instruments. We also analyze the heterogeneity of city dwellers’ preferences for the development of residential and commercial areas. In particular, we examine the differences between homeowners and tenants in this regard. This article is protected by copyright. All rights reserved.
Article
In this project, we explore distance in the context of metrics. More specifically, we take a look at an integral metric that is used to determine the distance between sets. The motivation behind this project is to determine that the integral metric we use is meaningful in the context of our data set. The data set we useconsists of trajectories of cars along a portion of the I5 highway. Through training, testing, and evaluating this model on the data set, we can reach conclusions on the structure of the data and the success of this integral metric in terms of a classifier. In the end, we can both determine whether or not this integral metric fits with the data set chosen and explore other areas where the metric could possibly succeed or where it might fail.
ResearchGate has not been able to resolve any references for this publication.