Figure 14 - uploaded by Eswaran Subrahmanian
Content may be subject to copyright.
Collapsed representation of the tree's red leaves

Collapsed representation of the tree's red leaves

Source publication
Article
Full-text available
Motivated by the need for flexible, intuitive, reusable, and normalized terminology for guiding search and building ontologies, we present a general approach for generating sets of such terminologies from natural language documents. The terms that this approach generates are root- and rule-based terms, generated by a series of rules designed to be...

Citations

... Given a corpus of text, such as a collection of papers, the task of identifying these central concepts is sometimes known as term extraction, and there are many generic toolkits for performing this task. In this paper we study four examples: TextRank (Mihalcea and Tarau, 2004), DyGIE++ , OpenTapioca (Delpeuch, 2020), and Parmenides (Bhat et al., 2018). ...
... Parmenides: Parmenides (Bhat et al., 2018) takes a linguistic approach to terminology extraction. It uses spaCy 6 to identify syntactic structures, then normalizes the syntactic structure and identifies phrases for extraction. ...
Preprint
Full-text available
We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text. We also make available two open corpora in research mathematics, in particular in category theory: a small corpus of 755 abstracts from the journal TAC (3188 sentences), and a larger corpus from the nLab community wiki (15,000 sentences).
... Automatic content analysis through text mining provides a convenient alternative to manual analysis to gather domain knowledge and to create domain ontologies (Collard, et al., 2018). Repeating content analysis through time allows the examination of the change in the concept networks and track the modifications of the important terms. ...
Chapter
Full-text available
Although common vulnerabilities and exposures data (CVE) is commonly known and used to keep vulnerabil-ity descriptions. It lacks enough classifiers that increase its usability. This results in focusing on some well-known vulnerabilities and leaving others during the security tests. Better classification of this dataset would result in find-ing solutions to a larger set of vulnerabilities/exposures. In this research, vulnerability and exposure data (CVE) is examined in detail using both manual and computerized content analysis techniques. Later, graph theoretical tech-niques are used to scrutinize the CVE data. The computerized content analysis made it possible to find out 94 con-cepts associated with the CVE records. The author was able to relate these concepts to 11 logical groups. Using the network of the relationships of these 94 concepts further in the graph theoretical analysis made it possible to dis-cover groups of contents, thus, the CVE items which have similarities. Moreover, lacking some concepts pointed out the problems related to CVE such as delays in the review CVE process or not being preferred by some user groups.
... For 1), we are primarily applying text mining (extracting and clustering noun phrases [3]), concept clustering and mapping, and ontological analysis to identify and track concepts. The text mining results will provide continuous input for the concept clustering, mapping and sentiment analysis phase and together they will provide results for inclusion into an evolving ontology. ...
... The characterization differentiates, and the classification relates, the individuals, groups, communities and organizations, the systems and applications, and the processes, methods and techniques involved. The processes are: 1) mining security data sources using phrasal parsing, automated terminology construction, statistical analysis and clustering to determine the most salient concepts [3] in the corpora being analyzed and tracking their changes through time; ...
Conference Paper
Full-text available
The number and variety of cyber-attacks is rapidly increasing, and the rate of new software vulnerabilities is also rising dramatically. The cybersecurity community typically reacts to attacks after they occur. Being reactive is costly and can be fatal, where attacks threaten lives, important data, or mission success. Taking a proactive approach, we are: (I) identifying potential attacks before they come to fruition, and based on this identification; (II) developing preventive counter-measures. We describe a Proactive Cybersecurity System (PCS), a layered, modular service platform that applies big data collection and processing tools a wide variety of unstructured data sources to identify potential attacks and develop countermeasures. The PCS provides security analysts a holistic, proactive, and systematic approach to cybersecurity. Here we describe our research vision and progress towards that vision.
Chapter
Information and Communication Technologies (ICT) has revolutionized our lives and transform it into a knowledge centric world. Where information is available just under few clicks. This advancement introduced different challenges and problems. One big challenge of today’s world is cybersecurity and privacy issues. With every passing day, number of cyber-attacks are increasing. Legacy security solutions like firewalls, antivirus, intrusion detection and prevention systems etc. are not equipped with right technologies to neutralized advance attacks. Recent developments in machine learning, deep learning have shown great potential to deal with modern attack vectors. In this chapter, we will present: (1) Current state of cyber-attacks. (2) Overview of Intrusion Detection Systems and taxonomy. (3) Recent techniques in machine/deep learning being used to detect and defend against novel intrusion.
Article
Full-text available
This short paper describes a web resource—the NIST CORD-19 Web Resource—for community explorations of the COVID-19 Open Research Dataset (CORD-19). The tools for exploration in the web resource make use of the NIST-developed Root- and Rule-based method, which exploits underlying linguistic structures to create terms that represent phrases in a corpus. The method allows for auto-suggesting-related terms to discover terms to refine the search of a COVID-19 heterogenous document base. The method also produces taxonomic structures in the target domain as well as providing semantic information about the relationships between terms. This term structure can serve as a basis for creating topic modeling and trend analysis tools. In this paper, we describe use of a novel search engine to demonstrate some of the capabilities above.
Article
Diabetes mellitus has become a global threat, especially in the emerging economies. In the United States, there are about 24 million people with diabetes mellitus. Diabetes represents a trove of physiologic and sociologic data that are only superficially understood by the health care system. Artificial intelligence can address many problems posed by the prevalence of diabetes mellitus and the impact of diabetes on individual and societal health. We provide a brief overview of artificial intelligence and discuss case studies that illustrate how artificial intelligence can enhance diabetes care.