E-R Diagram for Library Management System

Source publication

Generating ER Diagrams from Requirement Specifications Based On Natural Language Processing

Article

Full-text available

Apr 2015

An Entity Relationship (ER) data model is a high level conceptual model that describes information as entities, attributes, and relationships. Entity relationship modeling designed to facilitate database design. The abstract nature of Entity Relationship diagrams can be discouraging task to both designers and student alike. This paper deals with th...

Context 1

... tokens set with its POS. chunking usually selects a subset of the tokens together to indicate its type . Chunking is an intermediate step towards full parsing. Natural languages grammar is ambiguous and has multiple possible analyses. Each sentence may have many potential Parses tree. Most of them will seem easy to a human. However, it is difficult for decide which of them is in the specification. Therefore, Parsing process determines the parse tree of a given sentence. Sequences of words are transfo rmed into structures that indicate how the sentence’s units relate to each other. This step helps us in identifying the main parts in a given sentence such as object, subject... etc.. . Parsing examples are shown in Figure 6 and Figure 7. Some parsers assume the existence of a set of grammar rules in order to parse a given sentences. Following examples of such rules, however, recent parsers are smart enough to infer the parse trees directly using complex statistical models [17]. Parsing analysis will be able to extract nouns that are playing the role of entities or attributes, and extract verbs that act as a relationship between entities. Also, cardinalities and multiplicities information may be extracted from determiners, adjectives, model verbs and quantifies. This paper used Memory-Based Shallow Parser (MBSP) [8] as parser method. MBSP is a text analysis system provides tools for tokenization, sentence splitting, part of speech tagging, chunking and relation finding. The proposed methodology based on a set of identification rules that combine different concepts from other works as follows: 1. A common noun may indicate an entity type [5, 9]. 2. A proper noun may indicate an entity [5, 9]. 3. in case of consecutive nouns existence, check the last noun, If it is not one of the words in set J where J= [number, no, code, date, type, volume, birth, id, address, name], it may be an entity type otherwise it may indicate an attribute type [22]. 4: A gerund may indicate an entity type [5]. 5: a specialization’s relationship “A is a B” sentence’s structure can relate two nouns [23]. 6: A noun such as “database”, “record”, “system”, “information”, “organization” and “detail” may not be considered as a candidate for an entity type because it shows the business environment [22]. 7: ignored every proper noun such as (Location name, Person name ... etc.. ) [21]. Attributes are nouns mentioned along with their entity, it may proceeded by the verbs has, have, or includes which indicate that an entity is attributed with a property. For example, in "employee has id, name, and address", employee is detected as an entity, and name, id and address are detected as attributes. Here some rules that identify attributes in specifications. 1. Noun phrase with genitive case may indicate an attributes [9]. 2. If a noun is followed by another noun and the latter one belongs to set S where S= [number, no, code, date, type, volume, birth, id, address, name], this may indicate that both nouns are an attribute else it may be an entity [22]. 3: A noun such as “ vehicle no ”, “ group no ”, “person id” and “ room type ” refer to an attribute [24]. 4: The possessive case usually shows ownership it may indicate attribute type [9]. 4: A noun phrase such as “has/have” may indicate attribute types [24]. The main verb that occurs between two entities is more likely to be a relationship. Two entities can be separated by main verb only, by main verb and an auxiliary verb, or main verb and modal verb. For example, in {The bank is branched into many branches}, Branched is detected as relationship. 1: A transitive verb can indicate relationship type [5]. 2: A verb followed by a preposition such as "by", "to", "on" and "in" can indicate a relationship type [9]. 3: if a verb is in the following list {include, involve, consists of, contain, comprise, divided to, embrace}, this indicate a relationship of aggregation or composition [21]. 4. An adverb can indicate an attribute for relationship [5]. 5. A verb followed by a preposition such as {on, in, by, to} could be a relationship. For example, {Persons work on projects.} Other examples include {assigned to} and {managed by} [22]. 1. Adverb (uniquely) indicates PK of an entity [18]. 2. If the sentence is in the form of {“ Subject" + "Possessive verb" + "Adjective" +"Object " }, then the object is a key attribute [25]. The ER generator is a rule-based system that identifies ER relationships, ER entities and ER attributes [18]. Once all words have been assigned to its ER element type, relevant information consisting of which words are entities, relationships, cardinalities and attributes are stored in text files. These text files are then used to generate ER diagram. Figure 5 show the prototype editor for the ER generating process. Currently, the prototype is in design stage. ER generator is easy to use and understand. However, the ER generator tool aims to provide minimal human intervention during the process. Figure 8 show the E-R diagram for library management system example. Linguistic variation (Incomplete Knowledge) and ambiguity are the main problem in using NL. Part of- speech tagging also is harder than just having a list of words and their parts of speech, some parts of speech are complex or unspoken .difficulties of accessing information in given text is due to the complexity of natural language. Technologies of NLP are still a way from being able to understand information from unrestricted text. The heuristics approach suitable for small application domains not large one. Appling NLP in specific domain problems is more efficient and could make significant progress. Also there is no standard approach for automatically recognize objects and classes from English sentences. Moreover, the analysis process is the most critical and difficult tasks because most of input scenario is in natural language such as English or Arabic [20]. Generating Multi-document Text related to domain problem using NLP is more difficult from generating a single document [19]. Also the most challenging task is to be able to parse Arabic or Chinese language. Such language is different linguistic properties compared to English. Generally, Natural language processing is successful in meeting the syntax challenges. But it still has to go a long way in the areas of semantics and pragmatics. Entity relationship modeling is a high level data modeling technique that helps designers create useful and accurate conceptual models. Much research has attempted to apply Natural Language Processing (NLP) to extract knowledge from requirement specifications. Heuristics based rules is used to parse the specifications. This approach pays particular attention to natural language processing techniques such as tokenization, tagging POS, chunking and parsing based on syntax heuristics rules. Parsing results would be words with its Part Of Speech (POS); this result fed into ER generator to identify suitable data modeling elements according to heuristics based rules. This approach gives the Database designer an overview of the output of natural language processing. Moreover, provides designer with detailed modeling information that help them during database design. As future work, extend using of NLP to have semantics analysis rather than structural analysis to infer new things such as composite attributes, cardinalities, weak attributes etc.. . In addition, raise the using of artificial intelligent techniques (AI) such as support vector machine (SVM) and neural network for better understand of requirement specifications. Eman Btoush , (emanbtoush26@gmail.com) received the B.S. in computer science From Mutah University, Mutah, Jordan, in 2002. She is currently a M.S. student of computer science Department. Mutah University, Jordan. Mustafa Hammad , is an Assistant Professor at Information Technology department in Mu’tah University, Al Karak - Jordan. He received his PhD. in computer science from New Mexico State University, USA in 2010. He received his Masters degree in computer science from Al-Balqa Applied University, Jordan in 2005 and his B.Sc. in computer science from The Hashemite University, Jordan in 2002. His research interest is Software Engineering with focus on static and dynamic analysis and software ...

View in full-text

Generating Domain Terminologies using Root- and Rule-Based Terms

Article

Full-text available

Jan 2018

Motivated by the need for flexible, intuitive, reusable, and normalized terminology for guiding search and building ontologies, we present a general approach for generating sets of such terminologies from natural language documents. The terms that this approach generates are root- and rule-based terms, generated by a series of rules designed to be...

Design electronic documentation (E-doc) application for midwifery professional education at stikes Estu Utomo Boyolali

Conference Paper

Full-text available

Jan 2024

Challenges and Future of AI-Based Requirement Analysis: A Literature Review

Chapter

Full-text available

Feb 2023

Artificial intelligence has the capacity to store, retrieve, and discover patterns in order to better inform action and process data. Writing quality requirements is crucial since the majority of software projects entail rework and faults discovered during requirement collecting. As a result, bad requirement collecting led to unsuccessful software projects. NLP is a term for the technology that allows computers to understand, analyze, and interpret human language. The approach described in this review paper for converting software requirements expressed in plain language to formal specifications. In software engineering, the classification of functional and nonfunctional requirements has become essential. Discovering and classifying functional (F), nonfunctional (NF), and their subclasses of requirements are the goals of this work. The purpose of this paper is to help people produce application software, such as software designers, testers, etc. Faster software development and delivery are other benefits of this review paper. This paper identifies and addresses the difficulties requirements engineering faces while developing complicated AI systems.

IDENTIFYING ENTITY AND ATTRIBUTES USING NLP AND GENERATING ERDIAGRAM

Article

Dec 2022

In application development software, we need to model the data which is stored in a database. ER diagrams shows what type of data to be stored in the table or database. ER diagram can express the logical structure of the database graphically. The graphical representation of a system will be more comprehensible and easy to understand. The purpose of this paper is to identify entities and their attributes form user requirements using NLP and to combine requirement gathering technics and create tool for elicit entity and their attributes. In this study, we use NLP and its components for collecting entities and their attributes. The main aim of this paper is that to nd out entities and their attributes in order to generate diagrams such as ERDs. The purpose of this paper is to assist the business analyst and stakeholder to avoiding unnecessary work and complexity during the requirement analysis phase.

Generating BPMN diagram from textual requirements

Article

Full-text available

Oct 2022

An interesting challenge in software requirements engineering is converting textual requirements to Business Process Model and Notation (BPMN) diagrams. In this study, the BPMN diagram is used as an intermediate representation before measuring the functional software size from Natural Language (NL) input. The methods currently used for converting NL input to BPMN diagrams are not able to generate complete BPMN diagrams, nor can they handle complex and compound-complex sentences in the NL input. This study proposes conversion from textual requirements to a BPMN diagram for improving the weaknesses of existing methods. The proposed method has two stages: 1) analyzing the textual requirements using natural language processing and 2) generating the BPMN diagram. The output of the first stage is fact types as the basis for generating the BPMN diagram in the second phase. The BPMN diagram is generated using a set of informal mapping rules that were created in this study. The proposed method was applied to ten textual requirements of an enterprise application, which involved simple, compound, complex, and compound-complex sentences. The experiments resulted in a suitable BPMN diagram with higher accuracy than obtained by other methods.

Automatic Transformation of Natural to Unified Modeling Language: A Systematic Review

Preprint

Apr 2022

Context: Processing Software Requirement Specifications (SRS) manually takes a much longer time for requirement analysts in software engineering. Researchers have been working on making an automatic approach to ease this task. Most of the existing approaches require some intervention from an analyst or are challenging to use. Some automatic and semi-automatic approaches were developed based on heuristic rules or machine learning algorithms. However, there are various constraints to the existing approaches of UML generation, such as restriction on ambiguity, length or structure, anaphora, incompleteness, atomicity of input text, requirements of domain ontology, etc. Objective: This study aims to better understand the effectiveness of existing systems and provide a conceptual framework with further improvement guidelines. Method: We performed a systematic literature review (SLR). We conducted our study selection into two phases and selected 70 papers. We conducted quantitative and qualitative analyses by manually extracting information, cross-checking, and validating our findings. Result: We described the existing approaches and revealed the issues observed in these works. We identified and clustered both the limitations and benefits of selected articles. Conclusion: This research upholds the necessity of a common dataset and evaluation framework to extend the research consistently. It also describes the significance of natural language processing obstacles researchers face. In addition, it creates a path forward for future research.

A Corpus-Based Sentence Classifier for Entity–Relationship Modelling

Article

Full-text available

Mar 2022

Automated creation of a conceptual data model based on user requirements expressed in the textual form of a natural language is a challenging research area. The complexity of natural language requires deep insight into the semantics buried in words, expressions, and string patterns. For the purpose of natural language processing, we created a corpus of business descriptions and an adherent lexicon containing all the words in the corpus. Thus, it was possible to define rules for the automatic translation of business descriptions into the entity–relationship (ER) data model. However, since the translation rules could not always lead to accurate translations, we created an additional classification process layer—a classifier which assigns to each input sentence some of the defined ER method classes. The classifier represents a formalized knowledge of the four data modelling experts. This rule-based classification process is based on the extraction of ER information from a given sentence. After the detailed description, the classification process itself was evaluated and tested using the standard multiclass performance measures: recall, precision and accuracy. The accuracy in the learning phase was 96.77% and in the testing phase 95.79%.

PEMBANGKIT ENTITY RELATIONSHIP DIAGRAM DARI SPESIFIKASI KEBUTUHAN MENGGUNAKAN NATURAL LANGUAGE PROCESSING UNTUK BAHASA INDONESIA

Article

Full-text available

Oct 2021

Memodelkan Entity Relationship Diagram (ERD) dapat dilakukan secara manual, namun umumnya memperoleh pemodelan ERD secara manual membutuhkan waktu yang lama. Maka, dibutuhkan pembangkit ERD dari spesifikasi kebutuhan untuk mempermudah dalam melakukan pemodelan ERD. Penelitian ini bertujuan untuk mengembangkan sebuah sistem pembangkit ERD dari spesifikasi kebutuhan dalam Bahasa Indonesia dengan menerapkan beberapa tahapan-tahapan dari Natural Language Processing (NLP) sesuai kebutuhan penelitian. Spesifikasi kebutuhan yang digunakan tim peneliti menggunakan teknik document analysis. Untuk tahapan-tahapan dari NLP yang digunakan oleh peneliti yaitu: case folding, sentence segmentation, tokenization, POS tagging, chunking dan parsing. Kemudian peneliti melakukan identifikasi terhadap kata-kata dari teks yang sudah diproses pada tahapan-tahapan dari NLP dengan metode rule-based untuk menemukan daftar kata-kata yang memenuhi dalam komponen ERD seperti: entitas, atribut, primary key dan relasi. ERD kemudian digambarkan menggunakan Graphviz berdasarkan komponen ERD yang telah diperoleh Evaluasi hasil ERD yang berhasil dibangkitkan kemudian di evaluasi menggunakan metode evaluasi expert judgement. Dari hasil evaluasi berdasarkan beberapa studi kasus diperoleh hasil rata-rata precision, recall, F1 score berturut-turut dari tiap ahli yaitu: pada ahli 1 diperoleh 91%, 90%, 90%; pada ahli 2 diperoleh 90%, 90%, 90%; pada ahli 3 diperoleh 98%, 94%, 96%; pada ahli 4 diperoleh 93%, 93%, 93%; dan pada ahli 5 diperoleh 98%, 83%, 90%.

Automatic Generation of Scripts for Database Creation from Scenario Descriptions

Article

Mar 2021

Aims: Database creation is the most critical component of the design and implementation of any software application. Generally, the process of creating the database from the requirement specification of a software application is believed to be extremely hard. This study presents a method to automatically generate database scripts from a given scenario description of the requirement specification. Study Design: The method is developed based on a set of natural language processing (NLP) techniques and a few algorithms. Standard database scenario descriptions presented in popular textbooks on Database Design are used for the validation of the method. Place and Duration of Study: Department of Statistics and Computer Science, Faculty of Science, University of Peradeniya, Sri Lanka, Between December 2019 to December 2020. Methodology: The description of the problem scenario is processed using NLP operations such as tokenization, complex word handling, basic group handling, complex phrase handling, structure merging, and template construction to extract the necessary information required for the entity relational model. New algorithms are proposed to automatically convert the entity relational model to the logical schema and finally to the database script. The system can generate scripts for relational databases (RDB), object relational databases (ORDB) and Not Only SQL (NoSQL) databases. The proposed method is integrated into a web application where the users can type the scenario in natural or free text. The user can select the type of database (i.e., one of RDB, ORDB, NoSQL) considered in their system and accordingly the application generates the SQL scripts. Results: The proposed method was evaluated using 10 scenario descriptions connected to 10 different domains such as company, university, airport, etc. for all three types of databases. The method performed with impressive accuracies of 82.5%, 84.0% and 83.5% for RDB, ORDB and NoSQL scripts, respectively. Conclusion: This study is mainly focused on the automatic generation of SQL scripts from scenario descriptions of the requirement specification of a software system. Overall, the developed method helps to speed up the database development process. Further, the developed web application provides a learning environment for people who are novices in database technology.

Development of Tourism Database Management System: Creating ER Model

Article

Full-text available

Feb 2021

iMER: Iterative Process of Entity Relationship and Business Process Model Extraction from the Requirements

Article

Feb 2021
INFORM SOFTWARE TECH

Context Extracting conceptual models, e.g., entity relationship model or Business Process model, from software requirement document is an essential task in the software development life cycle. Business process model presents a clear picture of required system's functionality. Operations in business process model together with the data entity consumed, help the software developers to understand the database design and operations to be implemented. Researchers have been aiming at automatic extraction of these artefacts from the requirement document. Objective In this paper, we present an automated approach to extract the entity relationship and business process models from requirements, which are possibly in different formats such as general requirements, use case specification and user stories. Our approach is based on the efficient natural language processing techniques. Method It is an iterative approach of Models Extraction from the Requirements (iMER). iMER has multiple iterations where each iteration is to address a sub-problem. In the first iteration, iMER extracts the data entities and attributes. Second iteration is to find the relationships between data entities, while extracting cardinalities is in the third step. Business process model is generated in the fourth iteration, containing the external (actors’) and internal (system's) operations. Evaluation To evaluate the performance and accuracy of iMER, experiments are conducted on various formats of the requirement documents. Additionally, we have also evaluated our approaches using the requirement documents which been modified by shuffling the sentences and by merging with other requirements. Comparative study is also performed. The preliminary results show a noticeable improvement. Conclusion The iMER is an efficient automated iterative approach that is able to extract the conceptual models from the various formats of requirements.

E-R Diagram for Library Management System

Context in source publication

Similar publications

Citations