Figure 8 - uploaded by Eman Btoush
Content may be subject to copyright.
E-R Diagram for Library Management System 

E-R Diagram for Library Management System 

Source publication
Article
Full-text available
An Entity Relationship (ER) data model is a high level conceptual model that describes information as entities, attributes, and relationships. Entity relationship modeling designed to facilitate database design. The abstract nature of Entity Relationship diagrams can be discouraging task to both designers and student alike. This paper deals with th...

Context in source publication

Context 1
... tokens set with its POS. chunking usually selects a subset of the tokens together to indicate its type . Chunking is an intermediate step towards full parsing. Natural languages grammar is ambiguous and has multiple possible analyses. Each sentence may have many potential Parses tree. Most of them will seem easy to a human. However, it is difficult for decide which of them is in the specification. Therefore, Parsing process determines the parse tree of a given sentence. Sequences of words are transfo rmed into structures that indicate how the sentence’s units relate to each other. This step helps us in identifying the main parts in a given sentence such as object, subject... etc.. . Parsing examples are shown in Figure 6 and Figure 7. Some parsers assume the existence of a set of grammar rules in order to parse a given sentences. Following examples of such rules, however, recent parsers are smart enough to infer the parse trees directly using complex statistical models [17]. Parsing analysis will be able to extract nouns that are playing the role of entities or attributes, and extract verbs that act as a relationship between entities. Also, cardinalities and multiplicities information may be extracted from determiners, adjectives, model verbs and quantifies. This paper used Memory-Based Shallow Parser (MBSP) [8] as parser method. MBSP is a text analysis system provides tools for tokenization, sentence splitting, part of speech tagging, chunking and relation finding. The proposed methodology based on a set of identification rules that combine different concepts from other works as follows: 1. A common noun may indicate an entity type [5, 9]. 2. A proper noun may indicate an entity [5, 9]. 3. in case of consecutive nouns existence, check the last noun, If it is not one of the words in set J where J= [number, no, code, date, type, volume, birth, id, address, name], it may be an entity type otherwise it may indicate an attribute type [22]. 4: A gerund may indicate an entity type [5]. 5: a specialization’s relationship “A is a B” sentence’s structure can relate two nouns [23]. 6: A noun such as “database”, “record”, “system”, “information”, “organization” and “detail” may not be considered as a candidate for an entity type because it shows the business environment [22]. 7: ignored every proper noun such as (Location name, Person name ... etc.. ) [21]. Attributes are nouns mentioned along with their entity, it may proceeded by the verbs has, have, or includes which indicate that an entity is attributed with a property. For example, in "employee has id, name, and address", employee is detected as an entity, and name, id and address are detected as attributes. Here some rules that identify attributes in specifications. 1. Noun phrase with genitive case may indicate an attributes [9]. 2. If a noun is followed by another noun and the latter one belongs to set S where S= [number, no, code, date, type, volume, birth, id, address, name], this may indicate that both nouns are an attribute else it may be an entity [22]. 3: A noun such as “ vehicle no ”, “ group no ”, “person id” and “ room type ” refer to an attribute [24]. 4: The possessive case usually shows ownership it may indicate attribute type [9]. 4: A noun phrase such as “has/have” may indicate attribute types [24]. The main verb that occurs between two entities is more likely to be a relationship. Two entities can be separated by main verb only, by main verb and an auxiliary verb, or main verb and modal verb. For example, in {The bank is branched into many branches}, Branched is detected as relationship. 1: A transitive verb can indicate relationship type [5]. 2: A verb followed by a preposition such as "by", "to", "on" and "in" can indicate a relationship type [9]. 3: if a verb is in the following list {include, involve, consists of, contain, comprise, divided to, embrace}, this indicate a relationship of aggregation or composition [21]. 4. An adverb can indicate an attribute for relationship [5]. 5. A verb followed by a preposition such as {on, in, by, to} could be a relationship. For example, {Persons work on projects.} Other examples include {assigned to} and {managed by} [22]. 1. Adverb (uniquely) indicates PK of an entity [18]. 2. If the sentence is in the form of {“ Subject" + "Possessive verb" + "Adjective" +"Object " }, then the object is a key attribute [25]. The ER generator is a rule-based system that identifies ER relationships, ER entities and ER attributes [18]. Once all words have been assigned to its ER element type, relevant information consisting of which words are entities, relationships, cardinalities and attributes are stored in text files. These text files are then used to generate ER diagram. Figure 5 show the prototype editor for the ER generating process. Currently, the prototype is in design stage. ER generator is easy to use and understand. However, the ER generator tool aims to provide minimal human intervention during the process. Figure 8 show the E-R diagram for library management system example. Linguistic variation (Incomplete Knowledge) and ambiguity are the main problem in using NL. Part of- speech tagging also is harder than just having a list of words and their parts of speech, some parts of speech are complex or unspoken .difficulties of accessing information in given text is due to the complexity of natural language. Technologies of NLP are still a way from being able to understand information from unrestricted text. The heuristics approach suitable for small application domains not large one. Appling NLP in specific domain problems is more efficient and could make significant progress. Also there is no standard approach for automatically recognize objects and classes from English sentences. Moreover, the analysis process is the most critical and difficult tasks because most of input scenario is in natural language such as English or Arabic [20]. Generating Multi-document Text related to domain problem using NLP is more difficult from generating a single document [19]. Also the most challenging task is to be able to parse Arabic or Chinese language. Such language is different linguistic properties compared to English. Generally, Natural language processing is successful in meeting the syntax challenges. But it still has to go a long way in the areas of semantics and pragmatics. Entity relationship modeling is a high level data modeling technique that helps designers create useful and accurate conceptual models. Much research has attempted to apply Natural Language Processing (NLP) to extract knowledge from requirement specifications. Heuristics based rules is used to parse the specifications. This approach pays particular attention to natural language processing techniques such as tokenization, tagging POS, chunking and parsing based on syntax heuristics rules. Parsing results would be words with its Part Of Speech (POS); this result fed into ER generator to identify suitable data modeling elements according to heuristics based rules. This approach gives the Database designer an overview of the output of natural language processing. Moreover, provides designer with detailed modeling information that help them during database design. As future work, extend using of NLP to have semantics analysis rather than structural analysis to infer new things such as composite attributes, cardinalities, weak attributes etc.. . In addition, raise the using of artificial intelligent techniques (AI) such as support vector machine (SVM) and neural network for better understand of requirement specifications. Eman Btoush , (emanbtoush26@gmail.com) received the B.S. in computer science From Mutah University, Mutah, Jordan, in 2002. She is currently a M.S. student of computer science Department. Mutah University, Jordan. Mustafa Hammad , is an Assistant Professor at Information Technology department in Mu’tah University, Al Karak - Jordan. He received his PhD. in computer science from New Mexico State University, USA in 2010. He received his Masters degree in computer science from Al-Balqa Applied University, Jordan in 2005 and his B.Sc. in computer science from The Hashemite University, Jordan in 2002. His research interest is Software Engineering with focus on static and dynamic analysis and software ...

Similar publications

Article
Full-text available
Motivated by the need for flexible, intuitive, reusable, and normalized terminology for guiding search and building ontologies, we present a general approach for generating sets of such terminologies from natural language documents. The terms that this approach generates are root- and rule-based terms, generated by a series of rules designed to be...

Citations

... Therefore, ERD design is very important in system design because it can describe the data requirements of a system [12], [13]. Some of the components in this diagram that must be considered are (1) entities, which are part of the ERD that describe an object in the database; (2) attributes, which are characteristics possessed by entities; (3) relationships are parts that describe the relationship between entities [14], [15]. Based on Figure 6, several data must be recorded in the 090007-5 student professional education monitoring system, namely (a) lecturer data, (b) supervisor data, (c) student data, (d) announcement data, (e) logbook data, (f) SOAP data, (h) case data, and (i) report data. ...
... Eman S. Btoush and Mustafa M. Hammad come to the conclusion that natural language documents are the best source of information for creating the ER data model because they can be used to extract ER elements from natural language requirements using Natural Language Processing (NLP). This method demonstrates how to extract composite attributes, cardinalities, weak attributes, etc. using natural language processing methods including tokenization, tagging POS, chunking, and parsing based on syntactic heuristic rules (Btoush & Hammad, 2015). Kurtanovi tested an F/NFR binary supervised classifier that classifies requirements as functional (FR) or nonfunctional (NFR) automatically (NFR). ...
Chapter
Full-text available
Artificial intelligence has the capacity to store, retrieve, and discover patterns in order to better inform action and process data. Writing quality requirements is crucial since the majority of software projects entail rework and faults discovered during requirement collecting. As a result, bad requirement collecting led to unsuccessful software projects. NLP is a term for the technology that allows computers to understand, analyze, and interpret human language. The approach described in this review paper for converting software requirements expressed in plain language to formal specifications. In software engineering, the classification of functional and nonfunctional requirements has become essential. Discovering and classifying functional (F), nonfunctional (NF), and their subclasses of requirements are the goals of this work. The purpose of this paper is to help people produce application software, such as software designers, testers, etc. Faster software development and delivery are other benefits of this review paper. This paper identifies and addresses the difficulties requirements engineering faces while developing complicated AI systems.
... Eman S. Btoush and Mustafa M. Hammad, concludes that to extract ER elements from natural language specications using Natural Language Processing (NLP).The source of knowledge for generating ER data model provides natural language documents. This approach proves that natural language processing techniques such as tokenization, tagging POS, chunking and parsing based on syntax heuristics rules used for to extract composite attributes, cardinalities, weak attributes etc. [8]. The use of natural language and ambiguity detection and mistake identication in requirements elicitation methods such as interviews and requirements documents, as demonstrated by Habib, is effective [9]. ...
Article
In application development software, we need to model the data which is stored in a database. ER diagrams shows what type of data to be stored in the table or database. ER diagram can express the logical structure of the database graphically. The graphical representation of a system will be more comprehensible and easy to understand. The purpose of this paper is to identify entities and their attributes form user requirements using NLP and to combine requirement gathering technics and create tool for elicit entity and their attributes. In this study, we use NLP and its components for collecting entities and their attributes. The main aim of this paper is that to nd out entities and their attributes in order to generate diagrams such as ERDs. The purpose of this paper is to assist the business analyst and stakeholder to avoiding unnecessary work and complexity during the requirement analysis phase.
... used a slightly different approach, transforming NL into Object Constraint Language, a formal specification language used to define constraints for UML modelling. The method proposed by Btoush and Hammad (2015) uses a combined approach of Natural Language Processing (NLP) and rules to transform NL into entity-relationship diagrams. The extracted part-of-speech (POS) results are mapped onto entities, attributes, and relations to produce ER diagrams with a rule set. ...
Article
Full-text available
An interesting challenge in software requirements engineering is converting textual requirements to Business Process Model and Notation (BPMN) diagrams. In this study, the BPMN diagram is used as an intermediate representation before measuring the functional software size from Natural Language (NL) input. The methods currently used for converting NL input to BPMN diagrams are not able to generate complete BPMN diagrams, nor can they handle complex and compound-complex sentences in the NL input. This study proposes conversion from textual requirements to a BPMN diagram for improving the weaknesses of existing methods. The proposed method has two stages: 1) analyzing the textual requirements using natural language processing and 2) generating the BPMN diagram. The output of the first stage is fact types as the basis for generating the BPMN diagram in the second phase. The BPMN diagram is generated using a set of informal mapping rules that were created in this study. The proposed method was applied to ten textual requirements of an enterprise application, which involved simple, compound, complex, and compound-complex sentences. The experiments resulted in a suitable BPMN diagram with higher accuracy than obtained by other methods.
... 2) Technologies Used: We have manually extracted the technologies used from each of the papers from our primary studies. [11], [14], [20], [45] comparison [8], [54]- [56] code [11], [17], [35] use case [37], [57] test case [35], [57] processed srs [58], [59] activity diagram [7], [60] meta model [57], [61] Other Outputs: traceability [44]; graph [62]; sbvr [63]; er-diagram [64]; processed named entity [65]; b-spec [66]; owl class [67]; proposal [68]; collaboration diagram [13]; test cases [57]; natural language [69]; feature diagram [70]; gui prototype [71] ...
... Studies nlp [5], [8], [11], [12], [14], [15], [17]- [19], [25]- [31], [33], [36], [40], [42]- [47], [49], [51]- [53], [56]- [58], [60], [62], [63], [65], [68], [71] rule [1], [15], [17], [21], [23], [24], [29]- [31], [33], [44], [49] pos tagger [1], [5], [6], [15], [19], [25]- [28], [40]- [42], [53], [64] parse [1], [13], [21], [28], [41], [42], [45], [46], [54], [64], [68] stanford corenlp [1], [7], [11], [13]- [15], [17], [43], [45], [54], [62], [68] ontology [1], [5], [12], [18], [27]- [29], [42], [65] parser [13], [21], [28], [42], [45], [46], [54], [68] wordnet [5], [11]- [13], [27], [28], [54], [59] tree [34], [41], [42], [54], [64], [69] gui [5], [18], [25], [47], [52], [59] open nlp [12], [25], [27], [28], [68] dependency [16], [22], [34], [45] sbvr [15], [15], [24], [24] graph [22], [34], [62] ml [20], [36], [70] ocl [20], [34], [57] traceability [25], [26] Other Technologies: brill [5]; treetagger [42]; bayes [40]; featureide [70]; gkps [16]; javarap [13]; reasoning [19]; sharpnlp [52]; spacy [53]; spider [5]; verbnet [59]; semnet [38]; lolita [38]; rtool [37]; POS parser, and Stanford's parser were most commonly used for Parts of Speech (POS) tagging. WordNet was most common for resolving ambiguity, but WSD and VerbNet were also used. ...
... Studies nlp [5], [8], [11], [12], [14], [15], [17]- [19], [25]- [31], [33], [36], [40], [42]- [47], [49], [51]- [53], [56]- [58], [60], [62], [63], [65], [68], [71] rule [1], [15], [17], [21], [23], [24], [29]- [31], [33], [44], [49] pos tagger [1], [5], [6], [15], [19], [25]- [28], [40]- [42], [53], [64] parse [1], [13], [21], [28], [41], [42], [45], [46], [54], [64], [68] stanford corenlp [1], [7], [11], [13]- [15], [17], [43], [45], [54], [62], [68] ontology [1], [5], [12], [18], [27]- [29], [42], [65] parser [13], [21], [28], [42], [45], [46], [54], [68] wordnet [5], [11]- [13], [27], [28], [54], [59] tree [34], [41], [42], [54], [64], [69] gui [5], [18], [25], [47], [52], [59] open nlp [12], [25], [27], [28], [68] dependency [16], [22], [34], [45] sbvr [15], [15], [24], [24] graph [22], [34], [62] ml [20], [36], [70] ocl [20], [34], [57] traceability [25], [26] Other Technologies: brill [5]; treetagger [42]; bayes [40]; featureide [70]; gkps [16]; javarap [13]; reasoning [19]; sharpnlp [52]; spacy [53]; spider [5]; verbnet [59]; semnet [38]; lolita [38]; rtool [37]; POS parser, and Stanford's parser were most commonly used for Parts of Speech (POS) tagging. WordNet was most common for resolving ambiguity, but WSD and VerbNet were also used. ...
Preprint
Context: Processing Software Requirement Specifications (SRS) manually takes a much longer time for requirement analysts in software engineering. Researchers have been working on making an automatic approach to ease this task. Most of the existing approaches require some intervention from an analyst or are challenging to use. Some automatic and semi-automatic approaches were developed based on heuristic rules or machine learning algorithms. However, there are various constraints to the existing approaches of UML generation, such as restriction on ambiguity, length or structure, anaphora, incompleteness, atomicity of input text, requirements of domain ontology, etc. Objective: This study aims to better understand the effectiveness of existing systems and provide a conceptual framework with further improvement guidelines. Method: We performed a systematic literature review (SLR). We conducted our study selection into two phases and selected 70 papers. We conducted quantitative and qualitative analyses by manually extracting information, cross-checking, and validating our findings. Result: We described the existing approaches and revealed the issues observed in these works. We identified and clustered both the limitations and benefits of selected articles. Conclusion: This research upholds the necessity of a common dataset and evaluation framework to extend the research consistently. It also describes the significance of natural language processing obstacles researchers face. In addition, it creates a path forward for future research.
... Thus, we analysed the related works from 1999 and forward. Some of them use translation rules based on lexical string patterns, such as ER generator [14], CABSYDD [15], ER-Converter [16], DBDT [17], ERD generator [18] and ER generator using NLP [19]. Some of the tools provide graphical solution support, such as ABCM [20] and KERMIT [21]. ...
Article
Full-text available
Automated creation of a conceptual data model based on user requirements expressed in the textual form of a natural language is a challenging research area. The complexity of natural language requires deep insight into the semantics buried in words, expressions, and string patterns. For the purpose of natural language processing, we created a corpus of business descriptions and an adherent lexicon containing all the words in the corpus. Thus, it was possible to define rules for the automatic translation of business descriptions into the entity–relationship (ER) data model. However, since the translation rules could not always lead to accurate translations, we created an additional classification process layer—a classifier which assigns to each input sentence some of the defined ER method classes. The classifier represents a formalized knowledge of the four data modelling experts. This rule-based classification process is based on the extraction of ER information from a given sentence. After the detailed description, the classification process itself was evaluated and tested using the standard multiclass performance measures: recall, precision and accuracy. The accuracy in the learning phase was 96.77% and in the testing phase 95.79%.
... Saat ini sudah ada sistem yang dapat membangkitkan ERD dari spesifikasi kebutuhan dengan menerapkan Natural Language Processing (NLP) yang dapat mempermudah pemodelan ERD bagi system analyst, database administrator, dan tim pengembang perangkat lunak lainya karena pembangkit ERD dapat menggambarkan ERD berdasarkan spesifikasi kebutuhan [2]. Penelitian [3] menghasilkan ERD dari spesifikasi kebutuhan menggunakan NLP pada Bahasa Inggris pelabelan kelas kata menggunakan Part of Speech (POS) Tagging, chunking dan parsing. Penelitian [2] menghasilkan ERD dari spesifikasi pada Bahasa Inggris pada satu domain dengan melakukan tahapan pembangkitan ERD menggunakan NLP dengan tahapan text preprocessing, POS tagging, dan SVM classifier. ...
... Terakhir dilakukan tokenization untuk membagi kalimat menjadi potongan-potongan kecil atau yang disebut dengan kata atau token. Tokenization juga akan mengidentifikasi kata dan angka dalam setiap kalimat [3]. Dalam tokenization diperlukan pemisah kata agar tidak menimbulkan disambiguasi karakter tanda baca yang biasanya menggunakan spasi. ...
Article
Full-text available
Memodelkan Entity Relationship Diagram (ERD) dapat dilakukan secara manual, namun umumnya memperoleh pemodelan ERD secara manual membutuhkan waktu yang lama. Maka, dibutuhkan pembangkit ERD dari spesifikasi kebutuhan untuk mempermudah dalam melakukan pemodelan ERD. Penelitian ini bertujuan untuk mengembangkan sebuah sistem pembangkit ERD dari spesifikasi kebutuhan dalam Bahasa Indonesia dengan menerapkan beberapa tahapan-tahapan dari Natural Language Processing (NLP) sesuai kebutuhan penelitian. Spesifikasi kebutuhan yang digunakan tim peneliti menggunakan teknik document analysis. Untuk tahapan-tahapan dari NLP yang digunakan oleh peneliti yaitu: case folding, sentence segmentation, tokenization, POS tagging, chunking dan parsing. Kemudian peneliti melakukan identifikasi terhadap kata-kata dari teks yang sudah diproses pada tahapan-tahapan dari NLP dengan metode rule-based untuk menemukan daftar kata-kata yang memenuhi dalam komponen ERD seperti: entitas, atribut, primary key dan relasi. ERD kemudian digambarkan menggunakan Graphviz berdasarkan komponen ERD yang telah diperoleh Evaluasi hasil ERD yang berhasil dibangkitkan kemudian di evaluasi menggunakan metode evaluasi expert judgement. Dari hasil evaluasi berdasarkan beberapa studi kasus diperoleh hasil rata-rata precision, recall, F1 score berturut-turut dari tiap ahli yaitu: pada ahli 1 diperoleh 91%, 90%, 90%; pada ahli 2 diperoleh 90%, 90%, 90%; pada ahli 3 diperoleh 98%, 94%, 96%; pada ahli 4 diperoleh 93%, 93%, 93%; dan pada ahli 5 diperoleh 98%, 83%, 90%.
... The tool automatically generates a relational database schema by converting the ER diagram. Further, in [4], E. S. Btoush and M. M. Hammad proposed a method to generate ER diagrams from requirement specification based on natural language processing techniques. This approach provides an opening of using natural language documents as a source of knowledge for generating ER data models. ...
... To store the details of on the house of representatives. (4,11,3) Museum ...
Article
Aims: Database creation is the most critical component of the design and implementation of any software application. Generally, the process of creating the database from the requirement specification of a software application is believed to be extremely hard. This study presents a method to automatically generate database scripts from a given scenario description of the requirement specification. Study Design: The method is developed based on a set of natural language processing (NLP) techniques and a few algorithms. Standard database scenario descriptions presented in popular textbooks on Database Design are used for the validation of the method. Place and Duration of Study: Department of Statistics and Computer Science, Faculty of Science, University of Peradeniya, Sri Lanka, Between December 2019 to December 2020. Methodology: The description of the problem scenario is processed using NLP operations such as tokenization, complex word handling, basic group handling, complex phrase handling, structure merging, and template construction to extract the necessary information required for the entity relational model. New algorithms are proposed to automatically convert the entity relational model to the logical schema and finally to the database script. The system can generate scripts for relational databases (RDB), object relational databases (ORDB) and Not Only SQL (NoSQL) databases. The proposed method is integrated into a web application where the users can type the scenario in natural or free text. The user can select the type of database (i.e., one of RDB, ORDB, NoSQL) considered in their system and accordingly the application generates the SQL scripts. Results: The proposed method was evaluated using 10 scenario descriptions connected to 10 different domains such as company, university, airport, etc. for all three types of databases. The method performed with impressive accuracies of 82.5%, 84.0% and 83.5% for RDB, ORDB and NoSQL scripts, respectively. Conclusion: This study is mainly focused on the automatic generation of SQL scripts from scenario descriptions of the requirement specification of a software system. Overall, the developed method helps to speed up the database development process. Further, the developed web application provides a learning environment for people who are novices in database technology.
... Database Management Systems (DBMS) on the other hand are specifically developed computer management systems that communicate with users, other systems and create database application as a data store and data access tool includes a strong database system (Amran et al., 2019). It also lets designers create practical and realistic concept models (Btoush & Hammad, 2015). While Musa, Idowu & Zemba (2014) stated in addition to the basic functionality of the database system, the ability of the GIS database to store, view, download and display spatial data (i.e. ...
... Chronologically, the ER Model is an original idea from Chen (1976) cited by Bagui and Earp (2003), which only introduced the model to a combination of three (3) basic components namely entities, attributes, and relationships. According to Btoush & Hammad (2015), ER models are used to manage and track databases of the system. Entity Relationship Diagram is a common diagram that helps to show the database structure in a visualized format (Almasree, 2015). ...
... The extended ER diagram is presented in [8], by introducing some new concepts on generalization and abstraction. Developing ER diagram from requirements is very often the first step for designing a database system, which is an important step of Software Development Life Cycle (SDLC) [6,9,10]. ...
... Manually extraction of the conceptual models is tedious, time consuming and error-prone task [10]. Automated transformation helps to maintain the traceability of the requirements [12]. ...
... Btoush and Hammad [10] presented an approach to extract the ER model. The first step is sentence segmentation. ...
Article
Context Extracting conceptual models, e.g., entity relationship model or Business Process model, from software requirement document is an essential task in the software development life cycle. Business process model presents a clear picture of required system's functionality. Operations in business process model together with the data entity consumed, help the software developers to understand the database design and operations to be implemented. Researchers have been aiming at automatic extraction of these artefacts from the requirement document. Objective In this paper, we present an automated approach to extract the entity relationship and business process models from requirements, which are possibly in different formats such as general requirements, use case specification and user stories. Our approach is based on the efficient natural language processing techniques. Method It is an iterative approach of Models Extraction from the Requirements (iMER). iMER has multiple iterations where each iteration is to address a sub-problem. In the first iteration, iMER extracts the data entities and attributes. Second iteration is to find the relationships between data entities, while extracting cardinalities is in the third step. Business process model is generated in the fourth iteration, containing the external (actors’) and internal (system's) operations. Evaluation To evaluate the performance and accuracy of iMER, experiments are conducted on various formats of the requirement documents. Additionally, we have also evaluated our approaches using the requirement documents which been modified by shuffling the sentences and by merging with other requirements. Comparative study is also performed. The preliminary results show a noticeable improvement. Conclusion The iMER is an efficient automated iterative approach that is able to extract the conceptual models from the various formats of requirements.