Fig 4 - uploaded by Rosmayati Mohemad
Content may be subject to copyright.
Types of Information on Construction Tender Documents; a) Unstructured Information , b) Semi Structured Information, and c) Structured Information  

Types of Information on Construction Tender Documents; a) Unstructured Information , b) Semi Structured Information, and c) Structured Information  

Source publication
Conference Paper
Full-text available
Extracting potentially relevant information either from unstructured, semi structured or structured information on construction tender documents is paramount with respect to improve decision-making processes in tender evaluation. However, various forms of information on tender documents make the information extraction process non trivial. Manually...

Similar publications

Article
Full-text available
The main goal of this research was to propose a new method of polarimetric SAR data decomposition that will extract additional polarimetric information from the Synthetic Aperture Radar (SAR) images compared to other existing decomposition methods. Most of the current decomposition methods are based on scattering, covariance or coherence matrices d...

Citations

... In a study on this subject aiming to classify clauses, entire lines, and sentences in legal text analytics work on contracts, smaller datasets were used, and fewer classes were focused on [13,14]. In another study, a method for extracting specific entities related to market analysis was presented utilizing domain ontology [15]. ...
Article
Full-text available
With the recent growth of the Internet, the volume of data has also increased. In particular, the increase in the amount of unstructured data makes it difficult to manage data. Classification is also needed in order to be able to use the data for various purposes. Since it is difficult to manually classify the ever-increasing volume data for the purpose of various types of analysis and evaluation, automatic classification methods are needed. In addition, the performance of imbalanced and multi-class classification is a challenging task. As the number of classes increases, so does the number of decision boundaries a learning algorithm has to solve. Therefore, in this paper, an improvement model is proposed using WordNet lexical ontology and BERT to perform deeper learning on the features of text, thereby improving the classification effect of the model. It was observed that classification success increased when using WordNet 11 general lexicographer files based on synthesis sets, syntactic categories, and logical groupings. WordNet was used for feature dimension reduction. In experimental studies, word embedding methods were used without dimension reduction. Afterwards, Random Forest (RF), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) algorithms were employed to perform classification. These studies were then repeated with dimension reduction performed by WordNet. In addition to the machine learning model, experiments were also conducted with the pretrained BERT model with and without WordNet. The experimental results showed that, on an unstructured, seven-class, imbalanced dataset, the highest accuracy value of 93.77% was obtained when using our proposed model.
... In the article [10] authors proposed the ontology-based mechanism designed to extract information from construction tendering documents. The first step of the offered mechanism is tendering documents collection in the repository, then they are processed by the information extraction tool and transformed into machine-readable format and added to the knowledge base. ...
... The construction bidding process contains a large amount of document-based text information written in text (Mohemad et al. 2011). Moreover, since construction bid period is usually not enough to thoroughly review complicated documents, bidders are easy to fail to resolve many of the uncertainties and risks associated with the projects hidden in the bid documents within a limited time. ...
... However, it is not easy to review vast amounts of bid documents during short bidding period. Current approach for reviewing bid documents is impractical and time consuming when the evaluators need to identify, aggregate and synthesize salient information of these criteria manually (Mohemad et al. 2011). ...
Chapter
This study aimed to develop a technology to recognize and warehouse text data from a table containing technical documents relating to construction and plant projects. To this end, a table optical character recognition (OCR) technology was proposed to tag text data by recognizing the structure of the table and the context of the content in the table. For analysis, the table format was first classified into two patterns: T1 and T2. T1 refers to a table with only one step of the header, and T2 refers to a table with two phases of the header. The table OCR model extracts text in cell units of the table using the OpenCV engine after extracting data from the headers. A training model improves text recognition rate through a long short-term memory (LSTM)-based Tesseract OCR engine. Extracted data from the table were stored in the DB and output in CSV format. The confusion matrix was applied to verify the recognition accuracy of the extracted data, and as a result of the verification, the F-measure value of T1 was 96%, and T2 was 87%. Therefore, from the outcome of this study, it is expected that the automated management of tasks that hitherto relied solely on the engineer’s experience will subsequently contribute to reducing the workload and improving the productivity of the engineer in charge.KeywordsAutomatic conversion of table contentsTable OCROpenCVTesseract OCR
Article
As construction projects have significantly increased in size and become more complicated, the number of claims and dispute cases between participating parties during the construction work have been continuously increasing. To prevent such claims and disputes, the participants need to be assured of their contractual positions and rights based on contract facts. For this reason, the process of writing and reviewing the contracts for construction work is crucial. Most international construction projects require contract management teams to review all the possible risks in the contracts during the bidding periods. However, it is very difficult to review a vast number of contracts in a short period of time. Therefore, in this study, we proposed an automatic model of contract-risk extraction based on natural language processing (NLP) that can automatically detect the poisonous clauses of the contract in order to support contract management for construction companies (contractors). In validating the performance of the automatic model developed in this study, we found that the precision and recall were both 81.8% compared with manual review. This study is meaningful since a model has been developed that can carry out a preemptive contract-risk review.
Conference Paper
Full-text available
A huge number of documents are available that ma y cause information overload p roblems. Text document annotation p rovides a solution to such t yp e of p roblems. Text annotation is the p rocess of attaching comments, notes, or ex p lanations to text documents.