Types of Information on Construction Tender Documents; a) Unstructured Information , b) Semi Structured Information, and c) Structured Information

Source publication

Ontological-Based Information Extraction of Construction Tender Documents

Conference Paper

Full-text available

Jan 2011

Extracting potentially relevant information either from unstructured, semi structured or structured information on construction tender documents is paramount with respect to improve decision-making processes in tender evaluation. However, various forms of information on tender documents make the information extraction process non trivial. Manually...

Figure 1. Basic scattering types (a)-single-bounce, (b)-double-bounce,...

Figure 2. (a) Trihedral, (b) co-pol and (c) cross-pol polarimetric...

Figure 7. Average execution times of the SA procedure using CPU and GPU...

Figure 8. Results of the proposed SA-based and Arii decomposition: (a)...

Results of SA-based decomposition for synthetic polarimetric signatures...

Information Extraction from Satellite-Based Polarimetric SAR Data Using Simulated Annealing and SIRT Methods and GPU Processing

Article

Full-text available

Dec 2021

The main goal of this research was to propose a new method of polarimetric SAR data decomposition that will extract additional polarimetric information from the Synthetic Aperture Radar (SAR) images compared to other existing decomposition methods. Most of the current decomposition methods are based on scattering, covariance or coherence matrices d...

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction

Conference Paper

Full-text available

Jan 2023

Information Extraction of Domain-Specific Business Documents with Limited Data

Conference Paper

Full-text available

Jun 2021

Semantic Frame Parsing for Information Extraction : the CALOR corpus

Conference Paper

Full-text available

May 2018

Understanding and Improving Drilled-Down Information Extraction from Online Data Visualizations for Screen-Reader Users

Conference Paper

Full-text available

Apr 2023

Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning

Article

Full-text available

May 2023

With the recent growth of the Internet, the volume of data has also increased. In particular, the increase in the amount of unstructured data makes it difficult to manage data. Classification is also needed in order to be able to use the data for various purposes. Since it is difficult to manually classify the ever-increasing volume data for the purpose of various types of analysis and evaluation, automatic classification methods are needed. In addition, the performance of imbalanced and multi-class classification is a challenging task. As the number of classes increases, so does the number of decision boundaries a learning algorithm has to solve. Therefore, in this paper, an improvement model is proposed using WordNet lexical ontology and BERT to perform deeper learning on the features of text, thereby improving the classification effect of the model. It was observed that classification success increased when using WordNet 11 general lexicographer files based on synthesis sets, syntactic categories, and logical groupings. WordNet was used for feature dimension reduction. In experimental studies, word embedding methods were used without dimension reduction. Afterwards, Random Forest (RF), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) algorithms were employed to perform classification. These studies were then repeated with dimension reduction performed by WordNet. In addition to the machine learning model, experiments were also conducted with the pretrained BERT model with and without WordNet. The experimental results showed that, on an unstructured, seven-class, imbalanced dataset, the highest accuracy value of 93.77% was obtained when using our proposed model.

Tools for Internet Competitive Intelligence Based on Ontology

Conference Paper

Oct 2018

Viacheslav Lanin

Pre-Bid Clarification for Construction Project Risk Identification Using Unstructured Text Data Analysis

Conference Paper

Jul 2017

Automatic Conversion of Table Contents from PDF Technical Specification Documents into Database Using AI Optical Character Recognition (OCR)

Chapter

Sep 2022

This study aimed to develop a technology to recognize and warehouse text data from a table containing technical documents relating to construction and plant projects. To this end, a table optical character recognition (OCR) technology was proposed to tag text data by recognizing the structure of the table and the context of the content in the table. For analysis, the table format was first classified into two patterns: T1 and T2. T1 refers to a table with only one step of the header, and T2 refers to a table with two phases of the header. The table OCR model extracts text in cell units of the table using the OpenCV engine after extracting data from the headers. A training model improves text recognition rate through a long short-term memory (LSTM)-based Tesseract OCR engine. Extracted data from the table were stored in the DB and output in CSV format. The confusion matrix was applied to verify the recognition accuracy of the extracted data, and as a result of the verification, the F-measure value of T1 was 96%, and T2 was 87%. Therefore, from the outcome of this study, it is expected that the automated management of tasks that hitherto relied solely on the engineer’s experience will subsequently contribute to reducing the workload and improving the productivity of the engineer in charge.KeywordsAutomatic conversion of table contentsTable OCROpenCVTesseract OCR

Development of Automatic-Extraction Model of Poisonous Clauses in International Construction Contracts Using Rule-Based NLP

Article

May 2019
J COMPUT CIVIL ENG

As construction projects have significantly increased in size and become more complicated, the number of claims and dispute cases between participating parties during the construction work have been continuously increasing. To prevent such claims and disputes, the participants need to be assured of their contractual positions and rights based on contract facts. For this reason, the process of writing and reviewing the contracts for construction work is crucial. Most international construction projects require contract management teams to review all the possible risks in the contracts during the bidding periods. However, it is very difficult to review a vast number of contracts in a short period of time. Therefore, in this study, we proposed an automatic model of contract-risk extraction based on natural language processing (NLP) that can automatically detect the poisonous clauses of the contract in order to support contract management for construction companies (contractors). In validating the performance of the automatic model developed in this study, we found that the precision and recall were both 81.8% compared with manual review. This study is meaningful since a model has been developed that can carry out a preemptive contract-risk review.

Text Document Annotation Methods: Stat of Art

Conference Paper

Full-text available

Dec 2015

A huge number of documents are available that ma y cause information overload p roblems. Text document annotation p rovides a solution to such t yp e of p roblems. Text annotation is the p rocess of attaching comments, notes, or ex p lanations to text documents.

Types of Information on Construction Tender Documents; a) Unstructured Information , b) Semi Structured Information, and c) Structured Information

Similar publications

Citations