Table 2 - uploaded by Tanik Saikh
Content may be subject to copyright.
Different sets of features and the corresponding models.

Different sets of features and the corresponding models.

Source publication
Conference Paper
Full-text available
In this paper we propose a novel approach to determine Textual Entailment (TE) relation between a pair of text expressions. Different machine translation along with summary evaluation metrics and polarity feature have been used as features for different machine learning classifiers to take the entailment decision in this study. We consider three ma...

Context in source publication

Context 1
... use the im- plementation as available in Weka toolkit 3 . The classifiers are trained with the features as discussed earlier and summarized in Table 2. The classifier assigns a prediction class to each T-H pair in test dataset of unknown class. ...

Citations

... Methods such as BLEU, ROUGE, METEOR, and CIDEr are mainly used to analyze the performance of generative models; these methods evaluate the model by comparing the similarity between texts [67]. It is widely used in machine translation, question generation and other fields [68,69]. The model proposed in this study is a discriminative model, so the precision, recall, and F1 value are often used as indicators to evaluate model performance [46,70]: ...
Article
Safety hazards are a key consideration in construction management. The efficient recognition of safety hazard information can help managers formulate safety hazard management measures and improve the efficiency of construction safety management. However, construction site safety hazard data are stored in semistructured and unstructured text formats, which cannot be directly converted into understandable and usable information. Moreover, safety hazard text contains many fuzzy expressions, thereby increasing the difficulty of text semantic analysis; thus, how to accurately mine safety hazard information from complex and diverse text data is an urgent problem that must be solved. In consideration of this problem, we propose a bidirectional long short-term memory (BiLSTM) method with a fuzzy word vector and self-attention mechanism (FSABiLSTM) to automatically recognize safety hazard information. This method adopts TextRank and Word2vec to calculate the fuzzy word vector and process fuzzy expressions in safety hazard text. The safety hazard text semantic features are deeply extracted based on BiLSTM and a fuzzy word vector, and the extracted semantic features are analyzed via a self-attention mechanism. Actual construction safety hazard text is used to verify the reliability and applicability of the method, and the results indicate that the accuracy of this method, which outperforms existing machine learning methods, is 91.70%. In addition, the FSABiLSTM method can be used to automatically evaluate the risk degree of safety hazards; this use is beneficial to managing and controlling safety hazards. Concerning safety hazard text data, this study provides a new deep mining approach that can enhance safety management efficiency.
... This can be done through rating individual responses (Radziwill and Benton, 2017;Christensen et al., 2018;Sordoni et al., 2015), scoring aggregated qualities across the multi-turn exchange, or through ranking overall conversational experiences (Deriu et al., 2020;Li et al., 2019;Shieber, 1994). Alternatively, text overlap based metrics such as BLEU, METEOR and ROUGE (Sordoni et al., 2015;Saikh et al., 2018) and F-score metrics Cuayáhuitl et al., 2019) have also been proposed to assess chatbots. Other assessment methods (Yang et al., 2022) include sen-tence perplexity (Dhyani and Kumar, 2021;John et al., 2017;Higashinaka et al., 2014), entities per exchange (Finch and Choi, 2020), number of questions raised, specificity (Li et al., 2016), turns per conversation (Shum et al., 2018), inconsistency detection, and relevance to history. ...
Preprint
Full-text available
The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
... The tasks of [33,34] posed machine learning based approaches using conventional similarity metrics (cosine similarity, Jaccard, Dice, etc), along with machine translation (MT) evaluation metrics(BLEU [35] and METEOR [36] [37]) as features on RTE-1 to RTE-3 datasets and Indian Languages(Tamil, Telugu, Hindi, and Punjabi) respectively. The work defined in [38] made use of three MT evaluation metrics (BLEU, ME-TEOR, and TER [39]) and one summary evaluation metric (ROUGE [40]) along with polarity (negation) feature on RTE-1, RTE-2, RTE-3, RTE-4, and RTE-5 and obtained a reasonable output as compared to the best performing results in those tracks. The message was there is a correlation between MT evaluation and TE. ...
Chapter
In this paper, we describe a hybrid approach for Recognizing Textual Entailment (RTE) that makes use of dependency parsing and semantic similarity measures. Dependency triplet matching is performed between dependency parsed Text (T) and Hypothesis (H). In case of dependency relation match, we also consider partial matching and semantic similarity between the associated words is calculated with the help of various semantic similarity measures. Importance of various dependency relations with respect to the TE task is computed in terms of their information gain and the dependency relations are weighted accordingly. This paper reports our experiments carried out on the RTE-1, RTE-2 and RTE-3 benchmark datasets using three approaches namely greedy approach, exhaustive search and greedy approach with weighted dependency relations. Experimental results show that weighted dependency relations significantly improve TE performance over the baseline.
... However, given the fact that these metrics can be more easily used, they are still widely implemented to evaluate chatbots. The evaluation metrics used to measure accuracy will be standard evaluation metrics used for Machine Translation and other Natural Language Processing tasks such as BLEU, METEOR and TER, as they have been used by [33,65]. Although these evaluation metrics are considered to be more suitable for Machine Translation problems, they can still provide valuable information regarding the Textual Entailment of the chatbot output [65]. ...
... The evaluation metrics used to measure accuracy will be standard evaluation metrics used for Machine Translation and other Natural Language Processing tasks such as BLEU, METEOR and TER, as they have been used by [33,65]. Although these evaluation metrics are considered to be more suitable for Machine Translation problems, they can still provide valuable information regarding the Textual Entailment of the chatbot output [65]. ...
... Simply said, the BLEU metric counts the number of words that overlap in a translation when compared to a reference translation, giving sequential words a higher score (KantanMT-Cloud-based Machine Translation Platform). [33,38,65,70] are some of authors that used BLEU scores to evaluate chatbots and other NLP tasks. However, BLEU does present some issues. ...
Article
Full-text available
Chatbots are intelligent conversational computer systems designed to mimic human conversation to enable automated online guidance and support. The increased benefits of chatbots led to their wide adoption by many industries in order to provide virtual assistance to customers. Chatbots utilize methods and algorithms from two Artificial Intelligence domains: Natural Language Processing and Machine Learning. However, there are many challenges and limitations in their application. In this survey we review recent advances on chatbots, where Artificial Intelligence and Natural Language processing are used. We highlight the main challenges and limitations of current work and make recommendations for future research investigation.
... However, given the fact that these metrics can be more easily used, they are still widely implemented to evaluate chatbots. The evaluation metrics used to measure accuracy will be standard evaluation metrics used for Machine Translation and other Natural Language Processing tasks such as BLEU, METEOR and TER, as they have been used by [33,65]. Although these evaluation metrics are considered to be more suitable for Machine Translation problems, they can still provide valuable information regarding the Textual Entailment of the chatbot output [65]. ...
... The evaluation metrics used to measure accuracy will be standard evaluation metrics used for Machine Translation and other Natural Language Processing tasks such as BLEU, METEOR and TER, as they have been used by [33,65]. Although these evaluation metrics are considered to be more suitable for Machine Translation problems, they can still provide valuable information regarding the Textual Entailment of the chatbot output [65]. ...
... Refs. [33,38,65,70] are some of authors that used BLEU scores to evaluate chatbots and other NLP tasks. However, BLEU does present some issues. ...
Preprint
Full-text available
Chatbots are intelligent conversational computer systems designed to mimic human conversation to enable automated online guidance and support. The increased benefits of chatbots led to their wide adoption by many industries in order to provide virtual assistance to customers. Chatbots utilise methods and algorithms from two Artificial Intelligence domains: Natural Language Processing and Machine Learning. However, there are many challenges and limitations in their application. In this survey we review recent advances on chatbots, where Artificial Intelligence and Natural Language processing are used. We highlight the main challenges and limitations of current work and make recommendations for future research investigation
... TE has been extensively studied before the pre-deep learning era [4,5,12,24,23]. With the recent advancement of deep learning techniques, researchers have also started to explore these techniques for TE [1,17,2]. However, as already mentioned, the CLTE [3] is of very recent interest to the community. ...
Conference Paper
Recognizing Textual Entailment (RTE) between two pieces of texts is a very crucial problem in Natural Language Processing (NLP), and it adds further challenges when involving two different languages, i.e. in cross-lingual scenario. The paucity of a large volume of datasets for this problem has become the key bottleneck of nourishing research in this line. In this paper, we provide a deep neural framework for cross-lingual textual entailment involving English and Hindi. As there are no large dataset available for this task, we first create this by translating the premises and hypotheses pairs of Stanford Natural Language Inference (SNLI) (https://nlp.stanford.edu/projects/snli/) dataset into Hindi. We develop a Bidirectional Encoder Representations for Transformers (BERT) based baseline on this newly created dataset. We perform experiments in both mono-lingual and cross-lingual settings. For the mono-lingual setting, we obtain the accuracy scores of 83% and 72% for English and Hindi languages, respectively. In the cross-lingual setting, we obtain the accuracy scores of 69% and 72% for English-Hindi and Hindi-English language pairs, respectively. We hope this dataset can serve as valuable resource for research and evaluation of Cross Lingual Textual Entailment (CLTE) models.