Conference Paper

Comparing the Minimum Description Length Principle and Boosting in the Automatic Analysis of Discourse.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... When labeling sentences, coders were instructed not to label them with abstract discourse relations such as logical, sequence and elaboration, but to choose from a list of pre-determined connective expressions. 11 We expected that the coder would be able to identify a discourse relation with more confidence when working with explicit cues than with abstract concepts of discourse relations. Moreover, since 93% of sentences considered for labeling in the corpus did not contain any of pre-determined relation cues, the annotation task was in effect one of guessing a possible connective cue that may go with a sentence. ...
... Further, we provided an emacs-based software aid which helps the coder with tagging and also is capable of prohibiting the coder from making moves inconsistent with the coding instruction. 12 11 See Table 4-2 for examples (given in the rightmost column). The connectives here are all among those given in Ichikawa (1990). ...
Article
Full-text available
http://library.naist.jp/mylimedio/dllimedio/show.cgi?bookid=83858 博士 (Doctor) 工学 (Engineering) 博第451号 乙第451号
... MDL ranks, along with Akaike Information Criterion (AIC) and Bayes Information Criterion (BIC), as a standard criterion in machine learning and statistics for choosing among possible (statistical) models. As shown empirically in Nomoto and Matsumoto (2000) for discourse domain, pruning DT with MDL significantly reduces the size of tree, while not compromising performance. ...
Conference Paper
Full-text available
The paper proposes and empirically motivates an integration of supervised learning with unsupervised learning to deal with human biases in summarization. In particular, we explore the use of probabilistic decision tree within the clustering framework to account for the variation as well as regularity in human created summaries. The corpus of human created extracts is created from a newspaper corpus and used as a test set. We build probabilistic decision trees of different flavors and integrate each of them with the clustering framework. Experiments with the corpus demonstrate that the mixture of the two paradigms generally gives a significant boost in performance compared to cases where either of the two is considered alone.
Article
The paper introduces a novel approach to unsupervised text summarization, which in principle should work for any domain or genre. The novelty lies in exploiting the diversity of concepts in text for summarization, which has not received much attention in the summarization literature.We propose, in addition, what we call the information-centric approach to evaluation, where the quality of summaries is judged not in terms of how well they match human-created summaries but in terms of how well they represent their source documents in IR tasks such document retrieval and text categorization.To find the effectiveness of our approach under the proposed evaluation scheme, we set out to examine how a system with the diversity functionality performs against one without, using the test data known as BMIR-J2. The results demonstrate a clear superiority of the diversity-based approach to a non-diversity-based approach.The paper also addresses the question of how closely the diversity approach models human judgments on summarization. We have created a relatively large volume of data annotated for relevance to summarization by human subjects. We have trained a decision tree-based summarizer using the data, and examined how the diversity method compares with the supervised method in performance when tested on the data. It was found that the diversity approach performs as well as and in some cases superior to the supervised method.
Conference Paper
The paper presents a direct comparison of supervised and unsupervised approaches to text summarization. As a representative supervised method, we use the C4.5 decision tree algorithm, extended with the minimum description length principle (MDL), and compare it against several unsupervised methods. It is found that a particular unsupervised method based on an extension of the K-means clustering algorithm, performs equal to and in some cases superior to the decision tree based method
ResearchGate has not been able to resolve any references for this publication.