Context in source publication

Context 1
... compared these types across different proficiency levels. (Figure 5) We found that the most frequent errors were the underuse errors in all proficiency levels. The reason for this might be that Japanese language does not have an article system. ...

Similar publications

Article
Full-text available
Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: Eng...
Thesis
Full-text available
Semantic role labelling (SRL) is a task in natural language processing which detects and classifies the semantic arguments associated with the predicates of a sentence. It is an important step towards understanding the meaning of a natural language. There exists SRL systems for well-studied languages like English, Chinese or Japanese but there is n...

Citations

... Table 1. Existing spoken learner corpora of English Corpus L1 Corpus details SST [4] ...
Article
This paper describes the design and construction of PELECAN (Pronunciation Errors from Learners of English Corpus and Annotation). PELECAN is created primarily for collecting pronunciation errors from Thai learners of English in order to develop a more suitable pronunciation assessment tool for Thais. A 2-phase data collection process is used to balance between recording effort and the coverage of interested acoustic phenomena. The data collected from the first phase contains 1.5 hours of speech from 30 Thai learners reading 2 English passages that cover all English phones. Recorded speech was annotated with 2 types of error annotation: phonetic transcription of incorrect pronunciation and level of correctness of each phone. A contrastive list was used to guide the error analysis process. We found that many pronunciation errors are influenced by L1 (Thai), e.g. incorrect pronunciations of suffixes and the deletion of /l/ and /r/ in consonant clusters. However, there are some errors that may not be predictable from contrastive analysis alone such as the case of schwa. Hence, the data driven approach could help identify errors that may not be foreseen from only a linguistic point of view.
... We based our error annotation scheme on that used in the NICT JLE corpus (Izumi et al., 2003a), whose detailed description is readily available, for example, in Izumi et al. (2005). In that annotation scheme and accordingly in ours, errors are tagged using an XML syntax; an error is annotated by tagging a word or phrase that contains it. ...
Conference Paper
Full-text available
The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallow-parsed. This corpus is available for research and educational purposes on the web. In this paper, we describe it in detail together with its data-collection method and annotation schemes. Another contribution of this paper is that we take the first step toward evaluating the performance of existing POS-tagging/chunking techniques on learner corpora using the created corpus. These contributions will facilitate further research in related areas such as grammatical error detection and automated essay scoring.
... The common practise is, however, to use an editor which aids in computer-assisted insertion of tags. Some documented tools of this type are the Université Catholique de Louvain Error Editor (UCLEE) (Hutchinson 1996), the TagEditor (Izumi et al. 2003) and, more recently, a pilot editor under construction at the University of Jaén (Díaz-Negrillo & García-Cumbreras forthcoming). They include tag-associated error categories arranged on a menu-driven interface which the user can select and insert in the text as he/she revises the learner material, and decides on the nature of the error confronted. ...
Article
Full-text available
Learner corpora are used to investigate computerised learner language so as to gain insights into foreign language learning. One of the methodologies that can be applied to this type of research is computer-aided error analysis (CEA), which, in general terms, consists in the study of learner errors as contained in a learner corpus. Surveys of current learner corpora and of issues of learner corpus research have information on CEA research can be found, although usually limited. This article is centred on CEA research and is intended as a review of error tagging systems, including error categorizations, dimensions and levels of description. KEYWORDS. Second language acquisition, learner corpus research, computer-aided error analysis. RESUMEN. Los corpus de estudiantes se utilizan para la investigación de la lengua de estudiantes en formato electrónico con el fin de arrojar luz al proceso de adquisición de lenguas extranjeras. Una de las metodologías que se utilizan en este campo es el análisis informatizado de errores que, en términos generales, consiste en estudiar los errores recogidos en un corpus de estudiantes. Revisiones de los corpus de estudiantes existentes y de cuestiones relacionadas con el campo de la investigación en corpus de estudiantes han sido publicadas en los últimos años se proporciona información sobre la investigación en análisis informatizado de errores, aunque ésta es normalmente limitada. Este artículo se centra en el campo de análisis informatizado de errores y trata de proporcionar una revisión de los sistemas existentes de etiquetado de errores, sus categorizaciones, dimensiones y niveles de descripción. PALABRAS CLAVE. Adquisición de segundas lenguas, investigación en corpus de estudiantes, análisis informatizado de errores.
... In this paper we describe the mal-rules strategy and the types of mal-rules we encountered ( §2) and the implementation of aligned generation as a kind of best-first generation ( §3). We then evaluate the usefulness of the mal-rules strategy against the range and frequency of error types in an error-tagged learner corpus (SST: Izumi et al. (2003)) ( §4) and evaluate the effectiveness of the aligned-generation strategy based on a sample of the corpus. Finally, in §6, we consider how the error detection problem can be addressed using existing stochastic parse selection techniques without requiring training treebanks of learner English. ...
Article
Full-text available
We present a tutorial system for language learners, using a computational grammar augmented with mal-rules for analysis, error diagnosis, and semantics-centered generation of corrected forms.
Conference Paper
This paper proposes a method for detecting determiner errors, which are highly frequent in learner English. To augment conventional methods, the proposed method exploits a strong tendency displayed by learners in determiner usage, i.e., mistakenly omitting determiners most of the time. Its basic idea is simple and applicable to almost any conventional method. This paper combines this idea with countability prediction, which outperforms the conventional methods, achieving an F-measure of 0.613.
Conference Paper
A learner corpus is a computerized textual database of the language produced by foreign language learners. Annotated learner corpora contain invaluable meta-information about learners and the errors they make. With proper feature extractions and machine learning techniques, it is possible to extract implicit and explicit knowledge from learner corpora and develop useful applications to support effective foreign language teaching and learning, such as automatic proficiency level checking, error-driven and personalized learning etc. In this paper, we use a learner corpus and experiment with feature extraction and machine learning techniques to explore such applications. In particular, we reported our experimental results in automatic proficiency checking with ID3 and C4.5 decision tree algorithms, Bayesian Net and SVM. We also briefly outline other potential applications of learner corpora such as in error-driven learning by using implicit and explicit features along with machine learning.