Article

Transcription and manual treatment of spontaneous speech for its automatic recognition

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Our researches are based upon the EPAC project. We develop this work context in our first chapter. Then, transcription task is presented. Some important dates and people are mentionned, as well as an inventory of available speech corpus. Also, assisted and manual transcription task are evaluated and compared. A comparative study of eight transcription tools is developed in the third chapter. It shows that depending on the transcription context (data size, type of annotations ... ), some are more useful than others. Encoding data is the next step of our work. Is it really easy to exchange some transcriptions? We will demonstrate that interoperability must be much more efficient than it currently is, in order to easily share transcribed data. At least, what we name spontaneous speech is precisely analysed. Thanks to several points of view, definitions and experiences, we try to get the precise meaning of this expression.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Spontaneous Spoken Language SSL présents many differences compared to thé written one. Thèse differences are observed both in terri of grammatical and exhzgrammatical phenomena like répétitions, self-corrections, false-starts, etc. Tris thesis addresses thé problem of parsing SSL in thé context of human- machine dialogue front two points of view: theory and application. Fust, a corpus study of spoken language extragmmmaticalities is donc and a linguistic formalism (Sm-TAG) is proposed. Then thé results of thé theoretical work are used in thé implémentation of thé systems Corrector, Oasis and Navigator. Evaluation of thèse sysacms following quantitative and qualitative methods is done.
Article
Full-text available
How is it possible to lean on old oral corpora to identify current changes in the morphosyntax of spoken French? What does the term “old” mean when applied to oral corpora? What kind of work is it possible to do out of them? This article examines some potential lexical and morphosyntactic changes which appeared recently. Two kinds of corpora are used: the Phonothèque corpora (registered between 1958 and 1970) and some contemporary spoken corpora.
Article
Full-text available
Résumé Je propose de conduire une réflexion globale sur les recherches pluridisciplinaires menées actuellement à l'interface de la prosodie et du discours. Pour ce faire, je montrerai quel type de dialogue a pu s'amorcer entre les communautés scientifiques au cours de ces dernières années autour de thématiques plurielles concernant à la fois la dimension structurelle, sémantico-pragmatique et intersubjective du message parlé. Une science étant cumulative par essence, je présenterai ensuite certains sous-bassements épistémologiques fondamentaux pour les recherches actuelles ; j'insisterai sur l'apport marquant de quelques figures pionnières qui ont su, dans un contexte linguistique de frilosité générale au regard du discours et de la prosodie, ouvrir des pistes de réflexion qui imprègnent les travaux d'aujourd'hui. Pour finir, j'entamerai une discussion critique sur les différents angles d'attaque envisagés pour appréhender les relations entre discours et prosodie, les méthodes et les points de vue que l'on peut observer, et par delà les spécificités des uns et des autres ou les axes de convergence, les points qui restent problématiques parce que contradictoires ou demandant des mises à plat terminologiques, méthodologiques et conceptuelles.
Article
Full-text available
Résumé – Abstract Cet article présente deux corpus francophones de dialogue oral (OTG et ECOLE_MASSY) mis librement à la disposition de la communauté scientifique. Ces deux corpus constituent la première livraison du projet Parole Publique initié par le laboratoire VALORIA. Ce projet vise la constitution d'une collection de corpus de dialogue oral enrichis par annotation morpho-syntaxique. Ces corpus de dialogue finalisé sont essentiellement destinés à une utilisation en communication homme-machine.. This paper presents two corpora (OTG et ECOLE_MASSY) of French spoken dialogue which are the first delivery of the Parole_Publique (in English : Public Speech) project held by the VALORIA laboratory. This project aims at the achievement of a collection of spoken dialogue corpora that is freely distributed on the WWW. It is primarily intended for researches on man-machine communication. Mots Clés — Keywords ressources linguistiques francophones ; dialogue oral ; communication homme-machine French speaking linguistic ressources ; spoken dialogue ; man-machine communication. 319 Antoine et al.
Article
Full-text available
Cette étude s'intéresse au comportement de deux marques appelées abusivement d'« hésitation » en français oral non lu : le euh et les allongements vocaliques. Nous formulons l'hypothèse que ces deux marques auraient une distribution complémentaire (seraient des variantes combinatoires): l'allongement vocalique porterait presque exclusivement sur les syllabes (C)V des mots outils alors que le euh serait largement plus souvent distribué à la suite des mots pleins ou des syllabes (C)VC des mots outils.
Article
Full-text available
Nous proposons une réflexion théorique sur la place d'un phénomène tel que celui des disfluences au sein d'une grammaire. Les descriptions fines qui en ont été données mènent à se demander quel statut accorder aux disfluences dans une théorie linguistique com-plète, tout en conservant une perspective globale de représentation, c'est-à-dire sans nuire à la cohérence et à l'homogénéité générale. Nous en introduisons une représentation formelle, à la suite de quoi nous proposons quelques mécanismes de parsing permettant de les traiter. Abstract We propose a theoretical reflexion about the place of a phenomenon like dis-fluencies, in a grammar. The precise descriptions that are available leads to a question : what status shall we give to disfluencies into a complete linguistic theory ?, keeping a global point of view and without compromising the coherence and the homogeneity of its representation. We introduce a formal representation of the phenomenon, and then we propose some parsing mechanisms in order to treat it.
Article
Full-text available
Cet article présente sommairement la collection de corpus de français parlé Elicop et décrit les travaux réalisés pour en faciliter l'accès et la consultation par des utilisateurs non initiés. Il aborde successivement la normalisation des formats et des annotations, l'enrichissement du corpus grâce à l'étiquetage grammatical et à la lemmatisation, et enfin la consultation ciblée des corpus à partir des propriétés lexicales, morphologiques et syntaxiques. Ces outils de consultation sont accessibles par le réseau Internet.
Article
Full-text available
For some years, processing mass of multimedia documents has become a very crucial issue for applications like indexation or information retrieval. Among the focused information, speaker identity can be very useful for such applications. A huge collection of documents cannot be manually processed with a reasonable cost: only automatic systems are a relevant solution.In this paper, we consider the extraction of speaker identity (firstname and lastname) from audio records of broadcast news. Using a rich transcription system, we present a method which allows to extract speaker identities from automatic transcripts and to assign them to speaker turns. Experiments are carried out on French broadcast news records from the ESTER 1 phase II evaluation campaign.
Article
Full-text available
This paper describes the development of a speech recogni- tion system for the processing of conversational speech, start- ing with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve good performance on conversational speech. Some ma- jor changes on the acoustic side include the use of speaker nor- malizations (VTLN and SAT), the need for better pronunciation modeling and the use of discriminative training (MMIE). On the linguistic side the primary challenge is to cope with the lim- ited amount of language model training data. To address this issue we make use of a data selection technique, and a smooth- ing technique based on a neural network language model. At the decoding level, lattice rescoring and minimum word error decoding are applied. On the development data, the improve- ments yield an overall word error rate of about 21% whereas the original BN transcription system had a word error rate of about 50% on the same data.
Article
Full-text available
L'opposition oral/écrit est analysée à travers la variation dans les proportions des catégories grammaticales dans vingt-et-un corpus oraux et écrits italiens, néerlandais, français et d'interlangue française. Une analyse factorielle permet d'extraire une dimension unique sur laquelle s'opposent deux groupes de catégories grammaticales. Les substantifs, les articles, les adjectifs et les prépositions se situent près du pôle négatif tandis que les pronoms, les verbes, les adverbes, les interjections et les conjonctions se situent près du pôle positif de cette dimension qui reflète le continuum de la deixis. Les corpus oraux se situent près du pôle déictique/implicite et s'opposent aux corpus écrits qui se rapprochent du pôle explicite du continuum. Les énoncés oraux sont davantage ancrés dans le contexte spatio-temporel du locuteur tandis que le discours écrit est généralement plus indépendant du contexte spatio-temporel, ce qui explique son caractère plus nominal et sa proportion plus importante de mots essentiellement non-déictiques ou explicites.
Conference Paper
Full-text available
The LUNA corpus is a multi-domain multi- lingual dialogue corpus currently under development. The corpus will be anno- tated at multiple levels to include annota- tions of syntactic, semantic and discourse information and used to develop a ro- bust natural spoken language understand- ing toolkit for multilingual dialogue ser- vices1.
Article
Full-text available
The MultiModal Interface Language formalism (MMIL) has been selected as the High Level Semantic (HLS) formalism for annotating the French MEDIA dialogue corpus. This corpus is com-posed of human-machine dialogues in the domain of hotel reservation and tourist information. Utter-ances in dialogues have been previously annotated with a concept-value flat semantics for studying and evaluating spoken language understanding modules in dialogue systems. We are now interested in investigating the use of more complex representations to improve the understanding capability. The MMIL intermediate language is a high level semantic formalism that bears relevant linguistic information, from syntax up to discourse. This representation should increase the expressivity of the current annotation though at the expense of the annotation process complexity. In this paper we present our first attempt in defining the annotation guidelines for the HLS annotation of the MEDIA corpus and its effect on the annotation process itself, revealed by annotators' disagreements due to the different levels of hierarchy and the granularity of the features defined in MMIL.
Article
Full-text available
Résumé Les corpus oraux ont pour particularité qu'aux données primaires (les enregistrements) s'ajoutent des données secondaires (les transcriptions) nécessaires à leur exploitation. Cet article examine -les conséquences, sur les conventions de transcription à adopter, de l'évolution des outils informatisés pour l'exploitation des corpus oraux ; -l'implication qu'ont les conventions de transcription sur les types de requêtes et d'analyses qu'on peut mener sur les données ; -comment la transcription synchronisée, en alignant le texte et le son, facilite l'accès au son mais introduit un artéfact dû au choix d'une unité temporelle d'alignement. De manière plus particulière, cet article examine l'évolution des conventions de transcription et d'alignement 2 utilisées dans la banque de données VALIBEL et analyse les problèmes et les conséquences de ces conventions pour l'annotation et la recherche linguistique sur corpus. 1. Grands principes des conventions de transcription Toute analyse linguistique de productions orales est impossible à partir de la seule source sonore. En effet, le chercheur aura beau écouter et réécouter encore les enregistrements, il ne peut les appréhender uniquement par le biais du son. Ceux-ci ne pourront devenir objets d'étude à part entière qu'à partir de leur mise en/par écrit. La parole reste fluide, essentiellement fugace, même après avoir été captée sur bande sonore : On ne peut pas étudier l'oral par l'oral, en se fiant à la mémoire qu'on en garde. On ne peut pas, sans le secours de la représentation visuelle, parcourir l'oral en tous sens et en comparer les morceaux. (Blanche-Benveniste 2000 : 24) 1 Cet article a bénéficié des commentaires éclairants de Michel Francard et de Philippe Hambye. Toute erreur ou imprécision y subsistant est à attribuer à la seule responsabilité des auteures. 2 Avec le logiciel Praat (Boersma et Weenink 2007).
Article
Full-text available
Traditional experimental phonetics laboratories are made somewhat obsolete by the use of popular software tools such as Praat [7]. Indeed, these tools provide most of the acoustic analysis engines needed for prosodic research, in particular fundamental frequency trackers and speech prosodic morphing synthesizer. Still, their usage is not always totally intuitive, and considerable training must sometimes be provided in order to ensure a reasonable degree of success and efficiency when used in a research project. In this perspective, new generation acoustical analysis software such as WinPitchPro will put emphasis on reliability of measurements and ease of use.
Article
Full-text available
This paper reports on an experiment aimed at measuring the quality of automatic and human phonetic transcriptions of different speech styles that were produced within the framework of a large speech corpus project for Dutch, the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN). The results indicate that the procedure adopted in the CGN to improve the quality of phonetic transcriptions does indeed contribute to achieving this aim. However, better transcriptions of spontaneous speech could probably be obtained by resorting to ASR techniques for pronunciation variation modeling. Our research indicates how this could be achieved.
Article
Full-text available
The software program WinPitch Corpus addresses these concerns directly, allowing two modes of operation to handle the data. In the first mode, text is not available and is generated by the user speech segment by speech segment (as it was the case when only analog tape recorders were available). In the second mode, speech has already been transcribed into text, but the text units are not aligned, i.e. a bi -univocal relationship between units of text and units of speech has not been establishe d. Although some existing software programs operate in the first mode, establishing implicit text and speech alignment in the process, few allow operations in common ly found (difficult) recording conditions such as voice overlapping or presence of noise. This paper introduces briefly some of the important features of WinPitch Corpus , as an efficient tool for transcription and analysis of speech data: slower speech rate for easier transcription, dynamic adjustment of segments with simultaneous display of spectrograms for precise alignment, etc. Numerous speech analysis tools (fundamental frequency tracker, spectrogram, LPC formant analysis, etc.) are available with a quasi instantaneously display of the results. Support for the simultaneous acoustical analysis of both channels of stereo recordings is also provided. The program has already been extensively used for analysis of large romance languages corpora of spontaneous speech (more than 1.200.000 words, C-ORAL-ROM, 2003), as well as for the phonetic and phonological description of Parkatêjê, an endangered language of the Amazon spoken by about 300 people (Araújo and Martin, 2003) . WinPitch Corpus is available from the www.winpitch.com web site, under the name WinPit chPro.
Article
Full-text available
In this paper the Spoken Dutch Corpus project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a corpus of 1,000 hours of spoken Dutch. Upon completion, the corpus will constitute a valuable resource for re-search in the fields of (computational) linguistics and language and speech technology. Although the corpus will contain a fair amount of read speech (mainly to train initial acoustic models for speech recognizers), the lion's share of the data will consist of spontaneous speech, ranging from lectures to unobtrusively recorded conversations. The corpus is unique in that all speech recordings will be made available together with several levels of high quality annotations, from verbatim orthographic transcrip-tions to syntactic analyses and prosodic labeling.
Article
Full-text available
définissant d'abord sous différents angles et spécificités, puis en envisageant sa transcription de façon diachronique et synchronique. Enfin, par le biais de différentes expériences, réalisées notamment dans le cadre du projet EPAC, nous avons identifié les principaux problèmes que la parole spontanée posait aux systèmes de reconnaissance automatique, et proposons des optimisations en vue de les résoudre. MOTS-CLÉS : parole spontanée, transcription manuelle, transcription automatique, spécificités de l'oral, système de reconnaissance automatique de la parole ABSTRACT. This paper deals with spontaneous speech, considering first its specificities, and then its transcription – both diachronically and synchronically. The paper continues by listing the main problems spontaneous speech causes to automatic speech recognition systems, which were identified through several experiments. It ends by suggesting some optimizations to help solve these problems.
Conference Paper
Full-text available
This paper gives the final results of the ESTER evaluation cam- paign which started in 2003 and ended in January 2005. The aim of this campaign was to evaluate automatic broadcast news rich transcription systems for the French language. The evalua- tion tasks were divided into three main categories: orthographic transcription, event detection and tracking (e.g. speech vs. mu- sic, speaker tracking), and information extraction. The last one, limited to named entity detection in this evaluation, was a pre- liminary test. The paper reports on protocols and gives the re- sults obtained in the campaign.
Conference Paper
Full-text available
Spontaneous conversation is optimized for human-human com- munication, but differs in some important ways from the types of speech for which human language technology is often devel- oped. This overview describes four fundamental properties of spontaneous speech that present challenges for spoken language applications because they violate assumptions often applied in automatic processing technology.
Conference Paper
Full-text available
This paper presents the system used by the LIUM to partici- pate in ESTER, the french broadcast news evaluation campaign. This system is based on the CMU Sphinx 3.3 (fast) decoder. Some tools are presented which have been added on different steps of the Sphinx recognition process: segmentation, acoustic model adaptation, word-lattice rescoring. Several experiments have been conducted on studying the ef- fects of the signal segmentation on the recognition process, on injecting automatically transcribed data into training corpora, or on testing different approaches for acoustic model adapta- tion. The results are presented in this paper. With very few modifications and a simple MAP acoustic model estimation, Sphinx3.3 decoder reached a word error rate of 28.2%. The entire system developed by LIUM obtained 23.6% as official word error rate for the ESTER evaluation, and 23.4% as result of an unsubmited system.
Conference Paper
Full-text available
We present a preliminary analysis of transcriber consistency in labeling and segmentation of words and phones in the Buckeye corpus of spontaneous, informal speech. We find that pairwise inter-transcriber agreement on exact phone label match was 76%, and segmentation agreement within 20% of phone pair length was 75%, though longer phones are more consistently segmented than shorter phones. Patterns of consistency variation in labeling are observed as a function of phonetic categories that are similar to patterns reported for read speech. More agreement is seen on consonants than on vowels, and on fricatives and labials than on other consonant classes. In general, we find that shorter, more reduced words and phones result in more transcriber disagreement.
Conference Paper
Full-text available
Our paper focuses on the gain which can be achieved on human transcription of spontaneous and prepared speech, by using the assistance of an ASR system. This experiment has shown interesting results, first about the duration of the transcription task itself: even with the combination of prepared speech + ASR, an experimented annotator needs approximately 4 hours to transcribe 1 hours of audio data. Then, using an ASR system is mostly time-saving, although this gain is much more significant on prepared speech: assisted transcriptions are up to 4 times faster than manual ones. This ratio falls to 2 with spontaneous speech, because of ASR limits for these data. Detailed results reveal interesting correlations between the transcription task and phenomena such as Word Error Rate, telephonic or non-native speech turns, the number of fillers or propers nouns. The latter make spelling correction very time-consuming with prepared speech because of their frequency. As a consequence, watching for low averages of proper nouns may be a way to detect spontaneous speech.
Chapter
communicative knowledge;relationships;textual domains;languages;semantics
Article
Résumé. Nous présentons ici une revue des différents outils et formalismes informatiques récents qui peuvent aider le linguiste à faire de la transcription, et plus généralement à faire de l'annotation sur des corpus de parole. La standardisation des ces outils et de ces formalismes facilite le codage, l'échange et la diffusion de l'information. Nous présentons à titre d'illustration une méthode d'annotation de corpus de parole mise au point dans le cadre d'un programme d'archivage d'enregistrements de terrain. Cette méthode utilise le plus possible les standards émergeants (Unicode et XML). Nous décrivons dans cet article à la fois la structure des données (enregistrements et annotations) et les outils de manipulation de ces dernières (parseurs, éditeurs, browsers, etc.). Abstract. Computer tools and formats for linguistic transcription and for the annotation of linguistic corpora are reviewed. Standardization of these tools and formats will facilitate the coding, exchange, and dissemination of information. A method of annotation for corpora of spoken language, developed as part of a program to archive linguistic field recordings, is presented as an example. The method relies as far as possible on emerging standards for structured text (XML, Unicode). Data formats for both sound and annotation and processing tools (editors, parsers, browsers) are discussed.
Article
Our paper focuses on the gain which can be achieved on transcription of spontaneous and prepared speech, by using an ASR system. This experiment has shown interesting results, first about the duration of transcription task itself: even with the combination of prepared speech + ASR, an experimented annotator needs approximatively 8 hours to transcribe 2 hours of audio data. Then, using an ASR system is mostly time-saving, although this gain is much more significant on prepared speech: assisted transcriptions are up to four times faster than manual ones. This ratio falls to two with spontaneous speech, because of ASR limits for these data. Lastly, spelling correction is very time-consuming with prepared speech, because it contains many proper nouns that had to be checked; their frequency may be a way to detect spontaneous speech.
Book
The following values have no corresponding Zotero field: ID - 11
Article
We describe in this paper a syntactic parser for spontaneous speech geared towards the identification of verbal subcategorization frames. The parser proceeds in two stages. Thefirst stage is based on generic syntactic ressources for French. The second stage is a reranker which is specially trained for a given application. The parser is evaluated on the MEDIA corpus. Mots-clés : Analyse syntaxique, reconnaissance automatique de la parole.
Article
The aim of this study is to elaborate a disfluent speech model by comparing different types of audio transcripts. The study makes use of 10 hours of French radio interview archives, involving journalists and personalities from political or civil society. A first type of transcripts is press-oriented: most disfluencies are discarded. For 10% of the corpus, we produced exact audio transcripts: all audible phenomena and overlapping speech segments are transcribed manually. In these transcripts, about 14% of the words correspond to disfluencies and discourse markers. The audio corpus has then been transcribed using the LIMSI speech recogniser. With 8% of the corpus, the disfluency words explain 12% of the overall error rate. This shows that disfluencies have no major effect on neighbouring speech segments. Restarts are the most error prone, with a 36.9% within class error rate.
Article
This contribution aims at giving an overview of present automatic speech recognition in French highlighting ty-pical transcription problems for this language. Explana-tions for errors can be partially obtained by examining the acoustics of the speech data. Such investigations however do not only inform about system and/or speech modeling limitations, but they also contribute to discover, describe and quantify specificities of spoken language as opposed to written language and speakers' speech performance. To automatically transcribe different speech genres (e.g. broadcast news vs conversations) specific acoustic corpora are used for training, suggesting that frequency of occur-rence and acoustic realisations of phonemes vary signi-ficantly across genres. Some examples of corpus studies are presented describing phoneme frequencies and seg-ment durations on different corpora. In the future large-scale corpus studies may contribute to increase our know-ledge of spoken language as well as the performance of automatic processing.
Conference Paper
Automatic speech recognition (ASR) systems are used in a large number of applications, in spite of the inevitable recognition errors. In this study we propose a pragmatic approach to automatically repair ASR outputs by taking into account linguistic and acoustic information, using formal rules or stochastic methods. The proposed strategy consists in developing a specific correction solution for each specific kind of errors. In this paper, we apply this strategy on two case studies specific to French language. We show that it is possible, on automatic transcriptions of French broadcast news, to decrease the error rate of a specific error by 11.4% in one of two the case studies, and 86.4% in the other one. These results are encouraging and show the interest of developing more specific solutions to cover a wider set of errors in a future work.
Conference Paper
Processing spontaneous speech is one of the many challenges that automatic speech recognition (ASR) systems have to deal with. The main evidences characterizing spontaneous speech are disfluencies (filled pause, repetition, repair and false start) and many studies have focused on the detection and the correction of these disfluencies. In this study we define spontaneous speech as unprepared speech, in opposition to prepared speech where utterances contain well-formed sentences close to those that can be found in written documents. Disfluencies are of course very good indicators of unprepared speech, however they are not the only ones: ungrammaticality and language register are also important as well as prosodic patterns. This paper proposes a set of acoustic and linguistic features that can be used for characterizing and detecting spontaneous speech segments from large audio databases. More, we introduce a strategy that takes advantage of a global classification procfalseess using a probabilistic model which significantly improves the spontaneous speech detection.
Article
We present "Transcriber", a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with extensions such as Snack for advanced audio functions and telex for lexical analysis, and has been tested on various Unix systems and Windows. The data format follows the XML standard with Unicode support for multilingual transcriptions. Distributed as free software in order to encourage the production of corpora, ease their sharing, increase user feedback and motivate software contributions, Transcriber has been in use for over a year in several countries, As a result of this collective experience, new requirements arose to support additional data formats, video control, and a better management of conversational speech. Using the annotation graphs framework recently formalized, adaptation of the tool towards new tasks and support of different data formats will become easier.
Article
Human speech is peppered with ums and uhs, among other signs of hesitation in the planning process. But are these so-called fillers (or filled pauses) intentionally uttered by speakers, or are they side-effects of difficulties in the planning process? And how do listeners respond to them? In the present paper, we review evidence concerning the production and comprehension of fillers such as um and uh, in an attempt to determine whether they can be said to be ‘words’ with ‘meanings’ that are understood by listeners. We conclude that, whereas listeners are highly sensitive to hesitation disfluencies in speech, there is little evidence to suggest that they are intentionally produced, or should be considered to be words in the conventional sense.
Article
A six-step, iterative, empirical human factors design methodology was used to develop CAL, a natural language computer application to help computer-naive business professionals manage their personal calenders. Input language is processed by a simple, nonparsing algorithm with limited storage requirements and a quick response time. CAL allows unconstrained English inputs from users with no training (except for a five minute introduction to the keyboard and display) and no manual (except for a two-page overview of the system). In a controlled test of performance, CAL correctly responded to between 86 percent and 97 percent of the storage and retrieval requests it received, according to various criteria. This level of performance could never have been achieved with such a simple processing model were it not for the empirical approach used in the development of the program and its dictionaries. The tools of the engineering psychologist are clearly invaluable in the development of user-friendly software, if that software is to accommodate the unruly language of computer-naive, first-time users. The key is elicit the cooperation of such users as partners in an iterative, empirical development process. 15 references.
Conference Paper
WebTranscribe is a platform independent and extensible web-based annotation framework for speech research and spoken language technology. The framework consists of an annotation editor front-end running as a Java Web Start application on a client computer, and a DBMS on a server. The framework implements a “select – annotate – save” annotation workflow. The annotation capabilities are determined by annotation editors, implemented as plug-ins to the general framework. An annotation configuration generally consists of an editor, editing buttons, a signal display and a quality assessment panel. A configuration file determines which plug-ins to use for a given annotation project. WebTranscribe has been used in numerous projects at BAS and has reached a mature state now. The software is freely available [19].