Figure 6 - uploaded by Philippe Martin
Content may be subject to copyright.
Assisted alignment.  

Assisted alignment.  

Source publication
Article
Full-text available
The software program WinPitch Corpus addresses these concerns directly, allowing two modes of operation to handle the data. In the first mode, text is not available and is generated by the user speech segment by speech segment (as it was the case when only analog tape recorders were available). In the second mode, speech has already been transcribe...

Similar publications

Article
Full-text available
Models of speech production posit a role for the motor system, predominantly the posterior inferior frontal gyrus, in encoding complex phonological representations for speech production, at the phonemic, syllable, and word levels [Roelofs, A. A dorsal-pathway account of aphasic language production: The WEAVER++/ARC model. Cortex, 59(Suppl. C), 33-4...
Article
Full-text available
We present an analysis of palatalization from /s/ to [ʃ] at word boundaries in UK English. Previous work has considered the effect of lexical frequency (LF) on this phenomenon, but without combining acoustics and spontaneous speech in one study, which we undertake using data gathered from the Audio BNC (http://www.phon.ox.ac.uk/AudioBNC). We analyz...
Article
Full-text available
Language assessment has a crucial role in the clinical diagnosis of several neurodegenerative diseases. The analysis of extended speech production is a precious source of information encompassing the phonetic, phonological, lexico-semantic, morpho-syntactic, and pragmatic levels of language organization. The knowledge about the distinctive linguist...
Article
Full-text available
Background: The relationship between spontaneous speech and formal language testing in people with brain tumors (gliomas) has been rarely studied. In clinical practice, formal testing is typically used, while spontaneous speech is less often evaluated quantitatively. However, spontaneous speech is quicker to sample and may be less prone to test/ret...

Citations

... These are corpora that consist not only of a sound file, but also the (orthographic/phonetic) transcription of the sound file, as well as an alignment table indicating which fragment of the sound file a given sentence or phrase in the transcription file corresponds to. There are various tools around for the creation of timealigned corpora, such as WaveSurfer (Sjölander and Beskow, 2000), ELAN (Brugman and Russell, 2004), PRAAT (Boersma, 2001), WinPitch (Martin, 2003), and Transcriber (Barras et al., 2001). Many of these tools provide a rich set of features to explore both the sound file and the transcription, and most of them are free and readily available. ...
Conference Paper
Full-text available
Spock is an open source tool for the easy deployment of time-aligned corpora. It is fully web-based, and has very limited server-side requirements. It allows the end-user to search the corpus in a text-driven manner, obtaining both the transcription and the corresponding sound fragment in the result page. Spock has an administration environment to help manage the sound files and their respective tran- scription files, and also provides statistical data about the files at hand. Spock uses a proprietary file format for storing the alignment data but the integrated admin environment allows you to import files from a number of common file formats. Spock is not intended as a transcriber program: it is not meant as an alternative to programs such as ELAN, Wavesurfer, or Transcriber, but rather to make corpora created with these tools easily available on line. For the end user, Spock provides a very easy way of accessing spoken corpora, without the need of installing any special software, which might make time-aligned corpora corpora accessible to a large group of users who might otherwise never look at them.
Article
Our researches are based upon the EPAC project. We develop this work context in our first chapter. Then, transcription task is presented. Some important dates and people are mentionned, as well as an inventory of available speech corpus. Also, assisted and manual transcription task are evaluated and compared. A comparative study of eight transcription tools is developed in the third chapter. It shows that depending on the transcription context (data size, type of annotations ... ), some are more useful than others. Encoding data is the next step of our work. Is it really easy to exchange some transcriptions? We will demonstrate that interoperability must be much more efficient than it currently is, in order to easily share transcribed data. At least, what we name spontaneous speech is precisely analysed. Thanks to several points of view, definitions and experiences, we try to get the precise meaning of this expression.
Article
Full-text available
SOMMARIO Il Progetto C -ORAL-ROM ha realizzato una risorsa di parlato spontaneo comparabile delle quattro principali lingue romanze (italiano, francese, spagnolo e portoghese) della consistenza di circa 300.000 occorrenze lessicali per ciascuna lingua. L'articolo dà conto dei i criteri di costituzione utilizzati nella rappresentazione del parlato spontaneo e dei formati multimediali utilizzati (dati, metadati e sincronizzazione testo-suono). L'articolo discute in particolare i criteri di tagging prosodico su base percettiva utilizzati nella risorsa che risulterà completamente taggata rispetto a i confini prosodici terminali e non terminali. In particolare sarà evidenziato che il tagging dei confini prosodici terminali segnale le unità di riferimento del parlato (enunciati) e permette di derivare dai corpora una base di dati di enunciati. Saranno presentate le misure intraprese per la validazione del tagging prosodico in questione e argomenti empirici a favore dei criteri prosodici per l'identificazione delle unità di riferimento superiori alla parola nel parlato spontaneo.