Fig 7 - uploaded by Rodolfo Delmonte
Content may be subject to copyright.
Syllable Level Prosodic Activities syll.jpg The main Activity Window for "Parole e Sillabe"/Words and Syllables is divided into three main sections: in the higher portion of the screen the student is presented with the orthographic and phonetic transcription(in Arpabet) of the word which is spoken aloud by a native speaker's voice. This section of the screen can be activated or disactivated according to which level of Interlingua the student belongs to. We use six different levels (Delmonte R., Cristea D. et al. 1996; Delmonte R., et al. 1996). In particular, the stressed syllable is highlighted between a pair of dots. The main central portion of the screen contains the buttons corresponding to each single syllable which the student may click on. The system then waits for the student performance which is dynamically analysed and compared to the master's. The result is shown in the central section by aligning the student's performance with the master's. According to duration computed for each syllable the result will be a perfect alignment or a misalignment in defect or in excess. Syllables exceeding the master's duration will be shown longer, whereas syllables shorter in duration will show up shorter. The difference in duration will thus be evaluated in proportion as being a certain percentage of the master's duration. This value will be applied to parameters governing the drawing of the related button by HyperCardTM. At the same time, in the section below the central one, two warnings will be activated in yellow and red, informing the student that the performance was wrong: prosodic information concerns the placement of word stress on a given syllable, as well as the overall duration (see Bannert 1987; Batliner et al., 1998). In case of error, the student practicing at word level will hear at first an unpleasant sound which is then followed by the visual indication of the error by means of a red blinking syllable button, the one in which he/she wrongly assigned word stress. This is followed by the rehearsal of the right syllable which always appears in green. A companion exercise takes care of the unstressed portion/s of the word: in this case, the student will focus on 

Syllable Level Prosodic Activities syll.jpg The main Activity Window for "Parole e Sillabe"/Words and Syllables is divided into three main sections: in the higher portion of the screen the student is presented with the orthographic and phonetic transcription(in Arpabet) of the word which is spoken aloud by a native speaker's voice. This section of the screen can be activated or disactivated according to which level of Interlingua the student belongs to. We use six different levels (Delmonte R., Cristea D. et al. 1996; Delmonte R., et al. 1996). In particular, the stressed syllable is highlighted between a pair of dots. The main central portion of the screen contains the buttons corresponding to each single syllable which the student may click on. The system then waits for the student performance which is dynamically analysed and compared to the master's. The result is shown in the central section by aligning the student's performance with the master's. According to duration computed for each syllable the result will be a perfect alignment or a misalignment in defect or in excess. Syllables exceeding the master's duration will be shown longer, whereas syllables shorter in duration will show up shorter. The difference in duration will thus be evaluated in proportion as being a certain percentage of the master's duration. This value will be applied to parameters governing the drawing of the related button by HyperCardTM. At the same time, in the section below the central one, two warnings will be activated in yellow and red, informing the student that the performance was wrong: prosodic information concerns the placement of word stress on a given syllable, as well as the overall duration (see Bannert 1987; Batliner et al., 1998). In case of error, the student practicing at word level will hear at first an unpleasant sound which is then followed by the visual indication of the error by means of a red blinking syllable button, the one in which he/she wrongly assigned word stress. This is followed by the rehearsal of the right syllable which always appears in green. A companion exercise takes care of the unstressed portion/s of the word: in this case, the student will focus on 

Similar publications

Conference Paper
Full-text available
The newest generation of speech technology caused a huge increase of audio-visual data nowadays being enhanced with orthographic transcripts such as in automatic subtitling in online platforms. Research data centers and archives contain a range of new and historical data, which are currently only partially transcribed and therefore only partially a...
Conference Paper
Full-text available
Research projects incorporating spoken data require either a selection of existing speech corpora, or they plan to record new data. In both cases, recordings need to be transcribed to make them accessible to analysis. Underestimating the effort of transcribing can be risky. Automatic Speech Recognition (ASR) holds the promise to considerably reduce...
Article
Full-text available
This paper will present the morphosintactic tagger and the corpus of contemporary written Galician which are being employed in the development of the Galician version of our tex-to-speech synthesizer. Their quality and accuracy make them useful for speech technology applications and turn them into possible references for further investigation and r...

Citations

... To date, the research on teaching English phonemic assimilation as part of connected speech instruction has been limited. Although several studies on training the elements of Englishconnected speech have been conducted,most of them have been focused on prosodic features (Euler, 2014;Delmonte, 2011;Lengeris, 2012). Popular courses on teaching English as a second or foreign language only touch upon this aspect, providing scarce information supported with little practice. ...
Article
Full-text available
Phonology represents an important part of the English language; however, in the course of English language acquisition, it is rarely treated with proper attention. Connected speech is one of the aspects essential for successful communication, which comprises effective auditory perception and speech production. In this paper,I explored phonemic assimilation, which results in successive sounds at word boundaries influencing each other, as an element of connected speech, and studied how teaching it can be supported with mobile-assisted language learning. The research conducted revealed that elements of phonemic assimilation are found frequently in all styles of speech and thus, knowledge of it is necessary for developing proper listening and speaking skills. I advocate the use of the Internet as one of the best possible resources of listening and speaking materials for learning phonemic variations and show how various web tools and means of mobile can be used for preparing, presenting and storing educational materials. I believe this article makes a contribution to the corpus of research and instruction on connected speech as part of the standard British accent and will have an impact in raising general awareness of its significant role in the process of English language learning.
... Various efforts have been made to incorporate speech processing technologies and prosody teaching applications into language learning environments (e. g., [12,13]), or to implement prosody visual and audio-visual learning environments such as [14,15], including our own project originally introduced in [16] and described in more details in [17][18][19]. ...
Article
Full-text available
In tonal languages, tones are associated with both and phonological and lexical domains. Accurate tone articulation is required in order to convey the correct meaning. Learning tones at both word and phrase levels is often challenging for L2 learners with non-tonal language background, because of possible subtle difference between the close tones. In this paper, we discuss an adoption of StudyIntonation CAPT tools to the case of Vietnamese language being a good example of register tonal language with a complex system of tones comprising such features as tone pitch, its length, contour melody, intensity and phonation. The particular focus of this contribution is to assess the adoption of StudyIntonation course toolkit and its pitch processing and visualization algorithms in order to evaluate how the combined use of audio and visual perception mechanisms supported by StudyIntonation may help learners to improve the accuracy of their pronunciation and intonation with respect to tonal languages.
... Similar systems, which address teaching segmental and prosodic speech phenomena, have been developed for different languages and groups of learners. Systems which are most similar to StudyIntonation in terms of architecture and functionality were developed for suprasegmental training in English [25], for children with profound hearing loss to learn Hungarian [26], and for Italian adults learning English as a foreign language [27]. This study, in particular, exemplifies and explains the technological impact on CAPT development. ...
... Pronunciation teaching covers both segmental and suprasegmental aspects of speech. In natural speech, tonal and temporal prosodic properties are co-produced [39]; and, therefore, to characterize and evaluate non-native pronunciation, modern CAPT systems should have the means to collectively represent both segmentals and suprasegmentals [27]. Segmental (phonemic and syllabic) activities work out the correct pronunciation of single phonemes and co-articulation of phonemes into higher phonological units. ...
Article
Full-text available
This article contributes to the discourse on how contemporary computer and information technology may help in improving foreign language learning not only by supporting better and more flexible workflow and digitizing study materials but also through creating completely new use cases made possible by technological improvements in signal processing algorithms. We discuss an approach and propose a holistic solution to teaching the phonological phenomena which are crucial for correct pronunciation, such as the phonemes; the energy and duration of syllables and pauses, which construct the phrasal rhythm; and the tone movement within an utterance, i.e., the phrasal intonation. The working prototype of StudyIntonation Computer-Assisted Pronunciation Training (CAPT) system is a tool for mobile devices, which offers a set of tasks based on a “listen and repeat” approach and gives the audio-visual feedback in real time. The present work summarizes the efforts taken to enrich the current version of this CAPT tool with two new functions: the phonetic transcription and rhythmic patterns of model and learner speech. Both are designed on a base of a third-party automatic speech recognition (ASR) library Kaldi, which was incorporated inside StudyIntonation signal processing software core. We also examine the scope of automatic speech recognition applicability within the CAPT system workflow and evaluate the Levenstein distance between the transcription made by human experts and that obtained automatically in our code. We developed an algorithm of rhythm reconstruction using acoustic and language ASR models. It is also shown that even having sufficiently correct production of phonemes, the learners do not produce a correct phrasal rhythm and intonation, and therefore, the joint training of sounds, rhythm and intonation within a single learning environment is beneficial. To mitigate the recording imperfections voice activity detection (VAD) is applied to all the speech records processed. The try-outs showed that StudyIntonation can create transcriptions and process rhythmic patterns, but some specific problems with connected speech transcription were detected. The learners feedback in the sense of pronunciation assessment was also updated and a conventional mechanism based on dynamic time warping (DTW) was combined with cross-recurrence quantification analysis (CRQA) approach, which resulted in a better discriminating ability. The CRQA metrics combined with those of DTW were shown to add to the accuracy of learner performance estimation. The major implications for computer-assisted English pronunciation teaching are discussed.
... La aplicación, que incluye una variedad de ejercicios (véase Barrios, 2013, para una descripción detallada de la versión 9 de Tell Me More online), incorpora una sofisticada tecnología de reconocimiento de voz que permite, entre otras funcionalidades, la de visualizar la calidad de la propia producción oral y la de contrastarla con un modelo nativo (Campillos, 2010;Blake, 2011;Delmonte, 2011). Además, son abundantes sus componentes audiovisuales e interactivos (Godwin-Jones, 2010) y se le reconoce su facilidad de uso y la amigabilidad de su interfaz (Lafford, 2004). ...
Article
Full-text available
Este artículo explora la incidencia de tres variables, a saber, el sexo, el nivel de competencia en inglés y el grado de motivación, en las percepciones que albergan usuarios de la aplicación en línea Tell Me More acerca de su avance en habilidades comunicativas y competencias lingüísticas mediante dicha aplicación, y acerca de la contribución de distintas actividades que lo conforman a su progresión en inglés. La finalidad que se persigue, en última instancia, es la de identificar características individuales que se relacionan con percepciones más o menos favorables sobre esta aplicación como programa de autoaprendizaje de inglés basado en red. El estudio, en el que participaron 75 docentes de la Universidad de Málaga, empleó el cuestionario como estrategia de recogida de datos, a los que se aplicaron, según los casos, las pruebas estadísticas U de Mann-Whitney y Kruskal-Wallis, y se calcularon tablas de contingencia y el coeficiente de correlación de Kendall. Los hallazgos indican que la variable sexo influye en la percepción de progresión en inglés en tres de las cinco habilidades comunicativas por las que se sondeó a los participantes. Además, se comprueba que, a menor nivel de competencia en inglés, mayor es el grado en que los usuarios consideran que han contribuido las actividades a su aprendizaje; asimismo, de los tres estudiados, el grado de motivación es el factor más influyente, tanto en la percepción de avance en habilidades y competencias como en la consideración favorable de la contribución de las distintas actividades al aprendizaje del idioma. Estos hallazgos, junto a la constatación de que, a menor nivel de inglés, mayor es la motivación que genera Tell Me More, lleva a concluir que se trata de una plataforma más recomendable para estudiantes de nivel elemental en inglés que para quienes poseen un nivel igual o superior al nivel B1 según el Marco Común Europeo de Referencia para las Lenguas.
... With the increased computing power and the rapid progress in computer-based speech processing along with the creation of advanced methods for speech recognition(including dialects and accents) now make it possible to apply modern speech technologies to ''Computer-Aided Pronunciation Teaching' (CAPT) (Delmonte, 2011).There are currently available several systems that can measure the pronunciation quality of students by analysing few minutes of their speech and have shown to be as reliable as trained human experts (Bernstein, 2010). While this high-level global pronunciation scores might be sufficient for oral proficiency and pronunciation assessment purposes but in general it is not detailed enough for training purposes (Olov, 2012). ...
... With the increased computing power and the rapid progress in computer-based speech processing along with the creation of advanced methods for speech recognition(including dialects and accents) now make it possible to apply modern speech technologies to ''Computer-Aided Pronunciation Teaching' (CAPT) (Delmonte, 2011).There are currently available several systems that can measure the pronunciation quality of students by analysing few minutes of their speech and have shown to be as reliable as trained human experts (Bernstein, 2010). While this high-level global pronunciation scores might be sufficient for oral proficiency and pronunciation assessment purposes but in general it is not detailed enough for training purposes (Olov, 2012). ...
Article
Full-text available
In this paper we introduce the SpeakCorrect system which is a Computer Aided Pronunciation Training (CAPT) system for native Arabic students of English. The system is designed with optimized performance for the target users group. It is L1 dependent system and only the frequent pronunciation errors of native Arabic speakers are examined. Several adaptation techniques such as Speaker Adaptive Training (SAT), Speaker Clustering (SC) and Maximum Likelihood Linear Regression (MLLR) are used to boost the performance of the SpeakCorrect system. The decision reached by the SpeakCorrect system is accompanied by a posterior based confidence score to reduce effect of misleading system feedback.Evaluation results for the system are promising and show significant improvements in the users' pronunciation proficiency.
... References [11], [12] and [13] , provide a very thorough and indepth overview of the work up to 2009. Since pronunciation error detection and teaching in its entirety is a difficult problem, past work has often only addressed components of this field such as phoneme level pronunciation error detection or prosodic error detection. ...
... A very detailed discussion of CAPT systems that provide prosodic feedback can be found in [12]. Bernstein et al. [38] have shown that there appears to be a linear relationship between fluency measures (such as listed inTable 1 ) and human judgments of proficiency. ...
Conference Paper
Full-text available
This presentation gives a review of the large amount of research on automatic pronunciation error detection that has been conducted over the past 10-15 years. The goal is to provide a linkage between the various research approaches and work streams in order to aid development of the next generation of algorithms. A vision of an ideal pronunciation error detection system is presented and used as a reference to determine current challenges and possible next steps in research efforts. Lastly, an extensive list of references on the field is provided.