Bernd J. Kröger

Bernd J. Kröger
RWTH Aachen University · Phoniatrics Clinic, Neurophonetics Group

Ph.D.

About

179
Publications
37,980
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,679
Citations
Introduction
expertise: neurobiological models of speech production, speech perception and speech learning; voice signal generation; voice production; voice perception; vocal tract acoustics; vocal tract movement generation:
Additional affiliations
November 2001 - present
University Hospital RWTH Aachen
Position
  • Senior Researcher
November 2001 - August 2017
RWTH Aachen University
Position
  • Senior Researcher
January 2001 - October 2017
RWTH Aachen University
Position
  • Professor

Publications

Publications (179)
Article
Full-text available
Computer-implemented neural speech processing models can simulate patients suffering from neurogenic speech and language disorders like aphasia, dysarthria, apraxia of speech, and neurogenic stuttering. Speech production and perception tasks simulated by using quantitative neural models uncover a variety of speech symptoms if neural dysfunctions ar...
Article
Full-text available
Background: The computer-based simulation of the whole processing route for speech production and speech perception in a neurobiologically inspired way remains a challenge. Only a few neural based models of speech production exist, and these models either concentrate on the cognitive-linguistic component or the lower-level sensorimotor component o...
Article
Full-text available
A broad sketch for a model of speech production is outlined which describes developmental aspects of its cognitive-linguistic and sensorimotor components. A description of the emergence of phonological knowledge is a central point in our model sketch. It will be shown that the phonological form level emerges during speech acquisition and becomes an...
Article
Full-text available
Modeling speech production and speech articulation is still an evolving research topic. Some current core questions are: What is the underlying (neural) organization for controlling speech articulation? How to model speech articulators like lips and tongue and their movements in an efficient but also biologically realistic way? How to develop high-...
Presentation
Full-text available
summary of my work on neurobiologically inspired modeling of speech production and speech perception using a spiking neuron model; describing and modeling normal and disordered speech; simulation of speech production and spec perception tasks (picture naming; word comprehension; word repetition)
Article
Zusammenfassung Für die Therapie kindlicher Aussprachestörungen ist die diagnostische Abgrenzung zwischen kindlicher Sprechapraxie (KSAX) und phonologischer Aussprachestörung (PAS) wichtig. Während für die Diagnostik einer PAS Diagnostikmaterial zur Verfügung steht, ist das Expertenurteil Goldstandard in der Diagnostik einer KSAX. Ziel des Beitrags...
Article
Full-text available
Our understanding of the neurofunctional mechanisms of speech production and their pathologies is still incomplete. In this paper, a comprehensive model of speech production based on the Neural Engineering Framework (NEF) is presented. This model is able to activate sensorimotor plans based on cognitive-functional processes (i.e., generation of the...
Article
Full-text available
Background To produce and understand words, humans access the mental lexicon. From a functional perspective, the long-term memory component of the mental lexicon is comprised of three levels: the concept level, the lemma level, and the phonological level. At each level, different kinds of word information are stored. Semantic as well as phonologica...
Article
Full-text available
Many medical screenings used for the diagnosis of neurological, psychological or language and speech disorders access the language and speech processing system. Specifically, patients are asked to fulfill a task (perception) and then requested to give answers verbally or by writing (production). To analyze cognitive or higher-level linguistic impai...
Chapter
In this section models of speech production, perception, and learning are discussed. First, we present theoretical models based on gross brain activity data and behavioral data. We then describe quantitative computational models involving simulated brain activity or behavior.
Chapter
This chapter presents the “neural engineering framework” (NEF), which is a well-documented and easy-to-use framework from the computer programming point of view. In particular, we show how to use this framework to build a neural model for word generation and apply that model to simulate a picture naming test. The NEF can use neuron models that clos...
Chapter
In this chapter we discuss why children are interested in speaking and what drives them to learn spoken language. We also look at why children can effortlessly recognize, understand, and even produce novel words. Specifically, we examine the role of communication partners and certain communication scenarios between a child and their communication p...
Chapter
In this chapter we introduce semantic networks, mental lexicon, mental syllabary, articulation, and how the acoustic speech signal is generated. We detail the types of information associated with lexical items (concept, lemma, and phonological form) and syllables (motor form or motor plan, auditory form, somatosensory form), and discuss how motor p...
Chapter
This section provides an introduction to computer-implemented connectionist neural models. It explains how sensory, motor, and cognitive states are represented at the neural level and how these states can be processed in neural networks. Supervised learning is illustrated through a sensorimotor association example and unsupervised learning through...
Chapter
In this chapter we explain how the recognition of sound features works on the acoustic-auditory level and how recognition of sound features leads to the activation of symbolic-cognitive variables such as sounds, syllables, and words. We describe how the speech information signal is compressed from a detailed acoustic-auditory representation to an e...
Chapter
What brain areas and other parts of the nervous system are involved in speech processing? How do the organization and function of those neural resources enable speech processing? In order to answer these two questions, we introduce the broad categories of neurons (sensory neurons, motoneurons, and central neurons) and the functional neuroanatomy of...
Chapter
This section presents an approach for modeling speech processing and speech learning. Parts of this simulation model are implemented in the STAA approach, with other parts already in the NEF. The model described here comprises cognitive and sensory-motor components of speech production and perception. Additionally, we simulate the emergence of the...
Article
Full-text available
A comprehensive model of speech processing and speech learning has been established. The model comprises a mental lexicon, an action repository and an articulatory-acoustic module for executing motor plans and generating auditory and somatosensory feedback information (Kröger and Cao, 2015). In this study a “model language” based on three auditory...
Book
This book explores the processes of spoken language production and perception from a neurobiological perspective. After presenting the basics of speech processing and speech acquisition, a neurobiologically-inspired and computer-implemented neural model is described, which simulates the neural processes of speech processing and speech acquisition....
Article
Full-text available
The past decades have seen an explosion of research into the psychological, cognitive, neural, biological, and technical mechanisms of voice perception. These mechanisms refer to the general ability to extract information from voices expressed by other living beings or by technical systems. Voice perception research is now a lively area of research...
Article
Full-text available
Background: Parkinson's disease affects many motor processes including speech. Besides drug treatment, deep brain stimulation (DBS) in the subthalamic nucleus (STN) and globus pallidus internus (GPi) has developed as an effective therapy. Goal: We present a neural model that simulates a syllable repetition task and evaluate its performance when var...
Book
Speech production, speech perception, speech acquisition, speech learning, neural modelling of speech processing, ... 300 pages. In German: Das Buch bietet eine fundierte Einführung in die Sprachproduktion, Sprachwahrnehmung und den Spracherwerb und es erläutert ein neurobiologisch basiertes und computerimplementiertes Gesamtmodell der Sprachvera...
Article
Full-text available
Production and comprehension of speech are closely interwoven. For example, the ability to detect an error in one's own speech, halt speech production, and finally correct the error can be explained by assuming an inner speech loop which continuously compares the word representations induced by production to those induced by perception at various c...
Conference Paper
Full-text available
Background: Currently, there exists no comprehensive and biologically inspired model of speech production that utilizes spiking neuron. Goal: We introduce a speech production model based on a spiking neuron approach called the Neural Engineering Framework (NEF). Using the NEF to model temporal behavior at the neural level in a biologically plausibl...
Article
Full-text available
Background: Reduction of dopamine in basal ganglia is a common cause of Parkinson's disease (PD). If dopamine-producing cells die in the substantia nigra, as seen in PD, a typical symptom is freezing of articulatory movements during speech production. Goal: It is the goal of this study to simulate syllable sequencing tasks by computer modelling of...
Article
Abstract: Because speech acquisition begins with sensorimotor activity (i.e. babbling and imitation of speech items in order to learn articulatory-acoustic relations) as well as with semantic cognitive processing (i.e. linking phonetic items with concepts), distinctiveness as well as phonetic-phonological features emerge early in speech acquisition...
Conference Paper
Full-text available
Born with an innate neural architecture built specially for lan- guage learning, young children have the ability to distinguish sounds in a variety of languages. As they are exposed to na- tive language environment, perception reorganization occurs, and native language system gradually establishes. Phonology knowledge, which is language-specific, e...
Conference Paper
Full-text available
We present a model of imitative vocal learning consisting of two stages. First, the infant is exposed to the ambient language and forms auditory knowledge of the speech items to be acquired. Second, the infant attempts to imitate these speech items and thereby learns to control the articulators for speech production. We model these processes using...
Article
Full-text available
Vocal emotions are signaled by specific patterns of prosodic parameters, most notably pitch, phone duration, intensity, and phonation type. Phonation type was so far the least accessible parameter in emotion research, because it was difficult to extract from speech signals and difficult to manipulate in natural or synthetic speech. The present stud...
Article
Full-text available
Background Quantitative neural models of speech acquisition and speech processing are rare. Methods In this paper, we describe a neural model for simulating speech acquisition, speech production, and speech perception. The model is based on two important neural features: associative learning and self-organization. The model describes an SOM-based...
Conference Paper
Full-text available
A neurobiologically plausible model of speech pro-duction is introduced here using the Neural Engineering Frame-work (NEF). This approach allows detailed modeling of tempo-ral aspects of action selection and action execution in speech production at the level of single spiking neurons. A preliminary architecture of our NEF speech production model is...
Article
E-Learning dient vielfach als didaktisches Mittel zur Wissensvermittlung. Für das logopädische Störungsbild der Dysarthrie wurde eine Lernanwendung entwickelt und evaluiert, die Praxisbezüge durch Patientenvideos mit Basiswissen kombiniert und als onlinebasiertes Lernszenario konzipiert wurde. 74 Berufsfachschüler und 11 Studierende bewerteten dur...
Article
Full-text available
Speech production is complex for the brain to control, since it involves many neural processes such as speech planning, motor control, auditory and somatosensory feedback. Those functions are thought to work both in cascaded and parallel, and the control signals are transformed from one brain area to others with 'one-to-many' relations. To describe...
Conference Paper
Full-text available
Modeling speech processing in a biologically inspired way can be done by using growing self-organizing maps. But this approach is highly abstract because each " node " here represents an ensemble of " real " neurons, in our interpretation a cortical column. Moreover neural spikes trains are not modeled in this approach. Rather a mean rate of neural...
Conference Paper
Full-text available
Cognitive goals – i.e. the intention to utter a sentence and to produce co-speech facial and hand-arm gestures – as well as the sensorimotor realization of the intended speech, co-speech facial, and co-speech hand-arm actions are modulated by the emotional state of the speaker. In this review paper it will be illustrated how cognitive goals and sen...
Article
Full-text available
Based on the incremental nature of knowledge acquisition, in this study we propose a growing self-organizing neural network approach for modeling the acquisition of auditory and semantic categories. We introduce an Interconnected Growing Self-Organizing Maps (I-GSOM) algorithm, which takes associations between auditory information and semantic info...
Article
Full-text available
Background: Quantitative neural models of speech acquisition and speech processing are rare. Methods: In this paper, we describe a neural model for simulating speech acquisition, speech production, and speech perception. The model is based on two important neural features: associative learning and self-organization. The model describes an SOM-based...
Conference Paper
Full-text available
Based on the incremental nature of knowledge learning, in this study a growing self-organizing neural network approach for modeling the acquisition process of semantic features is proposed. The Growing Self-Organizing Map (GSOM) algorithm is extended and applied to the problem of language acquisition. Based on that algorithm, experiments are conduc...
Conference Paper
Full-text available
Speech motor learning is still an under-discussion process in neural computational modeling. In this paper we focus on the relationship between vowel articulation and its muscle activation patterns, propose a neural understanding of speech motor learning and elucidate the neural strategy for speech learning of infants. An existing physiological mod...
Conference Paper
Full-text available
Recent neural models are capable of generating quantitative patterns of speech articulation and speech acoustics. Five models are discussed here: the DIVA model, the task dynamics model, the ACT model, the Warlaumont model and the Hickok model. These models have a more or less strong background in neuroscience. Directions are identified in this pap...
Article
Full-text available
A speech–action-repository (SAR) or “mental syllabary” has been proposed as a central module for sensorimotor processing of syllables. In this approach, syllables occurring frequently within language are assumed to be stored as holistic sensorimotor patterns, while non-frequent syllables need to be assembled from sub-syllabic units. Thus, frequent...
Conference Paper
Full-text available
The speech-action-repository (SAR) is a neurofunctional and neurocom-putational model of syllable processing. The model is capable of storing sensori-motor representations of high-frequent syllables by a supramodal hub and its con-nections to unimodal sensorimotor state maps. In order to support the notion of the SAR, a functional imaging study was...
Article
Full-text available
Hintergrund: Kinder, die unter Sprechapraxie leiden, haben Probleme bei der Planung und Programmierung von sprechmotorischen Einheiten. Ätiologie, Definition und Diagnosekriterien dieses Störungsbildes werden kontrovers diskutiert. Kindliche Sprechapraxien (CAS) sind schwer abzugrenzen von kindlichen phonetisch-artikulatorischen Störungen sowie von...
Conference Paper
Full-text available
A computational model has been proposed which is capable of simulating early phases of speech acquisition, speech production, and speech perception. The model comprises two main modules, i.e. mental lexicon and action repository (Fig. 1). The mental lexicon activates semantic and phonological representations of words (cognitive level, Li et al. 200...
Conference Paper
Full-text available
Relevant literature of early language development is reviewed in the context of our neurocomputational model of speech acquisition. This literature con-firms our hypothesis, that phonological knowledge acquisition depends on phonetic in combination with semantic learning. It has been shown that phonetic learning starts at birth, followed by semanti...
Conference Paper
Full-text available
A comprehensive approach for describing the functional and behavioral aspects of communicative actions, e.g. facial, manual, and vocal tract actions, has been established for face-to-face-communication (Cogn Process 11:187-205, 2010). Within the speech domain, this approach will now be extended in two ways: (i) by introducing level actions such as...
Conference Paper
Full-text available
No quantitative specifications are known for neurobiologically motivated motor plans of speech actions (i.e. syllables, words, or utterances). This paper is motivated by the notion that quantitative parameters – as they can be estimated using a three-parameter model of movement trajectory approximation – are valuable candidates for specifying each...
Article
Full-text available
The importance of bodily movements in the production and perception of communicative actions has been shown for the spoken language modality and accounted for by a theory of communicative actions (Cogn. Process. 2010;11:187–205). In this study, the theory of communicative actions was adapted to the sign language modality; we tested the hypothesis t...
Conference Paper
Full-text available
Two-mass models of the vocal folds and their variants are valuable tools for voice synthesis and analysis, but are not able to produce breathy voice qualities. The produced voice qualities usually lie between normal and pressed. The reason for this property is that the mass elements are aligned parallel to the dorso-ventral axis. Thereby, the glott...
Article
Full-text available
While we are capable of modeling the shape, e.g. face, arms, etc. of humanoid robots in a nearly natural or humanlike way, it is much more difficult to generate human-like facial or body movements and human-like behavior like e.g. speaking and co-speech gesturing. In this paper it will be argued for a developmental robotics approach for learning to...
Article
Full-text available
We present a novel quantitative model for the generation of articulatory trajectories based on the concept of sequential target approximation. The model was applied for the detailed reproduction of movements in repeated consonant-vowel syllables measured by electromagnetic articulography (EMA). The trajectories for the constrictor (lower lip, tongu...
Conference Paper
Full-text available
Speech production and speech perception are important human capabilities comprising cognitive as well as sensorimotor functions. This paper summarizes our work developing a neurophonetic model for speech processing, called ACT, which was carried out over the last seven years. The function modes of the model are production, perception, and acquisiti...
Article
Full-text available
A modified two-mass model of the vocal folds is introduced and applied to the articulatory synthesis of words in six voice qualities. The modified two-mass model uses mass elements that are inclined, instead of parallel, with respect to the dorso-ventral axis as a function of the degree of abduction. This allows to produce the continuum of voice qu...
Article
Full-text available
The tongue often shows some forward movement during the closure phase of velar consonants, resulting in elliptical trajectories (loops) in symmet-rical VCV utterances. Several hypotheses exist for the cause of these loops, but none can consistently explain all observations. This study demonstrates that loops also occur in V1-V2-V1 sequences with no...
Article
Full-text available
This paper presents a distinctive phonetic features (DPFs) based phoneme recognition method by incorporating syllable language models (LMs). The method comprises three stages. The first stage extracts three DPF vectors of 15 dimensions each from local features (LFs) of an input speech signal using three multilayer neural networks (MLNs). The second...
Article
Full-text available
Communication problems are a frequent symptom for people with Parkinson's disease (PD) which can have a significant impact on their quality-of-life. Deciding on the right management approach can be problematic though, as, with the exception of LSVT, very few studies have been published demonstrating the effectiveness of treatment techniques. The ai...
Conference Paper
Full-text available
Virtual articulatory targets are a concept to explain the different trajectories of primary and secondary articulators during consonant production, as well as the different places of the tongue-palate contact depending on the context vowel, for example in [igi] vs. [ugu]. The virtual targets for the tongue tip and the tongue body in apical and dors...
Article
Full-text available
Mittels eines computerimplementierten neuronalen Modells der Sprachproduktion wurde versucht, den Zusammenhang von neuronalen Funktionsstorungen und typischen sprechapraktischen Symptomen zu erhellen. Hierzu wurde im Modell der Ausfall definierter neuronaler Assoziationen und Funktionsmodule simuliert und es wurden die resultierenden Sprechfehler b...
Chapter
Full-text available
This paper reviews interactive methods for improving the phonetic competence of subjects in the case of second language learning as well as in the case of speech therapy for subjects suffering from hearing-impairments or articulation disorders. As an example our audiovisual feedback software “SpeechTrainer” for improving the pronunciation quality o...
Conference Paper
Full-text available
Neural models of speech production are highly valuable for our understanding of normal as well as of disordered speech production processes. However, only few quantitative neural models of speech production are available. This paper introduces an action-based neurocomputational model of speech production (called ACT) comprising a high-quality artic...
Conference Paper
Full-text available
While a mental lexicon stores phonological, grammatical and semantic features of words, a vocal tract action repository is assumed to store inner motor and sensory representations of speech items (i.e. the sounds, syllables and words) of the speaker's native language. On the basis of a neural model of speech processing, which comprises important co...
Conference Paper
Full-text available
While the behavioral side of categorical perception in speech is al-ready well investigated, little is known concerning its underlying neural mecha-nisms. In this study, a computer-implemented neurophonetic model of speech production and perception is used in order to elucidate the functional neural mechanisms responsible for categorical perception...
Chapter
Full-text available
Modeling natural sounding voice qualities – for example the pressed-modal-breathy voice quality continuum which widely occurs during normal speech production – is a crucial point in speech synthesis. A parametric voice source model using prescribed sinusoidal vocal fold vibration patterns (i.e. extended Titze model) is introduced in this paper. Thi...
Article
Full-text available
Darstellungen der Lautartikulation (Lautbilder) werden gerne als visuelle Hilfsmittel in der Therapie von Sprechstörungen eingesetzt. In diesem Artikel wird ein computerimplementiertes Programm zur Generierung von mediosagittalen Lautbildern sowie artikulatorischen Bewegungsabläufen ganzer Silben und Wörter vorgestellt. Es wird erläutert, dass dies...
Article
Full-text available
Computer-aided, acoustic-based biofeedback programmes are increasingly employed in speech therapy. The aim of this study is to examine whether the computer-based speech corrector CoKo can differentiate between children with and without sigmatism (sensitivity study). Furthermore, the possibility to integrate and apply this biofeedback programme in s...
Conference Paper
This paper presents a phoneme recognition method based on distinctive phonetic features (DPFs). The method comprises three stages. The first stage extracts 3 DPF vectors of 15 dimensions each from local features (LFs) of an input speech signal using three multilayer neural networks (MLNs). The second stage incorporates an Inhibition/Enhancement (In...
Article
Full-text available
The concept of action as basic motor control unit for goal-directed movement behavior has been used primarily for private or non-communicative actions like walking, reaching, or grasping. In this paper, literature is reviewed indicating that this concept can also be used in all domains of face-to-face communication like speech, co-verbal facial exp...
Conference Paper
Full-text available
Background: Communicative actions are specific movements or gestures which are accomplished by vocal tract articulators (lips, tongue, velum etc.) for speech, by facial articulators (eye brows, eye lids etc.) for co-verbal facial expression and by other bodily articulators (hands, arms etc.) for co-verbal gesturing. While action-based approaches al...
Conference Paper
Full-text available
Kurzfassung: Artikulatorische Sprachsynthese zielt darauf ab, den gesamten Prozess der Sprachproduktion ausgehend von der Generierung der Sprechbewe-gungen aus einer phonologisch-phonetischen Spezifikation einer zu generierenden Äußerung (Steuermodell) über die detaillierte Beschreibung der Positionierung, Formung und Funktion aller Artikulatoren (...
Article
The limitation in performance of current speech synthesis and speech recognition systems may result from the fact that these systems are not designed with respect to the human neural processes of speech production and perception. A neurocomputational model of speech production and perception is introduced which is organized with respect to human ne...
Article
Full-text available
Hintergrund: Die vorsprachlichen Entwicklungsverläufe von früh mit Cochlea-Implantaten versorgten Kindern sind im deutschsprachigen Raum bisher wenig erforscht. Ziel der Studie war daher die exemplarische Darstellung der frühen Hörsprachentwicklung anhand eines Einzelfalls. Methode: Untersucht wurde ein Mädchen, das im Alter von 8;3 Monaten bilater...
Conference Paper
Full-text available
In this study the gesture duration and articulator velocity in con-sonant-vowel-transitions has been analysed using electromagnetic articulography (EMA). The receiver coils where placed on the tongue, lips and teeth. We found onset and offset durations which are statistically significant for a special articulator. The duration of the offset is affe...
Article
Full-text available
Early vocal development of German-speaking cochlear implant recipients has rarely been assessed so far. There-fore the purpose of this study was to describe the early vocal development following successful implantation. A case study was designed to assess the temporal progression of early vocal development in a young cochlear implant recipient who...

Questions

Questions (2)
Question
Ich habe eine Publikation (Steiner, Birkholz) gefunden, in der Ingmar Steiner alte von mir 1998 aufgestellte gestische Phasenregeln auf ein implementiertes Synthesemodell anwendet, dessen gestischer Ansatz erst nach 2005 entwickelt wurde. Ausserdem ist der in diesem Synthesmodell verwendete gestische Ansatz qualitativ ganz anders als der von mir entwickelte Ansatz in 1998. Es können als meine Phasenwerte gar nicht in diesem von Steiner genutzten Modell umgesetzt werden. Dennoch tut Steiner dies und sagt, dass meine Regeln bei komplexen Silben keine guten Ergebnisse liefern. Natürlich nicht: Er kann meine Phasenwerte gar nicht in VTL nutzen und mein Modell war auch nicht für komplexe Silben konzipiert. Herr Steiner, was machen Sie da eigentlich? Vielleicht sollten Sie mal etwas weniger publizieren und etwas mehr Zeit auf einzelne Publikationen verwenden. Masse ist nicht gleich Klasse!!!
Bernd Kröger

Network

Cited By