Dominik Wagner's research works | Technische Hochschule Nürnberg Georg Simon Ohm, Nürnberg (OHM) and other places

What is this page?

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling

Preprint

Jun 2024

With the advancement of multimedia technologies, news documents and user-generated content are often represented as multiple modalities, making Multimedia Event Extraction (MEE) an increasingly important challenge. However, recent MEE methods employ weak alignment strategies and data augmentation with simple classification models, which ignore the...

Optimized Speculative Sampling for GPU Hardware Accelerators

Preprint

Jun 2024

In this work, we optimize speculative sampling for parallel hardware accelerators to improve sampling speed. We notice that substantial portions of the intermediate matrices necessary for speculative sampling can be computed concurrently. This allows us to distribute the workload across multiple GPU threads, enabling simultaneous operations on matr...

Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models

Preprint

Jun 2024

This paper explores the improvement of post-training quantization (PTQ) after knowledge distillation in the Whisper speech foundation model family. We address the challenge of outliers in weights and activation tensors, known to impede quantization quality in transformer-based language and vision models. Extending this observation to Whisper, we de...

Large Language Models for Dysfluency Detection in Stuttered Speech

Preprint

Jun 2024

Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of no...

Towards Interpretability of Automatic Phoneme Analysis in Cleft Lip and Palate Speech

Conference Paper

Apr 2024

Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments

Conference Paper

Dec 2023

Detection of Vowel Errors in Children’s Speech using Synthetic Phonetic Transcripts

Conference Paper

Dec 2023

Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate

Conference Paper

Aug 2023

Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

Conference Paper

Aug 2023

A Stutter Seldom Comes Alone – Cross-Corpus Stuttering Detection as a Multi-label Problem

Conference Paper

Aug 2023

Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks

Preprint

Jun 2023

Cycle-consistent generative adversarial networks have been widely used in non-parallel voice conversion (VC). Their ability to learn mappings between source and target features without relying on parallel training data eliminates the need for temporal alignments. However, most methods decouple the conversion of acoustic features from synthesizing t...

A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

Preprint

May 2023

Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corp...

Detecting Vocal Fatigue with Neural Embeddings

Article

Feb 2023

Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization. This paper investigates the effectiveness of neural embeddings for the detection of vocal fatigue. We compare x-vectors, ECAPA-TDNN, and wav2vec 2.0 embeddings on a corpus of academic spoken English. Low-dimensional mappings of the data reveal that n...

Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech

Conference Paper

Jan 2023

Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech

Preprint

Dec 2022

This work adapts two recent architectures of generative models and evaluates their effectiveness for the conversion of whispered speech to normal speech. We incorporate the normal target speech into the training criterion of vector-quantized variational autoencoders (VQ-VAEs) and MelGANs, thereby conditioning the systems to recover voiced speech fr...

Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

Preprint

Nov 2022

We analyze the impact of speaker adaptation in end-to-end architectures based on transformers and wav2vec 2.0 under different noise conditions. We demonstrate that the proven method of concatenating speaker vectors to the acoustic features and supplying them as an auxiliary model input remains a viable option to increase the robustness of end-to-en...

The Importance of Speech Stimuli for Pathologic Speech Classification

Preprint

Oct 2022

Current findings show that pre-trained wav2vec 2.0 models can be successfully used as feature extractors to discriminate on speaker-based tasks. We demonstrate that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used for binary classification of various types of pathologic speech. We exam...

Dysfluencies Seldom Come Alone -- Detection as a Multi-Label Problem

Preprint

Oct 2022

Specially adapted speech recognition models are necessary to handle stuttered speech. For these to be used in a targeted manner, stuttered speech must be reliably detected. Recent works have treated stuttering as a multi-class classification problem or viewed detecting each dysfluency type as an isolated task; that does not capture the nature of st...

Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

Preprint

Oct 2022

The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four different pathologies: Parkins...

Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

Conference Paper

Sep 2022

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

Conference Paper

Sep 2022

The Influence of Dataset Partitioning on Dysfluency Detection Systems

Chapter

Sep 2022

This paper empirically investigates the influence of different data splits and splitting strategies on the performance of dysfluency detection systems. For this, we perform experiments using wav2vec 2.0 models with a classification head as well as support vector machines (SVM) in conjunction with the features extracted from the wav2vec 2.0 model to...

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

Preprint

Jun 2022

This work aims to automatically evaluate whether the language development of children is age-appropriate. Validated speech and language tests are used for this purpose to test the auditory memory. In this work, the task is to determine whether spoken nonwords have been uttered correctly. We compare different approaches that are motivated to model s...

The Influence of Dataset Partitioning on Dysfluency Detection Systems

Preprint

Jun 2022

Detecting Vocal Fatigue with Neural Embeddings

Preprint

Apr 2022

Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

Preprint

Apr 2022

Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies whil...