Fig 1 - uploaded by Alexander Gutkin
Content may be subject to copyright.
Simple three-stream template representation of phones /p/ and /b/ over a three symbol alphabet

Simple three-stream template representation of phones /p/ and /b/ over a three symbol alphabet

Source publication
Conference Paper
Full-text available
This paper deals with formulation of alternative structural approach to the speech recognition problem. In this approach, we require both the represen- tation and the learning algorithms defined on it to be linguistically meaningful, which allows the speech recognition system to discover the nature of the linguis- tic classes of speech patterns cor...

Contexts in source publication

Context 1
... a single token, each stream is a string of symbols from one of the corresponding alphabets. Figure 1 shows a simple representation for the two-class problem consisting of /p/ and /b/ consonants, for each of which two realizations are available. Each template consists of three independent distinctive feature streams (over a three-symbol alphabet) from the SPE features system defined in [9]. ...
Context 2
... an example representation in Fig. 1, and defining the weighted Levenshtein distance to act on the templates, we obtain a simple metric space where the set P consists of four templates and the metric is defined as a linear combination of three independent per-stream weighted Levenshtein edit distances over three different ...
Context 3
... the set of weights {∆ˆω{∆ˆ {∆ˆω } fromˆΩfromˆ fromˆΩ. The set of transformationsˆOformationsˆ formationsˆO is necessary since the concept of a distance can properly be defined only in terms of these operations. Figure 2 shows the non-trivial stream-specific transformations discovered during the learning process for the two-class phone problem of Fig. 1. These operations (the corresponding optimal sets of weightsˆΩweightsˆ weightsˆΩ /p/ andˆΩandˆ andˆΩ /b/ are not shown) together with the trivial one-symbol transformations form the optimal set of transformations for each class. Together with the corresponding sets of reference objectsˆCobjectsˆ objectsˆC + /p/ andˆCandˆ andˆC + /b/ ...

Similar publications

Article
Full-text available
This study examines the nature of the phonological representations mediating spoken word recognition b y means of phonetic priming in which primes and targets are phonetically similar but share no phonemes (GUESS - CAGE). We found an inhibitory priming effect of similar size for words and non-word primes in both a shadowing and a same-different jud...
Article
Full-text available
Phonological variation in speech production can neutralize phonemic distinctions. In some cases, the alternations also create lexical ambiguity, as in the sentence “A quick rum picks you up,” where the underlined sequence could be interpreted as either rum or as a place assimilated form of run. Three cross-modal priming experiments examined the per...
Article
Full-text available
Phonological alternations pose challenges for models of spoken word recognition in how surface information is mapped onto stored representations in the lexicon. In the current study, an auditory-auditory priming lexical decision experiment was conducted to investigate the alternating representations of Mandarin Tone 3 in both half-third and third t...
Conference Paper
Full-text available
This paper is a follow up on Müller, 2006 . It contains some comments on suggestions about the interaction of phrasal Constructions with constituent order that Adele Goldberg made at various occasions. In addition the paper discusses various HPSG analyses of particle verbs that assume lexical representations including phonologically specified parts...
Article
Full-text available
In Brazilian Portuguese, neoclassical elements (NCEs) may combine with both independent lexical words (e.g., 'psico' in 'psicolinguística' ‘psycholinguistics’) and non-lexical words (e.g., 'psico' in 'psicologia' ‘psychology’). This has led to the proposal that they have distinct prosodic representations depending on the type of structure that they...

Citations

... The learning is then carried out by matching an unknown pattern which each of these reference templates and selecting the one that best match the unknown input. This technique has been applied to different problems, in different fields, such as robotics for grasp selection (Herzog et al. 2012), spoken language learning (Gutkin & King 2005), detection and recognition of objects in images 1 NP-complete and NP-hard problems, where NP indicates that the problem has a Non Polynomial solution either in terms of computational time or of memory occupancy, or both. (Brunelli 2009, Cacciola et al. 2008, face recognition (Yuen et al. 2010), geostatistical modeling (Tahmasebi et al. 2012), and continuous speech recognition (De Wachter et al. 2007) among many. ...
Chapter
Full-text available
In the attempt to implement applications of public utility which simplify the user access to future, remote and nearby social services, new mathematical models and new psychological and computational approaches from existing cognitive frameworks and algorithmic solutions have been developed. The nature of these instruments is either deterministic, probabilistic, or both. Their use depends upon their contribute to the conception of new ICT functionalities and evaluation methods for modelling concepts of learning, reasoning, and data interpretation. This introductory chapter provide a brief overview on the theoretical and computational issues of such artificial intelligent methods and how they are applied to several research problems.
... In this chapter, we address the issue of learning in the phonological pseudo-metric spaces. The particular approach we pursue and some of the experimental results were previously reported by us in (Gutkin and King, 2005b). Informally, we can approach the learning problem by spelling out the following requirements, which are the main objectives of the work we describe in this chapter: ...
... Gutkin and King (2005b) (Chapter 4),Gutkin et al. (2004),,Gutkin and Gay (2005c),Gutkin and King (2005a) (Chapter 5). Author has been involved in the development of ETS 4 formalism as a member of Inductive Informatics Group run by Lev Goldfarb. ...
Thesis
Full-text available
Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new approaches are needed. The motivation behind the undertaken research is an observation that the notion of representation of objects and concepts that once was considered to be central in the early days of pattern recognition, has been largely marginalised by the advent of statistical approaches. As a consequence of a predominantly statistical approach to speech recognition problem, due to the numeric, feature vector-based, nature of representation, the classes inductively discovered from real data using decision-theoretic techniques have little meaning outside the statistical framework. This is because decision surfaces or probability distributions are difficult to analyse linguistically. Because of the later limitation it is doubtful that the gap between speech recognition and linguistic research can be bridged by the numeric representations. This thesis investigates an alternative, structural, approach to spoken language representation and categorisation. The approach pursued in this thesis is based on a consistent program, known as the Evolving Transformation System (ETS), motivated by the development and clarification of the concept of structural representation in pattern recognition and artificial intelligence from both theoretical and applied points of view. This thesis consists of two parts. In the first part of this thesis, a similarity-based approach to structural representation of speech is presented. First, a linguistically well-motivated structural representation of phones based on distinctive phonological features recovered from speech is proposed. The representation consists of string templates representing phones together with a similarity measure. The set of phonological templates together with a similarity measure defines a symbolic metric space. Representation and ETS-inspired categorisation in the symbolic metric spaces corresponding to the phonological structural representation are then investigated by constructing appropriate symbolic space classifiers and evaluating them on a standard corpus of read speech. In addition, similarity-based isometric transition from phonological symbolic metric spaces to the corresponding non-Euclidean vector spaces is investigated. Second part of this thesis deals with the formal approach to structural representation of spoken language. Unlike the approach adopted in the first part of this thesis, the representation developed in the second part is based on the mathematical language of the ETS formalism. This formalism has been specifically developed for structural modelling of dynamic processes. In particular, it allows the representation of both objects and classes in a uniform event-based hierarchical framework. In this thesis, the latter property of the formalism allows the adoption of a more physiologically-concreteapproach to structural representation. The proposed representation is based on gestural structures and encapsulates speech processes at the articulatory level. Algorithms for deriving the articulatory structures from the data are presented and evaluated.