Tomoko Matsui
The Institute of Statistical Mathematics

Doctor of Engineering

About

160

Publications

19,786

Reads

2,224

Citations

November 1998 - December 2002

Advanced Telecommunications Research Institute

Kyoto, Japan

Position

Senior Researcher

April 1988 - December 2002

NTT

Japan

Position

Researcher

Publications

Evaluation results for the baseline systems and our original proposed...

Spatial–Temporal Temperature Forecasting Using Deep-Neural-Network-Based Domain Adaptation

Article

Full-text available

Jan 2024

Accurate temperature forecasting is critical for various sectors, yet traditional methods struggle with complex atmospheric dynamics. Deep neural networks (DNNs), especially transformer-based DNNs, offer potential advantages, but face challenges with domain adaptation across different geographical regions. We evaluated the effectiveness of DNN-base...

Figure 3. Structure of main algorithm for estimating hidden control...

Figure 9. Dendrogram illustrating clustering of countries on the basis...

Candidate weekly predictor covariates for five Japanese prefectures.

Main categorical levels consisting of three groups of policy measures.

Data-Driven Framework for Uncovering Hidden Control Strategies in Evolutionary Analysis

Article

Full-text available

Oct 2023

We devised a data-driven framework for uncovering hidden control strategies used by an evolutionary system described by an evolutionary probability distribution. This innovative framework enables deciphering of the concealed mechanisms that contribute to the progression or mitigation of such situations as the spread of COVID-19. Novel algorithms ar...

Generalised hyperbolic state space models with application to spatio-temporal heat wave prediction

Article

Sep 2023

A Dynamic Stochastic Integrated Climate–Economic Spatiotemporal Model for Agricultural Insurance Products

Article

Jun 2023

Chart of daily tweet count vs. reported number of COVID-19 infections...

LSTM model for COVID-19 case prediction with time lag δ configured to...

Social media (Twitter) reaction in Japan measured by total tweet count...

Anomalous scores of top-3 used emoji with corresponding tweet counts...

COVID-19 case prediction using emotion trends via Twitter emoji analysis: A case study in Japan

Article

Full-text available

Mar 2023

Introduction The worldwide COVID-19 pandemic, which began in December 2019 and has lasted for almost 3 years now, has undergone many changes and has changed public perceptions and attitudes. Various systems for predicting the progression of the pandemic have been developed to help assess the risk of COVID-19 spreading. In a case study in Japan, we...

Spatio-Temporal Generalised Hyperbolic Models with Application to Heatwave Prediction

Article

Jan 2023

Results of application to the Coat and the Yahoo! R3 datasets

Rating Proportion-Aware Binomial Matrix Factorization for Collaborative Filtering

Article

Full-text available

Jan 2023

Addressing biases in observed data is a major challenge in statistical and machine learning applications. This challenge also exists in recommendation systems, and various methods based on causal inference are being investigated. We investigate a collaborative filtering technique that robustly predicts ratings from biased observation. Utilizing the...

Daily chart of tweet counts vs. reported COVID-19 infections in Japan...

Mobility trends reports for Tokyo (23 districts), Japan. Reports are...

Logarithm of increasing rate of the day of the week for reported...

COVID-19 epidemic simulation system (t marks end timing of observable...

Daily chart of tweet counts vs. reported COVID-19 infections in the 6th...

Tweet Analysis for Enhancement of COVID-19 Epidemic Simulation: A Case Study in Japan

Article

Full-text available

Mar 2022

The COVID-19 pandemic, which began in December 2019, progressed in a complicated manner and thus caused problems worldwide. Seeking clues to the reasons for the complicated progression is necessary but challenging in the fight against the pandemic. We sought clues by investigating the relationship between reactions on social media and the COVID-19...

Fig. 2 k-means clustering for 53 cities in Tokyo by month based on...

-squared (adjusted) values for response variables of distances of χ 2...

Analysis of COVID-19 evolution based on testing closeness of sequential data

Article

Full-text available

Jan 2022

A practical algorithm has been developed for closeness analysis of sequential data that combines closeness testing with algorithms based on the Markov chain tester. It was applied to reported sequential data for COVID-19 to analyze the evolution of COVID-19 during a certain time period (week, month, etc.).

Fig 1. RMSE of the regression coefficients in cases without...

Fig 2. Bias of the regression coefficients in cases without...

Fig 3. RMSE of the regression coefficients in cases with overdispersion...

Fig 4. Bias of the regression coefficients in cases with overdispersion...

Fig 5. Means of the coefficient standard errors (N = 200) (x−axis: β 0...

Improved log-Gaussian approximation for over-dispersed Poisson regression: Application to spatial analysis of COVID-19

Article

Full-text available

Jan 2022

In the era of open data, Poisson and other count regression models are increasingly important. Still, conventional Poisson regression has remaining issues in terms of identifiability and computational efficiency. Especially, due to an identification problem, Poisson regression can be unstable for small samples with many zeros. Provided this, we dev...

FIGURE 2. Graphical model (left) and VAE architecture (right) of our...

FIGURE 3. Neural network architecture comprising VAE in a CVAE and...

FIGURE 4. Performance of each method on simulation data (J = 1500),...

Variational Autoencoder-Based Hybrid Recommendation With Poisson Factorization for Modeling Implicit Feedback

Article

Full-text available

Jan 2022

Hybrid recommendation, which is based on collaborative filtering and supplemented with auxiliary content information, is being actively researched due to its ability to overcome the cold-start problem. Many proposed hybrid methods make recommendations using Gaussian distribution-based collaborative filtering even though they handle variables that t...

Impact of COVID-19 type events on the economy and climate under the stochastic DICE model

Article

Nov 2021

The classical DICE model is a widely accepted integrated assessment model for the joint modeling of economic and climate systems, where all model state variables evolve over time deterministically. We reformulate and solve the DICE model as an optimal control dynamic programming problem with six state variables (related to the carbon concentration,...

Impact of COVID-19 type events on the economy and climate under the stochastic DICE model

Preprint

Full-text available

Nov 2021

Penalized Least Square in Sparse Setting with Convex Penalty and Non Gaussian Errors

Article

Nov 2021

This paper consider the penalized least squares estimators with convex penalties or regularization norms. We provide sparsity oracle inequalities for the prediction error for a general convex penalty and for the particular cases of Lasso and Group Lasso estimators in a regression setting. The main contribution is that our oracle inequalities are es...

Tweet Analysis for Enhancement of COVID-19 Epidemic Simulation: A Case Study in Japan

Preprint

Oct 2021

Figure 1. Comparison of the standard Gaussian distribution to the Tukey...

Figure 5. USA map with sensor locations measuring precipitation in 4...

Spatial Warped Gaussian Processes: Estimation and Efficient Field Reconstruction

Article

Full-text available

Oct 2021

A class of models for non-Gaussian spatial random fields is explored for spatial field reconstruction in environmental and sensor network monitoring. The family of models explored utilises a class of transformation functions known as Tukey g-and-h transformations to create a family of warped spatial Gaussian process models which can support various...

FIGURE 1. Proposed biometric speech cyber risk mitigation system.

FIGURE 3. Top panels represent˜srepresent˜ represent˜s(t) = sin(4πt) +...

FIGURE 4. Top panel: signal˜s signal˜ signal˜s(t) = sin(4πt)I [t ≤ t1]...

FIGURE 5. Diagram of our proposed methodology characterising EMD-MFCC...

FIGURE 6. The Mel filter bank structure for 40 filters. Each peak...

Machine Learning Mitigants for Speech Based Cyber Risk

Article

Full-text available

Oct 2021

Statistical analysis of speech is an emerging area of machine learning. In this paper, we tackle the biometric challenge of Automatic Speaker Verification (ASV) of differentiating between samples generated by two distinct populations of utterances, those of an authentic human voice and those generated by a synthetic one. Solving such an issue throu...

Multi-Source Domain Adaptation with Sinkhorn Barycenter

Conference Paper

Aug 2021

Spatial Warped Gaussian Processes: Estimation and Efficient Field Reconstruction

Preprint

Aug 2021

A class of models for non-Gaussian spatial random fields is explored for spatial field reconstruction in environmental and sensor network monitoring. The family of models explored utilises a class of transformation functions known as the Tukey g-and-h transformations to create a family of warped spatial Gaussian process models which can support var...

Figure 2: k-means clustering for 53 cities in Tokyo by month based on...

Figure 3: For Shinjuku (left) and Tachikawa (right), 13months ×...

Figure 4: For all cities in Tokyo, 57weeks× 57weeks matrices of...

Figure 6: Acceptance probabilities and distance of χ 2 -type statistic...

Key factor candidates as predictor variables.

Analysis of COVID-19 evolution based on testing closeness of sequential data

Preprint

Full-text available

Jun 2021

Compositionally-warped additive mixed modeling for a wide variety of non-Gaussian spatial data

Article

Full-text available

May 2021

As with the advancement of geographical information systems, non-Gaussian spatial data sets are getting larger and more diverse. This study develops a general framework for fast and flexible non-Gaussian regression, especially for spatial/spatiotemporal modeling. The developed model, termed the compositionally-warped additive mixed model (CAMM), co...

Improved log-Gaussian approximation for over-dispersed Poisson regression: application to spatial analysis of COVID-19

Preprint

Full-text available

Apr 2021

In the era of open data, Poisson and other count regression models are increasingly important. Provided this, we develop a closed-form inference for an over-dispersed Poisson regression, especially for (over-dispersed) Bayesian Poisson wherein the exact inference is unobtainable. The approach is derived via mode-based log-Gaussian approximation. Un...

Compositionally-warped additive mixed modeling for a wide variety of non-Gaussian spatial data

Preprint

Full-text available

Jan 2021

As with the advancement of geographical information systems, non-Gaussian spatial data is getting larger and more diverse. Considering this background, this study develops a general framework for fast and flexible non-Gaussian regression, especially for spatial/spatiotemporal modeling. The developed model, termed the compositionally-warped additive...

Impact of COVID-19 Type Events on the Economy and Climate Under the Stochastic DICE Model

Article

Jan 2021

Study on spacial-temporal analysis of ground surface temperature for global warming countermeasures

Article

Nov 2020

Tomoko Matsui

Extreme weather events can arrive unannounced and cause immense harm for communities. Especially in cities where many people live in close proximity, events like flash flooding, windstorms or even heat waves can cause property damage, overworking of the emergency infrastructure and death. Unfortunately, because climate change continues to alter wea...

FIGURE 1: Air temperature monitoring stations in greater Tokyo...

FIGURE 4: Example boxplots of estimated t-values of regression...

FIGURE 5: Emulated air temperatures (top) and ground temperatures...

FIGURE 6: Land cover map for target area showing land cover type...

FIGURE 7: Boxplots of estimated t-values of regression coefficients for...

Spatio-Temporal Analysis of Urban Heatwaves Using Tukey g-and-h Random Field Models

Article

Full-text available

Jul 2020

Real-time heatwave risk management with fine-grained spatial resolution is important for analysis of urban heat island (UHI) effects and local heatwaves. This study analyzed the spatio-temporal behavior of ground temperatures and developed methods for modeling them. The developed models consider two higher-order stochastic spatial properties (skewn...

Spatiotemporal analysis of urban heatwaves using Tukey g-and-h random field models

Preprint

Full-text available

Apr 2020

The statistical quantification of temperature processes for the analysis of urban heat island (UHI) effects and local heat-waves is an increasingly important application domain in smart city dynamic modelling. This leads to the increased importance of real-time heatwave risk management on a fine-grained spatial resolution. This study attempts to an...

Which Risk Factors Drive Oil Futures Price Curves?

Article

Jan 2020

We develop extensions that introduce regression structure to the multi-factor stochastic models of commodity futures price term structure dynamics. We demonstrate the accuracy with which these models can be calibrated to oil futures data and how they improve on existing models both in model fit and in model interpretation. We found leading observab...

Spatiotemporal Analysis of Urban Heatwaves Using Tukey G-and-H Random Field Models

Article

Full-text available

Jan 2020

A Robust High-Dimensional Bayesian Filter: THE Stochastic GH-GEnKF

Conference Paper

Dec 2019

A sequential prediction method of quasi-periodicity based on Gaussian process state space model

Conference Paper

Nov 2019

Spatiotemporal Heatwave Risk Modeling Combining Multiple Observations

Conference Paper

Jul 2019

A Gps-Based Simple Evaluation Simulation Approach: Case Study in Joso, Japan

Conference Paper

Jul 2019

Optimization of local microgrid model for energy sharing considering daily variations in supply and demand

Article

Full-text available

Feb 2019

This study develops an approach for optimizing the size/scale of microgrids used in electricity sharing around each residence by considering the uncertainty between the electricity supply from photovoltaics and electricity demand. Uncertainties are quantified using simulations that consider actual daily variations in supply and demand. The develope...

Articulatory and Spectrum Information Fusion Based on Deep Recurrent Neural Networks

Article

Jan 2019

Supplementary Material: Which Risk Factors Drive Oil Futures Price Curves?

Article

Jan 2019

A Weighted Spatio-Temporal Model for County Yields

Article

Jan 2019

Figure 1. Study area: Sumida ward, Tokyo, Japan.

Table 1 . Estimated total carbon emissions in Sumida ward (kt).

Figure 4. GPS classification results on 7 November 2016 as an example:...

Figure 5. Carbon mapping of total, direct, and indirect emissions at...

Seasonal Urban Carbon Emission Estimation Using Spatial Micro Big Data

Article

Full-text available

Nov 2018

The objective of this study is to map direct and indirect seasonal urban carbon emissions using spatial micro Big Data, regarding building and transportation energy-use activities in Sumida, Tokyo. Building emissions were estimated by considering the number of stories, composition of use (e.g., residence and retail), and other factors associated wi...

Data augmentation with moment-matching networks for i-vector based speaker verification

Conference Paper

Nov 2018

I-vector-based speaker identification with extremely short utterances for both training and testing

Conference Paper

Oct 2017

Counting public transport passenger using WiFi signatures of mobile devices

Conference Paper

Oct 2017

Forecasting covariance for optimal carry trade portfolio allocations

Conference Paper

Mar 2017

Articulatory and Spectrum Features Integration using Generalized Distillation Framework

Conference Paper

Full-text available

Sep 2016

It has been shown that by combining the acoustic and artic-ulatory information significant performance improvements in automatic speech recognition (ASR) task can be achieved. In practice, however, articulatory information is not available during recognition and the general approach is to estimate it from the acoustic signal. In this paper, we prop...

Robust Speech Recognition Using Generalized Distillation Framework

Conference Paper

Full-text available

Sep 2016

Voice Liveness Detection for Speaker Verification based on a Tandem Single/Double-channel Pop Noise Detector

Conference Paper

Full-text available

Jun 2016

Participatory Sensing Data Tweets for Micro-Urban Real-Time Resiliency Monitoring and Risk Management

Article

Full-text available

Jan 2016

Real-time urban climate monitoring provides useful information that can be utilized to help urban management personnel to monitor and adapt their precautionary measures to extreme events, including urban heatwaves. Fortunately, recently created social media platforms, such as Twitter, furnish real-time and high-resolution spatial information that m...

Which Risk Factors Drive Oil Futures Price Curves? Speculation and Hedging in the Short and Long-Term

Article

Jan 2016

Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification

Conference Paper

Full-text available

Sep 2015

Dynamic Speech Emotion Recognition with State-Space Models

Article

Full-text available

Sep 2015

Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human emotion is continuous and time-varying. In this paper, we present a recognition system for dynamic speech emotion based on state-space models (SSMs). The prediction of the unknown emotion trajectory in the...

Road condition classification using a new global alignment kernel

Conference Paper

Sep 2015

The development of the so-called intelligent tire has changed the role of the tire. Here we discuss a real-time road condition classification system that employs monitoring tire acceleration. Because the tire acceleration is non-stationary and is warped non-linearly in the time domain, we applied the time alignment algorithm to it similarly to spee...

Dynamic speech emotion recognition with state-space models

Conference Paper

Aug 2015

1505.06188

Data

Full-text available

Jul 2015

Wind Storm Estimation using a Heterogeneous Sensor Network with High and Low Resolution Sensors

Conference Paper

Jun 2015

A Spatiotemporal Analysis of Participatory Sensing Data 'Tweets' and Extreme Climate Events Toward Real-Time Urban Risk Management

Article

May 2015

Real-time urban climate monitoring provides useful information that can be utilized to help monitor and adapt to extreme events, including urban heatwaves. Typical approaches to the monitoring of climate data include weather station monitoring and remote sensing. However, climate monitoring stations are very often distributed spatially in a sparse...

Upper panel: realization from a 2-d gaussian process. The black markers...

Wind speed prediction of the dagmar storm. The rectangular in the upper...

Wind speed prediction of the ulli storm. The normalized mean squared...

Wind speed detection vs. False alarm probabilities of dagmar and ulli...

Estimation of Spatially Correlated Random Fields in Heterogeneous Wireless Sensor Networks

Article

Full-text available

May 2015

We develop new algorithms for spatial field reconstruction, exceedance level estimation and classification in heterogeneous (mixed analog & digital sensors) Wireless Sensor Networks (WSNs). We consider spatial physical phenomena which are observed by a heterogeneous WSN, meaning that it consists partially of sparsely deployed high-quality sensors a...

Speech and Music Emotion Recognition Using Gaussian Processes

Chapter

Full-text available

Jan 2015

Gaussian Processes (GPs) are Bayesian nonparametric models that are becoming more and more popular for their superior capabilities to capture highly nonlinear data relationships in various tasks ranging from classical regression and classification to dimension reduction, novelty detection and time series analysis. Here, we introduce Gaussian proces...

How to Utilize Sensor Network Data to Efficiently Perform Model Calibration and Spatial Field Reconstruction

Chapter

Jan 2015

This chapter provides a tutorial overview of some modern applications of the statistical modeling that can be developed based upon spatial wireless sensor network data. We then develop a range of new results relating to two important problems that arise in spatial field reconstructions from wireless sensor networks. The first new result allows one...

Modern Methodology and Applications in Spatial-Temporal Modeling

Book

Jan 2015

This book provides a modern introductory tutorial on specialized methodological and applied aspects of spatial and temporal modeling. The areas covered involve a range of topics which reflect the diversity of this domain of research across a number of quantitative disciplines. For instance, the first chapter deals with non-parametric Bayesian infer...

Theoretical Aspects of Spatial-Temporal Modeling

Book

Jan 2015

This book provides a modern introductory tutorial on specialized theoretical aspects of spatial and temporal modeling. The areas covered involve a range of topics which reflect the diversity of this domain of research across a number of quantitative disciplines. For instance, the first chapter provides up-to-date coverage of particle association me...

Communications Meets Copula Modeling: Non-Standard Dependence Features in Wireless Fading Channels

Article

Dec 2014

Copula models have started to be explored in wireless communications, however to date the properties they offer have not been proven or verified on real data experiments. In this paper we provide the first real evidence that the features they offer will provide beneficial modeling capabilities in wireless channel models, which are not just theoreti...

Evaluation of invalid input discrimination using BOW for speech-oriented guidance system

Conference Paper

Aug 2014

We investigate a discrimination method for invalid and valid inputs, received by a speech-oriented guidance system operating in a real environment. Invalid inputs include background voices, which are not directly uttered to the system, and nonsense utterances. Such inputs should be rejected beforehand. We have reported methods using not only the li...

Topic Classification of Spoken Inquiries Using Transductive Support Vector Machine

Chapter

Aug 2014

In this work, we address the topic classification of spoken inquiries in Japanese that are received by a guidance system operating in a real environment, with a semi-supervised learning approach based on a transductive support vector machine (TSVM). Manual data labeling, which is required for supervised learning, is a costly process, and unlabeled...

Modelling threshold exceedence levels for spatial stochastic processes observed by sensor networks

Conference Paper

Apr 2014

We develop a new framework for explicitly modelling the threshold exceedence levels of the spatial stochastic process being monitored by a sensor network. Our framework also allows incorporating additional observed features as explanatory factors for the behaviour of the spatial stochastic process, and in particular the probability of exceedence of...

Music Genre and Emotion Recognition Using Gaussian Processes

Article

Full-text available

Jan 2014

Gaussian Processes (GPs) are Bayesian nonparametric models that are becoming more and more popular for their superior capabilities to capture highly nonlinear data relationships in various tasks, such as dimensionality reduction, time series analysis, novelty detection, as well as classical regression and classification tasks. In this paper, we inv...

Dynamic music emotion recognition using state-space models

Article

Jan 2014

This paper describes the temporal music emotion recogni- tion system developed at the University of Aizu for the Emo- tion in Music task of the MediaEval 2014 benchmark evalua- tion campaign. The arousal-valence trajectory prediction is cast as a time series ltering task and is modeled by a state- space models. These models include standard linear...

High level feature extraction for the self-taught learning algorithm Sparse modeling for speech and audio processing

Article

Full-text available

Dec 2013

Availability of large amounts of raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and unlabeled data come from the same distribution. This restriction is removed in the self-taught learning algorithm where unlabeled data can be different, but nevertheless have s...

Bayesian semi-supervised audio event transcription based on Markov indian buffet process

Conference Paper

Oct 2013

We present a novel generative model for audio event transcription that recognizes “events” on audio signals including multiple kinds of overlapping sounds. In the proposed model, firstly, the overlapping audio events are modeled based on nonnegative matrix factorization into which Bayesian nonparametric approaches: the Markov Indian buffet process...

Modeling head-related transfer functions via spatial-temporal Gaussian process

Conference Paper

Oct 2013

We propose a novel application of a family of non-parametric statistical models to estimate head-related transfer functions (HRTFs) using spatial-temporal Gaussian processes (GPs). In this approach, we model the head-related impulse response (HRIR) utilizing non-parametric regression via a GP. The challenge posed by this problem involves accurate m...

Music Genre Classification using Gaussian Process Models

Conference Paper

Full-text available

Sep 2013

In this paper we introduce Gaussian Process (GP) models for music genre classification. Gaussian Processes are widely used for various regression and classification tasks, but there are relatively few studies where GPs are applied in the audio signal processing systems. The GP models are non-parametric discriminative classifiers similar to the well...

Modeling room impulse response via composites of spatial-temporal Gaussian processes

Article

May 2013

We develop a novel algorithm to estimate a spatial-temporal transfer function of a time-domain room impulse response for reverberation in closed environments. This novel approach involves developing two non-parametric models, one for the early phase and the other for the late phase for reverberation. These models are based on a composite of two Gau...

Comparison of Methods for Topic Classification of Spoken Inquiries

Article

Apr 2013

In this work, we address the topic classification of spoken inquiries in Japanese that are received by a speech-oriented guidance system operating in a real environment. The classification of spoken inquiries is often hindered by automatic speech recognition (ASR) errors, the sparseness of features and the shortness of spontaneous speech utterances...

Evaluation Framework Design of Spoken Term Detection Study at the NTCIR-9 IR for Spoken Documents Task

Article

Full-text available

Dec 2012

This paper describes a design of spoken term detection (STD) studies and their evaluating framework at the STD sub-task of the NTCIR-9 IR for Spoken Documents (SpokenDoc) task. STD is the one of information access technologies for spoken documents. The goal of the STD sub-task is to rapidly detect presence of a given query term, consisting of word...

Spoken inquiry discrimination using bag-of-words for speech-oriented guidance system

Conference Paper

Sep 2012

Nonnegative matrix factorization based self-taught learning with application to music genre classification

Conference Paper

Full-text available

Sep 2012

Availability of large amounts of raw unlabeled data has sparked the recent surge in semi-supervised learning re-search. In most works, however, it is assumed that labeled and unlabeled data come from the same distribution. This restriction is removed in the self-taught learning approach where unlabeled data can be different, but nevertheless have s...

Designing an Evaluation Framework for Spoken Term Detection and Spoken Document Retrieval at the NTCIR-9 SpokenDoc Task

Conference Paper

Full-text available

May 2012

Music Genre Classification using Self-Taught Learning via Sparse Coding

Conference Paper

Full-text available

Mar 2012

Availability of large amounts of raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and unlabeled data come from the same distribution. This restriction is removed in the self-taught learning approach where unlabeled data can be different, but nevertheless have si...

Interactive visualization and search system for speech corpora

Article

Oct 2011

We have already reported a corpus similarity visualization method based on the corpus attribute using multidimensional scaling that makes it easy for users to utilize various speech corpora. In this paper, we present a revised visualization method that is based on a ring structure like a planisphere. By using only a mouse, a user can choose appropr...

Extraction of User Interaction Patterns for Low-Usability Web Pages

Conference Paper

Jul 2011

Our goal is to point out usability problems in web pages in order to improve the web usability. We investigate the relation between user interaction behaviors in web-viewing and evaluation results of web usability by subjects. And we extract discriminative patterns for user interaction behaviors in visited web pages with low usability by using the...

Gradient-based musical feature extraction based on scale-invariant feature transform

Article

Full-text available

Jan 2011

We investigate a novel gradient-based musical feature ex-tracted using a scale-invariant feature transform. This fea-ture enables dynamic information in music data to be effec-tively captured time-independently and frequency-independently. It will be useful for various music applica-tions such as genre classification, music mood classification, and...

Out-of-Task Utterance Detection Based on Bag-of-Words Using Automatic Speech Recognition Results

Article

Full-text available

Jan 2011

Example-based question answering (QA) is an ef-fective approach for real-world spoken dialogue systems. A limitation of an example-based QA is that a system cannot appropriately respond to a user's question, if a similar question-answer pair does not exist in the question and answer database (QADB). For a robust spoken dialogue system, it is import...

Topic classification of spoken inquiries based on stacked generalization

Article

Jan 2011

Stacked generalization is a method that allows combining output of multiple classifiers using a second-level classification, minimizing the generalization error of first-level classifiers and achieving greater predictive accuracy. In a previous work, we compared the performance of support vector machine (SVM) with radial basis function (RBF) kernel...

Discriminant analysis for detection of low usability web pages

Article

Jan 2011

The purpose of this work is to reduce the cost of the web usability evaluation by usability testing. The cost will reduce by detecting low usability web pages. We analyzed empirically to find detectable metrics from the quantitative data including eye movement. We investigate the relation between the quantitative data about the behavior of users an...

Constructing Japanese test collections for spoken term detection

Conference Paper

Full-text available

Sep 2010

Comparison of methods for topic classification in a speech-oriented guidance system

Conference Paper

Full-text available

Sep 2010

This work addresses the classification in topics of utterances in Japanese, received by a speech-oriented guidance system operating in a real environment. For this, we compare the performance of Support Vector Machine and PrefixSpan Boosting, against a conventional Maximum Entropy classification method. We are interested in evaluating their strengt...

Penalized Logistic Regression With HMM Log-Likelihood Regressors for Speech Recognition

Article

Sep 2010

Hidden Markov models (HMMs) are powerful generative models for sequential data that have been used in automatic speech recognition for more than two decades. Despite their popularity, HMMs make inaccurate assumptions about speech signals, thereby limiting the achievable performance of the conventional speech recognizer. Penalized logistic regressio...

Semi-supervised speaker identification under covariate shift

Article

Aug 2010

In this paper, we propose a novel semi-supervised speaker identification method that can alleviate the influence of non-stationarity such as session dependent variation, the recording environment change, and physical conditions/emotions. We assume that the voice quality variants follow the covariate shift model, where only the voice feature distrib...

Speech classification using penalized logistic regression with hidden Markov model log-likelihood regressors.

Article

Mar 2010

Penalized logistic regression (PLR) is a well-founded discriminative classifier with long roots in the history of statistics. Speech classification with PLR is possible with an appropriate choice of map from the space of feature vector sequences into the Euclidean space. In this talk, one such map is presented, namely, the one that maps into vector...

Acceleration of sequence kernel computation for real-time speaker identification

Conference Paper

Full-text available

Jan 2010

The sequence kernel has been shown to be a promising kernel function for learning from sequential data such as speech and DNA. However, it is not scalable to massive datasets due to its high computational cost. In this paper, we propose a method of approximating the sequence kernel that is shown to be computationally very efficient. More specifical...

A Proposal for Standardizing Catalogue Specifications of Speech Corpora

Article

Full-text available

Jan 2010

Speech corpora are indispensable to speech research. There are several data centers in the world that serve as repositories for various speech corpora. However, they use different specification items for their corpora, and so it is difficult to compare their corpora. It would be more convenient for corpus users if the data centers were to use a com...

Training data size requirements for topic classification in a speech-oriented guidance system

Article

Jan 2010

In this work, we address the classification in topics of utterances in Japanese received by a speech-oriented guidance system operating in a real environment. The implementation of this kind of systems requires the collection and manual labeling of actual user's utterances, which is a costly process. Because of this, we are interested in evaluating...

Utilization of acoustical feature in visualization of multiple speech corpora

Article

Aug 2009

The purpose of this study is to visualize the similarities among multiple speech corpora. In order for users to easily utilize various speech corpora, we reported a visualization method based on the corpus attribute using MDS. We had proposed the eight attributes as the speech corpus features. However, these attributes contained no acoustical featu...

Covariate shift adaptation for semi-supervised speaker identification

Conference Paper

Apr 2009

In this paper, we propose a novel semisupervised speaker identification method that can alleviate the influence of non-stationarity such as session dependent variation, the recording environment change, and physical condition/emotion. We assume that the utterance variation follows the covariate shift model, where only the utterance sample distribut...

High-level feature extraction using SVM with walk-based graph kernel

Conference Paper

Full-text available

Apr 2009

We investigate a method using support vector machines (SVMs) with walk-based graph kernels for high-level feature extraction from images. In this method, each image is first segmented into a finite set of homogeneous segments and then represented as a segmentation graph where each vertex is a segment and edges connect adjacent segments. Given a set...

Automatic Speech Recognition via N-Best Rescoring using Logistic Regression

Chapter

Full-text available

Nov 2008

A two-step approach to continuous speech recognition using logistic regression on speech segments has been presented. In the first step, a set of hidden Markov models (HMMs) is used in conjunction with the Viterbi algorithm in order to generate an N-best list of sentence hypotheses for the utterance to be recognized. In the second step, each senten...

MDS-based visualization method for multiple speech corpora

Conference Paper

Sep 2008

Speaker verification with non-audible murmur segments by combining global alignment kernel and penalized logistic regression machine

Conference Paper

Full-text available

Sep 2008

We investigate a novel method for speaker verification with non-audible murmur (NAM) segments. NAM is recorded using a special microphone placed on the neck and is hard for other people to hear. We have already reported a method based on a support vector machine (SVM) using NAM segments to use a keyword phrase effectively. To further exploit keywor...

ISM TRECVID2008 High-level Feature Extraction.

Conference Paper

Jan 2008

A Fast Sequence Kernel for Sequential Data Classification

Article

Full-text available

Jan 2008

In this paper, we propose a sequence kernel with fast computation. The kernel is approximately cal-culated by using a mean vector in feature space. We further studied on log normalization of a sequence kernel to avoid the diagonal dominance problem in this paper. In text-independent speaker identification experiments with 10 male speakers, our appr...

Study on speaker verification with non-audible murmur segments

Conference Paper

Aug 2007

We investigated a speaker verification method that uses non-audible murmur (NAM) segments using newly collected data and obtained several findings that will be useful when speaker verification systems are made in practice. NAM is recorded using a special microphone placed on the surface of the body, so it includes almost no external noise and is ha...

A Kernel for Time Series Based on Global Alignments

Conference Paper

Full-text available

May 2007

We propose in this paper a new family of kernels to handle time series, notably speech data, within the framework of kernel methods which includes popular algorithms such as the support vector machine. These kernels elaborate on the well known dynamic time warping (DTW) family of distances by considering the same set of elementary operations, namel...

Network

Y. Bengio
Université de Montréal
Björn Schuller
Technische Universität München
Stephanie Seneff
Massachusetts Institute of Technology
Ganggang Xu
University of Miami
Nicholas Evans
EURECOM

Abdel-rahman Al-Qawasmi
Majmaah University
Hemant Patil
Dhirubhai Ambani Institute of Information and Communication Technology
Md Sahidullah
TCG CREST
Khaled Daqrouq
King Abdulaziz University
Pierre-Francois Marteau
Université Bretagne Sud, Vannes, France