RES system architecture.

Source publication

ECESS Platform for Web Based TTS Modules and Systems Evaluation

Chapter

Full-text available

Dec 2008

The paper presents platform for web based TTS modules and systems evaluation named RES (Remote Evaluation System). It is being developed within the European Centre of Excellence for Speech Synthesis (ECESS, www.ecess.eu). The presented platform will be used for web based online evaluation of various text-to-speech (TTS) modules, and even complete T...

Context 1

... main goal of ECESS (European Centre of Excellence on Speech Synthesis) is to push TTS technology and at the same time, speeding up the process from basic re- search to product. In order to achieve this goal, clearly defined procedures for evaluat- ing different TTS components and assessment of complete TTS systems have to be defined (Perez et al., 2006). In order to establish a so-called common test-bed for development and evaluation purposes, a web-based distributed system has been devel- oped and named RES (Remote Evaluation System). The idea is that each ECESS partner would be able to place locally one or even more TTS components on the web, or that each partner would be able to access various TTS components from different partners available via the web. Consequently, evaluation can be done remotely by any Institution able to access modules for evaluation without the need to install these modules locally. All modules included in the RES platform are accessible via TCP/IP. MRCP protocol is used for data exchange. The platform integrates a RES server with one or more RES module servers, and RES clients. Each of these modules performs a specific sequence of actions, as defined by XML- based protocol descriptions and their implementation as finite-state machines. The proprietary RES protocol composed of requests and responses is implemented between RES server and RES module servers. The TC-STAR XML format is used as protocol data format. Within TC-STAR project a complete TTS system was constructed by adaptation of the UIMA system from IBM (UIMA, 2007). Nevertheless, ECESS idea about split- ting TTS system in more specified modules is new and also the related evaluation of modules. This allows to test in more detail the different algorithms within a TTS system. The remainder of this paper is organized as follows. Section 2 describes the basic architecture of the RES TTS system. Section 3 describes protocols used in the RES system. Following chapters describe RES system modules from user perspective. Therefore, section 4 presents RES client, section 5 RES server, and section 6 RES module server. RES tools are presented in section 7. The last section draws the con- clusions. The basic architecture of the RES system is shown in Figure 1. It consists of an RES client, RES server, and one or more RES module servers. The RES client and RES module servers are located by different partners worldwide. All modules are connected to the internet using TCP/IP and UDP connections. Using such architecture, partners are able to place their modules locally (tools) on the web. Via a remote- access mechanism, each partner can then access any of these modules. The RES server can interact with arbitrary RES clients and can communicate with several clients at the same time. Furthermore, RES module servers can also communicate with several clients at the same time. The whole RES system is organized as complex client/server architecture. Firstly, an RES client establishes connection with the RES server and then the RES server further establishes client/server connection with the RES module server on the web. Inside ECESS there are partners who have only installed RES clients and also partners who have installed one or even more RES module servers. Each partner wanting to evaluate their module must also install a dedicated RES module server. The RES system is dedicated to distributing web-based online evaluation of various ECESS TTS components running at different Institutes and Universities worldwide. By using it, it is possible to run partial or complete TTS system, composed of selectable ECESS TTS components, on the web. An evaluation institution is able to perform different evaluation tasks by sending test data from the RES client and by re- ceiving results from the selected RES module servers running corresponding partners’ TTS components. The RES system can be used by partners needing to use the ECESS TTS modules or TTS components of other partners to test and improve the perform- ances of their own modules or tools. Partners are able to perform different evaluation tasks of their modules using those necessary additional modules and/or language resources of other partners installed locally at the other partners’ sites. Partners are able to integrate their own modules into a complete web-based TTS system using any necessary remaining modules of other partners. By using RES system, the partners can also build-up a complete TTS system via web, without using any of their own modules. How is it used? Partners have to use RES client in order to put input data to the RES system. Then they have to select one of the available XML scenarios for per- forming specific evaluation task. Selected scenarios are automatically transferred to RES server and RES module server(s). They define the behaviour of the RES system, which is strongly connected with the desired evaluation task. In agreement with the selected scenario, the users can then select the desired TTS component(s) available on the web, and run the RES system. Next, input data from the RES client are transferred to the RES server (main managing unit). The RES server establishes connection(s) with selected RES module server(s) and sends received data. The RES module server accepts data sent from the user and runs the partners’ tool. The generated output is sent back to the RES server and further to the RES client of the user, where it is automatically shown in the GUI interface. In order to make TTS component available to other partners, the partner needs to install the RES module server. He has to configure the RES module server with proper IP/port settings, specify the command-line syntax used for running his TTS component, and finally run the RES module server. After this is performed, all ECESS partners are able to run, test or evaluate his TTS component. Information which TTS components are available on the web is specified at the RES server. Every time an RES client establishes connection with the RES server, a list of available partners’ TTS components is automatically sent to the users, before selection of the desired TTS component is possible. In contrast to the TC- STAR, no manual effort is necessary for the developers of a TTS component to take part in the evaluation. Figure 2 shows the detailed distributed architecture of the RES system. The RES clients open real time streaming protocol (RTSP) sessions using an RES server, which then closes after any specific task on corresponding RES module server is finished. As RTSP is based on TCP, a secure connection-oriented protocol, there is no need for the RES client or RES server to implement any additional error-correction mechanisms. RTSP protocol serves as a support to the MRCP protocol (Media Resource Control Protocol) used between RES client and RES server. Namely, MRCP can be used to control speech synthesizers and recognizers by providing speech recognition and to stream audio from a common location to a user. When the RES server accepts an RES client’s demand in a specific thread, it opens a client/server connection with those specified RES module servers in which the specific task is performed, and the results are obtained. Results ...

View in full-text

A Text-to-Speech-based Digital Public Address System for Campus Broadcasting and Language Listening Training

Conference Paper

Full-text available

Jun 2013

This paper proposes a digital public address (PA) system which is capable of multi-zone and text-to-speech (TTS) broadcasting functions for campus broadcasting and language listening training/exam. The proposed digital PA system can achieve environmental broadcasting requirement which means different broadcasts for different zones at the same time....

Server-based Speech Technologies for Mobile Robotic Applications

Article

Full-text available

May 2013

Paper proposes the server-based technologies and the overall solution of the multimodal interface (speech and touchscreen) usable for mobile applications in robotics as well as in other domain. The server-based automatic speech recognition server, able to handle several audio input streams, has been designed, developed and connected to the Android...

RES system architecture.

Context in source publication

Similar publications