Figure 3 - uploaded by Haejoong Lee
Content may be subject to copyright.
ODBC connect string dialog 

ODBC connect string dialog 

Source publication
Article
Full-text available
Four diverse tools built on the Annotation Graph Toolkit are described. Each tool associates linguistic codes and structures with time-series data. All are based on the same software library and tool architecture. TableTrans is for observational coding, using a spreadsheet whose rows are aligned to a signal. MultiTrans is for transcribing multi-par...

Context in source publication

Context 1
... Play and Stop The F1 key toggles playback and pausing of the recording. If the current region is chosen, it plays the current region. If a single point is selected in the waveform, F1 starts playback from the point, and annotations corresponding to the audio cursor are highlighted (aligned playback). Sort Annotations Double-clicking the left mouse button on a feature name (a column heading) of the spreadsheet sorts the annotations according to the values in the column. For example, double clicking on the cell titled ’f1’ will sort all the annotations according to the values in the column. Sorting according to start and end times can also be done from the menu items, Trans->Sort->Sort Annotations by Start Time and Trans->Sort->Sort Annotations by End Time . Find The menu item Trans->Find or the Control-f key combination brings up a dialog window to search a string in the spreadsheet. If a matching string is found in the cell it is highlighted and the annotation row becomes the current annotation. Control View of Annotations The menu item Trans->View->Show Select Rows lets the user specify a feature and a value so that only the annotations having this value as the feature will be displayed. The other menu items in Trans->View are for displaying all annotations and hiding all annotations in the spreadsheet. Open Sound/Movie File The menu item File->Open Sound File loads a sound file into the waveform panel. It automatically adjusts the number of waveforms according to the number of channels in the sound file. All sound file formats supported by WaveSurfer are supported. In the video version of TableTrans, the File->Open Movie File will open a movie file. Open Annotation File The menu item File->Open Annotation File provides the users with options to load an annotation file in the following formats: XML (AIF), Table Format (csv, etc.) and LCF (the LDC Callhome Format). The application opens appropriate dialog windows for each format. First, the user is prompted for the file. Then, for a Table format, a window for specifying names of features and a delimiter is opened. The names specified here are used as feature names and column headers. Save Annotations in File If a file name and a format type have been already chosen, the menu item File->Save will save all the annotations in the file. If they have not be chosen, the user needs to use one of the File->Save Annotations As menu items. Save Current Sound Region in File The menu item Sound->Save Current Region in File lets the user save the current region of the sound data into a file. Column configurations Column headers and width may be changed interactively, or may be specified in a configu- ration file. Database support TableTrans supports access to the database component of AGTK. Figure 3 shows the dialog window for entering a ODBC connect string. Table 1 shows some of the parameters used in a connect string. For a complete list, please see [ y/MyODBC_connect_parameters.html ]. MultiTrans is a transcription tool for transcribing multi- party conversations in multi-channel audio signals. The user interface is similar to Transcriber (Barras et al., 2001), but MultiTrans has one transcription panel corresponding to each channel in the signal. Figure 4 contains a screenshot of MultiTrans with a two-channel audio signal. The left transcription panel corresponds to the first channel in the audio signal, and the right transcription panel corresponds to the second channel. The boxes labeled from A to E in the figure illustrates the following points. A: The text panel for speaker 2. The channel associated with speaker 2 is the current channel. B: This is the second annotation for speaker 2. It is also the current annotation, and is highlighted. C: A segment for speaker 1. This shows some of the tran- script for this annotation and the associated region of the waveform. D: The highlighted region of the waveform for the second annotation. This is the current annotation and its entire waveform region is highlighted. E: The hollow play button. This button will play the current channel only, muting other speech channels. There are two ways to create an annotation, both are explained below. All annotations are inserted into the text panel, sorted by their starting position. Create Annotation (Specific) The first way to create an annotation is to explicitly highlight a region in the waveform and press the Return key. This will create a bullet in the appropriate text panel that designates the created annotation. Also, a region is created below the waveform itself to designate the created annotation. Only once an annotation is created can the transcription of that region begin. Create Annotation (Non-Specific) Another way to create an annotation is during speech playback. When the playback of speech is begun the user may press the Return key to insert an anchor in the current channel. When Return is pressed a small black bar will appear below the waveform. This designates the starting position of the current annotation. When Return is pressed a second time the end anchor for the annotation is inserted and the annotation is created. Ending speech playback destroys any start anchors that do not yet have an associated end anchor. Delete Current Annotation An annotation can be deleted using Control-d . To delete an annotation one must first select the annotation to be deleted. The current annotation can easily be distinguished from others because its transcription and waveform regions are highlighted. When an annotation is deleted its transcription and its association with the waveform region are also deleted. Change Current Annotation Once an annotation is created its region in the waveform can be changed without deleting the entire annotation. This is done by selecting the desired annotation (with a click in the text panel or segment below the waveform), and moving either the end point or start point of the annotation and pressing the Return key to register the change. Split Current Annotation Large annotations can be split into smaller annotations using the split current annotation command. First, the area in the text transcription where the split is to occur is selected. Next, the area in the waveform where the split is to occur is selected. Finally, the Return key is pressed and the old annotation is split into two new annotations, each associated with a different waveform region. Join Current Annotation Joining an annotation is the opposite of splitting an annotation. This is done by selecting an annotation and pressing the Shift-BackSpace key combination. This will merge the currently selected annotation region and transcription with the one that occurs im- mediately before it. Squeeze Current Annotation When an annotation is squeezed its starting boundary is pushed to the ending boundary of the previous annotation. This is used when annotations are desired to be separate, but one is to begin as soon as the other ends. This is done by selecting an annotation and pressing the Control-Shift-BackSpace key combination. Toggle Speech Playback There are several ways to begin speech playback. Pressing the Tab key will toggle speech playback of the current annotation, or the entire speech file if there is not a current annotation. Playback can also be initiated by pressing the solid play button in the waveform panel. Either of these commands will play all channels in the speech file. Pressing the hollow play button located in the waveform panel will play the current channel only, muting all other channels. This is useful when there are several channels that make speaker distinction difficult. InterTrans is an interlinear text editor. Interlinear text is a kind of text in which each word is annotated ...

Similar publications

Conference Paper
Full-text available
Since Mathematics really is about what mathematicians do, in this paper, we will look at the mathematical practice of framing, in which an object of interest is viewed in terms of well-understood math- ematical structures. The new perspective not only allows to deepen the understanding of a resp. object, it also facilitates new insights. We pro- po...
Article
Full-text available
The goal of this project is to implement and further improve the idea of a support graph as an extension to a spreadsheet application as presented by Peter Sestoft [CORE]. A support graph is used to perform as few recalculations as possible when data in a spreadsheet is altered. This is done by making a support graph that contains the dependencies...
Article
Full-text available
It has been known for some time that understanding the content of spreadsheet software used by many enterprises and organizations and discovering errors in that content is a difficult task. In particular, it is known that it takes time to create a spreadsheet, and many errors are included in the spreadsheet due to mistakes of the creator. In respon...
Article
Full-text available
This paper presents a project management game that evolves based on the participant's skills on time, cost and quality management. The game comes to the construction of a refinery unit whose activities are divided into three phases in which the player acts as the project manager, decides the resources allocation, and ensures that the project meets...

Citations

... However, a temporal database does not capture important information for data provenance such as activities performed on the data, agents acting on the data, and the relationships that the different versions of artifacts have to each other in various points in time. Annotation techniques [8] [13] [21] represent another perspective to model temporal relationships. In this technique, system entities are (optionally) labeled with time offsets. ...
... However, a temporal database does not capture important information for data provenance such as activities performed on the data, agents acting on the data, and the relationships that the different versions of artifacts have to each other in various points in time. Annotation techniques [8, 13, 21] represent another perspective to model temporal relationships. In this technique, system entities are (optionally) labeled with time offsets. ...
Article
Provenance refers to the documentation of an object's lifecycle. This documentation (often represented as a graph) should include all the information necessary to reproduce a certain piece of data or the process that led to it. In a dynamic world, as data changes, it is important to be able to get a piece of data as it was, and its provenance graph, at a certain point in time. Supporting time-aware provenance querying is challenging and requires: (i) explicitly representing the time information in the provenance graphs, and (ii) providing abstractions and efficient mechanisms for time-aware querying of provenance graphs over an ever growing volume of data. The existing provenance models treat time as a second class citizen (i.e. as an optional annotation). This makes time-aware querying of provenance data inefficient and sometimes inaccessible. We introduce an extended provenance graph model to explicitly represent time as an additional dimension of provenance data. We also provide a query language, novel abstractions and efficient mechanisms to query and analyze timed provenance graphs. The main contributions of the paper include: (i) proposing a Temporal Provenance Model (TPM) as a timed provenance model; and (ii) introducing two concepts of timed folder, as a container of related set of objects and their provenance relationship over time, and timed paths, to represent the evolution of objects tracing information over time, for analyzing and querying TPM graphs. We have implemented the approach on top of FPSPARQL, a query engine for large graphs, and have evaluated for querying TPM models. The evaluation shows the viability and efficiency of our approach.
... LDC has expertise in transcription and annotation of spoken corpora by maintaining a well-training staff of transcribers , annotators and experts in these fields. In addition, LDC has in-house software developers who are well experienced in designing and developing customized annotation tools (Maeda et al., 2006; Bird et al., 2002), such as the auditing and segmentation tool used in the creation of the SCC corpus. The tool displays a spectrogram which allows the auditor to identify words and word boundaries visually, as shown inFigure 2. ...
Article
Full-text available
Speech technology applications, such as speech recognition, speech synthesis, and speech dialog systems, often require corpora based on highly customized specifications. Existing corpora available to the community, such as TIMIT and other corpora distributed by LDC and ELDA, do not always meet the requirements of such applications. In such cases, the developers need to create their own corpora. The creation of a highly customized speech corpus, however, could be a very expensive and time-consuming task, especially for small organizations. It requires multidisciplinary expertise in linguistics, management and engineering as it involves subtasks such as the corpus design, human subject recruitment, recording, quality assurance, and in some cases, segmentation, transcription and annotation. This paper describes LDC's recent involvement in the creation of a low-cost yet highly-customized speech corpus for a commercial organization under a novel data creation and licensing model, which benefits both the particular data requester and the general linguistic data user community.
... Similar tools have been implemented with the same aim as the AbarHitz; Annotation Graph Toolkit (AGTK) (Bird et al., 2002), TREPIL Treebanking Interface (Rosén et al., 2005) are some examples. It is important to emphasize that the design of Abar-Hitz follows the general annotation schema we established for representing linguistic information and it is part of a general environment we have developed so far in which general processors and resources have been integrated. ...
Conference Paper
Full-text available
This paper deals with theoretical problems found in the work that is being carried out for annotating semantic roles in the Basque Dependency Treebank (BDT). We will present the resources used and the way the annotation is being done. Following the model proposed in the PropBank project, we will show the problems found in the annotation process and decisions we have taken. The representation of the semantic tag has been established and detailed guidelines for the annotation process have been defined, although it is a task that needs continuous updating. Besides, we have adapted AbarHitz, a tool used in the construction of the BDT, to this task.
... AbarHitz has been implemented to assist during the definition of dependencies among the words of the sentence. Similar tools have been implemented with the same aim as the AbarHitz; Annotation Graph Toolkit (AGTK) (Bird et al., 2002), TREPIL Treebanking Interface (Rosén et al., 2005) are some examples. It is important to emphasize that the design of AbarHitz follows the general annotation schema we established for representing linguistic information and it is part of a general environment we have developed so far in which general processors and resources have been integrated. ...
Conference Paper
Full-text available
This paper presents the work that has been carried out to annotate semantic roles in the Basque Depend ency Treebank (BDT) (Aldezabal et al., 2009). In this paper we will pre sent the resources we have used and the way the ann otation of 100 verbs has been done. We have followed the model proposed in the PropBank project (Palmer et al., 2005). In addition, we have adapted AbarHitz (Díaz de Ilarraza et al., 2004), a tool used in the construction of the Basque Dependency Treebank (BDT), for the task of annotating semantic roles.
... These data were entered into a Microsoft Excel spreadsheet using Unicode 5.1 characters [http://www.unicode.org], converted to a comma-delimited (CSV) file, and imported into TableTrans v. 1.2 software (Bird et al. 2002), where they were timealigned to the original audio recording by Judy Kuntz at the ILC on November 12–13, 2007. This annotation was outputted to an XML annotation graph output [http://www.w3.org/XML/] and transformed into an XML descriptive wordlist format using an XSLT script. ...
Article
Full-text available
This paper presents a 204-item digital wordlist of Mono, an Ubangian language spoken in the Democratic Republic of the Congo. The wordlist includes orthographic and broad phonetic transcriptions of each word, French and English glosses, an individual WAV recording of each item, GIF images of the original field transcriptions, and metadata for resource discovery. An archival form of the wordlist was deposited into an institutional archive (the SIL Language and Culture Archives) and includes the original WAV digital recording, descriptive markup encoding of the wordlist in XML employing Unicode 5.1 transcription, TIFF images of the original field transcriptions, and the metadata record. The presentation form was then generated directly from the archival form.
... The programming staff at LDC has created the tools and technical infrastructures to support the data creation efforts for both these programs as well as all other LDC projects. The majority of the annotated data was created with highly customized annotation tools (Maeda et al., 2006;Maeda and Strassel, 2004;Bird et al., 2002). In addition to annotation tools, LDC's programming staff creates tools and technical infrastructures for all aspects of data creation projects: data scouting, data collection, data selection, annotation, search, data tracking and workflow management. ...
Conference Paper
Full-text available
The Linguistic Data Consortium (LDC) creates a variety of linguistic resources ñ data, annotations, tools, standards and best practices ñ for many sponsored projects. The programming staff at LDC has created the tools and technical infrastructures to support the data creation efforts for these projects, creating tools and technical infrastructures for all aspects of data creation projects: data scouting, data collection, data selection, annotation, search, data tracking and worko w management. This paper introduces a number of samples of LDC programming staff's work, with particular focus on the recent additions and updates to the suite of software tools developed by LDC. Tools introduced include the GScout Web Data Scouting Tool, LDC Data Selection Toolkit, ACK - Annotation Collection Kit, XTrans Transcription and Speech Annotation Tool, GALE Distillation Toolkit, and the GALE MT Post Editing Worko w Management System.
... Bickford (1997); Kew and McConnel (1997); Maeda and Bird (2000); Bird et al. (2002); Maeda et al. (2002) ...
Conference Paper
Full-text available
Interlinear text has long been considered a valuable format in the presentation of multilingual data, and a variety of software tools have facilitated the creation and processing of such texts by researchers. Despite the diversity of tools, a common core of editorial functionality is provided. Identifying these core functions has important implications for software engineers who seek to efficiently build tools that support interlinear text editing. While few applications are specifically designed for the creation or manipulation of interlinear text, a number of tools offer varying degrees of incidental support for this modality. In this paper we provide a comprehensive set of critieria upon which the derivation of functional criteria can be based. We describe the basis on which a group of tools was selected for investigation, along with the evaluation criteria. Finally we consolidate our findings into a functional specification for the development of software applications for the editing of interlinear text.
... This form of transcription is ideal for datasets in which annotations are fully linked to media. Using the Annotation Graph ToolKit (AGTK) we have developed four useful sample applications: MultiTrans, TableTrans, TreeTrans, and InterTrans (Bird et al., 2002). With TableTrans (Figure 1), we have annotated a corpus of vervet monkey vocalizations (80 open-reel tapes) for variables such as call type, caller and recipient. ...
Article
Full-text available
The goal of the TalkBank project (http://talkbank.org) is to support data-sharing and direct, community-wide access to naturalistic recordings and transcripts of human and animal communication. Toward this end, we have constructed a web accessible database of transcripts linked to audio and video media within fields such as conversation analysis, classroom discourse, animal communication, gesture, meetings, second language acquisition, first language acquisition, bilingualism, tutoring, and legal oral argumentation. We discuss how we have taken discrepant databases from dozens of individual projects and merged them together into a well-structured uniform database in which transcripts can be opened online through browsers, allowing direct multimedia playback. To achieve translation across corpora, we have defined a general XML schema. The validity of this schema is checked by bidirectional conversion from alternative input formats to XML and back. The resultant transcripts are then linked to hinted media and XSLT is used to format web readable browsable multimedia transcripts playable through SMIL. A parallel pathway is used to support collaborative commentary and publication of PDF linked to media through special issues of journals in the relevant fields.
... This form of transcription is ideal for datasets in which annotations are fully linked to media. Using the Annotation Graph ToolKit (AGTK) we have developed four useful sample applications: MultiTrans, TableTrans, TreeTrans, and InterTrans (Bird et al., 2002). With TableTrans (Figure 1), we have annotated a corpus of vervet monkey vocalizations (80 open-reel tapes) for variables such as call type, caller and recipient. ...
Conference Paper
Full-text available
The goal of the TalkBank project (http://talkbank.org) is to support data-sharing and direct, community-wide access to naturalistic recordings and transcripts of human and animal communication. Toward this end, we have constructed a web accessible database of transcripts linked to audio and video media within fields such as conversation analysis, classroom discourse, animal communication, gesture, meetings, second language acquisition, first language acquisition, bilingualism, tutoring, and legal oral argumentation. We discuss how we have taken discrepant databases from dozens of individual projects and merged them together into a well-structured uniform database in which transcripts can be opened online through browsers, allowing direct multimedia playback. To achieve translation across corpora, we have defined a general XML schema. The validity of this schema is checked by bidirectional conversion from alternative input formats to XML and back. The resultant transcripts are then linked to hinted medi
... The standardisation work includes a content independent method of specifying regions and anchors in linear linguistic signals, and a query language over those regions and anchors. Similar work, with greater implemented functionality, is being done at the Linguistic Data Consortium 2 [21,20]. This work does not use either of these because they were insufficiently developed when the work began. ...
Article
This thesis develops work on using Hidden Markov Models to insert tags natural language text. A taxonomy of tags is developed unifying the fields of text segmentation tagging, part-of-speech tagging, proper noun extraction and hierarchical entity extraction. The search spaces for inserting tags are examined from both a theoretical and experimental point of view across the taxonomy and on four corpora. A analysis of different correctness measures for different types of tag insertion problem is undertaken and a technique to determine whether tag-insertion errors are the result of a modelling failure or a searching failure is discovered.