Fig 2 - uploaded by Giuseppe Rizzo
Content may be subject to copyright.
An example of audio annotation using the annotation tool. On the left the waveform of a sound retrieved from Freesound, reproducing a dripping faucet. On the right the annotation tool front-end where the user is linking the Dripping category of Liquid sounds, to a temporal interval of the sound. The categories are provided by the Sound Producing Events Ontology that is loaded from the Web. 

An example of audio annotation using the annotation tool. On the left the waveform of a sound retrieved from Freesound, reproducing a dripping faucet. On the right the annotation tool front-end where the user is linking the Dripping category of Liquid sounds, to a temporal interval of the sound. The categories are provided by the Sound Producing Events Ontology that is loaded from the Web. 

Source publication
Conference Paper
Full-text available
Music and sound have a rich semantic structure which is so clear to the composer and the listener, but that remains mostly hidden to computing machinery. Nevertheless, in recent years, the introduction of software tools for music production have enabled new opportunities for migrating this knowledge from humans to machines. A new generation of thes...

Context in source publication

Context 1
... client-side component is a graphical user interface consisting of boxes, menus and input fields to let the user navigate the classes provided by the ontology. It also allows to choose one or more classes, and specifying the value for the attributes of a class, if present. According to the SoC (separation of concerns) guidelines, we developed our tool using HTML for page markup, CSS for graphical style and JavaScript to handle the program logic and user interactions. The jQuery framework 4 was used to manipulate the Document Object Model (DOM) and the jQuery UI 5 utilized for the GUI components like autocomplete, datepick- ers and complex behaviour handlers like draggable and droppable. The tool has been developed with attention to modular programming. In order to allow other developers to reuse the code, our annotation tool was divided into three reusable modules: owl.js, owl-ui.js and owl-ui.audio.js. · owl.js: requests an interpretation of a specified ontology from the server side component and converts this to an internal data model. · owl-ui.js: is responsible for the creation of the annotation tool panel, composed of menus and dynamic textboxes. It requires the owl.js library to populate the user interface widgets with the information retrieved from the ontology. · owl-ui.audio.js: creates an interface to annotate audio files. It allows the user to listen to files and, using the audio waveform image, to select a sub part of a sound in order to annotate it. Then, it allows opening of the annotation tool panel generated by the owl-ui.js library in order to annotate a sound with ontology classes. These libraries can be embedded into any web page, making it particularly easy for a developer to add the annotation feature to his own web application. Furthermore, it would be relatively easy to develop special user interface components for annotating other types of documents, like video or text. The second part of our tool is the server-side component. It is a SPARQL Pro- tocol and RDF Query Language (SPARQL) endpoint which makes queries over the ontology and retrieves all classes, properties and attributes. The response is generated in JavaScript Object Notation (JSON) format (but it is possible to request different output formats, like raw text and XML) and it is returned to the client side. Example 4.2 provides a sample response. JSON notation is used because it is a lightweight data-interchange format, it is readable by humans and can be easily converted from text to JavaScript object. The SPARQL endpoint runs on a Linux machine with the Apache 2 web server running and the PHP language available. Furthermore, the endpoint uses the Redland RDF libraries to interpret the data from the ontologies and they are a key component of our framework. Figure 1 shows the annotation tool flow chart. When our annotation tool is initialized, it makes a synchronous call to the SPARQL Endpoint hosted by a server machine and it sends three main parameters: the URL of the ontology to query, the SPARQL query to execute and the format of the response. The annotation tool receives a response, by default a JSON object, containing each class and subclass of the ontology. Processing this data our tool creates a JavaScript structure of objects containing the complete ontology hierarchy of classes. At this point the annotation tool creates the user interface populated with data retrieved from the ontology. The resulting GUI widget includes a textbox, in which dynamic suggestions are provided by means of the autocomplete feature. The user can traverse the class hierarchy through a tree menu to choose a concept related to the resource he is annotating. When the user selects a class from the menu or from the textbox he is presented with a new widget where the user can assign a value to each attribute of the class. The tool chooses the right widget for each possible attribute type. The annotations are collected into a stack and when the annotation process is completed, the user confirms the annotation. The tool generates an RDF repre- sentation of the annotations and subsequently sends it to a server where it could be stored in a triple store and retrieved later. Thanks to namespaces and URIs that identify uniquely a resource, the generated RDF/XML annotation holds the complete semantic description and the information the user has associated with the resource. In this section we illustrate the widgets developed to allow users to annotate audio resources. In this specific case, we retrieve a dripping faucet sound and its waveform image from the Freesound repository. As shown in Figure 2, a user may playback the entire sound and select a part of it by clicking on the audio’s waveform. This way the annotator can link multiple different concepts to the same sound, or even to particular events that occur within the sound recording. In this case we loaded the Sound Producing Events Ontology, based on the work of W. W. Gaver [3]. In his book “What in the world do we hear? an ecological approach to auditory event perception” he proposed a framework for describing sound in terms of audible source attributes. We formalized a possible ontology based on the work of Gaver, using the Web Ontology Language (OWL) and published it on the Web in the form of an RDF graph. The ontology used can be easily substituted by the developer, specifying the URL of another ontology, so that classes coming from the new repository can be available to the widgets, ready to be linked to a digital resource. The annotation tool front-end is composed of a text field with autocompletion (as shown in Figure 3), so that by typing a user receives suggestions on the available concepts. Alternatively, a user can traverse the complete concept hierarchy through a tree menu (as shown in Figure 4) which illustrates the concept relationships and that appears by clicking on the root concepts (in this case Vibrating objects, Liquid sounds and Gasses). On the lower part of the user interface there is a stack of concepts the user already linked to the resource. It allows editing of attributes of the concept, showing annotated sound parts and in addition permits to delete an annotation previously created. Clicking on the Confirm button, the annotation tool generates an RDF/XML GRAPH which stores all the links and the information between the OWL classes and resources. This graph can be easily stored and retrieved later. In order to test the capabilities of our annotation tool in terms of usability and reliability, we developed a web-based audio sequencer for the Web, where users can work with sounds, mixing and annotating them in a production environment. The tool is available at the test project web page 6 . This site implements the annotation tool technology described above and it works as a hub, because it refers to sound files hosted by the Freesound website. We chose to realize a web-based audio sequencer completely developed with standard web languages. We used the standard mark-up language designed for the Web, HTML, graphical customization allowed by CSS stylesheets and we handled the business logic and user interactions with JavaScript. We also tried to exploit the multimedia capabilities of the new version of the HTML standard, but our project required advanced audio synchronization features that HTML5 Audio does not yet provide. We had to fall back to Adobe Flash technology that is responsible for handling audio playback. Figure 6 shows three different layers of our ...

Similar publications

Article
Full-text available
We conducted a bottom-up analysis using stock and usage estimates from secondary sources, and our own power measurements. We measured power levels of the most common audio products in their most commonly used operating modes. We found that the combined energy consumption of standby, idle, and play modes of clock radios, portable stereos, compact st...
Conference Paper
Full-text available
Animated by the award-winning, free ambiX/mcfx VST plugins that improve efficiency and applicability of 3D audio productions in Ambisonics, this contribution demonstrates an example of their successful usage and extension as live effects in a latency-critical sound-reinforcement situation. We present 3D-audio live effects developed for and used dur...