Jônatas Wehrmann

Jônatas Wehrmann
Pontifícia Universidade Católica do Rio Grande do Sul | PUCRS · Programa de Pós-Graduação em Ciência da Computação

Doctor of Philosophy

About

33
Publications
81,852
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,140
Citations
Additional affiliations
August 2016 - present
PUCRS/Motorola
Position
  • Researcher
Description
  • Deep Learning for image and text understanding.
January 2015 - August 2016
PUCRS/Motorola
Position
  • Researcher
Description
  • Deep learning for sentiment analysis, video classification, adult content detection.
Education
August 2016 - August 2020
Pontifícia Universidade Católica do Rio Grande do Sul
Field of study
  • Machine Learning / Deep Learning
January 2015 - July 2016
Pontifícia Universidade Católica do Rio Grande do Sul
Field of study
  • Machine Learning / Deep Learning

Publications

Publications (33)
Article
Full-text available
The amount of adult content on the Internet grows daily. Much of the pornographic content is unconstrained and freely-available for all users, requiring parents to make use of parental control strategies for protecting their children. Current parental control devices depend on human intervention, and hence there is the need of computational approac...
Article
Full-text available
This paper provides a very simple yet effective character-level architecture for learning bidirectional retrieval models. Aligning multimodal content is particularly challenging considering the difficulty in finding semantic correspondence between images and descriptions. We introduce an efficient character-level inception module, designed to learn...
Article
Full-text available
This paper proposes a framework for training language-invariant cross-modal retrieval models. We also introduce a novel character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage require...
Conference Paper
Full-text available
One of the most challenging machine learning problems is a particular case of data classification in which classes are hierarchically structured and objects can be assigned to multiple paths of the class hierarchy at the same time. This task is known as hierarchical multi-label classification (HMC), with applications in text classification, image a...
Preprint
Full-text available
In this paper, we propose MAGICSTYLEGAN and MAGICSTYLEGAN-ADA - both incarnations of the state-of-the-art models StyleGan2 and StyleGan2 ADA - to experiment with their capacity of transfer learning into a rather different domain: creating new illustrations for the vast universe of the game "Magic: The Gathering" cards. This is a challenging task es...
Conference Paper
We propose a framework for training language-invariant cross-modal retrieval models. We introduce four novel text encoding approaches, as well as a character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the characte...
Preprint
Full-text available
Text-to-image synthesis is the task of generating images from text descriptions. Image generation, by itself, is a challenging task. When we combine image generation and text, we bring complexity to a new level: we need to combine data from two different modalities. Most of recent works in text-to-image synthesis follow a similar approach when it c...
Preprint
Full-text available
Recent research advances in Computer Vision and Natural Language Processing have introduced novel tasks that are paving the way for solving AI-complete problems. One of those tasks is called Visual Question Answering (VQA). A VQA system must take an image and a free-form, open-ended natural language question about the image, and produce a natural l...
Conference Paper
Full-text available
In this paper, we introduce a novel approach for training image-text alignment models, namely ADAPT. Image-text alignment methods are often used for cross-modal retrieval, i.e., to retrieve an image given a query text, or captions that successfully label an image. ADAPT is designed to adjust an intermediate representation of instances from a modali...
Experiment Findings
Full-text available
We choose hyper-parameters based on the results over validation data. We employ Adam for optimization, with an initial learning rate of 6 × 10 −4 , which is further decreased 10× at the 15 th epoch for Flickr and M30K datasets, and at the 10 th epoch for COCO and YJ Captions datasets. We use a batch size of 128, leading us to select the hard-contra...
Conference Paper
Full-text available
Current state-of-the-art approaches for Natural Language Processing tasks such as text classification are either based on Recurrent or Convolutional Neural Networks. Notwithstanding , those approaches often require a long time to train, or large amounts of memory to store the entire trained models. In this paper, we introduce a novel neural network...
Conference Paper
Full-text available
The amount of digital pornographic content over the Internet grows daily and accessing such a content has become increasingly easier. Hence, there is a real need for mechanisms that can protect particularly-vulnerable audiences (e.g., children) from browsing the web. Recently, object detection methods based on deep neural networks such as CNNs have...
Conference Paper
Full-text available
The easy access and widespread of the Internet makes it easier than ever to reach content of any kind at any moment, and while that poses several advantages, there is also the fact that sensitive audiences may be inadvertently exposed to nudity content they did not ask for. Virtually every work on nudity and pornography censorship focus solely on p...
Conference Paper
Full-text available
Multimodal bidirectional retrieval is a very challenging task that consists of semantically aligning two distinct modalities such as images and textual descriptions, allowing the retrieval of content from one of the modalities given the other. The goal is to find a common semantic space for both modalities in order to discover the correspondences b...
Conference Paper
Full-text available
In this paper, we propose a novel approach for classifying both the sentiment and the language of tweets. Our proposed architecture comprises a convolutional neural network (ConvNet) with two distinct outputs, each of which designed to minimize the classification error of either sentiment assignment or language identification. Results show that our...
Article
The task of labeling movies according to their corresponding genre is a challenging classification problem, having in mind that genre is an immaterial feature that cannot be directly pinpointed in any of the movie frames. Hence, off-the-shelf image classification approaches are not capable of handling this task in a straightforward fashion. Moreove...
Article
Full-text available
With the novel and fast advances in the area of deep neural networks, several challenging image-based tasks have been recently approached by researchers in pattern recognition and computer vision. In this paper, we address one of these tasks, which is to match image content with natural language descriptions, sometimes referred as multimodal conten...
Conference Paper
Full-text available
Sentiment analysis of tweets is often monolingual and the models provided by machine learning classifiers are usually not applicable across distinct languages. Cross-language sentiment classification usually relies on machine translation strategies in which a source language is translated to the desired target language. Machine translation is costl...
Conference Paper
Full-text available
In classification tasks, an object usually belongs to one class within a set of disjoint classes. In more complex tasks, an object can belong to more than one class, in what is conventionally termed multi-label classification. Moreover, there are cases in which the set of classes are organised in a hierarchical fashion, and an object must be associ...
Conference Paper
In this paper, we explore the suitability of employing Convolutional Neural Networks (ConvNets) for multi-label movie trailer genre classification. Assigning genres to movies is a particularly challenging task because genre is an immaterial feature that is not physically present in a movie frame, so off-the-shelf image detection models cannot be ea...
Conference Paper
Learning content from videos is not an easy task and traditional machine learning approaches for computer vision have difficulties in doing it satisfactorily. However, in the past couple of years the machine learning community has seen the rise of deep learning methods that significantly improve the accuracy of several computer vision applications,...

Network

Cited By