Structure of the multi-font character recognition CNN used in the legibility evaluation.

Source publication

GlyphGAN: Style-consistent font generation based on generative adversarial networks

Article

Full-text available

Aug 2019

In this paper, we propose GlyphGAN: style-consistent font generation based on generative adversarial networks (GANs). GANs are a framework for learning a generative model using a system of two neural networks competing with each other. One network generates synthetic images from random input vectors, and the other discriminates between synthetic an...

The overall structure of the proposed CITNet.

Distribution before and after Gaussian modulation. (a) Original...

The specific architecture description of complementary integrated...

Experimental results of different learning rates and batch sizes on all...

Effect of different input space size on classification accuracy Overall...

A complementary integrated Transformer network for hyperspectral image classification

Article

Full-text available

Jan 2023

In the past, convolutional neural network (CNN) has become one of the most popular deep learning frameworks, and has been widely used in Hyperspectral image classification tasks. Convolution (Conv) in CNN uses filter weights to extract features in local receiving domain, and the weight parameters are shared globally, which more focus on the high‐fr...

Figure 4. Pseudo-color and ground references for the following...

Figure 10. Classification maps of the University of Pavia dataset...

Figure 11. Classification maps of the Salinas dataset obtained by...

Classification accuracies of the Indian Pines dataset obtained by each...

Classification accuracies of University of Pavia dataset obtained by...

A Superpixel-Based Relational Auto-Encoder for Feature Extraction of Hyperspectral Images

Article

Full-text available

Oct 2019

Filter banks transferred from a pre-trained deep convolutional network exhibit significant performance in heightening the inter-class separability for hyperspectral image feature extraction, but weakening the intra-class consistency simultaneously. In this paper, we propose a new superpixel-based relational auto-encoder for cohesive spectral–spatia...

Network architecture for the discriminator. We adopt the structure of...

The calculation process for LX\documentclass[12pt]{minimal}...

The calculation process of LG(Y)\documentclass[12pt]{minimal}...

Boundary equilibrium SR: effective loss functions for single image super-resolution

Article

Full-text available

Dec 2022

Recently, single image super-resolution (SISR) has made great progress due to the rapid development of deep convolutional neural networks (CNN), and the application of Generative Adversarial Networks (GAN ) has made super-resolution networks even more effective. However, GAN-based methods have many problems such as lengthy and unstable convergence....

Figure 2: Overview (left) and Visual Comparisons (right) between...

Figure 3: Reference reliability. A-FMI is reliable for the different...

A-FMI: Learning Attributions from Deep Networks via Feature Map Importance

Preprint

Full-text available

Apr 2021

Gradient-based attribution methods can aid in the understanding of convolutional neural networks (CNNs). However, the redundancy of attribution features and the gradient saturation problem, which weaken the ability to identify significant features and cause an explanation focus shift, are challenges that attribution methods still face. In this work...

Comparisons of different attention methods in terms of the number of...

Classification accuracy of HANet, the baseline (ResNet50), and the...

Hybrid attention network with appraiser-guided loss for counterfeit luxury handbag detection

Article

Full-text available

Feb 2022

Recently, convolutional neural networks have shown good performance in many counterfeit detection tasks. However, accurate counterfeit detection is still challenging due to the following three issues: (1) fine-grained classification, (2) class imbalance, and (3) high imitation samples. To address these issues, we propose a hybrid attention network...

Multi-Style Shape Matching GAN for Text Images

Article

Apr 2024
IEICE T INF SYST

Deep learning techniques are used to transform the style of images and produce diverse images. In the text style transformation field, many previous studies attempted to generate stylized text using deep learning networks. However, to achieve multiple style transformations for text images, the methods proposed in previous studies require learning multiple networks or cannot be guided by style images. Thus, in this study we focused on multistyle transformation of text images using style images to guide the generation of results. We propose a multiple-style transformation network for text style transfer, which we refer to as the Multi-Style Shape Matching GAN (Multi-Style SMGAN). The proposed method generates multiple styles of text images using a single model by training the model only once, and allows users to control the text style according to style images. The proposed method implements conditions to the network such that all styles can be distinguished effectively in the network, and the generation of each styled text can be controlled according to these conditions. The proposed network is optimized such that the conditional information can be transmitted effectively throughout the network. The proposed method was evaluated experimentally on a large number of text images, and the results show that the trained model can generate multiple-style text in realtime according to the style image. In addition, the results of a user survey study indicate that the proposed method produces higher quality results compared to existing methods.

A Deep Neural Network Based Holistic Approach for Optical Character Recognition of Handwritten Documents

Article

Full-text available

Mar 2024

Jayati Mukherjee

Optical Character Recognition of handwritten document has been a research topic for last few decades now. Different type of classification schemes starting from template matching, structural analysis to deep neural network have been proposed by researchers. In this paper, a novel holistic approach is proposed for recognition of handwritten words. The approach is a hybrid model combining CNN and BLSTM layers which are responsible for extraction of spatial and temporal features respectively from the word images. Both the features are combined by compact bi-liner pooling. The CBP layer highlights the fine grained details which in turn help to achieve high recognition accuracy. The extracted feature is recognized by a connectionist temporal classification layer. The weights are learnt based on the database using the back propagation algorithm. The hybrid model is trained using three public databases CMATERdb2.1.2, IIIT-HW-Dev and IIIT-HW-Telugu containing Bengali, Devanagari and Telegu handwritten words respectively. The proposed model has achieved 96.42%, 94.79%, 95.07% accuracy on the Bengali, Devanagari and Telegu databases respectively.

Interactive design generation and optimization from generative adversarial networks in spatial computing

Article

Full-text available

Mar 2024

This paper focuses on exploring the application possibilities and optimization problems of Generative Adversarial Networks (GANs) in spatial computing to improve design efficiency and creativity and achieve a more intelligent design process. A method for icon generation is proposed, and a basic architecture for icon generation is constructed. A system with generation and optimization capabilities is constructed to meet various requirements in spatial design by introducing the concept of interactive design and the characteristics of requirement conditions. Next, the generated icons can effectively maintain diversity and innovation while meeting the conditional features by integrating multi-feature recognition modules into the discriminator and optimizing the structure of conditional features. The experiment uses publicly available icon datasets, including LLD-Icon and Icons-50. The icon shape generated by the model proposed here is more prominent, and the color of colored icons can be more finely controlled. The Inception Score (IS) values under different models are compared, and it is found that the IS value of the proposed model is 7.05, which is higher than that of other GAN models. The multi-feature icon generation model based on Auxiliary Classifier GANs performs well in presenting multiple feature representations of icons. After introducing multi-feature recognition modules into the network model, the peak error of the recognition network is only 2.000 in the initial stage, while the initial error of the ordinary GAN without multi-feature recognition modules is as high as 5.000. It indicates that the improved model effectively helps the discriminative network recognize the core information of icon images more quickly. The research results provide a reference basis for achieving more efficient and innovative interactive space design.

The Power of Generative AI: A Review of Requirements, Models, Input–Output Formats, Evaluation Metrics, and Challenges

Article

Full-text available

Jul 2023

Generative artificial intelligence (AI) has emerged as a powerful technology with numerous applications in various domains. There is a need to identify the requirements and evaluation metrics for generative AI models designed for specific tasks. The purpose of the research aims to investigate the fundamental aspects of generative AI systems, including their requirements, models, input–output formats, and evaluation metrics. The study addresses key research questions and presents comprehensive insights to guide researchers, developers, and practitioners in the field. Firstly, the requirements necessary for implementing generative AI systems are examined and categorized into three distinct categories: hardware, software, and user experience. Furthermore, the study explores the different types of generative AI models described in the literature by presenting a taxonomy based on architectural characteristics, such as variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion models, transformers, language models, normalizing flow models, and hybrid models. A comprehensive classification of input and output formats used in generative AI systems is also provided. Moreover, the research proposes a classification system based on output types and discusses commonly used evaluation metrics in generative AI. The findings contribute to advancements in the field, enabling researchers, developers, and practitioners to effectively implement and evaluate generative AI models for various applications. The significance of the research lies in understanding that generative AI system requirements are crucial for effective planning, design, and optimal performance. A taxonomy of models aids in selecting suitable options and driving advancements. Classifying input–output formats enables leveraging diverse formats for customized systems, while evaluation metrics establish standardized methods to assess model quality and performance.

Personalized Font Generation using Deep Learning Neural Networks

Article

Full-text available

Jul 2023

Personalized font generation is an emerging technology that aims to create unique and customized fonts based on an individual’s preferences and characteristics. This process involves designing a font from scratch, including selecting the appropriate style, weight, and size. Personalized features such as letter shapes, ligatures, and kerning are then implemented to create a font that is unique to the individual. Personalized font generation has the potential to revolutionize the way we think about typography. By creating fonts that are tailored to an individual’s preferences, we can enhance the user experience and create more engaging and personalized content. This technology is particularly relevant in today’s digital age, where the use of typography is increasingly important in everything from social media posts to marketing materials. In summary, personalized font generation is a promising new technology that offers a unique and tailored typography experience. By combining traditional font design techniques with advanced technologies such as machine learning and artificial intelligence, we can create fonts that are truly one-of-a-kind and help to elevate our communication efforts.

Ambigram Generation by A Diffusion Model

Preprint

Jun 2023

Ambigrams are graphical letter designs that can be read not only from the original direction but also from a rotated direction (especially with 180 degrees). Designing ambigrams is difficult even for human experts because keeping their dual readability from both directions is often difficult. This paper proposes an ambigram generation model. As its generation module, we use a diffusion model, which has recently been used to generate high-quality photographic images. By specifying a pair of letter classes, such as 'A' and 'B', the proposed model generates various ambigram images which can be read as 'A' from the original direction and 'B' from a direction rotated 180 degrees. Quantitative and qualitative analyses of experimental results show that the proposed model can generate high-quality and diverse ambigrams. In addition, we define ambigramability, an objective measure of how easy it is to generate ambigrams for each letter pair. For example, the pair of 'A' and 'V' shows a high ambigramability (that is, it is easy to generate their ambigrams), and the pair of 'D' and 'K' shows a lower ambigramability. The ambigramability gives various hints of the ambigram generation not only for computers but also for human experts. The code can be found at (https://github.com/univ-esuty/ambifusion).

DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation

Preprint

Full-text available

May 2023

Automatic generation of fonts can be an important aid to typeface design. Many current approaches regard glyphs as pixelated images, which present artifacts when scaling and inevitable quality losses after vectorization. On the other hand, existing vector font synthesis methods either fail to represent the shape concisely or require vector supervision during training. To push the quality of vector font synthesis to the next level, we propose a novel dual-part representation for vector glyphs, where each glyph is modeled as a collection of closed "positive" and "negative" path pairs. The glyph contour is then obtained by boolean operations on these paths. We first learn such a representation only from glyph images and devise a subsequent contour refinement step to align the contour with an image representation to further enhance details. Our method, named DualVector, outperforms state-of-the-art methods in vector font synthesis both quantitatively and qualitatively. Our synthesized vector fonts can be easily converted to common digital font formats like TrueType Font for practical use. The code is released at https://github.com/thuliu-yt16/dualvector.

A comprehensive survey on generative adversarial networks used for synthesizing multimedia content

Article

Full-text available

Mar 2023
MULTIMED TOOLS APPL

GAN’s are playing an important role in creating and generating a new set of data from the previously available content. GAN models are impressive in the results for image and video generation tasks. These models uses convolutional neural networks for generator and discriminator. GAN models are progressively improving by adding more latent approaches of deep learning. GAN model has been implemented for both supervised as well as unsupervised learning for various applications like image inpainting, image blending, video generation, music generation etc. During the implementation of the GAN model for all these applications, some issues arise during the training of discriminators like model collapse, Penalty Gradient etc. This manuscript contains a detailed survey of GAN models presented with varied classifications along with the challenges involved in GAN models. GAN is classified for all the domains in which GAN is used, i.e., Image, Video, and Audio. Along with all these things, we have described some applications where the GAN model is used. This manuscript also presents the performance of various GAN models for understanding its working with evaluation metrics (Qualitative and Quantitative).

GAS-Net: Generative Artistic Style Neural Networks for Fonts

Preprint

Full-text available

Dec 2022

Generating new fonts is a time-consuming and labor-intensive, especially in a language with a huge amount of characters like Chinese. Various deep learning models have demonstrated the ability to efficiently generate new fonts with a few reference characters of that style. This project aims to develop a few-shot cross-lingual font generator based on AGIS-Net and improve the performance metrics mentioned. Our approaches include redesigning the encoder and the loss function. We will validate our method on multiple languages and datasets mentioned.

Font Representation Learning via Paired-glyph Matching

Preprint

Full-text available

Nov 2022

Fonts can convey profound meanings of words in various forms of glyphs. Without typography knowledge, manually selecting an appropriate font or designing a new font is a tedious and painful task. To allow users to explore vast font styles and create new font styles, font retrieval and font style transfer methods have been proposed. These tasks increase the need for learning high-quality font representations. Therefore, we propose a novel font representation learning scheme to embed font styles into the latent space. For the discriminative representation of a font from others, we propose a paired-glyph matching-based font representation learning model that attracts the representations of glyphs in the same font to one another, but pushes away those of other fonts. Through evaluations on font retrieval with query glyphs on new fonts, we show our font representation learning scheme achieves better generalization performance than the existing font representation learning techniques. Finally on the downstream font style transfer and generation tasks, we confirm the benefits of transfer learning with the proposed method. The source code is available at https://github.com/junhocho/paired-glyph-matching.

Structure of the multi-font character recognition CNN used in the legibility evaluation.

Similar publications

Citations