Figure 7 - uploaded by Hideaki Hayashi
Content may be subject to copyright.
Structure of the multi-font character recognition CNN used in the legibility evaluation.

Structure of the multi-font character recognition CNN used in the legibility evaluation.

Source publication
Article
Full-text available
In this paper, we propose GlyphGAN: style-consistent font generation based on generative adversarial networks (GANs). GANs are a framework for learning a generative model using a system of two neural networks competing with each other. One network generates synthetic images from random input vectors, and the other discriminates between synthetic an...

Similar publications

Article
Full-text available
In the past, convolutional neural network (CNN) has become one of the most popular deep learning frameworks, and has been widely used in Hyperspectral image classification tasks. Convolution (Conv) in CNN uses filter weights to extract features in local receiving domain, and the weight parameters are shared globally, which more focus on the high‐fr...
Article
Full-text available
Filter banks transferred from a pre-trained deep convolutional network exhibit significant performance in heightening the inter-class separability for hyperspectral image feature extraction, but weakening the intra-class consistency simultaneously. In this paper, we propose a new superpixel-based relational auto-encoder for cohesive spectral–spatia...
Article
Full-text available
Recently, single image super-resolution (SISR) has made great progress due to the rapid development of deep convolutional neural networks (CNN), and the application of Generative Adversarial Networks (GAN ) has made super-resolution networks even more effective. However, GAN-based methods have many problems such as lengthy and unstable convergence....
Preprint
Full-text available
Gradient-based attribution methods can aid in the understanding of convolutional neural networks (CNNs). However, the redundancy of attribution features and the gradient saturation problem, which weaken the ability to identify significant features and cause an explanation focus shift, are challenges that attribution methods still face. In this work...
Article
Full-text available
Recently, convolutional neural networks have shown good performance in many counterfeit detection tasks. However, accurate counterfeit detection is still challenging due to the following three issues: (1) fine-grained classification, (2) class imbalance, and (3) high imitation samples. To address these issues, we propose a hybrid attention network...

Citations

... In addition, FETGAN [24] employs an adaptive instance-normalized font style migration for few-shot learning, which solves the problem of converting existing fonts to the new style while keeping the text unchanged with only a few new style font samples. GlyphGAN [25] utilizes the GAN framework to create new fonts with a consistent style across all 26 letters in the English alphabet. In other words, this method can generate an infinite variety of fonts based on existing fonts. ...
Article
Deep learning techniques are used to transform the style of images and produce diverse images. In the text style transformation field, many previous studies attempted to generate stylized text using deep learning networks. However, to achieve multiple style transformations for text images, the methods proposed in previous studies require learning multiple networks or cannot be guided by style images. Thus, in this study we focused on multistyle transformation of text images using style images to guide the generation of results. We propose a multiple-style transformation network for text style transfer, which we refer to as the Multi-Style Shape Matching GAN (Multi-Style SMGAN). The proposed method generates multiple styles of text images using a single model by training the model only once, and allows users to control the text style according to style images. The proposed method implements conditions to the network such that all styles can be distinguished effectively in the network, and the generation of each styled text can be controlled according to these conditions. The proposed network is optimized such that the conditional information can be transmitted effectively throughout the network. The proposed method was evaluated experimentally on a large number of text images, and the results show that the trained model can generate multiple-style text in realtime according to the style image. In addition, the results of a user survey study indicate that the proposed method produces higher quality results compared to existing methods.
... One of the variation of ANN is deep learning algorithm. Some of the basic and widely used deep learning networks are Convolutional Neural Network, Recurrent Neural Network, Long Short Term Memory, Generative Adversarial Networks (GANs), Autoencoders etc. and their applications are [29][30][31][32][33], respectively. ...
Article
Full-text available
Optical Character Recognition of handwritten document has been a research topic for last few decades now. Different type of classification schemes starting from template matching, structural analysis to deep neural network have been proposed by researchers. In this paper, a novel holistic approach is proposed for recognition of handwritten words. The approach is a hybrid model combining CNN and BLSTM layers which are responsible for extraction of spatial and temporal features respectively from the word images. Both the features are combined by compact bi-liner pooling. The CBP layer highlights the fine grained details which in turn help to achieve high recognition accuracy. The extracted feature is recognized by a connectionist temporal classification layer. The weights are learnt based on the database using the back propagation algorithm. The hybrid model is trained using three public databases CMATERdb2.1.2, IIIT-HW-Dev and IIIT-HW-Telugu containing Bengali, Devanagari and Telegu handwritten words respectively. The proposed model has achieved 96.42%, 94.79%, 95.07% accuracy on the Bengali, Devanagari and Telegu databases respectively.
... Multi-scale discriminator is an improved approach that can capture image information more comprehensively by introducing discriminators of different scales. In the image generation task, discriminators can be trained on the original image and the image with reduced size simultaneously [28][29][30] . This multi-scale discriminator can capture features at different levels from details to the global, improving the authenticity and diversity of generated samples. ...
Article
Full-text available
This paper focuses on exploring the application possibilities and optimization problems of Generative Adversarial Networks (GANs) in spatial computing to improve design efficiency and creativity and achieve a more intelligent design process. A method for icon generation is proposed, and a basic architecture for icon generation is constructed. A system with generation and optimization capabilities is constructed to meet various requirements in spatial design by introducing the concept of interactive design and the characteristics of requirement conditions. Next, the generated icons can effectively maintain diversity and innovation while meeting the conditional features by integrating multi-feature recognition modules into the discriminator and optimizing the structure of conditional features. The experiment uses publicly available icon datasets, including LLD-Icon and Icons-50. The icon shape generated by the model proposed here is more prominent, and the color of colored icons can be more finely controlled. The Inception Score (IS) values under different models are compared, and it is found that the IS value of the proposed model is 7.05, which is higher than that of other GAN models. The multi-feature icon generation model based on Auxiliary Classifier GANs performs well in presenting multiple feature representations of icons. After introducing multi-feature recognition modules into the network model, the peak error of the recognition network is only 2.000 in the initial stage, while the initial error of the ordinary GAN without multi-feature recognition modules is as high as 5.000. It indicates that the improved model effectively helps the discriminative network recognize the core information of icon images more quickly. The research results provide a reference basis for achieving more efficient and innovative interactive space design.
... To translate text from one language to another Text-to-Text Transfer Transformer (T5) [39] Convolutional Sequence to Sequence Learning (ConvS2S) [132] Sequence to Sequence (Seq2Seq) [108] Text Generate handwritten characters in a target/new font style using text GlyphGAN [154] Text Generate accurate and meaningful corrections for code issues TFix [45] Text Explains the given input statements WT5 (Why, T5?) [121] Text Perform tasks like translation, question answering, classification, and summarization using input texts Text-To-Text Transformer (T5) [39] Text Generate or crack passwords PassGAN [155] Text Chat with users, answer follow-up questions, challenge incorrect premises, and reject inappropriate requests InstructGPT (GPT-3) [111] Text Operate as a conversational AI system to chat with users and answer follow-up questions Language Models for Dialog Applications (LaMDA) [33] ...
... The underlying process of text-to-text generation encompasses both transformer-based architectures and models that harness the power of generative adversarial networks (GANs). For example, models like PassGAN [155] employ GANs to generate or crack passwords, and GlyphGAN [154] utilizes GANs to create handwritten characters in different font styles based on text input. This fusion of transformer-based approaches and GAN-based models expands the horizons of text-to-text generation, unlocking exciting possibilities in natural language processing. ...
... Recognition Accuracy: Recognition accuracy assesses the ability of the generated handwritten characters to be recognized correctly by optical character recognition (OCR) systems or other recognition models. It measures how well the models like GlyphGAN [154] generate characters that resemble the original symbols and can be accurately identified. ...
Article
Full-text available
Generative artificial intelligence (AI) has emerged as a powerful technology with numerous applications in various domains. There is a need to identify the requirements and evaluation metrics for generative AI models designed for specific tasks. The purpose of the research aims to investigate the fundamental aspects of generative AI systems, including their requirements, models, input–output formats, and evaluation metrics. The study addresses key research questions and presents comprehensive insights to guide researchers, developers, and practitioners in the field. Firstly, the requirements necessary for implementing generative AI systems are examined and categorized into three distinct categories: hardware, software, and user experience. Furthermore, the study explores the different types of generative AI models described in the literature by presenting a taxonomy based on architectural characteristics, such as variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion models, transformers, language models, normalizing flow models, and hybrid models. A comprehensive classification of input and output formats used in generative AI systems is also provided. Moreover, the research proposes a classification system based on output types and discusses commonly used evaluation metrics in generative AI. The findings contribute to advancements in the field, enabling researchers, developers, and practitioners to effectively implement and evaluate generative AI models for various applications. The significance of the research lies in understanding that generative AI system requirements are crucial for effective planning, design, and optimal performance. A taxonomy of models aids in selecting suitable options and driving advancements. Classifying input–output formats enables leveraging diverse formats for customized systems, while evaluation metrics establish standardized methods to assess model quality and performance.
... Nevertheless, this paper gave us an insight as to have neural networks can be incorporated in a project related to us. [5] Alex Greaves et. al. shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. ...
Article
Full-text available
Personalized font generation is an emerging technology that aims to create unique and customized fonts based on an individual’s preferences and characteristics. This process involves designing a font from scratch, including selecting the appropriate style, weight, and size. Personalized features such as letter shapes, ligatures, and kerning are then implemented to create a font that is unique to the individual. Personalized font generation has the potential to revolutionize the way we think about typography. By creating fonts that are tailored to an individual’s preferences, we can enhance the user experience and create more engaging and personalized content. This technology is particularly relevant in today’s digital age, where the use of typography is increasingly important in everything from social media posts to marketing materials. In summary, personalized font generation is a promising new technology that offers a unique and tailored typography experience. By combining traditional font design techniques with advanced technologies such as machine learning and artificial intelligence, we can create fonts that are truly one-of-a-kind and help to elevate our communication efforts.
... However, there is no large "real" ambigram dataset with sufficient style variations. Therefore, conventional font generation methodologies by GANs [1,5,27] cannot be used for ambigrams. Furthermore, computers will suffer from the above two difficulties, like human experts. ...
... For generating letter images (or font images), various image generation models, especially GANs [19,9,2,11,14], have been used. The GAN-based models can also generate letter images with conditions, such as class labels [16,5,12,10] or texts [28,18]. Recently, diffusion models have achieved high-quality photographic image generation [20,21,23]. ...
Preprint
Ambigrams are graphical letter designs that can be read not only from the original direction but also from a rotated direction (especially with 180 degrees). Designing ambigrams is difficult even for human experts because keeping their dual readability from both directions is often difficult. This paper proposes an ambigram generation model. As its generation module, we use a diffusion model, which has recently been used to generate high-quality photographic images. By specifying a pair of letter classes, such as 'A' and 'B', the proposed model generates various ambigram images which can be read as 'A' from the original direction and 'B' from a direction rotated 180 degrees. Quantitative and qualitative analyses of experimental results show that the proposed model can generate high-quality and diverse ambigrams. In addition, we define ambigramability, an objective measure of how easy it is to generate ambigrams for each letter pair. For example, the pair of 'A' and 'V' shows a high ambigramability (that is, it is easy to generate their ambigrams), and the pair of 'D' and 'K' shows a lower ambigramability. The ambigramability gives various hints of the ambigram generation not only for computers but also for human experts. The code can be found at (https://github.com/univ-esuty/ambifusion).
... Benefiting from the development of image generation techniques, mainstream font synthesis methods [2,12,24,41,42] could generate pixelated glyph images. Despite the promising quality, images of glyphs incur aliasing artifacts on edges when discretely sampled, and thus are not competent for high-quality rendering or printing at arbitrary resolutions. ...
... Benefiting from the development of image generation [10,14,16], either black-white [12,15,37,40] or artistic glyph image generation [2,23,41,42] is well explored in the past decade. MC-GAN [2] synthesized ornamented glyphs for capital letters in an end-to-end manner from a small subset of the same style. ...
Preprint
Full-text available
Automatic generation of fonts can be an important aid to typeface design. Many current approaches regard glyphs as pixelated images, which present artifacts when scaling and inevitable quality losses after vectorization. On the other hand, existing vector font synthesis methods either fail to represent the shape concisely or require vector supervision during training. To push the quality of vector font synthesis to the next level, we propose a novel dual-part representation for vector glyphs, where each glyph is modeled as a collection of closed "positive" and "negative" path pairs. The glyph contour is then obtained by boolean operations on these paths. We first learn such a representation only from glyph images and devise a subsequent contour refinement step to align the contour with an image representation to further enhance details. Our method, named DualVector, outperforms state-of-the-art methods in vector font synthesis both quantitatively and qualitatively. Our synthesized vector fonts can be easily converted to common digital font formats like TrueType Font for practical use. The code is released at https://github.com/thuliu-yt16/dualvector.
... The GlyphGAN [32] is a GAN-based Chinese textual style creation model that acknowledges two information vectors: a class character vector connected with character class data and a styled vector associated with style data. The Glyph-GAN model makes different textual style characters while keeping a steady plan across all highlights. ...
... So to protect from these issues, the Generative modeling approach has been used in the current world scenario. There are many generative modeling approaches (i.e., Pix2Pix [87], CycleGAN [12], Least Square CGAN [91], GlyphGAN [32], MCMS GAN [51], etc.) that have been used in the character generation of languages like Chinese, Bangle, and others. ...
Article
Full-text available
GAN’s are playing an important role in creating and generating a new set of data from the previously available content. GAN models are impressive in the results for image and video generation tasks. These models uses convolutional neural networks for generator and discriminator. GAN models are progressively improving by adding more latent approaches of deep learning. GAN model has been implemented for both supervised as well as unsupervised learning for various applications like image inpainting, image blending, video generation, music generation etc. During the implementation of the GAN model for all these applications, some issues arise during the training of discriminators like model collapse, Penalty Gradient etc. This manuscript contains a detailed survey of GAN models presented with varied classifications along with the challenges involved in GAN models. GAN is classified for all the domains in which GAN is used, i.e., Image, Video, and Audio. Along with all these things, we have described some applications where the GAN model is used. This manuscript also presents the performance of various GAN models for understanding its working with evaluation metrics (Qualitative and Quantitative).
... Yang et al. introduced a Generative Adversarial Network (GAN)-based approach in text style transfer [45]. Hayashi et al. proposed a GAN-based method to generate new fonts from seen fonts while maintaining consistency among the generated fonts and diversity from seen fonts [12]. More recently, studies in font generation has been focused on non-alphabetic languages such as Chinese, as the number of classes of characters is huge compared to alphabetic languages, thus a font generator provides feasibility in applications. ...
Preprint
Full-text available
Generating new fonts is a time-consuming and labor-intensive, especially in a language with a huge amount of characters like Chinese. Various deep learning models have demonstrated the ability to efficiently generate new fonts with a few reference characters of that style. This project aims to develop a few-shot cross-lingual font generator based on AGIS-Net and improve the performance metrics mentioned. Our approaches include redesigning the encoder and the loss function. We will validate our method on multiple languages and datasets mentioned.
... To cope with these difficulties, fonts should be easier to search for and create. There has been active research on font retrieval [4,15,16,19,24], font style transfer and generation [1,11,40,42]. Font retrieval is a task that allows users to find similar looking fonts. Users can browse the fonts in the latent space to find the font they want. ...
Preprint
Full-text available
Fonts can convey profound meanings of words in various forms of glyphs. Without typography knowledge, manually selecting an appropriate font or designing a new font is a tedious and painful task. To allow users to explore vast font styles and create new font styles, font retrieval and font style transfer methods have been proposed. These tasks increase the need for learning high-quality font representations. Therefore, we propose a novel font representation learning scheme to embed font styles into the latent space. For the discriminative representation of a font from others, we propose a paired-glyph matching-based font representation learning model that attracts the representations of glyphs in the same font to one another, but pushes away those of other fonts. Through evaluations on font retrieval with query glyphs on new fonts, we show our font representation learning scheme achieves better generalization performance than the existing font representation learning techniques. Finally on the downstream font style transfer and generation tasks, we confirm the benefits of transfer learning with the proposed method. The source code is available at https://github.com/junhocho/paired-glyph-matching.