The difference between a normal letter (a) and a bold letter (b) regarding the ratio between the number of outline pixels and the total number of pixels.

Source publication

Automatic Text Clustering and Classification Based on Font Geometrical Characteristics

Conference Paper

Full-text available

Jun 2008

This paper describes an approach towards obtaining a normalized measure of text resemblance in scanned images, relying on the detection of standard character features, and using a sequence of procedures and algorithms on input images, for automatic content conversion purposes. The approach relies solely on geometrical characteristics of the charact...

Single Image Super-resolution from Transformed Self-Exemplars

Conference Paper

Full-text available

Jun 2015

Self-similarity based super-resolution (SR) algorithms are able to produce visually pleasing results without extensive training on external databases. Such algorithms exploit the statistical prior that patches in a natural image tend to recur within and across scales of the same image. However, the internal dictionary obtained from the given image...

Article Elements Recognition of Old Sinhala Newspapers Using Vision Heuristic Algorithm

Conference Paper

Feb 2023

An Overview of Document Image Analysis System

Article

Full-text available

Dec 2013

Andrei Tigora

This paper presents an overview of Document Image Analysis Systems, their composing modules, the approaches these modules use, as well as uses for these applications. One of the main goals is to present some of the most important technologies and methods behind the Document Image Analysis domain in order to evaluate the best approach when dealing with real-world documents. The other main goal is to ensure a foundation for those starting to build such complex software systems and to give an elaborate technical answer to the question: “How to make physical documents available to a large number of people?”

A comparison of some new methods for solving algebraic equations

Article

Full-text available

Jan 2013

Circular convolution and discrete Fourier transform

Article

Full-text available

Jan 2013

Mircea Ion Cirnu

eUPB: Towards an Integrated e-Service Platform in Large Scale Distributed Environments

Conference Paper

Full-text available

May 2011

The main goal of this paper is to present the architecture and functionality of e-Service based platform. The project is structured along several dimensions that follow the development of complementary services, integrated to support everyday work experience, research and learning in the University POLITEHNICA of Bucharest (UPB). The platform support is represented by Internet as a large scale distributed environment. The current evolution of the Internet can be viewed from multiple perspectives: service oriented (Internet of Services), users centered (Internet of People), real-world integration over Internet (Internet of Things), production and use of multimedia content over Internet. The main services in eUPB platform are: (i) data retrieval, aggregation and search service, (ii) communication service for heterogeneous networks, (iii) mobile services to support context-aware applications, (iv) secure data delivery infrastructure for wireless sensor networks, (v) 3DUPB - the 3D MMO virtual replica of UPB, (vi) analysis and content extraction of scanned documents, and (vii) collaboration service. This is a position paper presenting the general architecture of eUPB and a description of each services device and functionality.

Efficient Solutions for OCR Text Remote Correction in Content Conversion Systems

Article

Full-text available

Mar 2013

This paper describes a collection of algorithms for detecting text areas in document images using morphological operators, text clustering using geometrical text measurements and efficient image coding for fast remote correction in automatic content conversion systems Text characteristics are automatically discovered and used to filter out all non-text areas in the image. All the algorithms were implemented and tested on a representative set of test images obtained by scanning newspapers, books and magazines. The document image page clustering uses a measure of normalized text font resemblance. The approach makes use solely of the geometrical characteristics of characters, ignoring information regarding context or character recognition.

3D Mesh Simplification Techniques for Image-Page Clusters Detection

Article

Full-text available

Jul 2008

Entity clustering is a vital feature of any automatic content conversion system. Such systems generate digital documents from hard copies of newspapers, books, etc. At application level, the system processes an image (usually in black and white color mode) and identifies the various content layout elements, such as paragraphs, tables, images, columns, etc. Here is where the entity clustering mechanism comes into play. Its role is to group atomic entities (characters, points, lines) into layout elements. To achieve this, the system takes on different approaches which rely on the geometrical properties of the enclosed items: their relative position, size, boundaries and alignment. This paper describes such an approach based on 3D mesh reduction.

Line Detection Techniques for Automatic Content Conversion Systems

Article

Full-text available

Jul 2008

In an image processing application there is often need to identify and extract different morphological elements such as characters or lines. This paper studies one general method of identifying vertical or horizontal lines. However, the techniques described here can be used to detect other analytical objects like circles, or even ellipses. Line Detection is an important add-on to an automatic content conversion system which builds digital documents from scanned papers. After identifying lines, other layout elements can be extracted: columns, paragraphs, tables, headers. The present paper is a study of the Hough Transform for which several new enhancements are introduced.

Normalized Text Font Resemblance Method Aimed at Document Image Page Clustering

Article

Full-text available

Jul 2008

This paper describes an approach towards obtaining the normalized measure of text resemblance in scanned images. The technique, aimed at automatic content conversion, is relying on the detection of standard character features and uses a sequence of procedures and algorithms applied sequentially on the input document. The approach makes use solely of the geometrical characteristics of characters, ignoring information regarding context or the character-recognition.

The difference between a normal letter (a) and a bold letter (b) regarding the ratio between the number of outline pixels and the total number of pixels.

Similar publications

Citations