PreprintPDF Available

A Method to Generate Synthetically Warped Document Image

October 2019

October 2019

Authors:

Arpan Garai

Indian Institute of Technology Delhi

Samit Biswas

Indian Institute of Engineering Science and Technology, Shibpur

S. Mandal

Indian Institute of Engineering Science and Technology, Shibpur

Bidyut Baran Chaudhuri

Indian Statistical Institute

Preprints and early-stage research may not have been peer reviewed yet.

The digital camera captured document images may often be warped and distorted due to different camera angles or document surfaces. A robust technique is needed to solve this kind of distortion. The research on dewarping of the document suffers due to the limited availability of benchmark public dataset. In recent times, deep learning based approaches are used to solve the problems accurately. To train most of the deep neural networks a large number of document images is required and generating such a large volume of document images manually is difficult. In this paper, we propose a technique to generate a synthetic warped image from a flat-bedded scanned document image. It is done by calculating warping factors for each pixel position using two warping position parameters (WPP) and eight warping control parameters (WCP). These parameters can be specified as needed depending upon the desired warping. The results are compared with similar real captured images both qualitative and quantitative way.

Positions of 2 nd (green), 3 rd (blue) and 4 th (yellow) knot points

…

Calculation of curvature

…

Figures - uploaded by Samit Biswas

Content may be subject to copyright.

Content uploaded by Samit Biswas

Content may be subject to copyright.

A Method to Generate Synthetically Warped

Document Image

Arpan Garai1, Samit Biswas1, Sekhar Mandal1, and Bidyut. B. Chaudhuri2,3

1Department of Computer Science and Technology, Indian Institute of Engineering

Sciences and Technology, Shibpur, Hawrah, West Bengal, 711103, India

arpangarai@gmail.com,{samit,sekhar}@cs.iiests.ac.in,

2Techno India University, Kolkata

3Computer Vision and Pattern Recognition Unit, Indian Statistical Institute,

Kolkata, India

bidyutbaranchaudhuri@gmail.com

Abstract. The digital camera captured document images may often be

warped and distorted due to diﬀerent camera angles or document sur-

faces. A robust technique is needed to solve this kind of distortion. The

research on dewarping of the document suﬀers due to the limited avail-

ability of benchmark public dataset. In recent times, deep learning based

approaches are used to solve the problems accurately. To train most of

the deep neural networks a large number of document images is required

and generating such a large volume of document images manually is

diﬃcult. In this paper, we propose a technique to generate a synthetic

warped image from a ﬂat-bedded scanned document image. It is done

by calculating warping factors for each pixel position using two warp-

ing position parameters (WPP) and eight warping control parameters

(WCP). These parameters can be speciﬁed as needed depending upon

the desired warping. The results are compared with similar real captured

images both qualitative and quantitative way.

Keywords: Synthetic image generation ·Document image processing ·

Dewarping.

1 Introduction

People nowadays prefer to use digital gadgets like cameras with mobile phones

for capturing documents. These images are often distorted due to camera angles

and/or non-planner document surface. Various forms of distortion in document

image may arise due to these situations and warping is one of them. The perfor-

mance of the OCR systems is not satisfactory when a highly warped document

image is input to the OCR systems. A robust algorithm is needed to generate

dewarped images from warped images. Recently, artiﬁcial neural networks like

convolutional neural network (CNN) [11] and GAN based approaches [7] are

used for dewarping. To train such a network, a large number of warped images

are required.

arXiv:1910.06621v1 [cs.CV] 15 Oct 2019

2 A. Garai et al.

There are three publicly available warped document images datasets. They

are (i) ‘DFKI document image contest dataset’ [19] and (ii) ‘IUPR dataset’

[3] (iii) ‘Data set by Ke Ma’ [11]. The ‘DFKI document image contest dataset’

consists 102 warped images and only ASCII text ground truth is available. There

are 100 and 130 images present in the ‘IUPR dataset’, and ‘Ke Ma dataset’,

respectively. Generally, such number of images is not enough to train deep neural

networks to solve the dewarping problem. Capturing a large number of images

manually is a diﬃcult task. Moreover, ground truth (images of document taken

on a ﬂat surface) is needed for supervised training. Generation of the ground

truth for images from captured images is a non-trivial task. So, synthetic image

generation is very necessary.

Some techniques are already proposed to generate distortion in document

images. Such methods can roughly be classiﬁed into three types [12]. They are

(i) adding noise, [10] (ii) degrading characters, [13] and (iii) distorting the shape

of document images[12]. Kieu et. al. [12] proposed a mesh-based semi-synthetic

method to generate geometric distortion in the image. The mesh is generated

using ‘the Kron Aquilon laser 3D scanner’.

In this work, a technique is proposed where synthetic warped images are

generated from a ﬂat-bed scanned document image. In this technique, warping

factors are speciﬁed for each pixel position of the image. The warping factors

are estimated using cubic spline interpolation. The positions of the knot points

are generated by warping position parameters (WPP) and values of those knot

points are calculated by warping control parameters (WCP). These parameters

can be generated as necessary to create various kinds of warping. The curvy-ness

of the surface of a document is estimated using Cylindrical Surface Model [4].

The process is described in the Section- 2.

To measure the similarity between ground truth image and dewarped image

various metric has been used previously [1], [14]. They are based on OCR, feature

of text line etc. There is no evaluation method where pixel wise comparison is

done between ground truth image and dewarped image due to the diﬀerence of

the resolution of ground truth image and real captured image. The proposed

synthetic image generation can be applied to solve this problem and fulﬁl the

gap. The proposed method takes a binary image as an input then it generates

warped image synthetically. Next, warped image is dewarped using a dewarping

technique. Finally, the dewarped image is compared with the input image using

pixel wise comparison measures like F-Measure [17], pseudo F-Measure (Fps)

[16] [17], PSNR [17], DRD [9] [17], Recall [17], Precision [17] etc.

2 Proposed Method

The convolutional neural network (CNN) can be used to ﬁnd warping position

parameters from the warped document image. A huge number of documents is

required to train a CNN and capturing it is diﬃcult to capture large number of

document images.

A Method to Generate Synthetically Warped Document Image 3

(a) (b) (c)

Fig. 1. Example of warped documents for diﬀerent reasons (a) Distance diﬀerence, (b)

Foreshortening diﬀerence with camera at the middle, (c) Foreshortening diﬀerence with

camera at the bottom.

Hence, we generate synthetically the warped document images. Here, we de-

veloped a technique to generate several warped document images from a single

captured image. We use cylindrical surface model (CSM) proposed in [4] to gen-

erate these images. According to CSM, an image with a cylindrical surface gets

warped due to two major reasons. The ﬁrst reason is the distance variation from

the camera from diﬀerent regions of surface (e.g the curvature of the surface) as

shown in Fig. 1(a, c). The other reason is the foreshortening diﬀerences that oc-

curs if the surface is non-perpendicularly oriented with the optical axis as shown

in Fig. 1(b). Here, we assume that the surface does not suﬀer from foreshortening

diﬀerences.

Fig. 2. Model for warping generation.

Now, consider the Fig. 2. Let abcd be a document having ﬂat surface. Its

projection on the image plane is denoted by ABCD. Let p(x, y, z ) be a point

on the surface abcd. The z-value for all the points on abcd is constant which is

denoted by D(0). The projection of the point pin the image plane is represented

by P(X, Y ). The values of Xand Ycan be obtained using the equations: X=

D(0) x,Y=f

D(0) y[4]. Here fis the focal distance.

Consider another document (a0b0c0d0) having curved surface and its projec-

tion in the image plane is represented by A0B0C0D0. Let us consider a point

p0(x0, y0, z0) on the surface a0b0c0d0and its projection on the image plane is de-

4 A. Garai et al.

noted by P0(X0, Y 0). Note that the z value of a point in a0b0c0d0is a function

of x coordinate and denoted by D(x). We can obtain the X0and Y0using the

equations: X0=f

D(x)x0,Y0=f

D(x)y0.

Consider two points p1(x1, y1, D(0)) and p0

1(x1, y1, D(x)) on the ﬂat surface

and the curved surface, respectively. The x and y values of aforesaid point are

same and the diﬀerence of the corresponding y-values in the image plane is ob-

tained as: (Y0

1−Y1) = f.y1

D(x)−f.y1

D(0) =fy1D(0)−D(x)

D(x).D(0) =f y1D(0)−D(x)

D(0)[(D(x)−D(0))+D(0)] .

While capturing a page of a book/ document on lamp-post, notice-board

etc., the value of (D(x)−D(0)) [4] generally lies between 1 cm to 3 cm whereas

D(0) can have values 50 cm to 100 cm. Hence, the value of [D(x)−D(0)] can

be neglected comparing to the value of D(0).

(Y0

1−Y1)≈fy1

D(0) −D(x)

D(0)2=fy1

D(0)2=fy1

D(0) d

D(0)=Y1

D(0) (1)

Where D(0) −D(x) = dwhich is the variation of the z-coordinates in the

object plane. So, it is clear from Eq. 1 that the translation of the point P0(X0, Y 0)

in y-direction depends on d. We can generate diﬀerent warped images using

diﬀerent values of d. Here, dis referred to as warping factor. The value of d

changes pixel to pixel.

In our experiment, we distribute the warping factor in the image plane con-

sidering the surface of a book. The warping factor is row-wise distributed. The

value of din each cell within a row is determined using smooth cubic spline

interpolation function. Five knot points are used for interpolation and the posi-

tion of each knot point is determined by two warping parameters (P1and P2).

The amount of warping at each knot point is controlled by another eight param-

eters (P3,. . . ,P10). The detailed descriptions of the aforesaid parameters are

described in the next section.

2.1 Warping Position Parameters

The parameters P1and P2are used to select the position of the document

warping. The value of P1is multiplied by the width of the ﬁrst row of the image

and take its nearest integer to get the position of the cell (say Z1) where there

will be no distortion due to warping. Similarly, P2determines the position of

the column (Z2) in the bottom row where distortion will not take place due to

warping. The parameters P1and P2take any one of the of following real numbers

(0.1,0.2,0.3,...,0.9). Using P1and P2we select the position of the third knot

point. The position of second knot points are along the nearest point the middle

of straight line Z1−Z2(blue line at Fig. 3) and left boundary (green line at

Fig. 3). The position of forth knot points are along the nearest point the middle

of straight line of Z1−Z2and right boundary (yellow line at Fig. 3). Taking

diﬀerent values of P1and P2, we can change the position of knot points and

hence, we can generate diﬀerent warped documents from a single document.

A Method to Generate Synthetically Warped Document Image 5

Fig. 3. Positions of 2nd(green), 3r d(blue) and 4th (yellow) knot points

2.2 Warping Control Parameters

Smoothing cubic spline interpolation function is used to warp the document.

Five knots are used for this interpolation function in each row of the image.

The value of the leftmost and rightmost knots of the ﬁrst row of the image is

denoted by P3and P6, respectively. Similarly, P7and P10 represent the values

of the leftmost and rightmost knots of the last row of the image. Let Dis the

length of a diagonal of the image under test then, the value of this parameter is

k×D. In our experiment, kvaries within the range 0.04 −0.6 with a step size

of 0.01.

Similarly, P4and P5specify the values of 2nd and 4th knot point of the top-

most row and P8and P9specify the values of 2nd and 4th knots of the bottom-

most row, respectively. In our experiment, we have speciﬁed P4= 0.5×P3,

P5= 0.5×P6,P8= 0.5×P7and P9= 0.5×P10.

Consider the values of 1st, 2nd, 4th and 5th knots of the ith row are Gi

2,Gi

3and Gi

4, receptively. They are determined using the following equations.

Here, Ris total number of rows present in the image.

1=P7−P3

R−1×(i−1) + P3,Gi

2=P8−P4

R−1×(i−1) + P4

4=P9−P5

R−1×(i−1) + P5,Gi

5=P10 −P6

R−1×(i−1) + P6

The third knot of the interpolation function is on line passing through the

points Z1and Z2and hence, its value is zero.

2.3 Calculating the warping factors

Using the values of ﬁve knots at each row we interpolate the values of the other

points of the row. We have applied smoothing spline regression to interpolate

the values. An example of the warping factors at the top-most and bottom-most

row is shown in Fig. 4(a) and 4(b) respectively. At each location of the image

6 A. Garai et al.

(a) (b)

(c)

Fig. 4. (a) Warping Factors for each pixel the top-most row; (b) Warping Factors for

each pixel the bottom-most row; (c) Three dimensional plotting of the warping factors.

we have calculated a warping factor and they are stored in a 2-D array F. The

of 3-D representation of Fis shown in Fig. 4(c).

Finally, each pixel is translated using the respective warping factors. Here,

nearest neighbour interpolation is used during the process of translation. An

example of scanned image and its corresponding synthetic warped image is shown

in Fig. 5. Here, in Fig. 4, and 5, the value for P1and P2are 0.2 and 0.1,

respectively. The diagonal of the image is D= 2521 pixel. The values of warping

control parameters are:P3= 0.045×D= 113, P6= 0.04×D= 101, P7=−0.06×

D=−151, P10 =−0.055 ×D=−139, P4= 113/2 = 56.5, P5= 101/2 = 50.5,

P8=−151/2 = −75.5 and P9=−139/2 = −69.5.

Table 1. Value of WPP used to generate diﬀerent type of warping

Various types of warping P1P2

Type-I 0.1,0.2,0.3 0.1,0.2,0.3

Type-II 0.7,0.8,0.9 0.7,0.8,0.9

Type-III & IV 0.5 0.5

3 Experimental Results and Evalution

The method is implemented and tested in a PC (Intel(R) Core (TM) i7-6700 3.4

GHz CPU, running Ubuntu 16.04 platform). No specialised hardware is needed

to generate the warping factors of each pixel. We have used ‘HP LaserJet M1005

MFP Multi-function Printer’ to scan a document. In our experiment, we have

generated images having resolution 300 dpi and 600 dpi. Here, four diﬀerent

types of warping are considered: Type-I. Book having undistorted part at left

(as shown in Fig. 6(a, b)); Type-II. Book having undistorted part at right (as

A Method to Generate Synthetically Warped Document Image 7

(a) (b)

Fig. 5. (a) Scanned image (b) Synthetic warped image

(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 6. Visual comparison: Book page image undistorted part at left: (a) Pre-processed

real captured warped image, (b) Synthetic warped image; Book page image undistorted

part at right: (c) Pre-processed real captured warped image, (d) Synthetic warped

image; Document pasted on a cylindrical lamp-post: (e) Pre-processed real captured

warped image, (f) Synthetic warped image; Document hanging from notice-board: (g)

Pre-processed real captured warped image, (h) Synthetic warped image.

shown in Fig. 6(c, d)); Type-III. Document pasted on Lamp-post (as shown

in Fig. 6(e, f)); Type-IV. Document attached only at top-middle on a notice

8 A. Garai et al.

Table 2. Value of WCP used to generate diﬀerent type of warping

Types of warping P∗

3, P ∗

6P∗

7, P ∗

Type- I & II & III (0.04 ×D),...,(0.06 ×D)−(0.04 ×D),...,−(0.06 ×D)

Type-IV −(0.04 ×D),...,−(0.06 ×D) (0.04 ×D),...,(0.06 ×D)

∗The step size for P3, P6, P7, P10 is (0.005 ×D) where Dis the length of the

diagonal of the image.

board (as shown in Fig. 6(g, h)). The undistorted part at left suggests that

the perpendicular drawn from optical centre to the document surface, hits the

document surface at left side of the middle of the document. The diﬀerent values

of parameters for creating diﬀerent images are shown in Table 1 and 2.

To visualize the performance of the proposed synthetic warped image gen-

eration technique, we have captured some images using mobile camera and the

images have diﬀerent variety of warping. These images are binarized and the bor-

der noises are removed. Many techniques are proposed to binarise a document

image [17], [15], [21] and remove border noises from document images [5], [18],

[20], [6], [2] recently. Here we have used a recent simple and robust binarisation

technique proposed by Su et. al. [21]. Among all the border noise removal tech-

niques most of the approaches works good enough for document images having

a ﬂat surface. But method proposed by Bukhari et. al. [2] is specially designed

for camera captured warped document image. So, we have used this technique

to remove the border noises. The same documents are scanned using ﬂat-bed

scanner and binarized. These scanned images are used as the inputs to the pro-

posed warped image generation method and try to generate the warped images

which are look like camera captured images. Fig. 6 shows a set of pre-processed

camera captured images and the corresponding synthetic images generated by

the proposed method. It is evident from Fig. 6 that synthetic warped images is

almost similar to their corresponding camera captured images.

Fig. 7. Calculation of curvature

To measure the performance of the proposed method, we calculate the cur-

vatures of the ‘headline’ of text lines present in camera captured warped image

and the corresponding synthetic warped image. Here, the curvature is calculated

using three points and they are two end points of the ‘headline’ and the point

A Method to Generate Synthetically Warped Document Image 9

on the ‘headline’ from where the slope changes its sign. Let Crand Csbe the

curvature of the ‘matra/headline’ of text lines present in the real image and

synthetic image respectively. An example of a ’headline’ for a particular text

line is shown in Fig. 7 and the ’headline is obtained using the method proposed

in [8]. For each image (real and synthetic), we have considered four text lines to

evaluate the performance of the proposed method. Two text lines from the top

and two from the bottom of the image. The length of the each considered text

line is greater than 80% of the longest text line present in each image.

The root mean squired error (RMSE), Rcis calculated using the formula:

Rc=qPN

i=1[Cr(i)−Cs(i)]2

N. Here, Nis the number of text lines under consider-

ation. We have considered 10 images from each of four types of warping (Total

40 images). The average of all the values of Rcis 3.78 ×10−5.

4 Conclusion

The approach of generating synthetic images proposed by us not only helps

in training neural networks but also to measure the performance dewarping

approaches. The proposed method can be used to generated diﬀerent types of

warped images. The dewarping of non-text part of a document like ﬁgures or

tables can also be evaluated along with the text part by the proposed technique.

Bibliography

[1] Document dewarping via text-line based optimization. Pattern Recognition

48(11), 3600 – 3614 (2015)

[2] Bukhari, S.S., Shafait, F., Breuel, T.M.: Border noise removal of camera-

captured document images using page frame detection. In: Iwamura, M.,

Shafait, F. (eds.) Camera-Based Document Analysis and Recognition. pp.

126–137. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)

[3] Bukhari, S.S., Shafait, F., Breuel, T.M.: The IUPR Dataset of Camera-

Captured Document Images, pp. 164–171. Springer Berlin Heidelberg,

Berlin, Heidelberg (2012)

[4] Cao, H., Ding, X., Liu, C.: A cylindrical surface model to rec-

tify the bound document image. In: Proceedings Ninth IEEE Interna-

tional Conference on Computer Vision. pp. 228–233 vol.1 (Oct 2003).

https://doi.org/10.1109/ICCV.2003.1238346

[5] Dey, S., Mitra, B., Mukhopadhyay, J., Sural, S.: A comparative study

of margin noise removal algorithms on marnr: A margin noise dataset of

document images. In: 2017 14th IAPR International Conference on Docu-

ment Analysis and Recognition (ICDAR). vol. 04, pp. 35–39 (Nov 2017).

https://doi.org/10.1109/ICDAR.2017.310

[6] Dey, S., Mukhopadhyay, J., Sural, S., Bhowmick, P.: Margin noise removal

from printed document images. In: Proceeding of the Workshop on Docu-

ment Analysis and Recognition. pp. 86–93. DAR ’12, ACM, New York, NY,

USA (2012). https://doi.org/10.1145/2432553.2432570

[7] Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity met-

rics based on deep networks. CoRR abs/1602.02644 (2016)

10 A. Garai et al.

[8] Garai, A., Biswas, S., Mandal, S., Chaudhuri, B.B.: Automatic dewarping

of camera captured born-digital bangla document images. In: 2017 Ninth

International Conference on Advances in Pattern Recognition (ICAPR).

pp. 1–6 (Dec 2017). https://doi.org/10.1109/ICAPR.2017.8593157

[9] H.Lu, Kot, A.C., Shi, Y.Q.: Distance-reciprocal distortion measure for bi-

nary document images. IEEE Signal Processing Letters 11(2), 228–231 (Feb

2004). https://doi.org/10.1109/LSP.2003.821748

[10] Jian Zhai, Liu Wenyin, Dori, D., Qing Li: A line drawings degradation

model for performance characterization. In: Seventh International Confer-

ence on Document Analysis and Recognition, 2003. Proceedings. pp. 1020–

1024 (Aug 2003). https://doi.org/10.1109/ICDAR.2003.1227813

[11] Ke, M., Zhixin, S., Bai, X., Jue, W., Dimitris, S.: Docunet: Document im-

age unwarping via a stacked u-net. In: Proceedings of IEEE Conference on

Computer Vision and Pattern Recognition (2018)

[12] Kieu, V.C., Journet, N., Visani, M., Mullot, R., Domenger, J.P.:

Semi-synthetic document image generation using texture mapping on

scanned 3d document shapes. In: 2013 12th International Conference

on Document Analysis and Recognition. pp. 489–493 (Aug 2013).

https://doi.org/10.1109/ICDAR.2013.104

[13] Kieu, V.C., Visani, M., Journet, N., Mullot, R., Domenger, J.P.: An eﬃcient

parametrization of character degradation model for semi-synthetic image

generation. In: Proceedings of the 2Nd International Workshop on Historical

Document Imaging and Processing. pp. 29–35. HIP ’13 (2013)

[14] Kil, T., Seo, W., Koo, H.I., Cho, N.I.: Robust document image dewarping

method using text-lines and line segments. In: 2017 14th IAPR International

Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp.

865–870 (Nov 2017). https://doi.org/10.1109/ICDAR.2017.146

[15] Meng, G., Yuan, K., Wu, Y., Xiang, S., Pan, C.: Deep networks

for degraded document image binarization through pyramid reconstruc-

tion. In: 2017 14th IAPR International Conference on Document Anal-

ysis and Recognition (ICDAR). vol. 01, pp. 727–732 (Nov 2017).

https://doi.org/10.1109/ICDAR.2017.124

[16] Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evalua-

tion methodology for historical document image binarization. IEEE

Transactions on Image Processing 22(2), 595–609 (Feb 2013).

https://doi.org/10.1109/TIP.2012.2219550

[17] Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: Icdar2017 competition

on document image binarization (dibco 2017). In: 2017 14th IAPR Interna-

tional Conference on Document Analysis and Recognition (ICDAR). vol. 01,

pp. 1395–1403 (Nov 2017). https://doi.org/10.1109/ICDAR.2017.228

[18] Shafait, F., Breuel, T.M.: A simple and eﬀective approach for border noise

removal from document images. In: 2009 IEEE 13th International Multi-

topic Conference. pp. 1–5 (2009)

[19] Shafait, F.: Document image dewarping contest. In: in 2nd Int. Workshop

on Camera-Based Document Analysis and Recognition. pp. 181–188 (2007)

[20] Shafait, F., van Beusekom, J., Keysers, D., Breuel, T.M.: Document cleanup

using page frame detection. International Journal of Document Analysis and

Recognition (IJDAR) 11(2), 81–96 (Nov 2008)

[21] Su, B., Lu, S., Tan, C.L.: Robust document image binarization technique for

degraded document images. IEEE Transactions on Image Processing 22(4),

1408–1417 (April 2013). https://doi.org/10.1109/TIP.2012.2231089

ResearchGate has not been able to resolve any citations for this publication.

Semi-synthetic Document Image Generation Using Texture Mapping on Scanned 3D Document Shapes

Conference Paper

Full-text available

Aug 2013

This article presents a method for generating semisynthetic images of old documents where the pages might be torn (not flat). By using only 2D deformation models, most existing methods give non-realistic synthetic document images. Thus, we propose to use 3D approach for reproducing geometric distortions in real documents. First, our new texture coordinate generation technique extracts texture coordinates of each vertex in the document shape (mesh) resulting from 3D scanning of a real degraded document. Then, any 2D document image can be overlayed on the mesh by using an existing texture image mapping method. As a result, many complex real geometric distortions can be integrated in generated synthetic images. These images then can be used for enriching training sets or for performance evaluation. Our degradation method here is jointly used with the character degradation model we proposed in [1] to generate the 6000 semi-synthetic degraded images of the music score removal staff line competition of ICDAR 2013

Automatic dewarping of Camera Captured Born-Digital Bangla Document Images

Conference Paper

Dec 2017

DocUNet: Document Image Unwarping via a Stacked U-Net

Conference Paper

Jun 2018

Deep Networks for Degraded Document Image Binarization through Pyramid Reconstruction

Conference Paper

Nov 2017

ICDAR2017 Competition on Document Image Binarization (DIBCO 2017)

Conference Paper

Nov 2017

A Comparative Study of Margin Noise Removal Algorithms on MarNR: A Margin Noise Dataset of Document Images

Conference Paper

Nov 2017

Robust Document Image Dewarping Method Using Text-Lines and Line Segments

Conference Paper

Nov 2017

Generating Images with Perceptual Similarity Metrics based on Deep Networks

Article

Feb 2016

Image-generating machine learning models are typically trained with loss functions based on distance in the image space. This often leads to over-smoothed results. We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), that mitigate this problem. Instead of computing distances in the image space, we compute distances between image features extracted by deep neural networks. This metric better reflects perceptually similarity of images and thus leads to better results. We show three applications: autoencoder training, a modification of a variational autoencoder, and inversion of deep convolutional networks. In all cases, the generated images look sharp and resemble natural images.

The IUPR Dataset of Camera-Captured Document Images

Conference Paper

Sep 2011

Major challenges in camera-base document analysis are dealing with uneven shadows, high degree of curl and perspective distortions. In CBDAR 2007, we introduced the first dataset (DFKI-I) of camera-captured document images in conjunction with a page dewarping contest. One of the main limitations of this dataset is that it contains images only from technical books with simple layouts and moderate curl/skew. Moreover, it does not contain information about camera's specifications and settings, imaging environment, and document contents. This kind of information would be more helpful for understanding the results of the experimental evaluation of camera-based document image processing (binarization, page segmentation, dewarping, etc.). In this paper, we introduce a new dataset (the IUPR dataset) of camera-captured document images. As compared to the previous dataset, the new dataset contains images from different varieties of technical and non-technical books with more challenging problems, like different types of layouts, large variety of curl, wide range of perspective distortions, and high to low resolutions. Additionally, the document images in the new dataset are provided with detailed information about thickness of books, imaging environment and camera's viewing angle and its internal settings. The new dataset will help research community to develop robust camera-captured document processing algorithms in order to solve the challenging problems in the dataset and to compare different methods on a common ground.

Margin noise removal from printed document images

Conference Paper

Dec 2012

In this paper, we propose a technique for removing margin noise (both textual and non-textual noise) from scanned document images. We perform layout analysis to detect words, lines, and paragraphs in the document image. These detected elements are classified into text and non-text components on the basis of their characteristics (size, position, etc.). The geometric properties of the text blocks are sought to detect and remove the margin noise. We evaluate our algorithm on several scanned pages of Bengali literature books.

A Method to Generate Synthetically Warped Document Image

Abstract and Figures

Recommended publications

Continuous Learning without Forgetting for Person Re-Identification

A Method to Generate Synthetically Warped Document Image

Automatic Rectification of Warped Bangla Document Images

A Theoretical Justification of Warping Generation for Dewarping Using CNN

Dewarping of document images: A semi-CNN based approach