PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The digital camera captured document images may often be warped and distorted due to different camera angles or document surfaces. A robust technique is needed to solve this kind of distortion. The research on dewarping of the document suffers due to the limited availability of benchmark public dataset. In recent times, deep learning based approaches are used to solve the problems accurately. To train most of the deep neural networks a large number of document images is required and generating such a large volume of document images manually is difficult. In this paper, we propose a technique to generate a synthetic warped image from a flat-bedded scanned document image. It is done by calculating warping factors for each pixel position using two warping position parameters (WPP) and eight warping control parameters (WCP). These parameters can be specified as needed depending upon the desired warping. The results are compared with similar real captured images both qualitative and quantitative way.
Content may be subject to copyright.
A Method to Generate Synthetically Warped
Document Image
Arpan Garai1, Samit Biswas1, Sekhar Mandal1, and Bidyut. B. Chaudhuri2,3
1Department of Computer Science and Technology, Indian Institute of Engineering
Sciences and Technology, Shibpur, Hawrah, West Bengal, 711103, India
arpangarai@gmail.com,{samit,sekhar}@cs.iiests.ac.in,
2Techno India University, Kolkata
3Computer Vision and Pattern Recognition Unit, Indian Statistical Institute,
Kolkata, India
bidyutbaranchaudhuri@gmail.com
Abstract. The digital camera captured document images may often be
warped and distorted due to different camera angles or document sur-
faces. A robust technique is needed to solve this kind of distortion. The
research on dewarping of the document suffers due to the limited avail-
ability of benchmark public dataset. In recent times, deep learning based
approaches are used to solve the problems accurately. To train most of
the deep neural networks a large number of document images is required
and generating such a large volume of document images manually is
difficult. In this paper, we propose a technique to generate a synthetic
warped image from a flat-bedded scanned document image. It is done
by calculating warping factors for each pixel position using two warp-
ing position parameters (WPP) and eight warping control parameters
(WCP). These parameters can be specified as needed depending upon
the desired warping. The results are compared with similar real captured
images both qualitative and quantitative way.
Keywords: Synthetic image generation ·Document image processing ·
Dewarping.
1 Introduction
People nowadays prefer to use digital gadgets like cameras with mobile phones
for capturing documents. These images are often distorted due to camera angles
and/or non-planner document surface. Various forms of distortion in document
image may arise due to these situations and warping is one of them. The perfor-
mance of the OCR systems is not satisfactory when a highly warped document
image is input to the OCR systems. A robust algorithm is needed to generate
dewarped images from warped images. Recently, artificial neural networks like
convolutional neural network (CNN) [11] and GAN based approaches [7] are
used for dewarping. To train such a network, a large number of warped images
are required.
arXiv:1910.06621v1 [cs.CV] 15 Oct 2019
2 A. Garai et al.
There are three publicly available warped document images datasets. They
are (i) ‘DFKI document image contest dataset’ [19] and (ii) ‘IUPR dataset’
[3] (iii) ‘Data set by Ke Ma’ [11]. The ‘DFKI document image contest dataset’
consists 102 warped images and only ASCII text ground truth is available. There
are 100 and 130 images present in the ‘IUPR dataset’, and ‘Ke Ma dataset’,
respectively. Generally, such number of images is not enough to train deep neural
networks to solve the dewarping problem. Capturing a large number of images
manually is a difficult task. Moreover, ground truth (images of document taken
on a flat surface) is needed for supervised training. Generation of the ground
truth for images from captured images is a non-trivial task. So, synthetic image
generation is very necessary.
Some techniques are already proposed to generate distortion in document
images. Such methods can roughly be classified into three types [12]. They are
(i) adding noise, [10] (ii) degrading characters, [13] and (iii) distorting the shape
of document images[12]. Kieu et. al. [12] proposed a mesh-based semi-synthetic
method to generate geometric distortion in the image. The mesh is generated
using ‘the Kron Aquilon laser 3D scanner’.
In this work, a technique is proposed where synthetic warped images are
generated from a flat-bed scanned document image. In this technique, warping
factors are specified for each pixel position of the image. The warping factors
are estimated using cubic spline interpolation. The positions of the knot points
are generated by warping position parameters (WPP) and values of those knot
points are calculated by warping control parameters (WCP). These parameters
can be generated as necessary to create various kinds of warping. The curvy-ness
of the surface of a document is estimated using Cylindrical Surface Model [4].
The process is described in the Section- 2.
To measure the similarity between ground truth image and dewarped image
various metric has been used previously [1], [14]. They are based on OCR, feature
of text line etc. There is no evaluation method where pixel wise comparison is
done between ground truth image and dewarped image due to the difference of
the resolution of ground truth image and real captured image. The proposed
synthetic image generation can be applied to solve this problem and fulfil the
gap. The proposed method takes a binary image as an input then it generates
warped image synthetically. Next, warped image is dewarped using a dewarping
technique. Finally, the dewarped image is compared with the input image using
pixel wise comparison measures like F-Measure [17], pseudo F-Measure (Fps)
[16] [17], PSNR [17], DRD [9] [17], Recall [17], Precision [17] etc.
2 Proposed Method
The convolutional neural network (CNN) can be used to find warping position
parameters from the warped document image. A huge number of documents is
required to train a CNN and capturing it is difficult to capture large number of
document images.
A Method to Generate Synthetically Warped Document Image 3
(a) (b) (c)
Fig. 1. Example of warped documents for different reasons (a) Distance difference, (b)
Foreshortening difference with camera at the middle, (c) Foreshortening difference with
camera at the bottom.
Hence, we generate synthetically the warped document images. Here, we de-
veloped a technique to generate several warped document images from a single
captured image. We use cylindrical surface model (CSM) proposed in [4] to gen-
erate these images. According to CSM, an image with a cylindrical surface gets
warped due to two major reasons. The first reason is the distance variation from
the camera from different regions of surface (e.g the curvature of the surface) as
shown in Fig. 1(a, c). The other reason is the foreshortening differences that oc-
curs if the surface is non-perpendicularly oriented with the optical axis as shown
in Fig. 1(b). Here, we assume that the surface does not suffer from foreshortening
differences.
Fig. 2. Model for warping generation.
Now, consider the Fig. 2. Let abcd be a document having flat surface. Its
projection on the image plane is denoted by ABCD. Let p(x, y, z ) be a point
on the surface abcd. The z-value for all the points on abcd is constant which is
denoted by D(0). The projection of the point pin the image plane is represented
by P(X, Y ). The values of Xand Ycan be obtained using the equations: X=
f
D(0) x,Y=f
D(0) y[4]. Here fis the focal distance.
Consider another document (a0b0c0d0) having curved surface and its projec-
tion in the image plane is represented by A0B0C0D0. Let us consider a point
p0(x0, y0, z0) on the surface a0b0c0d0and its projection on the image plane is de-
4 A. Garai et al.
noted by P0(X0, Y 0). Note that the z value of a point in a0b0c0d0is a function
of x coordinate and denoted by D(x). We can obtain the X0and Y0using the
equations: X0=f
D(x)x0,Y0=f
D(x)y0.
Consider two points p1(x1, y1, D(0)) and p0
1(x1, y1, D(x)) on the flat surface
and the curved surface, respectively. The x and y values of aforesaid point are
same and the difference of the corresponding y-values in the image plane is ob-
tained as: (Y0
1Y1) = f.y1
D(x)f.y1
D(0) =fy1D(0)D(x)
D(x).D(0) =f y1D(0)D(x)
D(0)[(D(x)D(0))+D(0)] .
While capturing a page of a book/ document on lamp-post, notice-board
etc., the value of (D(x)D(0)) [4] generally lies between 1 cm to 3 cm whereas
D(0) can have values 50 cm to 100 cm. Hence, the value of [D(x)D(0)] can
be neglected comparing to the value of D(0).
(Y0
1Y1)fy1
D(0) D(x)
D(0)2=fy1
d
D(0)2=fy1
D(0) d
D(0)=Y1
d
D(0) (1)
Where D(0) D(x) = dwhich is the variation of the z-coordinates in the
object plane. So, it is clear from Eq. 1 that the translation of the point P0(X0, Y 0)
in y-direction depends on d. We can generate different warped images using
different values of d. Here, dis referred to as warping factor. The value of d
changes pixel to pixel.
In our experiment, we distribute the warping factor in the image plane con-
sidering the surface of a book. The warping factor is row-wise distributed. The
value of din each cell within a row is determined using smooth cubic spline
interpolation function. Five knot points are used for interpolation and the posi-
tion of each knot point is determined by two warping parameters (P1and P2).
The amount of warping at each knot point is controlled by another eight param-
eters (P3,. . . ,P10). The detailed descriptions of the aforesaid parameters are
described in the next section.
2.1 Warping Position Parameters
The parameters P1and P2are used to select the position of the document
warping. The value of P1is multiplied by the width of the first row of the image
and take its nearest integer to get the position of the cell (say Z1) where there
will be no distortion due to warping. Similarly, P2determines the position of
the column (Z2) in the bottom row where distortion will not take place due to
warping. The parameters P1and P2take any one of the of following real numbers
(0.1,0.2,0.3,...,0.9). Using P1and P2we select the position of the third knot
point. The position of second knot points are along the nearest point the middle
of straight line Z1Z2(blue line at Fig. 3) and left boundary (green line at
Fig. 3). The position of forth knot points are along the nearest point the middle
of straight line of Z1Z2and right boundary (yellow line at Fig. 3). Taking
different values of P1and P2, we can change the position of knot points and
hence, we can generate different warped documents from a single document.
A Method to Generate Synthetically Warped Document Image 5
Fig. 3. Positions of 2nd(green), 3r d(blue) and 4th (yellow) knot points
2.2 Warping Control Parameters
Smoothing cubic spline interpolation function is used to warp the document.
Five knots are used for this interpolation function in each row of the image.
The value of the leftmost and rightmost knots of the first row of the image is
denoted by P3and P6, respectively. Similarly, P7and P10 represent the values
of the leftmost and rightmost knots of the last row of the image. Let Dis the
length of a diagonal of the image under test then, the value of this parameter is
k×D. In our experiment, kvaries within the range 0.04 0.6 with a step size
of 0.01.
Similarly, P4and P5specify the values of 2nd and 4th knot point of the top-
most row and P8and P9specify the values of 2nd and 4th knots of the bottom-
most row, respectively. In our experiment, we have specified P4= 0.5×P3,
P5= 0.5×P6,P8= 0.5×P7and P9= 0.5×P10.
Consider the values of 1st, 2nd, 4th and 5th knots of the ith row are Gi
1,
Gi
2,Gi
3and Gi
4, receptively. They are determined using the following equations.
Here, Ris total number of rows present in the image.
Gi
1=P7P3
R1×(i1) + P3,Gi
2=P8P4
R1×(i1) + P4
Gi
4=P9P5
R1×(i1) + P5,Gi
5=P10 P6
R1×(i1) + P6
The third knot of the interpolation function is on line passing through the
points Z1and Z2and hence, its value is zero.
2.3 Calculating the warping factors
Using the values of five knots at each row we interpolate the values of the other
points of the row. We have applied smoothing spline regression to interpolate
the values. An example of the warping factors at the top-most and bottom-most
row is shown in Fig. 4(a) and 4(b) respectively. At each location of the image
6 A. Garai et al.
(a) (b)
(c)
Fig. 4. (a) Warping Factors for each pixel the top-most row; (b) Warping Factors for
each pixel the bottom-most row; (c) Three dimensional plotting of the warping factors.
we have calculated a warping factor and they are stored in a 2-D array F. The
of 3-D representation of Fis shown in Fig. 4(c).
Finally, each pixel is translated using the respective warping factors. Here,
nearest neighbour interpolation is used during the process of translation. An
example of scanned image and its corresponding synthetic warped image is shown
in Fig. 5. Here, in Fig. 4, and 5, the value for P1and P2are 0.2 and 0.1,
respectively. The diagonal of the image is D= 2521 pixel. The values of warping
control parameters are:P3= 0.045×D= 113, P6= 0.04×D= 101, P7=0.06×
D=151, P10 =0.055 ×D=139, P4= 113/2 = 56.5, P5= 101/2 = 50.5,
P8=151/2 = 75.5 and P9=139/2 = 69.5.
Table 1. Value of WPP used to generate different type of warping
Various types of warping P1P2
Type-I 0.1,0.2,0.3 0.1,0.2,0.3
Type-II 0.7,0.8,0.9 0.7,0.8,0.9
Type-III & IV 0.5 0.5
3 Experimental Results and Evalution
The method is implemented and tested in a PC (Intel(R) Core (TM) i7-6700 3.4
GHz CPU, running Ubuntu 16.04 platform). No specialised hardware is needed
to generate the warping factors of each pixel. We have used ‘HP LaserJet M1005
MFP Multi-function Printer’ to scan a document. In our experiment, we have
generated images having resolution 300 dpi and 600 dpi. Here, four different
types of warping are considered: Type-I. Book having undistorted part at left
(as shown in Fig. 6(a, b)); Type-II. Book having undistorted part at right (as
A Method to Generate Synthetically Warped Document Image 7
(a) (b)
Fig. 5. (a) Scanned image (b) Synthetic warped image
(a) (b) (c) (d)
(e) (f) (g) (h)
Fig. 6. Visual comparison: Book page image undistorted part at left: (a) Pre-processed
real captured warped image, (b) Synthetic warped image; Book page image undistorted
part at right: (c) Pre-processed real captured warped image, (d) Synthetic warped
image; Document pasted on a cylindrical lamp-post: (e) Pre-processed real captured
warped image, (f) Synthetic warped image; Document hanging from notice-board: (g)
Pre-processed real captured warped image, (h) Synthetic warped image.
shown in Fig. 6(c, d)); Type-III. Document pasted on Lamp-post (as shown
in Fig. 6(e, f)); Type-IV. Document attached only at top-middle on a notice
8 A. Garai et al.
Table 2. Value of WCP used to generate different type of warping
Types of warping P
3, P
6P
7, P
10
Type- I & II & III (0.04 ×D),...,(0.06 ×D)(0.04 ×D),...,(0.06 ×D)
Type-IV (0.04 ×D),...,(0.06 ×D) (0.04 ×D),...,(0.06 ×D)
The step size for P3, P6, P7, P10 is (0.005 ×D) where Dis the length of the
diagonal of the image.
board (as shown in Fig. 6(g, h)). The undistorted part at left suggests that
the perpendicular drawn from optical centre to the document surface, hits the
document surface at left side of the middle of the document. The different values
of parameters for creating different images are shown in Table 1 and 2.
To visualize the performance of the proposed synthetic warped image gen-
eration technique, we have captured some images using mobile camera and the
images have different variety of warping. These images are binarized and the bor-
der noises are removed. Many techniques are proposed to binarise a document
image [17], [15], [21] and remove border noises from document images [5], [18],
[20], [6], [2] recently. Here we have used a recent simple and robust binarisation
technique proposed by Su et. al. [21]. Among all the border noise removal tech-
niques most of the approaches works good enough for document images having
a flat surface. But method proposed by Bukhari et. al. [2] is specially designed
for camera captured warped document image. So, we have used this technique
to remove the border noises. The same documents are scanned using flat-bed
scanner and binarized. These scanned images are used as the inputs to the pro-
posed warped image generation method and try to generate the warped images
which are look like camera captured images. Fig. 6 shows a set of pre-processed
camera captured images and the corresponding synthetic images generated by
the proposed method. It is evident from Fig. 6 that synthetic warped images is
almost similar to their corresponding camera captured images.
Fig. 7. Calculation of curvature
To measure the performance of the proposed method, we calculate the cur-
vatures of the ‘headline’ of text lines present in camera captured warped image
and the corresponding synthetic warped image. Here, the curvature is calculated
using three points and they are two end points of the ‘headline’ and the point
A Method to Generate Synthetically Warped Document Image 9
on the ‘headline’ from where the slope changes its sign. Let Crand Csbe the
curvature of the ‘matra/headline’ of text lines present in the real image and
synthetic image respectively. An example of a ’headline’ for a particular text
line is shown in Fig. 7 and the ’headline is obtained using the method proposed
in [8]. For each image (real and synthetic), we have considered four text lines to
evaluate the performance of the proposed method. Two text lines from the top
and two from the bottom of the image. The length of the each considered text
line is greater than 80% of the longest text line present in each image.
The root mean squired error (RMSE), Rcis calculated using the formula:
Rc=qPN
i=1[Cr(i)Cs(i)]2
N. Here, Nis the number of text lines under consider-
ation. We have considered 10 images from each of four types of warping (Total
40 images). The average of all the values of Rcis 3.78 ×105.
4 Conclusion
The approach of generating synthetic images proposed by us not only helps
in training neural networks but also to measure the performance dewarping
approaches. The proposed method can be used to generated different types of
warped images. The dewarping of non-text part of a document like figures or
tables can also be evaluated along with the text part by the proposed technique.
Bibliography
[1] Document dewarping via text-line based optimization. Pattern Recognition
48(11), 3600 – 3614 (2015)
[2] Bukhari, S.S., Shafait, F., Breuel, T.M.: Border noise removal of camera-
captured document images using page frame detection. In: Iwamura, M.,
Shafait, F. (eds.) Camera-Based Document Analysis and Recognition. pp.
126–137. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)
[3] Bukhari, S.S., Shafait, F., Breuel, T.M.: The IUPR Dataset of Camera-
Captured Document Images, pp. 164–171. Springer Berlin Heidelberg,
Berlin, Heidelberg (2012)
[4] Cao, H., Ding, X., Liu, C.: A cylindrical surface model to rec-
tify the bound document image. In: Proceedings Ninth IEEE Interna-
tional Conference on Computer Vision. pp. 228–233 vol.1 (Oct 2003).
https://doi.org/10.1109/ICCV.2003.1238346
[5] Dey, S., Mitra, B., Mukhopadhyay, J., Sural, S.: A comparative study
of margin noise removal algorithms on marnr: A margin noise dataset of
document images. In: 2017 14th IAPR International Conference on Docu-
ment Analysis and Recognition (ICDAR). vol. 04, pp. 35–39 (Nov 2017).
https://doi.org/10.1109/ICDAR.2017.310
[6] Dey, S., Mukhopadhyay, J., Sural, S., Bhowmick, P.: Margin noise removal
from printed document images. In: Proceeding of the Workshop on Docu-
ment Analysis and Recognition. pp. 86–93. DAR ’12, ACM, New York, NY,
USA (2012). https://doi.org/10.1145/2432553.2432570
[7] Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity met-
rics based on deep networks. CoRR abs/1602.02644 (2016)
10 A. Garai et al.
[8] Garai, A., Biswas, S., Mandal, S., Chaudhuri, B.B.: Automatic dewarping
of camera captured born-digital bangla document images. In: 2017 Ninth
International Conference on Advances in Pattern Recognition (ICAPR).
pp. 1–6 (Dec 2017). https://doi.org/10.1109/ICAPR.2017.8593157
[9] H.Lu, Kot, A.C., Shi, Y.Q.: Distance-reciprocal distortion measure for bi-
nary document images. IEEE Signal Processing Letters 11(2), 228–231 (Feb
2004). https://doi.org/10.1109/LSP.2003.821748
[10] Jian Zhai, Liu Wenyin, Dori, D., Qing Li: A line drawings degradation
model for performance characterization. In: Seventh International Confer-
ence on Document Analysis and Recognition, 2003. Proceedings. pp. 1020–
1024 (Aug 2003). https://doi.org/10.1109/ICDAR.2003.1227813
[11] Ke, M., Zhixin, S., Bai, X., Jue, W., Dimitris, S.: Docunet: Document im-
age unwarping via a stacked u-net. In: Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition (2018)
[12] Kieu, V.C., Journet, N., Visani, M., Mullot, R., Domenger, J.P.:
Semi-synthetic document image generation using texture mapping on
scanned 3d document shapes. In: 2013 12th International Conference
on Document Analysis and Recognition. pp. 489–493 (Aug 2013).
https://doi.org/10.1109/ICDAR.2013.104
[13] Kieu, V.C., Visani, M., Journet, N., Mullot, R., Domenger, J.P.: An efficient
parametrization of character degradation model for semi-synthetic image
generation. In: Proceedings of the 2Nd International Workshop on Historical
Document Imaging and Processing. pp. 29–35. HIP ’13 (2013)
[14] Kil, T., Seo, W., Koo, H.I., Cho, N.I.: Robust document image dewarping
method using text-lines and line segments. In: 2017 14th IAPR International
Conference on Document Analysis and Recognition (ICDAR). vol. 01, pp.
865–870 (Nov 2017). https://doi.org/10.1109/ICDAR.2017.146
[15] Meng, G., Yuan, K., Wu, Y., Xiang, S., Pan, C.: Deep networks
for degraded document image binarization through pyramid reconstruc-
tion. In: 2017 14th IAPR International Conference on Document Anal-
ysis and Recognition (ICDAR). vol. 01, pp. 727–732 (Nov 2017).
https://doi.org/10.1109/ICDAR.2017.124
[16] Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evalua-
tion methodology for historical document image binarization. IEEE
Transactions on Image Processing 22(2), 595–609 (Feb 2013).
https://doi.org/10.1109/TIP.2012.2219550
[17] Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: Icdar2017 competition
on document image binarization (dibco 2017). In: 2017 14th IAPR Interna-
tional Conference on Document Analysis and Recognition (ICDAR). vol. 01,
pp. 1395–1403 (Nov 2017). https://doi.org/10.1109/ICDAR.2017.228
[18] Shafait, F., Breuel, T.M.: A simple and effective approach for border noise
removal from document images. In: 2009 IEEE 13th International Multi-
topic Conference. pp. 1–5 (2009)
[19] Shafait, F.: Document image dewarping contest. In: in 2nd Int. Workshop
on Camera-Based Document Analysis and Recognition. pp. 181–188 (2007)
[20] Shafait, F., van Beusekom, J., Keysers, D., Breuel, T.M.: Document cleanup
using page frame detection. International Journal of Document Analysis and
Recognition (IJDAR) 11(2), 81–96 (Nov 2008)
[21] Su, B., Lu, S., Tan, C.L.: Robust document image binarization technique for
degraded document images. IEEE Transactions on Image Processing 22(4),
1408–1417 (April 2013). https://doi.org/10.1109/TIP.2012.2231089
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
This article presents a method for generating semisynthetic images of old documents where the pages might be torn (not flat). By using only 2D deformation models, most existing methods give non-realistic synthetic document images. Thus, we propose to use 3D approach for reproducing geometric distortions in real documents. First, our new texture coordinate generation technique extracts texture coordinates of each vertex in the document shape (mesh) resulting from 3D scanning of a real degraded document. Then, any 2D document image can be overlayed on the mesh by using an existing texture image mapping method. As a result, many complex real geometric distortions can be integrated in generated synthetic images. These images then can be used for enriching training sets or for performance evaluation. Our degradation method here is jointly used with the character degradation model we proposed in [1] to generate the 6000 semi-synthetic degraded images of the music score removal staff line competition of ICDAR 2013
Article
Image-generating machine learning models are typically trained with loss functions based on distance in the image space. This often leads to over-smoothed results. We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), that mitigate this problem. Instead of computing distances in the image space, we compute distances between image features extracted by deep neural networks. This metric better reflects perceptually similarity of images and thus leads to better results. We show three applications: autoencoder training, a modification of a variational autoencoder, and inversion of deep convolutional networks. In all cases, the generated images look sharp and resemble natural images.
Conference Paper
Major challenges in camera-base document analysis are dealing with uneven shadows, high degree of curl and perspective distortions. In CBDAR 2007, we introduced the first dataset (DFKI-I) of camera-captured document images in conjunction with a page dewarping contest. One of the main limitations of this dataset is that it contains images only from technical books with simple layouts and moderate curl/skew. Moreover, it does not contain information about camera's specifications and settings, imaging environment, and document contents. This kind of information would be more helpful for understanding the results of the experimental evaluation of camera-based document image processing (binarization, page segmentation, dewarping, etc.). In this paper, we introduce a new dataset (the IUPR dataset) of camera-captured document images. As compared to the previous dataset, the new dataset contains images from different varieties of technical and non-technical books with more challenging problems, like different types of layouts, large variety of curl, wide range of perspective distortions, and high to low resolutions. Additionally, the document images in the new dataset are provided with detailed information about thickness of books, imaging environment and camera's viewing angle and its internal settings. The new dataset will help research community to develop robust camera-captured document processing algorithms in order to solve the challenging problems in the dataset and to compare different methods on a common ground.
Conference Paper
In this paper, we propose a technique for removing margin noise (both textual and non-textual noise) from scanned document images. We perform layout analysis to detect words, lines, and paragraphs in the document image. These detected elements are classified into text and non-text components on the basis of their characteristics (size, position, etc.). The geometric properties of the text blocks are sought to detect and remove the margin noise. We evaluate our algorithm on several scanned pages of Bengali literature books.