Fig 2 - uploaded by Roger David Boyle
Content may be subject to copyright.
Sample input scanned and back-lit images – part of a page. The back-lit image shows verso inscription and details within the paper – watermark features are in the right hand margin. 

Sample input scanned and back-lit images – part of a page. The back-lit image shows verso inscription and details within the paper – watermark features are in the right hand margin. 

Source publication
Article
Full-text available
We consider the problem of locating a watermark in pages of archaic documents that have been both scanned and back-lit: the problem is of interest to codicologists in identifying and tracking paper materials. Commonly, documents of interest are worn or damaged, and all information is victim to very unfavourable signal-to-noise ratios—this is especi...

Contexts in source publication

Context 1
... these, the most challenging by far is the first and it is on this document that we have done most development: an example is at Figure 2. The others provide regular verification that ‘easier’ scans are indeed more easily accessible. Each of these volumes carries different watermarks and in aggregate represent over 800 sheets. We see merit in estab- lishing benchmark datasets for the work we have done; since creation of such data may be non-trivial in time and library permissions, we have made these scans publicly available online [30]. In earlier work [31] we have located watermarks using an image processing bottom-up approach that deploys background estimation and morphology, and is comparable in power to others used in the literature, with the benefit of automatic parameter selection. On the very difficult data of the ‘Mahdiyya’ Qur’ an we found these approaches less than successful and in some cases wholly unproductive as a result of the noise levels and very faint watermark evidence [29]. We chose instead to build a model of back-lighting to take a top-down view. This model is illustrated in simplified form in Figure 3. The RGB vector detected at a particular pixel is dependent on the paper properties (absence or presence of watermark or other manufactured feature), recto features and verso features. In an ideal world, blank featureless paper (labelled ‘A’ in the Figure) would always produce the same output, but we do not have to assume that the same is true of inked regions (e.g., ‘B’), paper features, or combinations thereof. For clarity, we shall define at this point a feature to be visible if it is visible on the recto – thus, recto writing and other paper features visible to the reader. Other features be- trayed in the back-lit image (watermark, verso writing, dirt on the verso face etc.) we shall collectively call hidden . Back- lit pixels at which no hidden data are evident we shall call uncorrupted . In fact, the noise and damage that we experience produce significant variations across all regions that we might wish to be internally homogeneous, as may be clear from Figure 2. This however is not critical – what we can exploit is the difference between pixels that represent just blank paper or recto features, and those representing verso or other features, such as internal ones. Consider momentarily a blank, featureless page which we scan as image S and back-light as image B , and define an image D in which pixels are given by the difference between their detected back-lit intensity (in B ), and the intensity we might expect given the corresponding location in S . In the ideal case this page will be of uniform intensity ( r , g , b ) in S and, say, ( ρ , γ , β ) in B . We hypothesise some transform T which describes the back-lighting, and subtract T ( r , g , b ) from the corresponding ( ρ , γ , β ) in B . We should see (0,0,0) at all locations. If there are paper or verso features (invisible in S ), these will be revealed by this differencing process. In fact, of course, regions are not uniform in intensity and blank paper will scan and back-light as a range of ( r , g , b ) , ( ρ , γ , β ) vectors – these may, however, be expected to cluster reasonably tightly, and to be related to each other. If we ...
Context 2
... these, the most challenging by far is the first and it is on this document that we have done most development: an example is at Figure 2. The others provide regular verification that ‘easier’ scans are indeed more easily accessible. Each of these volumes carries different watermarks and in aggregate represent over 800 sheets. We see merit in estab- lishing benchmark datasets for the work we have done; since creation of such data may be non-trivial in time and library permissions, we have made these scans publicly available online [30]. In earlier work [31] we have located watermarks using an image processing bottom-up approach that deploys background estimation and morphology, and is comparable in power to others used in the literature, with the benefit of automatic parameter selection. On the very difficult data of the ‘Mahdiyya’ Qur’ an we found these approaches less than successful and in some cases wholly unproductive as a result of the noise levels and very faint watermark evidence [29]. We chose instead to build a model of back-lighting to take a top-down view. This model is illustrated in simplified form in Figure 3. The RGB vector detected at a particular pixel is dependent on the paper properties (absence or presence of watermark or other manufactured feature), recto features and verso features. In an ideal world, blank featureless paper (labelled ‘A’ in the Figure) would always produce the same output, but we do not have to assume that the same is true of inked regions (e.g., ‘B’), paper features, or combinations thereof. For clarity, we shall define at this point a feature to be visible if it is visible on the recto – thus, recto writing and other paper features visible to the reader. Other features be- trayed in the back-lit image (watermark, verso writing, dirt on the verso face etc.) we shall collectively call hidden . Back- lit pixels at which no hidden data are evident we shall call uncorrupted . In fact, the noise and damage that we experience produce significant variations across all regions that we might wish to be internally homogeneous, as may be clear from Figure 2. This however is not critical – what we can exploit is the difference between pixels that represent just blank paper or recto features, and those representing verso or other features, such as internal ones. Consider momentarily a blank, featureless page which we scan as image S and back-light as image B , and define an image D in which pixels are given by the difference between their detected back-lit intensity (in B ), and the intensity we might expect given the corresponding location in S . In the ideal case this page will be of uniform intensity ( r , g , b ) in S and, say, ( ρ , γ , β ) in B . We hypothesise some transform T which describes the back-lighting, and subtract T ( r , g , b ) from the corresponding ( ρ , γ , β ) in B . We should see (0,0,0) at all locations. If there are paper or verso features (invisible in S ), these will be revealed by this differencing process. In fact, of course, regions are not uniform in intensity and blank paper will scan and back-light as a range of ( r , g , b ) , ( ρ , γ , β ) vectors – these may, however, be expected to cluster reasonably tightly, and to be related to each other. If we ...

Similar publications

Article
Full-text available
Current paper presents a novel scheme for biomedical image watermarking by hiding multiple copies of the same data in the cover image using bit replacement in horizontal (HL) and vertical (LH) resolution approximation image components of wavelet domain. The proposed scheme use an approach for recovering the hidden information from the damaged copie...

Citations

... The identification issue is further complicated by the presence of handwriting on the watermarks and various types of noises [4]. Hand tracing and backlit photography are two traditional methods of watermark retrieval [5]. A number of research works have emphasized localizing and extracting a watermark pattern from a backlit image, rarely incorporating aligned reflected light images [6], [7], [8]. ...
Preprint
Full-text available
The identification and restoration of ancient watermarks have long been a major topic in codicology and history. Classifying historical documents based on watermarks can be difficult due to the diversity of watermarks, crowded and noisy samples, multiple modes of representation, and minor distinctions between classes and intra-class changes. This paper proposes a U-net-based conditional generative adversarial network (GAN) to translate noisy raw historical watermarked images into clean, handwriting-free images with just watermarks. Considering its ability to perform image translation from degraded (noisy) pixels to clean pixels, the proposed network is termed as Npix2Cpix. Instead of employing directly degraded watermarked images, the proposed network uses image-to-image translation using adversarial learning to create clutter and handwriting-free images for restoring and categorizing the watermarks for the first time. In order to learn the mapping from input noisy image to output clean image, the generator and discriminator of the proposed U-net-based GAN are trained using two separate loss functions, each of which is based on the distance between images. After using the proposed GAN to pre-process noisy watermarked images, Siamese-based one-shot learning is used to classify watermarks. According to experimental results on a large-scale historical watermark dataset, extracting watermarks from tainted images can result in high one-shot classification accuracy. The qualitative and quantitative evaluation of the retrieved watermarks illustrates the effectiveness of the proposed approach.
... [5] is a web catalog containing watermarks obtained from backlit paper, ref. [6] used transmitted light images to locate watermarks, refs. [7,8] combined transmitted light images with surface images to study the papers of Leonardo and Dürer. ...
... Update parameters: 6: ...
Article
Full-text available
This paper introduces the Watermark Imaging System (WImSy) which can be used to photograph , document, and study sheets of paper. The WImSy provides surface images, raking light images, and transmitted light images of the paper, all in perfect alignment. We develop algorithms that exploit this alignment by combining several images together in a process that mimics both the "surface image removal" technique and the method of "high dynamic range" photographs. An improved optimization criterion and an automatic parameter selection procedure streamline the process and make it practical for art historians and conservators to extract the relevant information to study watermarks. The effectiveness of the method is demonstrated in several experiments on images taken with the WImSy at the Metropolitan Museum of Art in New York and at the Getty Museum in Los Angeles, and the results are compared with manually optimized images.
... Just-noticeable-distortion is used to estimate the energy to remove the watermark [4]. Taking into questions of scanned and back-lit pages in archaic documents account, a known lexicon of fragments is exploited to find watermarks and remove them for overcoming the effects of damaged files [5]. To improve the effect of removing watermarks, the discrete cosine transform domain and a key-based matrix are fused to remove visible watermarking [6]. ...
Article
Full-text available
Convolutional neural networks (CNNs) with different layers have performed with excellent results in watermark removal. However, how to extract robust and effective features via CNNs of black box in watermark removal is very important. In this paper, we propose an improved watermark removal U-net (IWRU-net). Taking the robustness of obtained information into account, a serial architecture is designed to facilitate useful information for guaranteeing performance in watermark removal. Taking the problem of long-term dependency into account, U-nets based simple components are integrated into the serial architecture to extract more salient hierarchical information for addressing watermark removal problems. To increase the adaptability of IWRU-net to the real world, we use randomly distributed blind watermarks to implement a blind watermark removal model. The experiment results illustrate that the proposed method is superior to other popular watermark removal methods in terms of quantitative and qualitative evaluations.
... After slipping a light sheet between each print and its mat as described in [27], one photograph was taken in ambient light and another with the print illuminated from below with the light sheet. A weighted subtraction of the two images was performed using the algorithm outlined in [28] in order to increase the visibility of watermark data within the sheets. Captures were made with a Canon EOS 50 Mark III DSLR, fitted with a Canon Zoom Lens EF 24-105 mm and with a Daylight Wafer 2 LED light table (model number D/E/U/A 35030). ...
Article
Full-text available
Northwestern University’s Charles Deering McCormick Library of Special Collections owns three hand-colored copperplate engravings that once belonged to an edition of Matamorphosis Insectorum Surinamensium by artist-naturalist Maria Sibylla Merian (1647–1717). Because early modern prints are often colored by early modern readers, or modern collectors, it was initially unclear whether the coloring on these prints should be attributed to the print maker, to subsequent owners or collectors, or to an art dealer. Such ambiguities posed challenges for the interpretation of these prints by art historians. Therefore, the prints underwent multi-modal, non-invasive technical analysis to assess the date and material composition of the prints’ coloring. The work combined several different non-invasive analytical techniques: hyperspectral imaging (HSI), macro X-ray fluorescence (MA-XRF) mapping, surface normal mapping with photometric stereo, visible light photography, and visual comparative art historical analysis. As a result, the prints and paper were attributed to a late eighteenth-century posthumous edition of Merian’s work while the colorants were dated to the early twentieth century. This information enables more thorough contextualization of these prints in their use as teaching and research tools in the University collection.
... A complete review of techniques developed to reproduce watermarks is outside the scope of this work and can be found for example in [8]. We will focus on simple approaches: manual tracing and back-lit photography. ...
Preprint
Full-text available
Historical watermark recognition is a highly practical, yet unsolved challenge for archivists and historians. With a large number of well-defined classes, cluttered and noisy samples, different types of representations, both subtle differences between classes and high intra-class variation, historical watermarks are also challenging for pattern recognition. In this paper, overcoming the difficulty of data collection, we present a large public dataset with more than 6k new photographs, allowing for the first time to tackle at scale the scenarios of practical interest for scholars: one-shot instance recognition and cross-domain one-shot instance recognition amongst more than 16k fine-grained classes. We demonstrate that this new dataset is large enough to train modern deep learning approaches, and show that standard methods can be improved considerably by using mid-level deep features. More precisely, we design both a matching score and a feature fine-tuning strategy based on filtering local matches using spatial consistency. This consistency-based approach provides important performance boost compared to strong baselines. Our model achieves 55% top-1 accuracy on our very challenging 16,753-class one-shot cross-domain recognition task, each class described by a single drawing from the classic Briquet catalog. In addition to watermark classification, we show our approach provides promising results on fine-grained sketch-based image retrieval.
... Transillumination or backlighting helps to mitigate against this by illuminating from the reverse of the object and detecting only light which has passed through it. This approach is commonly used in conjunction with multispectral imaging when analysing documents, and has had particular success when examining watermarks [19]. A bespoke illuminator containing an array of 28 white LEDs (LUMEX SLX-LX5093UWC/C, Farnell, UK) was used with the PhaseOne camera system described above. ...
Article
Full-text available
Ancient Egyptian mummies were often covered with an outer casing, panels and masks made from cartonnage: a lightweight material made from linen, plaster, and recycled papyrus held together with adhesive. Egyptologists, papyrologists, and historians aim to recover and read extant text on the papyrus contained within cartonnage layers, but some methods, such as dissolving mummy casings, are destructive. The use of an advanced range of different imaging modalities was investigated to test the feasibility of non-destructive approaches applied to multi-layered papyrus found in ancient Egyptian mummy cartonnage. Eight different techniques were compared by imaging four synthetic phantoms designed to provide robust, well-understood, yet relevant sample standards using modern papyrus and replica inks. The techniques include optical (multispectral imaging with reflection and transillumination, and optical coherence tomography), X-ray (X-ray fluorescence imaging, X-ray fluorescence spectroscopy, X-ray micro computed tomography and phase contrast X-ray) and terahertz-based approaches. Optical imaging techniques were able to detect inks on all four phantoms, but were unable to significantly penetrate papyrus. X-ray-based techniques were sensitive to iron-based inks with excellent penetration but were not able to detect carbon-based inks. However, using terahertz imaging, it was possible to detect carbon-based inks with good penetration but with less sensitivity to iron-based inks. The phantoms allowed reliable and repeatable tests to be made at multiple sites on three continents. The tests demonstrated that each imaging modality needs to be optimised for this particular application: it is, in general, not sufficient to repurpose an existing device without modification. Furthermore, it is likely that no single imaging technique will to be able to robustly detect and enable the reading of text within ancient Egyptian mummy cartonnage. However, by carefully selecting, optimising and combining techniques, text contained within these fragile and rare artefacts may eventually be open to non-destructive imaging, identification, and interpretation.
Article
Full-text available
The Dutch East India Company (VOC) has left behind an extensive collection of archival materials, approximately 20,000 item numbers, composed of various documents such as books and letters from the 17th to 19th centuries. Among this vast collection, a partial portion pertains to VOC's trading posts in Hirado and Dejima, and archives specifically related to Japan, around 2,000 items have been recognized, and paper analysis has been conducted on 201 items dating back to the period between 1614 and 1830. This research involved a mainly watermark examination, revealing that VOC's recorded documents utilized a diverse range of paper types sourced from various countries and regions. These included France (such as the Angoumois region), the Netherlands (specifically the Zaan and Veluwe regions), the United Kingdom, the United States, Japan, and potentially China and Indonesia (the blue paper made in the VOC Batavia paper mill). To gain further insights into the origins of the paper used, a detailed analysis of watermarks and initials found on Western paper has been conducted, providing valuable information about the paper mills and paper makers employed. The results reveal a remarkable diversity of paper within VOC's recorded documents, indicating notable changes over time. The reasons behind these variations in paper sourcing and usage are thoroughly discussed, shedding light on the historical context and trading practices of the VOC during this period. This article can be downloaded from the following site. Nouvelles Chroniques du manuscrit au Yémen http://www.cdmy.org
Article
Historical paper often contains features embedded in its structure that are invisible under standard viewing conditions. These features (watermarks, laid lines, and chain lines) can provide valuable information about a sheet’s provenance. Standard methods of reproducing watermarks, such as beta-radiography and low-voltage x-rays, are costly and time intensive, and therefore inaccessible to many institutions or individuals. In this work we introduce an inexpensive prototype whose elements are a light table and a consumer-grade photographic camera. For a given document we acquire an image with light emitted by a light table passing through the document and two images of the front and the back side with ambient light. The images are then processed to suppress the printed elements and isolate the watermark. The proposed method is capable of recovering images of watermarks similar to the ones obtained with standard methods while being a non-destructive, rapid, easy to operate, and inexpensive method.
Book
In fourteen thoughtful essays this book reports and reflects on the many changes that a digital workflow brings to the world of original texts and textual scholarship, and the effect on scholarly communication practices. The spread of digital technology across philology, linguistics and literary studies suggests that text scholarship is taking on a more laboratory-like image. The ability to sort, quantify, reproduce and report text through computation would seem to facilitate the exploration of text as another type of quantitative scientific data. However, developing this potential also highlights text analysis and text interpretation as two increasingly separated sub-tasks in the study of texts. The implied dual nature of interpretation as the traditional, valued mode of scholarly text comparison, combined with an increasingly widespread reliance on digital text analysis as scientific mode of inquiry raises the question as to whether the reflexive concepts that are central to interpretation – individualism, subjectivity – are affected by the anonymised, normative assumptions implied by formal categorisations of text as digital data.