Example of homophonic key with variable length code (ARA Brus SEG inr.2chiffres1647-98 key3, 2018).

Example of homophonic key with variable length code (ARA Brus SEG inr.2chiffres1647-98 key3, 2018).

Source publication
Conference Paper
Full-text available
We present an empirical study on historical keys in their original form from Early Modern Times (1400-1800) in Europe. We describe the internal structure of keys, and specify what was encoded and how. We present some trends of the construction of historical keys over time. Some of these trends have been sensed but never systematically documented by...

Contexts in source publication

Context 1
... make decryption difficult, the most frequently occurring plaintext characters in a language might have several corresponding codes. Figure 1 illustrates a key based on homophonic substitution with nomenclature from the second half of the 17th century. Each letter in the alphabet has at least one ciphertext symbol represented as a two-digit number or a symbol, and the vowels and double consonants have one additional graphical sign (e.g. ...
Context 2
... the distribution of code types vary not only across but within a single key. In Figure 10, we show the code types for alphabets, nomenclatures as well as for nulls. Typically, while several characters in alphabets are often encoded with two or more codes resulting in a homophonic substitution, elements in nomenclatures tend to have one code only. ...
Context 3
... while several characters in alphabets are often encoded with two or more codes resulting in a homophonic substitution, elements in nomenclatures tend to have one code only. Given the various code types in a key, we analyze the type given their components, see Fig- ure 11. Homophonic substitution is far most popular either on its own or combined with simple substitution. ...
Context 4
... usage of the types of symbols that have been chosen for encoding varied over the centuries, as illustrated in Figure 12. While alphabetical characters, digits, and graphic signs were evenly distributed in the 15th century, we can see a clear increase in tendency to use digits as the main encoding at the expense of Latin letters or graphic signs, which we can hardly find in keys from the 18th century. ...
Context 5
... the 15th century, all three types of symbols were combined in almost all keys, but this eclectic symbol set have been reduced in the 16th and 17th centuries in favor of digits in combination with Latin letters. The distribution of various symbols sets over centuries is shown in Figure 13. The usage of the length of the codes also varies over time, as illustrated in Figure 14. ...
Context 6
... distribution of various symbols sets over centuries is shown in Figure 13. The usage of the length of the codes also varies over time, as illustrated in Figure 14. The great majority of keys contain codes of variable length and the length typically differ between alphabetical elements, nomenclatures, as well as nulls. ...
Context 7
... of alphabetical signs were mostly homophonic, as shown in Figure 15. Quite sur- prisingly, however, we can see a decrease in favor of simple substitution which became more frequent in the 17th and 18th centuries. ...
Context 8
... might be due to the increase in the size of the nomenclatures over time. The usage of nulls in keys also varied over time, as illustrated in Figure 17. While nulls have been frequently occurring in keys, i.e. 96% of keys included nulls in the 15th century, we find nulls in 27% of the keys in the 18th century. ...

Similar publications

Conference Paper
Full-text available
In this paper, we present an empirical study on plaintext entities in historical cipher keys from the 15th to the 18th century to shed light on what linguistic entities have been chosen for encryption. We focus mainly on the nomenclature part of the keys describing longer elements than the plaintext alphabet. We show that the chosen plaintext entit...

Citations

... Many important publications on this subject are presented annually at the International Conference on Historical Cryptology (HistoCrypt). The design and structure of historical cipher keys were investigated in [3,11,14]. These publications are related to the two ongoing projects, namely: ...
Article
Full-text available
This paper deals with historical encrypted manuscripts and introduces an automated method for the detection and transcription of ciphertext symbols for subsequent cryptanalysis. Our database contains documents used in the past by aristocratic families living in the territory of Slovakia. They are encrypted using a nomenclator which is a specific type of substitution cipher. In our case, the nomenclator uses digits as ciphertext symbols. We have proposed a method for the detection, classification, and transcription of handwritten digits from the original documents. Our method is based on Mask R-CNN which is a deep convolutional neural network for instance segmentation. Mask R-CNN was trained on a manually collected database of digit annotations. We employ a specific strategy where the input image is first divided into small blocks. The image blocks are then passed to Mask R-CNN to obtain detections. This way we avoid problems related to the detection of a large number of small dense objects in a high-resolution image. Experiments have shown promising detection performance for all digit types with minimum false detections.
... An important conclusion of this article is that no easy typology of cipher keys can be offered, as various innovations and methods intermingled in the diplomatic practice, and thus the evolution of keys was far from being a linear improvement. Megyesi et al. (2021) undertook a task similar to that of the present article: a systematic analysis of 700 cipher keys collected at that time in the Decode database (Megyesi, Blomqvist, and Pettersson 2019) focusing on the symbol system, languages, nulls, and code types. A year later, the authors continued with studies of the plaintext entities in nomenclatures encoded in 1,384 keys (Megyesi et al. 2022). ...
... The two articles can be seen as previous steps of the present research in two senses: the first being carried out on a limited set of keys, and the second investigating only part of the components of the keys. To our knowledge, the studies by Megyesi et al. (2021Megyesi et al. ( , 2022 were the first that used large scale statistics to analyze the trends and the morphology of 1 https://www.cryptool.org/en/ct2/ the cipher keys but none of them involved the entire nomenclatures of a significantly bigger dataset, making the present study more representative and exhaustive than the previous ones. ...
... The distribution of the above mentioned characteristics of keys over centuries in Europe shows similar results as the pilot study based on a smaller sample of 700 keys described in Megyesi et al. (2021). ...
Article
Full-text available
We give on overview of the development of European historical cipher keys originating from early Modern times. We describe the nature and the structure of the keys with a special focus on the nomenclatures. We analyze what was encoded and how and take into account chronological and regional differences. The study is based on the analysis of over 1,600 cipher keys, collected from archives and libraries in 10 European countries. We show that historical cipher keys evolved over time and became more secure, shown by the symbol set used for encoding, the code length and the code types presented in the key, the size of the nomenclature, as well as the diversity and complexity of linguistic entities that are chosen to be encoded.
... It was first by Megyesi, Tudor, Láng and Lehofer (2021) that a quantitative analysis was made on a European sample of 700 cipher keys although this sample was not representative for the whole of European history (Megyesi et al., 2021). This quantitative analysis was possible thanks to the DECODE database (Megyesi et al., 2019) which is aimed for the storage and description of historical encrypted sources and has be-come the largest source for historical ciphers and keys by today. ...
Conference Paper
Full-text available
In this paper, we present an empirical study on plaintext entities in historical cipher keys from the 15th to the 18th century to shed light on what linguistic entities have been chosen for encryption. We focus mainly on the nomenclature part of the keys describing longer elements than the plaintext alphabet. We show that the chosen plaintext entities to be encoded varied over time. Nomenclatures developed from short lists consisting of names for persons and/or locations to longer, more advanced dictionaries and eventually to codebooks containing a highly diverse and advanced set of linguistic entities.
... Gaining new knowledge about how these ciphers were designed and used is necessary for a better understanding of these ciphers and for developing effective and sophisticated solving methods. It is therefore necessary to investigate the design and structure of historical cipher keys (Tudor et al., 2020;Megyesi et al., 2021). If a nomenclator system is correctly constructed and used, it is very hard (or impossible) to crack. ...
... In this section, we focus on the 78 cipher keys, which consist of at least two sub-cipher parts. The most commonly used symbol set in the cipher keys were numbers (with an increasing tendency through the centuries) (Megyesi et al., 2021). This also corresponds with our case, we identify 66 cipher keys from the 78 (84.615%), ...
Conference Paper
Full-text available
Nomenclator is a complex encryption system consisting of several different simpler encryption systems used together during the encryption. It is one of the main encryption systems used before the twentieth century. In some cases, there are large collections of historical ciphers preserved in archives. Those from a particular time period or geographic location are very valuable and can bring insights to the cipher design from a specific time/location. This paper provides the first detailed empirical analysis of historical cipher keys from the Thirty Years’War deposited in Hessisches Staatsarchiv Marburg. We describe a large variety of analyzed keys with a focus on those properties (poorly designed keys) that can decrease the security of the cipher. We further show that these properties alone do not imply a bad design, sometimes a combination of several properties is needed and at the same time a bad use of the encryption key.
... Encrypted manuscripts contain a wide range of symbols, especially those from Early Modern Times. An investigation of 700 historical cipher keys shows that the usage of digits, Latin characters, and graphic signs were evenly distributed in keys from the 15th and 16th centuries, as illustrated in Figure 1 (Megyesi et al., 2021). In fact, 30% of the symbols were graphic signs representing a large variety of symbols taken from symbol sets including not only the Zodiac or alchemical signs, but also various unknown, fancy symbols. ...
Conference Paper
Full-text available
Historical ciphers contain a wide range of symbols from various symbol sets. Identifying the cipher alphabet is a prerequisite before decryption can take place and is a time-consuming process. In this work we explore the use of image processing for identifying the underlying alphabet in cipher images, and to compare alphabets between ciphers. The experiments show that ciphers with similar alphabets can be successfully discovered through clustering.
Conference Paper
Full-text available
In this article, we present encrypted documents and cipher keys from the 18th and 19th century, related to central-European aristocratic families Amade-Üchtritz, Esterházy, and Pálffy-Daun. In the first part of the article, we present an overview and analysis of the available documents from the archives with examples. We provide a short historical overview of the people related to the analyzed documents to provide a context for the research. In the second part of the article, we focus on the digital processing of these historical manuscripts. We developed new tools based on machine learning that can automate the transcription of encrypted parts of the documents, which contain only digits as cipher text alphabet. Our digit detection and segmentation are based on YOLOv7. YOLOv7 provided good detection precision and was able to cope with problems like noisy paper background and areas where digits collided with the text from the reverse side of the paper.