ChapterPDF Available

An Overview of Text Steganalysis

Authors:

Abstract and Figures

With the rapid development of the Internet, network information security is progressively under menace. Text steganography is one of the key reasons to affect information content security. It aims to hide confidential information in text carriers in a concealment system. In case that text steganography is used by criminals, malicious information can easily be transmitted through the Internet without being discovered by a third party. In contrast, text steganalysis is an effective technique to solve this problem by detecting whether a text carrier contains secret information. This paper presents an overview of current text steganalysis methods starting from 2006. We discuss the basic text steganalysis model and compare the pros and cons of these algorithms, hoping to offer some perceptions and motivations for future research directions.KeywordsInformation securityText steganographyText steganalysis
Content may be subject to copyright.
An Overview of Text Steganalysis
Yu Yang ,LeiZha , Ziwei Zhang , and Juan Wen
Abstract With the rapid development of the Internet, network information security
is progressively under menace. Text steganography is one of the key reasons to affect
information content security. It aims to hide confidential information in text carriers in
a concealment system. In case that text steganography is used by criminals, malicious
information can easily be transmitted through the Internet without being discovered
by a third party. In contrast, text steganalysis is an effective technique to solve this
problem by detecting whether a text carrier contains secret information. This paper
presents an overview of current text steganalysis methods starting from 2006. We
discuss the basic text steganalysis model and compare the pros and cons of these
algorithms, hoping to offer some perceptions and motivations for future research
directions.
Keywords Information security ·Text steganography ·Text steganalysis
1 Introduction
With the speedy development of information technology, the unseal surroundings of
information transmission and sharing has been rapidly constructed. While providing
convenience for people’s lifestyle, it brings a series of safety risks. For example, the
network information is susceptible to spiteful assaults, illegal access, falsification,
plagiarism, etc. [1]. How to ensure the safety of multimedia data has become a
significant topic that needs to be resolved eagerly in the domain of information
security.
Steganography is the aesthetics and technology to conceal confidential infor-
mation into multimedia carriers. Modern steganography technology uses mankind
perception redundancy, statistical redundancy of multimedia data, and other charac-
teristics to hide secret information by a certain coding form or encryption in some
Y. Ya n g ·L. Zha ·Z. Zhang ·J. Wen (B)
College of Information and Electrical Engineering, China Agricultural University, Beijing
100083, China
e-mail: wenjuan@cau.edu.cn
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022
J. Yao et al. (eds.), The International Conference on Image, Vision and Intelligent Systems
(ICIVIS 2021), Lecture Notes in Electrical Engineering 813,
https://doi.org/10.1007/978-981- 16-6963- 7_82
933
934 Y. Yang et al.
public data carrier to embed confidential information. The cover used to hide clas-
sified message is generally multi-media files transmitted on the network, such as
videos, audio, images, text, etc. Among them, text has become an important cover
due to its fast transmission and convenient access. Particularly, with the rapid devel-
opment of natural language processing (NLP), text steganography has been greatly
developed and refined. Currently, text steganography has been extensively used in
confidential correspondence, copyright maintenance, content identification, etc.
Unlike text steganography, text steganalysis identifies whether a provided text
contains undisclosed communication and extracts the embedded secret information
when possible. In recent years, text steganalysis is becoming a vital investigation
topic on information security, as one of the effective ways to prevent criminals from
malicious use of text steganography technology for illegal activities. In addition,
it further ensures safe and covert communication, and has important applications
in military, intelligence, and government secret departments, such as detecting and
jamming enemy communication signals; it can effectively block information sources,
and conduct information reconnaissance and destruction on the enemy. Almost all
information embedding algorithms inevitably change the statistical characteristics of
the carrier. The core idea of steganalysis is to use a statistical machine learning algo-
rithm to model and detect the subtle differences caused by information embedding,
so as to identify the suspicious and stego text.
The key of text steganography and text steganalysis is shown in Fig. 1.
Owing to the significance of text steganalysis in Internet security, it is essential
to review and summarize the mainstream text steganalysis in recent years. This
Fig. 1 Basic model of text steganography and text steganalysis [2]
An Overview of Text Steganalysis 935
paper outlines the current state of research in text steganalysis starting from 2016.
Furthermore, we summarize, compare and analyse some of these algorithms.
Next, introduce the framework of this article. In Sect. 2, different types of text
steganalysis are reviewed. The techniques and concepts involved in each type of
text steganalysis are described in detail. In Sect. 3, a comparative analysis of the
techniques and approaches is made. Eventually, the conclusion is drawn in Sect. 4.
2 Classification of Text Steganalysis Algorithms
In this Section, the classification of text steganalysis, including targeted steganalysis
and blind steganalysis, will be introduced in detail.
2.1 Targeted Text Steganalysis
Targeted text steganalysis is a steganalysis means introduced to identify an espe-
cial text steganography algorithm. Scilicet, the detection algorithm knows which
text steganography method is used to embed the secret information. Thus, targeted
steganalysis is excel in detecting the specific text steganography algorithm. However,
they may fail exponentially when they facing other steganography algorithms.
The common statistical features used for targeted steganalysis include word-
initial distribution, alphabetic cases, contextual information, evolutionary features,
and synonym frequency.
Distribution of First Letters of Words [3].For the stego text generated by
context-free steganography, words occur randomly, and the probability of appearance
of words in each segment of the text lies only on the possibility in local region. In
contrast, in a natural text, words do not occur randomly, and the process of word
generation can be viewed as an nth-order Markov process [2]. That is, the probability
distribution of word initials in natural texts is very different from that of word initials
in context-free texts, as shown in Fig. 2.
Stego [4].Stego is a text steganography tool that uses dictionaries to transform
secret message into grammar-free text with a configuration similar to normal text for
steganographic communication. By studying the mechanism of Stego, the paper [4]
proposes a Stego-based text steganography analysis method. When the dictionary
words used for steganography start with all lowercase letters, the stego text can be
detected by the steganalysis method based on sign features. Otherwise, the stego text
will be detected by the steganalysis method based on statistical features.
Context Information. Article [5] introduces the concept of context clustering
to estimate the contextual fitness of a text and shows how to distinguish ordi-
nary text from a stego text by counting the contextual fitness values of the text.
The Substitution-based Linguistic Steganography (SLS) system replaces an original
element in the overwritten text with a replacement element in the same replacement
936 Y. Yang et al.
(a) Probability distribution of a natural text
(b) Probability distribution of a context-free text
Fig. 2 Distribution of probabilities of natural text and context-free text
set when performing message steganography. This substitution behaviour may result
in the new replacement element not fitting well to the original context. According
to this feature, the paper proposes a steganalysis scheme for SLS, and the specific
process is shown in Fig. 3. Following that, a text steganalysis method based on
synonym replacement is proposed based on this scheme. The average accuracy of
this text steganalysis approach is 98.86%.
Article [6] proposes a word embedding-based approach to detect secret informa-
tion in a text. The method uses a continuous Skip-gram model to symbolize synonyms
Fig. 3 The steganalysis direct at substitution-based text steganography [5]. SI: Substitution
Information; CI: Context Information; λ: Context Maximum Rate; θ: Context Maximum Deviation
An Overview of Text Steganalysis 937
and their contextual words as word embeddings and encode the word semantics as a
low-dimensional dense vector; the embeddings of synonym counterparts are used to
effectively estimate the contextual adaptation and are weighted by the TF-IDF scores
of the contextual words. By analysing the distinctions in the contextual adaptation
scores of synonyms in the synonym set and the distinctions in the contextual adapt
values of synonyms in the cover text and the stego text, extract three features and
then input them to a support vector machine (SVM) classifier for steganalysis. The
proposed steganalysis technique enhances higher than 4.8%.
Evolution Algorithm [7]. Article [7] proposes an evolutionary detection steganal-
ysis system (EDSS) based on the evolutionary algorithm of the Java Genetic Algo-
rithm Package (JGAP). The results of the EDSS can be classified into good adaptation
and bad adaptation according to the adaptation value.
Synonym Frequency [8].Article [8] proposes a text steganalysis method based
on synonym substitution (SS). First, attribute pairs of synonyms are introduced to
represent their positions in the ordered synonym set and the size of synonyms. Due to
the substitution of synonyms, the quantity of high-frequency attribute pairs decreases
nevertheless the quantity of low-frequency attribute pairs increases. Ground on this,
the changes of statistical features of SS steganographic pairs of attributes are anal-
ysed theoretically, and secret information is detected using eigenvectors build on the
relative frequency differences of diverse attribute pairs. This paper also analyses the
impact of the synonym encoding strategy on feature vector extraction.
2.2 Blind Text Steganalysis
Blind steganalysis does not depend on a specific steganographic algorithm. As a
result, it meets a wider range of applications and requirements. Since embedding
secret information in normal text more or less changes the content of the text, intro-
ducing statistical difference in normal textual features. Therefore, the key step for
blind steganalysis is to model these subtle differences [9]. As Fig. 4shows, feature
extraction and text classification are two stages of blind text steganalysis. Next,
we will introduce the current mainstream blind steganography algorithms based on
different model types.
Text Steganalysis Based on AdaBoost [10]. It points out that the statistical
changes will be brought to the text after embedding secret information. Ground on
this, a general detecting algorithm ground on AdaBoost is put forward to extract text
statistical features and detect natural texts and stego texts.
AdaBoost can recognize all text embedding rates at 2 and 4%, and the recognition
rate is also 100% under other conditions. The experiment proves that AdaBoost
is almost unaffected by the embedding rate, reflecting the superior classification
performance of AdaBoost.
Text Steganalysis Based on Statistical Language Model [11]. In the article [11],
a text steganalysis algorithm based on a statistical language model is proposed to
classify a given text segment into natural text and stego text using its complexity. The
938 Y. Yang et al.
Fig. 4 Standard blind text
steganalysis phases
algorithm achieves 96.3% recognition accuracy for stego text segments and natural
text segments when the segment size is 5 K; the algorithm detects more than 93.9%
accuracy when the text size is 2 K. Not only that, but the experiment also tested the
NICETEXT system, TEXTO system, and the text generated based on the Markov
chain, and achieved superior results.
Text Steganalysis Based on SVM [12]. Article [12] proposes an SVM-based
hidden information detection algorithm. The SVM classifier is built by learning
and training the normal text and small-sample laden confidential text, and the better
generalization ability of the classifier is used to classify the unknown text. The model
has great generalization performance and the SVM classifier also has an excellent
classification effect.
Natural Frequency Zoned Word Distribution Analysis (NFZ-WDA) [13].
Translation-based steganography (TBS) is secure text steganography that encodes
secret information using the noise generated by the translation of natural language
text. The NFZ-WDA method proposed in article [13] aims to detect TBS without
using any TBS-related information. The single support in this method is a natural
frequency lexicon, a word frequency dictionary obtained from a large corpus. NFZ-
WDA uses frequency criteria (NFZs) to refine word distribution features. Since the
An Overview of Text Steganalysis 939
elaboration of word distribution features maintains more structural information, the
improved method can analyse the stego text generated by TBS more effectively. To
attest the validity of the NFZ-WDA method, the paper carries out experiments on
two-class and multi-class SVM classifiers. The results show that the accuracy of both
detections is comparatively high and increases with the increase of text size. Thus,
this text steganalysis method has good application prospects.
Text Steganalysis Based on Convolutional Neural Network (CNN). Article
[14] proposes a CNN-based model for text steganalysis that captures complex depen-
dencies and automatically learns the text feature representations. A decision strategy
for detecting long texts is also proposed, so as to boost the performance ulteriorly.
Firstly, the word embedding layer extracts the semantic and syntactic features of
words. Secondly, use different sized rectangular convolution kernels to learn sentence
features. The method is not only valid in exploring different types of text steganog-
raphy algorithms but also achieves excellent results in analysing texts of different
sizes.
Article [15] propounds a two-stage CNN-based method for text steganalysis. The
first stage is a sentence-level CNN, consisting of a convolutional layer containing
multiple convolutional kernels with disparate window sizes, a pooling layer, a fully
connected layer with Dropout, and a Soft-max output. In this way, the layer not only
handles variable-length sentences but also obtains two steganographic features per
sentence. The second stage is a text-level CNN that uses the output of the first stage
to ensure whether the detected text is steganographic or not. The average accuracy
of this approach is 82.245%.
Text Steganalysis Based on Recurrent Neural Networks (RNN) [16]. In auto-
matically generated stego text, the distortion of the conditional probability distri-
bution is caused by the embedding of hidden information. Based on this, paper
[16] proposes a text steganalysis algorithm that uses RNN to extract these feature
distribution differences and subsequently classify these features into cover text and
steganographic text. The experimental results show that the model not only has high
detection accuracy but also can use the subtle differences of text feature distributions
to estimate the amount of information embedded in the generated stego text.
Text Steganalysis Based on Word2vec [17]. A Word2vec-based approach to text
steganalysis is proposed in [17]. First, a multi-dimensional word vector containing
rich semantic information is trained for each word using the distributed word repre-
sentation tool Word2vec; then to calculate the suitability of the synonym in a partic-
ular context, the correlation between two words needs to be measured by the cosine
distance between the synonym and its contextual word vector, and obtain detection
features; finally, the extracted detection features are input into a Bayesian estima-
tion model for training and testing. The average detection accuracy of the approach
reaches 97.71% for stego texts with different embedding rates, which has a very good
measuring performance.
Text Steganalysis Based on Convolutional Sliding Windows (TS-CSW) [18].
Word association features in the stego text are distorted after inserting confidential
message, and the TS-CSW is proposed based on this changed feature, which uses
convolutional sliding windows (CSW) of multiple sizes to obtain relevant features of
940 Y. Yang et al.
the text. Samples collected from the T-Steg dataset are used in the paper to train and
test the proposed steganalysis approach. The model not only has great performance
in steganalysis but also can estimate the amount of secret information embedded in
the stego text.
Text Steganalysis Based on Long Short-Term Memory Networks (LSTM)
[19]. To enhance the low-level features in the feature vector and then better asso-
ciate with the low-level features to test the steganographic information in the gener-
ated text, paper [19] introduces two parts, including dense connectivity and feature
pyramid. It comes up with a text steganalysis approach ground on densely connected
long short-term memory networks with a feature pyramid. Firstly, map the words
in the text to a semantical space with hidden representations for better utilization of
semantical features; then the semantic features at different levels are extracted using
a stacked bidirectional long short-term memory networks (Bi-LSTM); finally, fuse
the semantic features at all levels and use the Sigmoid layer to resolve whether the
text is steganographic or not. This approach achieves a satisfying result.
Text Steganalysis Based on LSTM-CNN. In article [20], a hybrid text steganal-
ysis method (R-BILSTM-C) is proposed by combining the advantages of Bi-LSTM
and CNN. The method captures long-term semantic information of text using Bi-
LSTM and extracts local relationships between words using asymmetric convo-
lutional kernels of different sizes. The detection accuracy is extremely increased.
Furthermore, the paper visualizes the high-dimensional semantic feature space. The
approach is able to be effectually used to different text steganography algorithms.
Article [21] proposes an LSTM-CNN model for text steganalysis. Firstly, map
the words to semantical space to better utilize the semantical features of the text;
then LSTM and CNN are combined to obtain local contextual info and long-range
contextual info in a stego text. In addition, the text also employs an attention mecha-
nism to identify important cues in suspicious sentences. The model can accomplish
outstanding results in steganalysis tasks.
Text Steganalysis Based on Bi-LSTM-GNN [22]. A text steganalysis model
with two stages of high robustness is proposed. In the first phase, Bi-LSTM is used
to obtain feature information of all words in a sentence while holding a powerful
correlation. In the second phase, input multi-sentence vectors to graph neural network
(GNN), from which anomalous features between sentences are extracted. Moreover,
article [22] adds adversarial instances to the training set to increase the robustness
and generalization of the steganalysis model. The experiments reveal that the model
not has excellent robustness but is quite effective for steganographic text judgment.
Text Steganalysis Based on Capsule Network [23]. Capsule networks iden-
tify the subtle differences between stego texts and normal texts by extracting and
preserving the semantic features of the texts. Article [23] uses capsule networks to
detect whether the natural text contains secret information: the text is vectorized
using word2vec, and steganographic text generated by RNNs and variable-length
encoding is used as the experimental dataset to enhance the generalization of the
method. Experiments reveal that the method can reach a 92% correct detection rate
for stego text at a lower embedding rate (1–3 bits/word), which is about 7% better
An Overview of Text Steganalysis 941
than that of other neural networks; at a high embedding rate (4–5 bits/word), the
detection accuracy can reach more than 94%.
3 Evaluation
From the above overview of text steganalysis in the past decade, it can be seen that
the development of text steganalysis is consistently changing and improving, from
the early target steganalysis to the more versatile and effective blind steganalysis.
The advantages and disadvantages of five chosen target text steganalysis are listed
in Table 1. From Table 1, it is clear that the algorithms based on initial letter prob-
ability distribution, contextual information, and synonym frequency algorithms are
simple and efficient; among them, the contextual information approach is simpler
and easier to implement than the other two methods. The Stego-based steganalysis
algorithm, however, relies on detecting the case form of the initial letter of text words,
which is more restrictive.
As for blind text steganalysis, start from the CNN-based text steganalysis algo-
rithm in [14], it has continuous developed and improved. As can be seen from Sect. 2.2
of this paper, blind steganalysis have been getting better from the early use of machine
learning algorithms, such as SVM, to the use of deep learning algorithms such as
CNN, RNN, LSTM, and the combination of LSTMs, CNN, and GNN, which have
emerged in the last two years. The average detection accuracies of blind text steganal-
ysis for stego texts are listed in Table 2. Although deep learning enhances the property
of text steganalysis, the computation complexity and time cost of the algorithm are
also raising, which has become one of the issues to be solved in the future.
Tabl e 1 Comparative analysis of target text steganalysis [9]
No Years Methods Advantages Disadvantages
12006 Distribution of first letters
of words
High recall and low
error
Require much time
22006 Stego Simple Require distribution of
first letters
32011 Context information Simple and effective
variants
Lack of vocabulary
42014 Evolution algorithm Support the text-based
document
Complexity of
computation
52018 Synonym frequency High speed Complex
942 Y. Yang et al.
Tabl e 2 Average detection accuracies of blind text steganalysis
No Years Methods Accuracies
12007 Text steganalysis based on AdaBoost 100%
22009 Text steganalysis based on statistical language model Higher than 93.90%
32009 Text steganalysis based on SVM 89.80%
42011 NFZ-WDA Higher than 91.22%
52019 Text steganalysis based on CNN 82.25%
62019 Text steganalysis based on RNN Higher than 90%
72019 Text steganalysis based on Word2vec 97.71%
82020 TS-CSW Higher than 90%
92020 Text steganalysis based on LSTM 90.61%
10 2020 Text steganalysis based on LSTM-CNN 91.35%
11 2020 Text steganalysis based on Bi-LSTM-GNN Higher accuracy
12 2021 Text steganalysis based on capsule network 92% (1–3 bits/word)
94% (4–5 bits/word)
4 Conclusion
This paper reviews different types of text steganalysis algorithms since 2006,
including target steganalysis and blind steganalysis, and compares and analyses the
two categories, respectively. The study indicates that steganalysis methods do have
their own advantages and disadvantages. We believe this paper can supply motivation
and assistance for future steganalysis research.
As far as the current research trends are concerned, the development of NLP
has a significant impact on text steganography and text steganalysis, for most of
the latest algorithms are inspired by the advanced technology in NLP. The most
momentous issue of text steganalysis is to enhance the effectiveness and robustness
of steganalysis while simplifying model complexity. Therefore, in the near future,
based on clarifying the development of text steganalysis and its actual development,
we will face its main problems, closely combine the latest research results of NLP,
reinvent the text steganalysis method, and strive to break through the development
bottleneck mentioned in the previous section.
References
1. Ahvanooey, M., Li, Q., Hou, J., Rajput, A.R., Chen, Y.: Modern text hiding, text steganalysis,
and applications: a comparative analysis. Entropy 21, 355 (2019)
2. Chang, C., Clark, S.: Practical linguistic steganography using contextual synonym substitution
and a novel vertex coding method. Comput. Linguist. 40, 404–448 (2014)
3. Sui, X., Luo, H., Zhu, Z.: A steganalysis method based on the distribution of first letters of
words. IEEE Comput. Soc. 6, 369–372 (2006)
An Overview of Text Steganalysis 943
4. Wu, M., Jin, S.: Text steganalysis method—breaking steganographic utility of Stego. Computer
Eng. 32, 10–12 (2006)
5. Chen, Z., Huang, L., Miao, H., Yang, W., Meng, P.: Steganalysis against substitution-based
linguistic steganography based on context clusters. Comput. Electr. Eng. 37, 1071–1081 (2011)
6. Xiang, L., Yu, J., Yang, C., Zeng, D., Shen, X.: A word-embedding-based steganalysis method
for linguistic steganography via synonym substitution. In: 6th IEEE Access, pp. 64131–64141
(2018)
7. Puriwat, L.: A detection method for text steganalysis using evolution algorithm (EA) approach.
Adv. Comput. Sci., pp. 22–23 (2012)
8. Xiang, L., Sun, X., Luo, G., Xia, B.: Linguistic steganalysis using the features derived from
synonym frequency. Multimed. Tools Appl. 71, 1893–1911 (2014)
9. Lokman, S., Mustapha, A., Ismail, A., Din, R.: Analysis review on linguistic steganalysis.
Indones. J. Electr. Eng. Comput. Sci. 17, 950–956 (2019)
10. Sui, X., Shen, L., Yan, J., Zhu, Z.: Text steganalysis using AdaBoost. Tongxin Xuebao 28
(2007)
11. Meng, P., Hang, L., Yang, W., Chen, Z., Zheng, H.: Linguistic steganography detection
algorithm using statistical language model. Technol. Comput. Sci. 2, 540–543 (2009)
12. Xin, G., Hui, L., Zhong, Z.: Text steganalysis based on support vector machine. Comput. Eng.
35, 188–191 (2009)
13. Chen, Z., Huang, L., Meng, P., Yang, W.: Blind linguisticsteganalysis against translation based
steganography. Lect. Notes Comput. Sci. 6526, 251–265 (2011)
14. Wen, J., Zhou, X., Zhong, P., Xue, Y.: Convolutional neural network based text steganalysis.
IEEE Signal Process. Lett. 26, 460–464 (2019)
15. Xiang, L., Guo, G., Yu, J., Sheng, V., Yang, P.: A convolutional neural network-based linguistic
steganalysis for synonym substitution steganography. Math. Biosci. Eng. 17, 1041–1058 (2020)
16. Yang, Z., Wang, K., Li, J., Huang, Y., Zhang, Y.: TS-RNN: text steganalysis based on recurrent
neural networks. IEEE Signal Process. Lett. 26, 1743–1747 (2019)
17. Yu, J., Xiang, L., Zeng, D.: Natural language steganalysis method based on Word2vec. Comput.
Eng. 45, 309–314 (2019)
18. Yang, Z., Huang, Y., Zhang, Y.: TS-CSW: text steganalysis and hidden capacity estimation
based on convolutional sliding windows. Multimed. Tools Appl. 79, 18293–18316 (2020)
19. Li, H., Jin, S.: Text steganalysis based on capsule network with dynamic routing. IETE Tech.
Rev. 38, 72–81 (2021)
20. Yang, H.: Linguistic steganalysis via densely connected LSTM with feature Pyramid. 2020
Assoc. Comput. Mach. 20, 5–10 (2020)
21. Niu, Y., Wen, J., Zhong, P., Xue, Y.: A hybrid R-BILSTM-C neural network based text
steganalysis. IEEE Signal Process. Lett. 26, 1907–1911 (2019)
22. Bao, Y., Yang, H., Yang, Z., Liu, S., Huang, Y.: Text steganalysis with attentional L STM-CNN.
In: 5th Int. Conf. Comput. Commun. Syst., pp. 138–142 (2020)
23. Li, E., Fu, Z., Chen, S., Chen, J.: A two-stage highly robust text steganalysis model. J. Cyber
Secur. 2, 183–190 (2020)
... Most of the previously used methods take the pixel LSB as the main target for the embedding and then make a specific change to suit the output. In our proposed method, we will study most of the attacks carried out by the hacker or the intruder and try to bypass the weaknesses and increase the gaps from which the stego image can be as innocent as possible [47]. Figure 3 shows the proposed method employs an eight-bit RGB cover image and uses a secret key while embedding. ...
Article
Full-text available
One of the highest priorities in the era of information technology is to achieve an accurate and effective system for hiding security data. One of the goals of steganography is imperceptability to intruder. So this paper work to increase the imperceptibility on image, which has weaknesses in previous studies, as well as to avoid statistical attacks such as chi-square. A method has been proposed that includes calculating the color contrasts in the homogeneous areas of the image and dividing them according to the color contrast and exploiting the data of pixels that have a high impact to embed on the two first and third bits of least significant bit (LSB) to increase the amount of embedded data, impact regions (IR) classify according to selected features extracted in advance by using the support vector machine (SVM) classifier. Work was done on standard images taken from a standard dataset (USC-SIPI) for two types of gray and color images. The results showed the worth of the proposed method through a high peak signal to noise ratio (PSNR) that reached 89.5 dB due to the distribution of data on pixels according to the proposed method
... One of the common methods to protect sensitive information against information destruction and attacks is steganography [1]. Steganography approach is the process of embedding a message in cover files such as voice, image, text, and video so that no significant visual damage is observed in the cover multimedia [46,49]. The main focus of the steganography algorithms is to achieve a high level of resistance versus attack. ...
Article
Full-text available
In this paper, a steganalysis algorithm is proposed based on Modified Graph Clustering Based Ant Colony Optimization (MGCACO) feature selection and Random Forest classifier. First, different features related to the steganalysis problem are extracted from each image, and then an optimal set of the extracted features is selected by using the MGCACO feature selection algorithm, and finally a trained classifier used to separate the clean images from the steganography images. Our proposed algorithm is compared with four steganography algorithms including least significant bit matching (LSB), highly undetectable steganography (HUGO), wavelet obtained weights (WOW) and spatial-universal relative wavelet distortion (S_UNIWARD) with different embedding rates such as 0.1, 0.2, 0.3 and 0.4. Moreover, as a new study, the types of steganography algorithms are identified by using the proposed algorithm. The results of the proposed algorithm show that our approach can distinguish between clean and steganography images acceptably and, in addition, this algorithm can detect the type of steganography algorithm with an average accuracy of 90%.
Conference Paper
Full-text available
The Most multimedia files, especially those containing private information are images. Since multimedia transmission takes place on public communication channels, it is more vulnerable to a wide range of threats as the internet community evolves. Every day, thousands of people upload and download millions of multimedia files. Steganography, the practice of concealing information inside another data stream so that only intended recipients can access it, is an alluring option for protecting the privacy of data transmissions. In this paper will be presented Different techniques for embedding and extracting a multimedia file, The most important methods that work on for improving secret data steganography that utilizes both steganographic and ciphering methods to create a highly secure system, making the data unreadable to hackers.
Article
Full-text available
With the rapid development of natural language processing (NLP) technology in the past few years, the automatic steganographic texts generation methods have been greatly developed. Benefiting from the powerful feature extraction and expression capabilities of neural networks, these methods can generate steganographic texts with both relatively high concealment and high hidden capacity at the same time. For these steganographic methods, previous steganalysis models show unsatisfactory detection performance, which remains an unsolved problem and poses a great threat to the security of cyberspace. In this paper, we first collect a large text steganalysis (T-Steg) dataset, which contains a total number of 396,000 texts with various embedding rates under various formats. We analyze that there are three kinds of word correlation patterns in texts. Then we propose a new text steganalysis model based on convolutional sliding windows (TS-CSW), which use convolutional sliding windows (CSW) with multiple sizes to extract those correlation features. We observed that these word correlation features in the generated steganographic texts would be distorted after being embedded with secret information. These subtle changes of correlation feature distribution could then be used for text steganalysis. We use the samples collected in T-Steg dataset to train and test the proposed steganalysis method. Experimental results show that the proposed model can not only achieve a high steganalysis performance, but can even estimate the amount of secret information embedded in the generated steganographic texts, which shows a state-of-the-art performance.
Article
Full-text available
span>Steganography and steganalysis are essential topics for hiding information. Steganography is a technique of conceal secret messages by transmitting data through different domains. Its objective is to avoid discovery of secret messages. Steganalysis, meanwhile, is a method for locating the secret messages contained in the stego text. The objective of steganalysis is to find concealed data and to break the security of its domains. Steganalysis can be categorized into two types: targeted steganalysis and blind steganalysis. Steganography and steganalysis both have domains that are split into natural, also known as linguistic and digital media. There are three kinds of digital media which are picture, video and audio. The aim of this paper is to provide a survey on different linguistic steganalysis techniques used to find secret messages. This paper also highlighted two type of steganalysis method that are used in research and real practice. The discussion include findings on the most recent work on linguistic steganalysis techniques. This review hoped to help future research for improving and enhancing steganalytic capabilities.</span
Article
Full-text available
Abstract: Modern text hiding is an intelligent programming technique which embeds a secret message/watermark into a cover text message/file in a hidden way to protect confidential information. Recently, text hiding in the form of watermarking and steganography has found broad applications in, for instance, covert communication, copyright protection, content authentication, etc. In contrast to text hiding, text steganalysis is the process and science of identifying whether a given carrier text file/message has hidden information in it, and, if possible, extracting/detecting the embedded hidden information. This paper presents an overview of state of the art of the text hiding area, and provides a comparative analysis of recent techniques, especially those focused on marking structural characteristics of digital text message/file to hide secret bits. Also, we discuss different types of attacks and their effects to highlight the pros and cons of the recently introduced approaches. Finally, we recommend some directions and guidelines for future works.
Article
With the growth of natural language processing technology, coverless text steganography has attracted the attention of a large number of researchers. Most existing text steganalysis methods are based on traditional neural network to extract and analyze the semantic features of automatically generated steganographic text. However, due to the limitation of traditional neural networks to preserve subtle features, these methods cannot obtain satisfactory results when detecting the differences between steganographic text with low embedding rate and natural text. This paper demonstrates that using a capsule network to detect whether the natural text contains secret information and gets robust and accurate performance. The capsule network extracts and preserves the sematic features of text, analyzes the subtle differences between steganographic text and natural text. To strengthen the generalization of the method, we choose word2vec to vectorize text and use steganographic text generated based on RNN and variable-length coding as the data set for experiments. Experimental results show that detection accuracy of our method can achieve 92% in steganographic text with the low embedding rate (1–3 bit/word), which is about 7% higher than that based on other neural networks; in high embedding rate (4–5 bit/word), the detection accuracy can reach more than 94%.
Article
In this paper, a linguistic steganalysis method based on two-level cascaded convolutional neural networks (CNNs) is proposed to improve the system's ability to detect stego texts, which are generated via synonym substitutions. The first-level network, sentence-level CNN, consists of one convolutional layer with multiple convolutional kernels in different window sizes, one pooling layer to deal with variable sentence lengths, and one fully connected layer with dropout as well as a softmax output, such that two final steganographic features are obtained for each sentence. The unmodified and modified sentences, along with their words, are represented in the form of pre-trained dense word embeddings, which serve as the input of the network. Sentence-level CNN provides the representation of a sentence, and can thus be utilized to predict whether a sentence is unmodified or has been modified by synonym substitutions. In the second level, a text-level CNN exploits the predicted representations of sentences obtained from the sentence-level CNN to determine whether the detected text is a stego text or cover text. Experimental results indicate that the proposed sentence-level CNN can effectively extract sentence features for sentence-level steganalysis tasks and reaches an average accuracy of 82.245%. Moreover, the proposed steganalysis method achieves greatly improved detection performance when distinguishing stego texts from cover texts.
Article
With the emergence of the generation-based steganography, the traditional text steganalysis methods show the unsatisfactory detection performance as the manually extracted features are simple and non-universal. The recently proposed deep learning-based text steganalysis methods can obtain the great detection accuracy by extracting the high-level features. In this paper, a hybrid text steganalysis method (R-BILSTM-C) is proposed through combining the advantages of Bidirectional Long Short Term Memory Recurrent Neural Network (Bi-LSTM) and Convolutional Neural Network (CNN). The proposed method can efficiently capture both local features and long-term semantic information from text to improve the detection accuracy. In the proposed method, the Bi-LSTM architecture is used to capture the long-term semantic information of texts. And the asymmetric convolution kernels with different sizes are applied to extract the local relationship between words. In addition, the high dimensional semantic feature space is visualized. Experimental results show that the proposed method adapts to the different steganographic algorithms efficiently, and achieves the comparable or superior detection performance for the various sentence lengths compared with other state-of-the-art text steganalysis methods.
Article
With the rapid development of natural language processing technologies, more and more text steganographic methods based on automatic text generation technology have appeared in recent years. These models use the powerful self-learning and feature extraction ability of the neural networks to learn the feature expression of massive normal texts. Then they can automatically generate dense steganographic texts which conform to such statistical distribution based on the learned statistical patterns. In this paper, we observe that the conditional probability distribution of each word in the automatically generated steganographic texts will be distorted after embedded with hidden information. We use Recurrent Neural Networks (RNNs) to extract these feature distribution differences and then classify those features into cover text and stego text categories. Experimental results show that the proposed model can achieve high detection accuracy. Besides, the proposed model can even make use of the subtle differences of the feature distribution of texts to estimate the amount of hidden information embedded in the generated steganographic text.