ChapterPDF Available

An Overview of Text Steganalysis

March 2022

March 2022

DOI:10.1007/978-981-16-6963-7_82

In book: The International Conference on Image, Vision and Intelligent Systems (ICIVIS 2021) (pp.933-943)

Authors:

Juan Wen

China Agricultural University

With the rapid development of the Internet, network information security is progressively under menace. Text steganography is one of the key reasons to affect information content security. It aims to hide confidential information in text carriers in a concealment system. In case that text steganography is used by criminals, malicious information can easily be transmitted through the Internet without being discovered by a third party. In contrast, text steganalysis is an effective technique to solve this problem by detecting whether a text carrier contains secret information. This paper presents an overview of current text steganalysis methods starting from 2006. We discuss the basic text steganalysis model and compare the pros and cons of these algorithms, hoping to offer some perceptions and motivations for future research directions.KeywordsInformation securityText steganographyText steganalysis

Standard blind text steganalysis phases

…

Comparative analysis of target text steganalysis [9]

…

Average detection accuracies of blind text steganalysis

…

Figures - uploaded by Juan Wen

Content may be subject to copyright.

Content uploaded by Juan Wen

Content may be subject to copyright.

An Overview of Text Steganalysis

Yu Yang ,LeiZha , Ziwei Zhang , and Juan Wen

Abstract With the rapid development of the Internet, network information security

is progressively under menace. Text steganography is one of the key reasons to affect

information content security. It aims to hide conﬁdential information in text carriers in

a concealment system. In case that text steganography is used by criminals, malicious

information can easily be transmitted through the Internet without being discovered

by a third party. In contrast, text steganalysis is an effective technique to solve this

problem by detecting whether a text carrier contains secret information. This paper

presents an overview of current text steganalysis methods starting from 2006. We

discuss the basic text steganalysis model and compare the pros and cons of these

algorithms, hoping to offer some perceptions and motivations for future research

directions.

Keywords Information security ·Text steganography ·Text steganalysis

1 Introduction

With the speedy development of information technology, the unseal surroundings of

information transmission and sharing has been rapidly constructed. While providing

convenience for people’s lifestyle, it brings a series of safety risks. For example, the

network information is susceptible to spiteful assaults, illegal access, falsiﬁcation,

plagiarism, etc. [1]. How to ensure the safety of multimedia data has become a

signiﬁcant topic that needs to be resolved eagerly in the domain of information

security.

Steganography is the aesthetics and technology to conceal conﬁdential infor-

mation into multimedia carriers. Modern steganography technology uses mankind

perception redundancy, statistical redundancy of multimedia data, and other charac-

teristics to hide secret information by a certain coding form or encryption in some

Y. Ya n g ·L. Zha ·Z. Zhang ·J. Wen (B)

College of Information and Electrical Engineering, China Agricultural University, Beijing

100083, China

e-mail: wenjuan@cau.edu.cn

J. Yao et al. (eds.), The International Conference on Image, Vision and Intelligent Systems

(ICIVIS 2021), Lecture Notes in Electrical Engineering 813,

https://doi.org/10.1007/978-981- 16-6963- 7_82

933

934 Y. Yang et al.

public data carrier to embed conﬁdential information. The cover used to hide clas-

siﬁed message is generally multi-media ﬁles transmitted on the network, such as

videos, audio, images, text, etc. Among them, text has become an important cover

due to its fast transmission and convenient access. Particularly, with the rapid devel-

opment of natural language processing (NLP), text steganography has been greatly

developed and reﬁned. Currently, text steganography has been extensively used in

conﬁdential correspondence, copyright maintenance, content identiﬁcation, etc.

Unlike text steganography, text steganalysis identiﬁes whether a provided text

contains undisclosed communication and extracts the embedded secret information

when possible. In recent years, text steganalysis is becoming a vital investigation

topic on information security, as one of the effective ways to prevent criminals from

malicious use of text steganography technology for illegal activities. In addition,

it further ensures safe and covert communication, and has important applications

in military, intelligence, and government secret departments, such as detecting and

jamming enemy communication signals; it can effectively block information sources,

and conduct information reconnaissance and destruction on the enemy. Almost all

information embedding algorithms inevitably change the statistical characteristics of

the carrier. The core idea of steganalysis is to use a statistical machine learning algo-

rithm to model and detect the subtle differences caused by information embedding,

so as to identify the suspicious and stego text.

The key of text steganography and text steganalysis is shown in Fig. 1.

Owing to the signiﬁcance of text steganalysis in Internet security, it is essential

to review and summarize the mainstream text steganalysis in recent years. This

Fig. 1 Basic model of text steganography and text steganalysis [2]

An Overview of Text Steganalysis 935

paper outlines the current state of research in text steganalysis starting from 2016.

Furthermore, we summarize, compare and analyse some of these algorithms.

Next, introduce the framework of this article. In Sect. 2, different types of text

steganalysis are reviewed. The techniques and concepts involved in each type of

text steganalysis are described in detail. In Sect. 3, a comparative analysis of the

techniques and approaches is made. Eventually, the conclusion is drawn in Sect. 4.

2 Classiﬁcation of Text Steganalysis Algorithms

In this Section, the classiﬁcation of text steganalysis, including targeted steganalysis

and blind steganalysis, will be introduced in detail.

2.1 Targeted Text Steganalysis

Targeted text steganalysis is a steganalysis means introduced to identify an espe-

cial text steganography algorithm. Scilicet, the detection algorithm knows which

text steganography method is used to embed the secret information. Thus, targeted

steganalysis is excel in detecting the speciﬁc text steganography algorithm. However,

they may fail exponentially when they facing other steganography algorithms.

The common statistical features used for targeted steganalysis include word-

initial distribution, alphabetic cases, contextual information, evolutionary features,

and synonym frequency.

Distribution of First Letters of Words [3].For the stego text generated by

context-free steganography, words occur randomly, and the probability of appearance

of words in each segment of the text lies only on the possibility in local region. In

contrast, in a natural text, words do not occur randomly, and the process of word

generation can be viewed as an nth-order Markov process [2]. That is, the probability

distribution of word initials in natural texts is very different from that of word initials

in context-free texts, as shown in Fig. 2.

Stego [4].Stego is a text steganography tool that uses dictionaries to transform

secret message into grammar-free text with a conﬁguration similar to normal text for

steganographic communication. By studying the mechanism of Stego, the paper [4]

proposes a Stego-based text steganography analysis method. When the dictionary

words used for steganography start with all lowercase letters, the stego text can be

detected by the steganalysis method based on sign features. Otherwise, the stego text

will be detected by the steganalysis method based on statistical features.

Context Information. Article [5] introduces the concept of context clustering

to estimate the contextual ﬁtness of a text and shows how to distinguish ordi-

nary text from a stego text by counting the contextual ﬁtness values of the text.

The Substitution-based Linguistic Steganography (SLS) system replaces an original

element in the overwritten text with a replacement element in the same replacement

936 Y. Yang et al.

(a) Probability distribution of a natural text

(b) Probability distribution of a context-free text

Fig. 2 Distribution of probabilities of natural text and context-free text

set when performing message steganography. This substitution behaviour may result

in the new replacement element not ﬁtting well to the original context. According

to this feature, the paper proposes a steganalysis scheme for SLS, and the speciﬁc

process is shown in Fig. 3. Following that, a text steganalysis method based on

synonym replacement is proposed based on this scheme. The average accuracy of

this text steganalysis approach is 98.86%.

Article [6] proposes a word embedding-based approach to detect secret informa-

tion in a text. The method uses a continuous Skip-gram model to symbolize synonyms

Fig. 3 The steganalysis direct at substitution-based text steganography [5]. SI: Substitution

Information; CI: Context Information; λ: Context Maximum Rate; θ: Context Maximum Deviation

An Overview of Text Steganalysis 937

and their contextual words as word embeddings and encode the word semantics as a

low-dimensional dense vector; the embeddings of synonym counterparts are used to

effectively estimate the contextual adaptation and are weighted by the TF-IDF scores

of the contextual words. By analysing the distinctions in the contextual adaptation

scores of synonyms in the synonym set and the distinctions in the contextual adapt

values of synonyms in the cover text and the stego text, extract three features and

then input them to a support vector machine (SVM) classiﬁer for steganalysis. The

proposed steganalysis technique enhances higher than 4.8%.

Evolution Algorithm [7]. Article [7] proposes an evolutionary detection steganal-

ysis system (EDSS) based on the evolutionary algorithm of the Java Genetic Algo-

rithm Package (JGAP). The results of the EDSS can be classiﬁed into good adaptation

and bad adaptation according to the adaptation value.

Synonym Frequency [8].Article [8] proposes a text steganalysis method based

on synonym substitution (SS). First, attribute pairs of synonyms are introduced to

represent their positions in the ordered synonym set and the size of synonyms. Due to

the substitution of synonyms, the quantity of high-frequency attribute pairs decreases

nevertheless the quantity of low-frequency attribute pairs increases. Ground on this,

the changes of statistical features of SS steganographic pairs of attributes are anal-

ysed theoretically, and secret information is detected using eigenvectors build on the

relative frequency differences of diverse attribute pairs. This paper also analyses the

impact of the synonym encoding strategy on feature vector extraction.

2.2 Blind Text Steganalysis

Blind steganalysis does not depend on a speciﬁc steganographic algorithm. As a

result, it meets a wider range of applications and requirements. Since embedding

secret information in normal text more or less changes the content of the text, intro-

ducing statistical difference in normal textual features. Therefore, the key step for

blind steganalysis is to model these subtle differences [9]. As Fig. 4shows, feature

extraction and text classiﬁcation are two stages of blind text steganalysis. Next,

we will introduce the current mainstream blind steganography algorithms based on

different model types.

Text Steganalysis Based on AdaBoost [10]. It points out that the statistical

changes will be brought to the text after embedding secret information. Ground on

this, a general detecting algorithm ground on AdaBoost is put forward to extract text

statistical features and detect natural texts and stego texts.

AdaBoost can recognize all text embedding rates at 2 and 4%, and the recognition

rate is also 100% under other conditions. The experiment proves that AdaBoost

is almost unaffected by the embedding rate, reﬂecting the superior classiﬁcation

performance of AdaBoost.

Text Steganalysis Based on Statistical Language Model [11]. In the article [11],

a text steganalysis algorithm based on a statistical language model is proposed to

classify a given text segment into natural text and stego text using its complexity. The

938 Y. Yang et al.

Fig. 4 Standard blind text

steganalysis phases

algorithm achieves 96.3% recognition accuracy for stego text segments and natural

text segments when the segment size is 5 K; the algorithm detects more than 93.9%

accuracy when the text size is 2 K. Not only that, but the experiment also tested the

NICETEXT system, TEXTO system, and the text generated based on the Markov

chain, and achieved superior results.

Text Steganalysis Based on SVM [12]. Article [12] proposes an SVM-based

hidden information detection algorithm. The SVM classiﬁer is built by learning

and training the normal text and small-sample laden conﬁdential text, and the better

generalization ability of the classiﬁer is used to classify the unknown text. The model

has great generalization performance and the SVM classiﬁer also has an excellent

classiﬁcation effect.

Natural Frequency Zoned Word Distribution Analysis (NFZ-WDA) [13].

Translation-based steganography (TBS) is secure text steganography that encodes

secret information using the noise generated by the translation of natural language

text. The NFZ-WDA method proposed in article [13] aims to detect TBS without

using any TBS-related information. The single support in this method is a natural

frequency lexicon, a word frequency dictionary obtained from a large corpus. NFZ-

WDA uses frequency criteria (NFZs) to reﬁne word distribution features. Since the

An Overview of Text Steganalysis 939

elaboration of word distribution features maintains more structural information, the

improved method can analyse the stego text generated by TBS more effectively. To

attest the validity of the NFZ-WDA method, the paper carries out experiments on

two-class and multi-class SVM classiﬁers. The results show that the accuracy of both

detections is comparatively high and increases with the increase of text size. Thus,

this text steganalysis method has good application prospects.

Text Steganalysis Based on Convolutional Neural Network (CNN). Article

[14] proposes a CNN-based model for text steganalysis that captures complex depen-

dencies and automatically learns the text feature representations. A decision strategy

for detecting long texts is also proposed, so as to boost the performance ulteriorly.

Firstly, the word embedding layer extracts the semantic and syntactic features of

words. Secondly, use different sized rectangular convolution kernels to learn sentence

features. The method is not only valid in exploring different types of text steganog-

raphy algorithms but also achieves excellent results in analysing texts of different

sizes.

Article [15] propounds a two-stage CNN-based method for text steganalysis. The

ﬁrst stage is a sentence-level CNN, consisting of a convolutional layer containing

multiple convolutional kernels with disparate window sizes, a pooling layer, a fully

connected layer with Dropout, and a Soft-max output. In this way, the layer not only

handles variable-length sentences but also obtains two steganographic features per

sentence. The second stage is a text-level CNN that uses the output of the ﬁrst stage

to ensure whether the detected text is steganographic or not. The average accuracy

of this approach is 82.245%.

Text Steganalysis Based on Recurrent Neural Networks (RNN) [16]. In auto-

matically generated stego text, the distortion of the conditional probability distri-

bution is caused by the embedding of hidden information. Based on this, paper

[16] proposes a text steganalysis algorithm that uses RNN to extract these feature

distribution differences and subsequently classify these features into cover text and

steganographic text. The experimental results show that the model not only has high

detection accuracy but also can use the subtle differences of text feature distributions

to estimate the amount of information embedded in the generated stego text.

Text Steganalysis Based on Word2vec [17]. A Word2vec-based approach to text

steganalysis is proposed in [17]. First, a multi-dimensional word vector containing

rich semantic information is trained for each word using the distributed word repre-

sentation tool Word2vec; then to calculate the suitability of the synonym in a partic-

ular context, the correlation between two words needs to be measured by the cosine

distance between the synonym and its contextual word vector, and obtain detection

features; ﬁnally, the extracted detection features are input into a Bayesian estima-

tion model for training and testing. The average detection accuracy of the approach

reaches 97.71% for stego texts with different embedding rates, which has a very good

measuring performance.

Text Steganalysis Based on Convolutional Sliding Windows (TS-CSW) [18].

Word association features in the stego text are distorted after inserting conﬁdential

message, and the TS-CSW is proposed based on this changed feature, which uses

convolutional sliding windows (CSW) of multiple sizes to obtain relevant features of

940 Y. Yang et al.

the text. Samples collected from the T-Steg dataset are used in the paper to train and

test the proposed steganalysis approach. The model not only has great performance

in steganalysis but also can estimate the amount of secret information embedded in

the stego text.

Text Steganalysis Based on Long Short-Term Memory Networks (LSTM)

[19]. To enhance the low-level features in the feature vector and then better asso-

ciate with the low-level features to test the steganographic information in the gener-

ated text, paper [19] introduces two parts, including dense connectivity and feature

pyramid. It comes up with a text steganalysis approach ground on densely connected

long short-term memory networks with a feature pyramid. Firstly, map the words

in the text to a semantical space with hidden representations for better utilization of

semantical features; then the semantic features at different levels are extracted using

a stacked bidirectional long short-term memory networks (Bi-LSTM); ﬁnally, fuse

the semantic features at all levels and use the Sigmoid layer to resolve whether the

text is steganographic or not. This approach achieves a satisfying result.

Text Steganalysis Based on LSTM-CNN. In article [20], a hybrid text steganal-

ysis method (R-BILSTM-C) is proposed by combining the advantages of Bi-LSTM

and CNN. The method captures long-term semantic information of text using Bi-

LSTM and extracts local relationships between words using asymmetric convo-

lutional kernels of different sizes. The detection accuracy is extremely increased.

Furthermore, the paper visualizes the high-dimensional semantic feature space. The

approach is able to be effectually used to different text steganography algorithms.

Article [21] proposes an LSTM-CNN model for text steganalysis. Firstly, map

the words to semantical space to better utilize the semantical features of the text;

then LSTM and CNN are combined to obtain local contextual info and long-range

contextual info in a stego text. In addition, the text also employs an attention mecha-

nism to identify important cues in suspicious sentences. The model can accomplish

outstanding results in steganalysis tasks.

Text Steganalysis Based on Bi-LSTM-GNN [22]. A text steganalysis model

with two stages of high robustness is proposed. In the ﬁrst phase, Bi-LSTM is used

to obtain feature information of all words in a sentence while holding a powerful

correlation. In the second phase, input multi-sentence vectors to graph neural network

(GNN), from which anomalous features between sentences are extracted. Moreover,

article [22] adds adversarial instances to the training set to increase the robustness

and generalization of the steganalysis model. The experiments reveal that the model

not has excellent robustness but is quite effective for steganographic text judgment.

Text Steganalysis Based on Capsule Network [23]. Capsule networks iden-

tify the subtle differences between stego texts and normal texts by extracting and

preserving the semantic features of the texts. Article [23] uses capsule networks to

detect whether the natural text contains secret information: the text is vectorized

using word2vec, and steganographic text generated by RNNs and variable-length

encoding is used as the experimental dataset to enhance the generalization of the

method. Experiments reveal that the method can reach a 92% correct detection rate

for stego text at a lower embedding rate (1–3 bits/word), which is about 7% better

An Overview of Text Steganalysis 941

than that of other neural networks; at a high embedding rate (4–5 bits/word), the

detection accuracy can reach more than 94%.

3 Evaluation

From the above overview of text steganalysis in the past decade, it can be seen that

the development of text steganalysis is consistently changing and improving, from

the early target steganalysis to the more versatile and effective blind steganalysis.

The advantages and disadvantages of ﬁve chosen target text steganalysis are listed

in Table 1. From Table 1, it is clear that the algorithms based on initial letter prob-

ability distribution, contextual information, and synonym frequency algorithms are

simple and efﬁcient; among them, the contextual information approach is simpler

and easier to implement than the other two methods. The Stego-based steganalysis

algorithm, however, relies on detecting the case form of the initial letter of text words,

which is more restrictive.

As for blind text steganalysis, start from the CNN-based text steganalysis algo-

rithm in [14], it has continuous developed and improved. As can be seen from Sect. 2.2

of this paper, blind steganalysis have been getting better from the early use of machine

learning algorithms, such as SVM, to the use of deep learning algorithms such as

CNN, RNN, LSTM, and the combination of LSTMs, CNN, and GNN, which have

emerged in the last two years. The average detection accuracies of blind text steganal-

ysis for stego texts are listed in Table 2. Although deep learning enhances the property

of text steganalysis, the computation complexity and time cost of the algorithm are

also raising, which has become one of the issues to be solved in the future.

Tabl e 1 Comparative analysis of target text steganalysis [9]

No Years Methods Advantages Disadvantages

12006 Distribution of ﬁrst letters

of words

High recall and low

error

Require much time

22006 Stego Simple Require distribution of

ﬁrst letters

32011 Context information Simple and effective

variants

Lack of vocabulary

42014 Evolution algorithm Support the text-based

document

Complexity of

computation

52018 Synonym frequency High speed Complex

942 Y. Yang et al.

Tabl e 2 Average detection accuracies of blind text steganalysis

No Years Methods Accuracies

12007 Text steganalysis based on AdaBoost 100%

22009 Text steganalysis based on statistical language model Higher than 93.90%

32009 Text steganalysis based on SVM 89.80%

42011 NFZ-WDA Higher than 91.22%

52019 Text steganalysis based on CNN 82.25%

62019 Text steganalysis based on RNN Higher than 90%

72019 Text steganalysis based on Word2vec 97.71%

82020 TS-CSW Higher than 90%

92020 Text steganalysis based on LSTM 90.61%

10 2020 Text steganalysis based on LSTM-CNN 91.35%

11 2020 Text steganalysis based on Bi-LSTM-GNN Higher accuracy

12 2021 Text steganalysis based on capsule network 92% (1–3 bits/word)

94% (4–5 bits/word)

4 Conclusion

This paper reviews different types of text steganalysis algorithms since 2006,

including target steganalysis and blind steganalysis, and compares and analyses the

two categories, respectively. The study indicates that steganalysis methods do have

their own advantages and disadvantages. We believe this paper can supply motivation

and assistance for future steganalysis research.

As far as the current research trends are concerned, the development of NLP

has a signiﬁcant impact on text steganography and text steganalysis, for most of

the latest algorithms are inspired by the advanced technology in NLP. The most

momentous issue of text steganalysis is to enhance the effectiveness and robustness

of steganalysis while simplifying model complexity. Therefore, in the near future,

based on clarifying the development of text steganalysis and its actual development,

we will face its main problems, closely combine the latest research results of NLP,

reinvent the text steganalysis method, and strive to break through the development

bottleneck mentioned in the previous section.

References

1. Ahvanooey, M., Li, Q., Hou, J., Rajput, A.R., Chen, Y.: Modern text hiding, text steganalysis,

and applications: a comparative analysis. Entropy 21, 355 (2019)

2. Chang, C., Clark, S.: Practical linguistic steganography using contextual synonym substitution

and a novel vertex coding method. Comput. Linguist. 40, 404–448 (2014)

3. Sui, X., Luo, H., Zhu, Z.: A steganalysis method based on the distribution of ﬁrst letters of

words. IEEE Comput. Soc. 6, 369–372 (2006)

An Overview of Text Steganalysis 943

4. Wu, M., Jin, S.: Text steganalysis method—breaking steganographic utility of Stego. Computer

Eng. 32, 10–12 (2006)

5. Chen, Z., Huang, L., Miao, H., Yang, W., Meng, P.: Steganalysis against substitution-based

linguistic steganography based on context clusters. Comput. Electr. Eng. 37, 1071–1081 (2011)

6. Xiang, L., Yu, J., Yang, C., Zeng, D., Shen, X.: A word-embedding-based steganalysis method

for linguistic steganography via synonym substitution. In: 6th IEEE Access, pp. 64131–64141

(2018)

7. Puriwat, L.: A detection method for text steganalysis using evolution algorithm (EA) approach.

Adv. Comput. Sci., pp. 22–23 (2012)

8. Xiang, L., Sun, X., Luo, G., Xia, B.: Linguistic steganalysis using the features derived from

synonym frequency. Multimed. Tools Appl. 71, 1893–1911 (2014)

9. Lokman, S., Mustapha, A., Ismail, A., Din, R.: Analysis review on linguistic steganalysis.

Indones. J. Electr. Eng. Comput. Sci. 17, 950–956 (2019)

10. Sui, X., Shen, L., Yan, J., Zhu, Z.: Text steganalysis using AdaBoost. Tongxin Xuebao 28

(2007)

11. Meng, P., Hang, L., Yang, W., Chen, Z., Zheng, H.: Linguistic steganography detection

algorithm using statistical language model. Technol. Comput. Sci. 2, 540–543 (2009)

12. Xin, G., Hui, L., Zhong, Z.: Text steganalysis based on support vector machine. Comput. Eng.

35, 188–191 (2009)

13. Chen, Z., Huang, L., Meng, P., Yang, W.: Blind linguisticsteganalysis against translation based

steganography. Lect. Notes Comput. Sci. 6526, 251–265 (2011)

14. Wen, J., Zhou, X., Zhong, P., Xue, Y.: Convolutional neural network based text steganalysis.

IEEE Signal Process. Lett. 26, 460–464 (2019)

15. Xiang, L., Guo, G., Yu, J., Sheng, V., Yang, P.: A convolutional neural network-based linguistic

steganalysis for synonym substitution steganography. Math. Biosci. Eng. 17, 1041–1058 (2020)

16. Yang, Z., Wang, K., Li, J., Huang, Y., Zhang, Y.: TS-RNN: text steganalysis based on recurrent

neural networks. IEEE Signal Process. Lett. 26, 1743–1747 (2019)

17. Yu, J., Xiang, L., Zeng, D.: Natural language steganalysis method based on Word2vec. Comput.

Eng. 45, 309–314 (2019)

18. Yang, Z., Huang, Y., Zhang, Y.: TS-CSW: text steganalysis and hidden capacity estimation

based on convolutional sliding windows. Multimed. Tools Appl. 79, 18293–18316 (2020)

19. Li, H., Jin, S.: Text steganalysis based on capsule network with dynamic routing. IETE Tech.

Rev. 38, 72–81 (2021)

20. Yang, H.: Linguistic steganalysis via densely connected LSTM with feature Pyramid. 2020

Assoc. Comput. Mach. 20, 5–10 (2020)

21. Niu, Y., Wen, J., Zhong, P., Xue, Y.: A hybrid R-BILSTM-C neural network based text

steganalysis. IEEE Signal Process. Lett. 26, 1907–1911 (2019)

22. Bao, Y., Yang, H., Yang, Z., Liu, S., Huang, Y.: Text steganalysis with attentional L STM-CNN.

In: 5th Int. Conf. Comput. Commun. Syst., pp. 138–142 (2020)

23. Li, E., Fu, Z., Chen, S., Chen, J.: A two-stage highly robust text steganalysis model. J. Cyber

Secur. 2, 183–190 (2020)

Hiding text using the least significant bit technique to improve cover image in the steganography system

Article

Full-text available

Dec 2022

One of the highest priorities in the era of information technology is to achieve an accurate and effective system for hiding security data. One of the goals of steganography is imperceptability to intruder. So this paper work to increase the imperceptibility on image, which has weaknesses in previous studies, as well as to avoid statistical attacks such as chi-square. A method has been proposed that includes calculating the color contrasts in the homogeneous areas of the image and dividing them according to the color contrast and exploiting the data of pixels that have a high impact to embed on the two first and third bits of least significant bit (LSB) to increase the amount of embedded data, impact regions (IR) classify according to selected features extracted in advance by using the support vector machine (SVM) classifier. Work was done on standard images taken from a standard dataset (USC-SIPI) for two types of gray and color images. The results showed the worth of the proposed method through a high peak signal to noise ratio (PSNR) that reached 89.5 dB due to the distribution of data on pixels according to the proposed method

Image steganalysis using modified graph clustering based ant colony optimization and Random Forest

Article

Full-text available

Aug 2022
MULTIMED TOOLS APPL

In this paper, a steganalysis algorithm is proposed based on Modified Graph Clustering Based Ant Colony Optimization (MGCACO) feature selection and Random Forest classifier. First, different features related to the steganalysis problem are extracted from each image, and then an optimal set of the extracted features is selected by using the MGCACO feature selection algorithm, and finally a trained classifier used to separate the clean images from the steganography images. Our proposed algorithm is compared with four steganography algorithms including least significant bit matching (LSB), highly undetectable steganography (HUGO), wavelet obtained weights (WOW) and spatial-universal relative wavelet distortion (S_UNIWARD) with different embedding rates such as 0.1, 0.2, 0.3 and 0.4. Moreover, as a new study, the types of steganography algorithms are identified by using the proposed algorithm. The results of the proposed algorithm show that our approach can distinguish between clean and steganography images acceptably and, in addition, this algorithm can detect the type of steganography algorithm with an average accuracy of 90%.

Text Steganography Methods and their Influence in Malware: A Comprehensive Overview and Evaluation

Conference Paper

Jun 2024

A reversible natural language watermarking for sensitive information protection

Article

May 2024
INFORM PROCESS MANAG

Deep learning for steganalysis of diverse data types: A review of methods, taxonomy, challenges and future directions

Article

Mar 2024
NEUROCOMPUTING

Information Security: A Review on Steganography with Cryptography for Genetic Algorithm (GA) Transaction

Conference Paper

Full-text available

Oct 2023

The Most multimedia files, especially those containing private information are images. Since multimedia transmission takes place on public communication channels, it is more vulnerable to a wide range of threats as the internet community evolves. Every day, thousands of people upload and download millions of multimedia files. Steganography, the practice of concealing information inside another data stream so that only intended recipients can access it, is an alluring option for protecting the privacy of data transmissions. In this paper will be presented Different techniques for embedding and extracting a multimedia file, The most important methods that work on for improving secret data steganography that utilizes both steganographic and ciphering methods to create a highly secure system, making the data unreadable to hackers.

A Two-Stage Highly Robust Text Steganalysis Model

Article

Full-text available

Jan 2020

TS-CSW: text steganalysis and hidden capacity estimation based on convolutional sliding windows

Article

Full-text available

Jul 2020
MULTIMED TOOLS APPL

With the rapid development of natural language processing (NLP) technology in the past few years, the automatic steganographic texts generation methods have been greatly developed. Benefiting from the powerful feature extraction and expression capabilities of neural networks, these methods can generate steganographic texts with both relatively high concealment and high hidden capacity at the same time. For these steganographic methods, previous steganalysis models show unsatisfactory detection performance, which remains an unsolved problem and poses a great threat to the security of cyberspace. In this paper, we first collect a large text steganalysis (T-Steg) dataset, which contains a total number of 396,000 texts with various embedding rates under various formats. We analyze that there are three kinds of word correlation patterns in texts. Then we propose a new text steganalysis model based on convolutional sliding windows (TS-CSW), which use convolutional sliding windows (CSW) with multiple sizes to extract those correlation features. We observed that these word correlation features in the generated steganographic texts would be distorted after being embedded with secret information. These subtle changes of correlation feature distribution could then be used for text steganalysis. We use the samples collected in T-Steg dataset to train and test the proposed steganalysis method. Experimental results show that the proposed model can not only achieve a high steganalysis performance, but can even estimate the amount of secret information embedded in the generated steganographic texts, which shows a state-of-the-art performance.

Analysis review on linguistic steganalysis

Article

Full-text available

Feb 2020

span>Steganography and steganalysis are essential topics for hiding information. Steganography is a technique of conceal secret messages by transmitting data through different domains. Its objective is to avoid discovery of secret messages. Steganalysis, meanwhile, is a method for locating the secret messages contained in the stego text. The objective of steganalysis is to find concealed data and to break the security of its domains. Steganalysis can be categorized into two types: targeted steganalysis and blind steganalysis. Steganography and steganalysis both have domains that are split into natural, also known as linguistic and digital media. There are three kinds of digital media which are picture, video and audio. The aim of this paper is to provide a survey on different linguistic steganalysis techniques used to find secret messages. This paper also highlighted two type of steganalysis method that are used in research and real practice. The discussion include findings on the most recent work on linguistic steganalysis techniques. This review hoped to help future research for improving and enhancing steganalytic capabilities.</span

Modern Text Hiding, Text Steganalysis, and Applications: A Comparative Analysis

Article

Full-text available

Apr 2019
Entropy

Abstract: Modern text hiding is an intelligent programming technique which embeds a secret message/watermark into a cover text message/file in a hidden way to protect confidential information. Recently, text hiding in the form of watermarking and steganography has found broad applications in, for instance, covert communication, copyright protection, content authentication, etc. In contrast to text hiding, text steganalysis is the process and science of identifying whether a given carrier text file/message has hidden information in it, and, if possible, extracting/detecting the embedded hidden information. This paper presents an overview of state of the art of the text hiding area, and provides a comparative analysis of recent techniques, especially those focused on marking structural characteristics of digital text message/file to hide secret bits. Also, we discuss different types of attacks and their effects to highlight the pros and cons of the recently introduced approaches. Finally, we recommend some directions and guidelines for future works.

Text Steganalysis Based on Capsule Network with Dynamic Routing

Article

Jun 2020

With the growth of natural language processing technology, coverless text steganography has attracted the attention of a large number of researchers. Most existing text steganalysis methods are based on traditional neural network to extract and analyze the semantic features of automatically generated steganographic text. However, due to the limitation of traditional neural networks to preserve subtle features, these methods cannot obtain satisfactory results when detecting the differences between steganographic text with low embedding rate and natural text. This paper demonstrates that using a capsule network to detect whether the natural text contains secret information and gets robust and accurate performance. The capsule network extracts and preserves the sematic features of text, analyzes the subtle differences between steganographic text and natural text. To strengthen the generalization of the method, we choose word2vec to vectorize text and use steganographic text generated based on RNN and variable-length coding as the data set for experiments. Experimental results show that detection accuracy of our method can achieve 92% in steganographic text with the low embedding rate (1–3 bit/word), which is about 7% higher than that based on other neural networks; in high embedding rate (4–5 bit/word), the detection accuracy can reach more than 94%.

Linguistic Steganalysis via Densely Connected LSTM with Feature Pyramid

Conference Paper

Jun 2020

Text Steganalysis with Attentional L STM-CNN

Conference Paper

May 2020

A convolutional neural network-based linguistic steganalysis for synonym substitution steganography

Article

Jan 2020

In this paper, a linguistic steganalysis method based on two-level cascaded convolutional neural networks (CNNs) is proposed to improve the system's ability to detect stego texts, which are generated via synonym substitutions. The first-level network, sentence-level CNN, consists of one convolutional layer with multiple convolutional kernels in different window sizes, one pooling layer to deal with variable sentence lengths, and one fully connected layer with dropout as well as a softmax output, such that two final steganographic features are obtained for each sentence. The unmodified and modified sentences, along with their words, are represented in the form of pre-trained dense word embeddings, which serve as the input of the network. Sentence-level CNN provides the representation of a sentence, and can thus be utilized to predict whether a sentence is unmodified or has been modified by synonym substitutions. In the second level, a text-level CNN exploits the predicted representations of sentences obtained from the sentence-level CNN to determine whether the detected text is a stego text or cover text. Experimental results indicate that the proposed sentence-level CNN can effectively extract sentence features for sentence-level steganalysis tasks and reaches an average accuracy of 82.245%. Moreover, the proposed steganalysis method achieves greatly improved detection performance when distinguishing stego texts from cover texts.

A Hybrid R-BILSTM-C Neural Network Based Text Steganalysis

Article

Nov 2019

With the emergence of the generation-based steganography, the traditional text steganalysis methods show the unsatisfactory detection performance as the manually extracted features are simple and non-universal. The recently proposed deep learning-based text steganalysis methods can obtain the great detection accuracy by extracting the high-level features. In this paper, a hybrid text steganalysis method (R-BILSTM-C) is proposed through combining the advantages of Bidirectional Long Short Term Memory Recurrent Neural Network (Bi-LSTM) and Convolutional Neural Network (CNN). The proposed method can efficiently capture both local features and long-term semantic information from text to improve the detection accuracy. In the proposed method, the Bi-LSTM architecture is used to capture the long-term semantic information of texts. And the asymmetric convolution kernels with different sizes are applied to extract the local relationship between words. In addition, the high dimensional semantic feature space is visualized. Experimental results show that the proposed method adapts to the different steganographic algorithms efficiently, and achieves the comparable or superior detection performance for the various sentence lengths compared with other state-of-the-art text steganalysis methods.

TS-RNN: Text Steganalysis Based on Recurrent Neural Networks

Article

Jun 2019

With the rapid development of natural language processing technologies, more and more text steganographic methods based on automatic text generation technology have appeared in recent years. These models use the powerful self-learning and feature extraction ability of the neural networks to learn the feature expression of massive normal texts. Then they can automatically generate dense steganographic texts which conform to such statistical distribution based on the learned statistical patterns. In this paper, we observe that the conditional probability distribution of each word in the automatically generated steganographic texts will be distorted after embedded with hidden information. We use Recurrent Neural Networks (RNNs) to extract these feature distribution differences and then classify those features into cover text and stego text categories. Experimental results show that the proposed model can achieve high detection accuracy. Besides, the proposed model can even make use of the subtle differences of the feature distribution of texts to estimate the amount of hidden information embedded in the generated steganographic text.

An Overview of Text Steganalysis

Abstract and Figures

Recommended publications

An Invisible Communication for Secret Sharing against Transmission Error

DCT Based Robust Multi-bit Steganographic Algorithm

A General Linguistic Steganalysis Framework Using Multi-Task Learning

Detection of Generative Linguistic Steganography Based on Explicit and Latent Text Word Relation Min...

Few-shot Text Steganalysis Based on Attentional Meta-learner

Linguistic Steganalysis via Fusing Multi-granularity Attentional Text Features