Business and government operations generate large volumes of documents to be categorized through machine learning techniques before dissemination and storage. One prerequisite in such classification is to properly choose training documents. Active learning emerges as a technique to achieve better accuracy with fewer training documents by choosing data to learn and querying oracles for unknown labels. In practice, such oracles are usually human analysts who are likely to make mistakes or, in some cases, even intentionally introduce erroneous labels for malicious purposes. We propose a risk-factor based strategy to defend active-learning-based document classification against human mistakes or adversarial inputs. We show that the proposed strategy can substantially alleviate the damage caused by malicious labeling. Our experimental results demonstrate the effectiveness of our defense strategy in terms of maintaining accuracy against adversaries.
Lei Pi, Zhuo Lu, Yalin Sagduyu, Su Chen
University of Memphis, TN 38152. Emails: {lpi,schen4}
University of South Florida, Tampa FL 33620. Email:
Intelligent Automation Inc., Rockville, MD 20855, Email:
Daily routine operations in business and governments pro-
duce a large numbers of documents, which must be properly
categorized or labeled, then disseminated to authorized per-
sonnel and stored in appropriate places. For example, doc-
uments in government operations may be labeled as public
information or a classified level may be assigned according to
national security requirements. Machine learning techniques,
such as Naive Bayes classifier and Support Vector Machine
(SVM) [1], have been extensively used as a vital assistance
for automated d ocument classification [2, 3].
To facilitate processing training data sets, active learn-
ing [4] has been used to achieve better accuracy with smaller
training sets for document classification. The essential idea
behind active learning is to let the learning system choose
data to learn from and query an oracle for a label. In practice,
such an oracle is usually a human analyst who is tasked to
identify and classify given documents. For example, in gov-
ernment operations, security analysts are assigned to classify
any documents into a security classification level for proper
information control and dissemination.
On one hand, active learning can significantly reduce the
size of training documents that are essential to train an un-
derlying machine learning model [4, 5]. On the other hand,
however, it also introduces risks that could lead to less ac-
curate classification. Specifically, active learning usually in-
volves the inputs from human analysts who can sometimes
make mistakes. More severely, due to inside threats or ac-
count hacking, such inputs can even be malicious with intent
to sabotage the entire active learning process. Many poten-
tial vulnerabilities in active learning can make such attacks
possible [6]: 1) the attacker (i.e., a human analyst with mali-
cious intent) can fabricate less significant data but appealing
for the learner to choose; 2) the attacker can leverage existing
machine learning vulnerabilities inherited by active learning;
3) the attacker can provide incorrect results when the learner
queries for labels. Therefore, it should never be taken for
granted that the inputs from human analysts are always cor-
rect, and it is critical to make active learning resilient to erro-
neous inputs due to human errors or malicious attacks.
In this paper, we aim at designing a robust active learning
defense strategy. In particular, we focus on the scenario of
SVM-based active learning under a malicious attacker that
gives erroneous inputs during learning queries as SVM is an
extensively-used method in classification and active learning
[4,5, 7–9]. Our defense strategy is to design a risk factor based
mechanism to guide whether we should accept or reject the
input from active learning. By examining the distance of a
newly labeled document to the current separating hyperplane
of the SVM model, the mechanism will decide if it is too risky
to accept the input depending on whether the distance is larger
than a given threshold. Our method is shown to substantially
alleviate the damage caused by malicious attacks.
In this section, we briefly introduce SVM and active learning.
2.1. SVM and Active Learning
SVM is a widely-used classification method [1] to find a hy-
perplane that separates the training data into desirable sub-
sets with different categories/labels based on support vectors,
which are a set of instances from the training data closest to
the hyperplane. In SVM-based document classification, an in-
stance is a feature vector representing the counting of words
extracted from a document.
To perform accurate classification, SVM requires train-
ing based on a substantial number of instances with labels
already given as the ground truth. However, labeling many in-
stances for training a classifier could be cumbersome in prac-
tice. Hence, active learning [4,10] has been designed as an ad-
vanced process, in which only a subset of unlabeled instances
is chosen to be labeled and added to the training set.
Active learning involves two parties: the learner (that is
usually a machine to build an accurate classifier) and the ora-
cle (that is usually a human analyst in practice), and consists
of three components [4]: (f, q, X ), where fis a classifier
mapping a document into a label, Xis the training set, and
qis the query function, which chooses and returns the next
instance from all unlabeled instances and query the oracle for
the corresponding label. After each query, the learner updates
Xand returns a new classifier.
2.2. Adversarial Active Learning
Since active learning relies on oracles that are usually human
analyst in practice, it is subject to common security vulnera-
bilities and exposed potential risks associated with or due to
human analysts. A list of possible vulnerabilities were sum-
marized in [6] with focus on the query strategies, leaving the
risks due to human analysts less discussed.
As human analyst is an essential part in active learning,
we have to consider the active learning scenario in a security
sense that the inputs from human expert should not be trusted,
but carefully examined to ensure security. During document
classification, an analyst can maliciously label a document,
which can be in fact hard to detect. When there are a fairly
large number of malicious labels, the inaccuracy introduced
to the resulting classifier will become significant enough to
reduce or diminish the usability of an application. The work
in [11] proved that it is not even necessary for an adversary to
have a perfect knowledge of the classifier to launch such at-
tacks against active learning. It is imperative to at least allevi-
ate, if not completely eliminate, the damages due to malicious
labeling in active learning for document classification.
In this section, we first present the models and then describe
our design to protect active learning from malicious inputs.
To maintain simplicity without loss of generality, if a doc-
ument set Dcan be separated into two disjoint subsets D0and
D1, we say a document d D is labeled 0 if d D0, and say
dis labeled 1 if d D1.
3.1. Active Learning under Attacks
As aforementioned, the effectiveness of active learning relies
on the outside inputs that may be manipulated by an adver-
sary. Therefore, we focus on providing a defense strategy
to combat such an attack to protect active learning from ac-
cepting malicious inputs. Specifically, as Fig. 1 shows, we
consider an active learning process for document classifica-
tion including a learner (i.e., the machine that performs active
learning), a malicious human analyst that randomly gives ma-
licious inputs, and queries from the learner to human analysts.
Select document
label (li)
might be malicious
train classi
SVM Classi
Fig. 1. The scenario of active learning under attacks.
As shown in Fig. 1, in the i-th query, the learner already
has a labeled document set Di1, and sends a query qicon-
taining document dito the analyst who then gives a (poten-
tially wrong) label lito the learner. The problem is whether
the learner should accept the label to form a new labeled doc-
ument set Di=Di1 {di}, reject or even revert the label to
keep the original labeled document set Di=Di1.
3.2. Risk Factor based Defense Strategy
In what follows, we design a risk factor based defense strategy
to protect active learning. The intuition to model the risk is as
follows: in SVM, data close to a hyperplane means it is more
likely to be mis-classified; if an attacker has no knowledge or
access to the entire training data set, there is no way for the at-
tacker to know where exactly the hyperplane is; therefore, the
mislabeled data may have a larger distance to its hyperplane.
Consequently, if a document dicomes from the analyst
with a label li, we define the risk factor rifor this document
in active learning as
where max
iis the maximum distance between current sup-
port vectors to the separation hyperplane based upon the ex-
isting document set Di1, and a > 0is a constant threshold.
Then, our method works as follows. When a query qi
containing document diis made, the learner is offered with a
label lifrom the analyst. We first use the current model built
upon document set Di1to predict the label of document di,
Algorithm 1 The risk-factor based defense algorithm
Given: current set Di1, query document di, input label li
1: l
i=predict using current set(Di1,di)
2: if l
3: i=compute distance(di)
4: if i> ri:
5: return FALSE LABEL
6: return TRUE LABEL
and get the predicted label l
i. If l
i6=li, we calculate the
distance ibetween the dito current separating hyperplane,
and compare it to the risk factor ri. If di> ri, we can think
the label is mistakenly provided and reject it.
The defense process is in Algorithm 1. In algorithm 1,
function predict using current set accepts current docu-
ment set Di1and a document dias parameters, and outputs
the predicted label of di; and function compute distance ac-
cepts the document dias parameter and calculates the dis-
tance between the corresponding feature vector and current
separating hyperplane in the SVM algorithm.
Without doubt, this approach relies on the correctness of
initial training set D0, which is assumed to be accurate in this
paper. By design, this strategy leverages the statical property
initially derived from D0, therefore it is not designed to pre-
vent all attacks but only to identify and correct a subset of
mislabels based on the initial and inherited statistical proper-
ties during the active learning process.
3.3. Choosing the Risk Factor
We propose to design benchmark tests on the initial training
set D0to adequately choose the risk factor. We used the fol-
lowing metric of accuracy score Sto evaluate and compare
the effectiveness of classification.
S=total number of accurate classif ications
total number of classifications (2)
In each benchmark test (i.e, the function benchmark test
in Algorithm 2), we train the classifier using active learning
with a given set of parameters, including risk factor rand de-
fense strategy, and record the accuracy score for the testing
data set as the number of queries increases. When the train-
ing set is mixed with malicious labels without any defense,
the score is called affected score Sa. When the querying is
protected with our defense strategy, the corresponding score
is called Sd. We evaluate the effectiveness of our defense
strategy by comparing Sdwith Sa. Our goal is to choose
a risk factor that makes the defense strategy effective, i.e.,
SdSa. With the benchmark tests, our heuristic approach
to search for the risk factor is shown in Algorithm 2.
In Algorithm 2, the labeling error rate reis the probability
that an input is wrongly labeled in benchmark tests. In prac-
tice, it should be of small value as a large value is likely to be
Algorithm 2 Risk Factor Search Algorithm
Given: risk factor r, search step , labeling error rate re
1: Sa=benchmark test(r,re, defense=False)
2: Sd=benchmark test(r,re, defense=True)
3: if SdSa:
4: return r
5: else
6: r=r+
7: goto 1
noticeable and raise suspicion. For example, when the gov-
ernment uses document analysts to classify documents, ad-
ministrative approaches such as internal review and sample
checking techniques can be effective in detecting such errors.
In this section, we present experimental setups and results.
4.1. Experiment Setups and Parameters
We build a data set of 1264 instances with 10233 features ex-
tracted from real documents in Reuters-21578 Data Set [12].
Instances from the data set are uniformly distributed among
two categories. Three fourths of the data set is used for train-
ing and the rest is for testing. Among the training set, four
fifths is labeled data, the rest is the query pool.
For the SVM algorithm, we use the radial basis function
(RBF) kernel with parameters γ= 1.0/1264 and C= 1.0.
We consider three test cases: 1) No attack: There is no error
labeling without defense strategy; 2) Attack without defense:
There is 25% error labeling due to the attack without defense
strategy; 3) Attack with defense: There is 25% error labeling
due to the attack with defense strategy. During active learn-
ing, queries are all made randomly in the three set of tests.
We first use the random sampling strategy then use the uncer-
tainty sampling strategy in experiments [10].
4.2. Experimental Results
We first consider the scenarios that the risk factor is not care-
fully chosen. The query strategy in active learning is random
sampling in the experiments.
Fig. 2 shows the accuracy scores of three test cases when
the risk factor is too small. We can observe that the perfor-
mance of the attack with defense case is even worse than that
of the no attack cases under small risk factor although the
number of correct inputs are 3 times more than that of mali-
cious ones. This is because the defense strategy cannot really
distinguish which input is due to the attacker, but can only de-
tect which label may be malicious by comparing the distance
of its instance in the SVM model with the risk factor. When
the risk factor is too small, the defense strategy has a very
0 50 100 150 200
Number of Queries
Accuracy Score
No Attack
Attack With Defense
Attack Without Defense
Fig. 2. Comparison of accuracy scores in
three cases when a= 0.5.
0 20 40 60 80 100 120 140 160 180
Number of Queries
Accuracy Score
No Attack
Attack With Defense
Attack Without Defense
Fig. 3. Comparison of accuracy scores in
three cases when a= 1.5.
0 50 100 150 200
Number of Queries
Accuracy Score
No Attack
Attack With Defense
Attack Without Defense
Fig. 4. Comparison of accuracy scores
with optimal risk factor.
small tolerance level to accept new inputs, making the strat-
egy erroneous by rejecting many inputs with correct labels.
Fig. 2 shows the accuracy scores of three test cases when
the risk factor has a large value, i.e.; the threshold in the risk
factor a= 1.5. As Fig. 3 depicts, the attack with defense
case yields with the same performance as the attack without
defense case, which is substantially worse than the no attack
case. This means that the defense strategy neither improves
nor degrades the performance of classifiers comparing with
the attack without defense case. This is because the risk factor
is chosen improperly large and the distance of each instance
with error label to the separating hyperplane in the SVM clas-
sifier is considered acceptable in the defense strategy.
Figs. 2 and 3 clearly show how largely malicious inputs
can affect the accuracy of document classification. The two
figures also show how the value of the risk factor can affect
the effectiveness of the defense strategy: a very small risk fac-
tor yields worse performance than the attack without defense
case; and a very large risk factor leads to equal performance
than the attack without defense case;
Then we use a risk factor that is locally optimal given in
Algorithm 2. Fig. 4 shows when the risk factor is optimal, the
attack with defense case almost achieves similar performance
as the no attack case, and outperforms the attack without de-
fense case. Admittedly, there are errors that the defense strat-
egy cannot detect. First, it ignores mistakes where an error
label is the same with the prediction of current classifier. Sec-
ond, it omits the cases in which the corresponding instance
of an error label is within the distance margin allowed by the
risk factor. This explains why the attack with defense perfor-
mance is overall worse than the no attack case. From Figs. 3
and 4, we can conclude that the defense strategy is effective
when the risk factor is carefully chosen.
Finally, we evaluate the effectiveness of the defense when
the query strategy is uncertainty sampling and compare the
result with random sampling. We decrease the size of train-
ing set to half of the entire dataset and labeled set to half of
training set, and increase the input error ratio to 1/2. Fig. 5
shows in uncertainty sampling where each queried sample is
10 20 30 40 50 60 70 80 90100
Number of Queries
Accuracy Score
No Attack
Attack Without Defense
Attack With Defense
Uncertainty Sampling
Random Sampling
Fig. 5. Comparison of different sampling strategies.
closer to the current separating hyperplane than others, our
defense is still effective in defending against erroneous label-
ing. With uncertainty sampling, the classifier should achieve
a higher accuracy with less queries compared to random sam-
pling. But under a heavy attack, Fig. 5 shows that the affected
classifier degrades to random sampling case. With the pro-
posed defense strategy, the damage is largely reduced and the
classification accuracy is approximately equal to that of the
original classifier without attack.
In this paper, we considered the scenario of protecting active
learning in document classification against adversarial inputs.
We proposed a risk-factor based defense strategy. We used
real data sets and experiments to show that by adequately
adjusting the risk factor, the proposed defense strategy can
improve the classification accuracy and therefore shows its
effectiveness in defending active-learning-based document
classification against adversarial inputs.
