Over the last few decades, the advent of telecommunication systems has allowed a growing exchange of electronic messages around the world. Unfortunately, irrelevant and/or unsolicited content corresponds to the majority of this volume of data, and to decide whether to keep or discard each message is a known challenge in the context of machine learning. This paper proposes an anti-spam filtering
... [Show full abstract] approach base on linguistic techniques. The real effect of each system parameter is evaluated through design factorial analysis using two different classifiers: first using Support Vector Machine (SVM) and second applying Naive Bayesian (NB) classification. This analysis is detailed and discussed providing a step-by-step guide for developers and users of anti-spam filters. Based on different system metrics, multi-objective optimization is applied in order to obtain the optimal filter setup. Evaluation of anti-spam filter under optimal configuration showed that SVM-based system achieved an accuracy performance above 98% whereas the NB-based system reached 87%. Results also reveal that linguistic techniques are relevant for the NB classifier but do not contribute to improve the SVM-based system performance.