Arabic Tweets Spam Detection Based on Various Supervised Machine Learning and Deep Learning Classifiers

Document Type : Original Article

Authors

1 Electrical Engineering Department, Faculty of Engineering at Shoubra, Benha University, Cairo, Egypt

2 Engineering Department, Nuclear Research Centre, Egyptian Atomic Energy Authority, Cairo, Egypt

Abstract

 In this paper, different machine learning algorithms, ensemble algorithms,
and deep learning algorithms are applied to Arabic tweets to detect whether it
human-generated or not. The tweets are used twice as preprocessed and nonpreprocessed to measure the effectiveness of Arabic preprocessing in the
classification process. The data is also tokenized with various methods like unigram,
trigram, and Term Frequency–Inverse Document Frequency. The experiments
show that the support vector machine with the non-preprocessed tweets and
unigram tokenization has the best performance of 83.11% and a precision of 0.9516
while it predicts the spam or not in a relatively small time.

Keywords

Main Subjects