N-Gram-Based Machine Learning Approach for Bot or Human Detection from Text Messages
Article 2022 English
Authors
DK
Durga Prasad Kavadi
CS
Chandra Sekhar Sanaboina
RP
Rizwan Patan
Abstract
1 min read
Social bots are computer programs created for automating general human activities like the generation of messages. The rise of bots in social network platforms has led to malicious activities such as content pollution like spammers or malware dissemination of misinformation. Most of the researchers focused on detecting bot accounts in social media platforms to avoid the damages done to the opinions of users. In this work, n-gram based approach is proposed for a bot or human detection. The content-based features of character n-grams and word n-grams are used. The character and word n-grams are successfully proved in various authorship analysis tasks to improve accuracy. A huge number of n-grams is identified after applying different pre-processing techniques. The high dimensionality of features is reduced by using a feature selection technique of the Relevant Discrimination Criterion. The text is represented as vectors by using a reduced set of features. Different term weight measures are used in the experiment to compute the weight of n-grams features in the document vector representation. Two classification algorithms, Support Vector Machine, and Random Forest are used to train the model using document vectors. The proposed approach was applied to the dataset provided in PAN 2019 competition bot detection task. The Random Forest classifier obtained the best accuracy of 0.9456 for bot/human detection.
Zhenguo Xu, Ayush Maria, Kahina Chelli, Thibaut Dumouchel De Premare, Xabadin Bilbao, C. Petit, Robert Zoumboulis-Airey, Irene Moulitsas, Tom Teschner, S. A. Syed Asif, Jun Li
Discussion(0)
No comments yet. Be the first to comment.