N-Gram-Based Machine Learning Approach for Bot or Human Detection from Text Messages

Durga Prasad Kavadi; Chandra Sekhar Sanaboina; Rizwan Patan; Amir Gandomi

doi:10.1145/3533050.3533063

Abstract

1 min read

Social bots are computer programs created for automating general human activities like the generation of messages. The rise of bots in social network platforms has led to malicious activities such as content pollution like spammers or malware dissemination of misinformation. Most of the researchers focused on detecting bot accounts in social media platforms to avoid the damages done to the opinions of users. In this work, n-gram based approach is proposed for a bot or human detection. The content-based features of character n-grams and word n-grams are used. The character and word n-grams are successfully proved in various authorship analysis tasks to improve accuracy. A huge number of n-grams is identified after applying different pre-processing techniques. The high dimensionality of features is reduced by using a feature selection technique of the Relevant Discrimination Criterion. The text is represented as vectors by using a reduced set of features. Different term weight measures are used in the experiment to compute the weight of n-grams features in the document vector representation. Two classification algorithms, Support Vector Machine, and Random Forest are used to train the model using document vectors. The proposed approach was applied to the dataset provided in PAN 2019 competition bot detection task. The Random Forest classifier obtained the best accuracy of 0.9456 for bot/human detection.

Related publications

Article2015

Machine Learning Methods for Attack Detection in the Smart Grid

Mete Özay, Iñaki Esnaola, Fatoş T. Yarman Vural, Sanjeev R. Kulkarni, H Vincent Vincent Poort

Article2016

SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods

Aviad Cohen, Nir Nissim, Lior Rokach, Yuval Elovici

Expert Systems with Applications

Article2023

Hateful Sentiment Detection in Real-Time Tweets: An LSTM-Based Comparative Approach

Sanjiban Sekhar Roy, Akash Roy, Pijush Samui, Mostafa Gandomi, Amir Gandomi

IEEE Transactions on Computational Social Systems

Article2022

Mixed Game-Based AoI Optimization for Combating COVID-19 With AI Bots

Yaoqi Yang, Weizheng Wang, Zhimeng Yin, Renhui Xu, Xiaokang Zhou, Neeraj Kumar, Mamoun Alazab, Thippa Reddy Gadekallu

Article2023

Vortex and Core Detection using Computer Vision and Machine Learning Methods

Zhenguo Xu, Ayush Maria, Kahina Chelli, Thibaut Dumouchel De Premare, Xabadin Bilbao, C. Petit, Robert Zoumboulis-Airey, Irene Moulitsas, Tom Teschner, S. A. Syed Asif, Jun Li