Browse By Repository:

 
 
 
   

Ensemble method of tweet, URL and other features classifiers in twitter bot detection

Sabri, Ahmad Syahmi Farhan (2023) Ensemble method of tweet, URL and other features classifiers in twitter bot detection. Project Report. Universiti Teknikal Malaysia Melaka, Melaka, Malaysia. (Submitted)

[img] Text (24 Pages)
Ensemble method of tweet, URL and other features classifiers in twitter bot detection.pdf - Submitted Version

Download (297kB)
[img] Text (Full text)
Ensemble method of tweet, URL and other features classifiers in twitter bot detection.pdf - Submitted Version
Restricted to Registered users only

Download (4MB)

Abstract

Twitter has experienced a remarkable rise in popularity and influence. However, twitter's popularity and open nature make it a desirable target for bots known as Twitter bots. This research proposes an Ensemble method for detecting Twitter bots. In the initial phase, four models are developed from Twitter features: a model that extracts tweet features using words, a model that extracts tweet features using n-gram, a model that extracts URL features using n-gram, and one that extracts additional features. Information Gain feature selection is applied and evaluated for multiple threshold values for all models to achieve the most accurate representation. The model with the threshold value that has the highest accuracy on the training set is chosen as the input for the ensemble method. The final prediction is derived by combining the probability outputs of these four models. This Ensemble method strategy aims to improve the classifier's overall performance by capitalizing on the strengths of the four fundamental models. To evaluate the effectiveness of the proposed method, extensive experiments are conducted on the Cresci-2017 dataset. The test results show that the proposed method using four models provides a satisfactory mean accuracy of 97.50%, which surpasses the accuracy of the models together, which are 70.50% for the tweets word model, 85.00% for the tweets n-gram model, 93.50% for the URL n-gram model and 82.00% for other features model. This demonstrates the efficacy of the ensemble approach and the importance of incorporating diverse features to achieve outstanding Twitter bot detection accuracy.

Item Type: Final Year Project (Project Report)
Uncontrolled Keywords: Twitter bot, Machine learning, Text classification, Support vector machine linear kernel classifier, Ensemble model
Subjects: Q Science > Q Science (General)
Divisions: Library > Final Year Project > FTMK
Depositing User: Sabariah Ismail
Date Deposited: 08 Jan 2024 03:36
Last Modified: 08 Jan 2024 03:36
URI: http://digitalcollection.utem.edu.my/id/eprint/31579

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year