Twitter spam detection using machine learning approach

Chan, Yan Shih (2021) Twitter spam detection using machine learning approach. Project Report. Universiti Teknikal Malaysia Melaka, Melaka, Malaysia. (Submitted)

Preview

Text (Full Text)
Twitter spam detection using machine learning approach.pdf - Submitted Version
Download (21MB) | Preview

Abstract

Nowadays, the application of social media has grown widely in our daily routine. People can freely post and share any contents on social media. With the growth of social media, people can now make use of it for building connections whether for business or personal gain. The popularity of Twitter has also noted to attract awareness of spammers who make use of Twitter for their own malevolent objectives such as conducting acts of phishing real Twitter users or spreading malicious software through URLs that are shared in tweets as well as hijack topics to attract users’ attention. The Internet is a boundless platform for information and data sharing. Detecting spam contents from social media network is an intriguing research topic because it is important for cyber forensic agencies to detect the way of social media in broadcasting malicious activities or attacks before offenses are performed. This research attempts to detect spam in Twitter platform using three different machine learning classifier models which is Naïve Bayes, Support Vector Machine (SVM), and Random Forest in addition propose the model that produce the highest accuracy and precision in predicting spam by comparing each of the model’s result. At the end of this study, the results of each model’s analysis will be explained and compared to achieve the objective of this study. The dataset is categorized into Training and Testing and the samples for testing is divided into 5 categories such as 100, 200, 300, 500, and 1000 sample tweets. The reason of dividing the samples into different size is to analyses whether the size of samples affect the analysis results or not. After comparing the results, we can conclude that Naïve Bayes has the highest accuracy and precision value in predicting spam while Random Forest has the worst accuracy. Thus, this research includes all features from extracting contents from social media network such as Twitter, applying different machine learning classifiers based on specific keywords like URLs on social media network to finally classifying them as Spam or Ham as well as equating the accuracy differences between each of the machine learning classifiers.

Item Type:	Final Year Project (Project Report)
Uncontrolled Keywords:	Support vector machine, Random forest, Spam, Naïve bayes, Hijack
Divisions:	Library > Final Year Project > FTMK
Depositing User:	Norfaradilla Idayu Ab. Ghafar
Date Deposited:	23 May 2023 08:29
Last Modified:	10 Dec 2024 03:48
URI:	http://digitalcollection.utem.edu.my/id/eprint/27356

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year