Deep fake audio detection based on deep learning

Wijaya, Athallah Ariq (2023) Deep fake audio detection based on deep learning. Project Report. Universiti Teknikal Malaysia Melaka, Melaka, Malaysia. (Submitted)

Text (Full Text)
Deep fake audio detection based on deep learning.pdf
Download (4MB)

Abstract

With the advance of technology, especially in the field of AI, new and recent threats have emerged called deep fake audio. Instead of using a video to create a convincing, legit video, this threat only relies on audio. An example of this threat in action is using it to call for social engineering or to create fake audio propaganda by broadcasting it on the radio or other streaming services that only use audio as the bridge. Although this threat only relies on the audio, it can be hard to distinguish between the real and fake because the listener cannot see the source of the audio or speaker of the voice. Many previous studies have proposed a novel approach by using deep learning models, from simple ones like CNN to complex ones like ResNet and combinations of two or more models. In addition, with the many approaches proposed by many researchers, it is hard to choose which model can be used efficiently and easily. By using the dataset provided by ASV Spoof 2019, a dataset that has been used by many research papers to research deep fake audio, we investigated three deep learning models: CNN, ResNet, and ResNet+LSTM, and how to implement those models using the Python language and PyTorch framework. Out of the three chosen models, the ResNet model has achieved the best performance against the other two models by using macro-F1 as the measurement of performance. The CNN model has the lowest performance, and the ResNet+LSTM model performance is between the CNN model and the ResNet model. Even though the performance of the ResNet+LSTM model in terms of macro-F1 is lower than that of the ResNet model, the accuracy when the model predicts the data in the evaluation dataset is greater than that obtained by the ResNet model because the ResNet+LSTM model tends to choose the spoof data, which also means that the model needs more training to be flexible to choose the correct label of data.

Item Type:	Final Year Project (Project Report)
Uncontrolled Keywords:	Deepfake, Deep fake, Deep learning, Deep fake audio, Deepfake audio
Subjects:	Q Science > Q Science (General)
Divisions:	Library > Final Year Project > FTMK
Depositing User:	Sabariah Ismail
Date Deposited:	08 Jan 2024 03:17
Last Modified:	21 Nov 2024 07:36
URI:	http://digitalcollection.utem.edu.my/id/eprint/31576

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year