Browse By Repository:

 
 
 
   

An Enhanced Malay Named Entity Recognition Based Fuzzy Semi-Supervised Clustering For Unstructured Crime Textual Data Analysis

Asmai, Siti Azirah and Syed Ahmad, Sharifah Sakinah and Hussin, Burairah and Basiron, Halizah and Shibghatullah, Abdul Samad and Kamalrudin, Massila and Ahmad, Sabrina and Abdullah, Rosmiza Wahida and Salleh, Muhammad Sharilazlan (2018) An Enhanced Malay Named Entity Recognition Based Fuzzy Semi-Supervised Clustering For Unstructured Crime Textual Data Analysis. Project Report. UTeM, Melaka, Malaysia. (Submitted)

[img] Text (24 Pages)
An Enhanced Malay Named Entity Recognition Based Fuzzy Semi-Supervised Clustering For Unstructured Crime Textual Data Analysis.pdf - Submitted Version
Restricted to Registered users only

Download (597kB)

Abstract

Named Entity Recognition (NER) is one of the tasks undertaken in the information extraction. NER is used for extracting and classifying words or entities that belong to the proper noun category in text data such as the person's name, location, organization, date, etc. As seen in today's generation, social media such as web pages, blogs, Facebook, Twitter, Instagram and online newspapers are among the major contributors to information extraction. These resources contain various types of unstructured data such as text. However, the amount of works done to process this type of data is limited for Malay Named Entity Recognition (MNER). The deficiency on Malay textual analytic has led to difficulties in extracting information for decision making. This research aims to present a Malay Named Entity Recognition technique that focuses on crime data analysis in the Malay language that extracted from Polis Diraja Malaysia (PDRM) news web page. This Malay Named Entity Recognition (MNER) technique is proposed by using multi-staged of clustering and classification methods. The methods are Fuzzy C-Means and K-Nearest Neighbors Algorithm. The methods involve multi-layer features extraction to recognize entities such as person name, location, organization, date and crime type. This multi-staged technique is obtained 95.24% accuracy in the process of recognizing named entities for text analysis, particularly in Malay. The proposed technique can improve the accuracy performance on named entity recognition of crime data based on the suitability selected features for the Malay language.

Item Type: Final Year Project (Project Report)
Uncontrolled Keywords: Artificial intelligence, Machine learning, Natural language processing
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA76 Computer software
Divisions: Library > Long/ Short Term Research > FTMK
Depositing User: Mohd Hannif Jamaludin
Date Deposited: 31 Dec 2019 02:58
Last Modified: 31 Dec 2019 02:58
URI: http://digitalcollection.utem.edu.my/id/eprint/24137

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year