Browse By Repository:

 
 
 
   

An Analysis Of Attribute Reduction Techniques For Breast Cancer Dataset

Wong , Hor Yan (2010) An Analysis Of Attribute Reduction Techniques For Breast Cancer Dataset. Project Report. UTeM, Melaka,Malaysia. (Submitted)

[img] PDF (24 Pages)
An_Analysis_Of_Attribute_Reduction_Techniques_For_Breast_Canser_Dataset_Wong_Hor_Yan_R858.W66_2010_-_24_Pages.pdf - Submitted Version

Download (7MB)
[img] PDF (Full Text)
An_Analysis_Of_Attribute_Reduction_Techniques_For_Breast_Canser_Dataset_Wong_Hor_Yan_R858.W66_2010.pdf - Submitted Version
Restricted to Registered users only

Download (75MB)

Abstract

Breast cancer is a deadly disease popularly among women but the disease is curable when detected in early stage. However, large number of disease markers in breast cancer data set may affects the quality of prediction. Thus, this project's objectives are to analysis and to benchmark attribute reduction techniques besides developing an attribute reduction tool for breast cancer data set. CRISP-DM is used as the main methodology whereas OOAD is used for the tool development. After the attribute reduction tool is completed, analyses of RELIEF, SVM-RFE and CFS techniques on different data sets are done. Experiments on acquiring classification accuracy are done with Naive Bayes as the classifier, 10-folds cross validation as the evaluation mode and a random seed of 1 while the ROC values and percentage of reduction are used in comparing the classification performance. The experiments shows that CFS achieved high percentage of reduction and fine ROC values in most experiments conducted while SVM-RFE's performance IS considered tolerable although it consume more process time than CFS and RELIEF. The experiments also show that RELIEF bore exceptional results for the Wisconsin Breast Cancer data. Thus RELIEF is suggested for numeric-valued attributes and large or artificial data sets while SVM-RFE is good for data with mostly nominalvalued attributes, real-world data with more training data and less testing data. Then, as CFS performs excellently it is recommended for processing numeric-valued attributes and real-world data sets. Future recommendation will be comparing more techniques with more different data set.

Item Type: Final Year Project (Project Report)
Uncontrolled Keywords: Breast -- Cancer, System analysis, Medical informatics, Management information systems -- Data processing
Subjects: R Medicine > R Medicine (General)
Divisions: Library > Final Year Project > FTMK
Depositing User: Mohd Syahrizal Mohd Razali
Date Deposited: 19 Sep 2012 00:49
Last Modified: 28 May 2015 03:37
URI: http://digitalcollection.utem.edu.my/id/eprint/5825

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year