Wong , Hor Yan (2010) An Analysis Of Attribute Reduction Techniques For Breast Cancer Dataset. Project Report. UTeM, Melaka,Malaysia. (Submitted)
![]() |
PDF (24 Pages)
An_Analysis_Of_Attribute_Reduction_Techniques_For_Breast_Canser_Dataset_Wong_Hor_Yan_R858.W66_2010_-_24_Pages.pdf - Submitted Version Download (7MB) |
![]() |
PDF (Full Text)
An_Analysis_Of_Attribute_Reduction_Techniques_For_Breast_Canser_Dataset_Wong_Hor_Yan_R858.W66_2010.pdf - Submitted Version Restricted to Registered users only Download (75MB) |
Abstract
Breast cancer is a deadly disease popularly among women but the disease is curable when detected in early stage. However, large number of disease markers in breast cancer data set may affects the quality of prediction. Thus, this project's objectives are to analysis and to benchmark attribute reduction techniques besides developing an attribute reduction tool for breast cancer data set. CRISP-DM is used as the main methodology whereas OOAD is used for the tool development. After the attribute reduction tool is completed, analyses of RELIEF, SVM-RFE and CFS techniques on different data sets are done. Experiments on acquiring classification accuracy are done with Naive Bayes as the classifier, 10-folds cross validation as the evaluation mode and a random seed of 1 while the ROC values and percentage of reduction are used in comparing the classification performance. The experiments shows that CFS achieved high percentage of reduction and fine ROC values in most experiments conducted while SVM-RFE's performance IS considered tolerable although it consume more process time than CFS and RELIEF. The experiments also show that RELIEF bore exceptional results for the Wisconsin Breast Cancer data. Thus RELIEF is suggested for numeric-valued attributes and large or artificial data sets while SVM-RFE is good for data with mostly nominalvalued attributes, real-world data with more training data and less testing data. Then, as CFS performs excellently it is recommended for processing numeric-valued attributes and real-world data sets. Future recommendation will be comparing more techniques with more different data set.
Item Type: | Final Year Project (Project Report) |
---|---|
Uncontrolled Keywords: | Breast -- Cancer, System analysis, Medical informatics, Management information systems -- Data processing |
Subjects: | R Medicine > R Medicine (General) |
Divisions: | Library > Final Year Project > FTMK |
Depositing User: | Mohd Syahrizal Mohd Razali |
Date Deposited: | 19 Sep 2012 00:49 |
Last Modified: | 28 May 2015 03:37 |
URI: | http://digitalcollection.utem.edu.my/id/eprint/5825 |
Actions (login required)
![]() |
View Item |