Applying Scatter Search and Ensemble Architecture for Improving the Efficiency of Data Mining Techniques

Project: National Science and Technology CouncilNational Science and Technology Council Academic Grants

Project Details

Abstract

Data mining techniques have been used in prediction/classification problems. The decision trees (DT), back-propagation network (BPN), support vector machine (SVM) and support vector regression (SVR) are popular among them and can be applied to various areas. Nevertheless, different problems may require different parameters when applying DT, BPN and SVM/SVR. If parameters didn’t setting well, it will obtain unsatisfied result. Therefore, rule of thumb or "try and error" methods are usually used to determine them. However, these methods may lead worse parameters. On the other hand, a dataset may contain many features; however, not all features are beneficial for classification. The features may contain false correlations, which hinder the process of detecting process. Without feature selection, the prediction/classification accuracy rate may be worse due to the noises or too many dirty data. In most research either parameters turning or feature selection is used to improve classification accuracy rate. Some researches consider parameters turning and feature selection simultaneously to specific problems. However, the specific data can not be used to further comparison with other approaches. Therefore, scatter search (SS), is proposed to select the beneficial subset of features and to obtain the better parameters which will result in a better classification. The above data mining techniques has its own advantages and disadvantages and the suitability will influenced by the characteristic of problem. If these techniques can work together, it is expected that the better result can be obtained. This is so-called ensemble architecture. The original concept of ensemble comes from committee machine proposed by Nilsson in 1965. The purpose of committee machine is to integrate the option of many experts instead of only one expert to obtain better result. Therefore, this proposal is plan to use the ensemble architecture to further enhance the prediction/classification accuracy rate. In order to evaluate the proposed approaches, datasets in UCI (University of California, Irvine) are planned to evaluate the performance of the proposed approaches. It is expected that the approaches which apply the parameter turning and feature selection simultaneously, will obtain better classification rates and decrease the computational time, than approaches which apply either parameter turning or feature selection. Therefore, the proposed SS+DT, SS+BPN, SS+SVM及SS+SVR can find best parameters and feature subset when face various problems, and provide the higher classification accuracy rate.

Project IDs

Project ID:PF9709-0769
External Project ID:NSC97-2410-H182-020-MY2
StatusFinished
Effective start/end date01/08/0831/07/09

Keywords

  • Scatter Search
  • Decision Tree
  • Back-propagation Network
  • Support Vector

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.