Project Details
Abstract
Data mining techniques have been used in prediction/classification problems. The
decision trees (DT), back-propagation network (BPN), support vector machine (SVM) and
support vector regression (SVR) are popular among them and can be applied to various areas.
Nevertheless, different problems may require different parameters when applying DT, BPN
and SVM/SVR. If parameters didn’t setting well, it will obtain unsatisfied result. Therefore,
rule of thumb or "try and error" methods are usually used to determine them. However, these
methods may lead worse parameters. On the other hand, a dataset may contain many features;
however, not all features are beneficial for classification. The features may contain false
correlations, which hinder the process of detecting process. Without feature selection, the
prediction/classification accuracy rate may be worse due to the noises or too many dirty data.
In most research either parameters turning or feature selection is used to improve
classification accuracy rate. Some researches consider parameters turning and feature
selection simultaneously to specific problems. However, the specific data can not be used to
further comparison with other approaches. Therefore, scatter search (SS), is proposed to select
the beneficial subset of features and to obtain the better parameters which will result in a
better classification.
The above data mining techniques has its own advantages and disadvantages and the
suitability will influenced by the characteristic of problem. If these techniques can work
together, it is expected that the better result can be obtained. This is so-called ensemble
architecture. The original concept of ensemble comes from committee machine proposed by
Nilsson in 1965. The purpose of committee machine is to integrate the option of many experts
instead of only one expert to obtain better result. Therefore, this proposal is plan to use the
ensemble architecture to further enhance the prediction/classification accuracy rate.
In order to evaluate the proposed approaches, datasets in UCI (University of California,
Irvine) are planned to evaluate the performance of the proposed approaches. It is expected
that the approaches which apply the parameter turning and feature selection simultaneously,
will obtain better classification rates and decrease the computational time, than approaches
which apply either parameter turning or feature selection. Therefore, the proposed SS+DT,
SS+BPN, SS+SVM及SS+SVR can find best parameters and feature subset when face various
problems, and provide the higher classification accuracy rate.
Project IDs
Project ID:PF9709-0769
External Project ID:NSC97-2410-H182-020-MY2
External Project ID:NSC97-2410-H182-020-MY2
Status | Finished |
---|---|
Effective start/end date | 01/08/08 → 31/07/09 |
Keywords
- Scatter Search
- Decision Tree
- Back-propagation Network
- Support Vector
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.