Abstract
Feature selection is a process aimed at filtering out unrepresentative features from a given dataset, usually allowing the later data mining and analysis steps to produce better results. However, different feature selection algorithms use different criteria to select representative features, making it difficult to find the best algorithm for different domain datasets. The limitations of single feature selection methods can be overcome by the application of ensemble methods, combining multiple feature selection results. In the literature, feature selection algorithms are classified as filter, wrapper, or embedded techniques. However, to the best of our knowledge, there has been no study focusing on combining these three types of techniques to produce ensemble feature selection. Therefore, the aim here is to answer the question as to which combination of different types of feature selection algorithms offers the best performance for different types of medical data including categorical, numerical, and mixed data types. The experimental results show that a combination of filter (i.e., principal component analysis) and wrapper (i.e., genetic algorithms) techniques by the union method is a better choice, providing relatively high classification accuracy and a reasonably good feature reduction rate.
Original language | English |
---|---|
Article number | e12553 |
Journal | Expert Systems |
Volume | 37 |
Issue number | 5 |
DOIs | |
State | Published - 01 10 2020 |
Bibliographical note
Publisher Copyright:© 2020 John Wiley & Sons, Ltd
Keywords
- data mining
- dimensionality reduction
- ensemble
- feature selection
- feature selection
- medial datasets