Combining data discretization and missing value imputation for incomplete medical datasets

Min Wei Huang, Chih Fong Tsai, Shu Ching Tsui, Wei Chao Lin*

*此作品的通信作者

研究成果: 期刊稿件文章同行評審

1 引文 斯高帕斯(Scopus)

摘要

Data discretization aims to transform a set of continuous features into discrete features, thus simplifying the representation of information and making it easier to understand, use, and explain. In practice, users can take advantage of the discretization process to improve knowledge discovery and data analysis on medical domain problem datasets containing continuous features. However, certain feature values were frequently missing. Many data-mining algorithms cannot handle incomplete datasets. In this study, we considered the use of both discretization and missing-value imputation to process incomplete medical datasets, examining how the order of discretization and missing-value imputation combined influenced performance. The experimental results were obtained using seven different medical domain problem datasets: two discretizers, including the minimum description length principle (MDLP) and ChiMerge; three imputation methods, including the mean/mode, classification and regression tree (CART), and k-nearest neighbor (KNN) methods; and two classifiers, including support vector machines (SVM) and the C4.5 decision tree. The results show that a better performance can be obtained by first performing discretization followed by imputation, rather than vice versa. Furthermore, the highest classification accuracy rate was achieved by combining ChiMerge and KNN with SVM.

原文英語
文章編號e0295032
期刊PLoS ONE
18
發行號11 November
DOIs
出版狀態已出版 - 11 2023

文獻附註

Publisher Copyright:
Copyright: © 2023 Huang et al.

指紋

深入研究「Combining data discretization and missing value imputation for incomplete medical datasets」主題。共同形成了獨特的指紋。

引用此