Abstract
The problem of missing values frequently occurs during data analysis. Imputation is one of the solutions to handle missing data. Clinical data often contain multiple measurements such as laboratory test results which are measured at different time points. In this study, we compared three imputation methods and their effects on different multiple measurement data sets with different sampling time periods. Data sets of liver cancer were used in this study for classification of liver cancer recurrence based on two types of classification models built by support vector machine (SVM) and random forests. The results report appropriate combinations of imputation methods and sampling time periods which achieve better classification results than those of other imputation methods and periods. These reported the leading imputation method with SVM is significantly different (P<0.001) from mean imputation with SVM which is frequently used by data sets with missing values.
Original language | English |
---|---|
Title of host publication | Intelligent Systems and Applications - Proceedings of the International Computer Symposium, ICS 2014 |
Editors | William Cheng-Chung Chu, Han-Chieh Chao, Stephen Jenn-Hwa Yang |
Publisher | IOS Press BV |
Pages | 1930-1939 |
Number of pages | 10 |
ISBN (Electronic) | 9781614994831 |
DOIs | |
State | Published - 2015 |
Externally published | Yes |
Event | International Computer Symposium, ICS 2014 - Taichung, Taiwan Duration: 12 12 2014 → 14 12 2014 |
Publication series
Name | Frontiers in Artificial Intelligence and Applications |
---|---|
Volume | 274 |
ISSN (Print) | 0922-6389 |
ISSN (Electronic) | 1879-8314 |
Conference
Conference | International Computer Symposium, ICS 2014 |
---|---|
Country/Territory | Taiwan |
City | Taichung |
Period | 12/12/14 → 14/12/14 |
Bibliographical note
Publisher Copyright:© 2015 The authors and IOS Press. All rights reserved.
Keywords
- Missing values
- imputation
- random forests
- support vector machine