Abstract
Background and objective: Univariate feature selection is one of the simplest and most commonly used techniques to develop a multigene predictor for survival. Presently, there is no software tailored to perform univariate feature selection and predictor construction. Methods: We develop the compound.Cox R package that implements univariate significance tests (via the Wald tests or score tests) for feature selection. We provide a cross-validation algorithm to measure predictive capability of selected genes and a permutation algorithm to assess the false discovery rate. We also provide three algorithms for constructing a multigene predictor (compound covariate, compound shrinkage, and copula-based methods), which are tailored to the subset of genes obtained from univariate feature selection. We demonstrate our package using survival data on the lung cancer patients. We examine the predictive capability of the developed algorithms by the lung cancer data and simulated data. Results: The developed R package, compound.Cox, is available on the CRAN repository. The statistical tools in compound.Cox allow researchers to determine an optimal significance level of the tests, thus providing researchers an optimal subset of genes for prediction. The package also allows researchers to compute the false discovery rate and various prediction algorithms.
Original language | English |
---|---|
Pages (from-to) | 21-37 |
Number of pages | 17 |
Journal | Computer Methods and Programs in Biomedicine |
Volume | 168 |
DOIs | |
State | Published - 01 2019 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2018
Keywords
- Cancer prognosis
- Copula
- Cox regression
- Cross-validation
- Dependent censoring
- False discovery rate
- Gene expression
- High-dimensional data
- Multiple testing