Exploring Random Forest in Genetic Risk Score Construction

Vaishnavi Venkat, Kaylyn Clark, X. Jessie Jeng, Tsung Chieh Yao, Hui Ju Tsai, Tzu Pin Lu, Tzu Hung Hsiao, Ching Heng Lin, Shannon Holloway, Cathrine Hoyo, Shin Yi Chou, Hui Wang, Wan Ping Lee, Li San Wang, Jung Ying Tzeng*

*Corresponding author for this work

Research output: Contribution to journalJournal Article peer-review

Abstract

Genetic risk scores (GRS) are crucial tools for estimating an individual's genetic liability to various traits and diseases, computed as a weighted sum of trait-associated allele counts. Traditionally, GRS models assume additive, linear effects of risk variants. However, complex traits often involve nonadditive interactions, such as epistasis, which are not captured by these conventional methods. In this study, we investigate the use of random forest (RF) models as a model-free approach for constructing GRS, leveraging RF's capacity to capture complex, nonlinear interactions among genetic variants. Specifically, we introduce two new RF-based GRS strategies to boost RF performance and to incorporate base data information if available, including (1) ctRF, which optimizes linkage disequilibrium (LD) clumping and p-value thresholds within RF; and (2) wRF, which adjusts the chance of SNP inclusion in tree nodes based on their association strength. Through simulation studies and real data applications of Alzheimer's disease, body mass index, and atopy, we find that ctRF consistently outperforms other RF-based methods and classical additive models when traits exhibit complex genetic architectures. Additionally, incorporating informative base data into RF-GRS construction can enhance predictive accuracy. Our findings suggest that RF-based GRS can effectively capture intricate genetic interactions, and offer a robust alternative to traditional GRS methods, especially for complex traits with nonlinear genetic effects.

Original languageEnglish
Article numbere70022
JournalGenetic Epidemiology
Volume49
Issue number8
DOIs
StatePublished - 12 2025

Bibliographical note

Publisher Copyright:
© 2025 The Author(s). Genetic Epidemiology published by Wiley Periodicals LLC.

Fingerprint

Dive into the research topics of 'Exploring Random Forest in Genetic Risk Score Construction'. Together they form a unique fingerprint.

Cite this