On Fair Performance Comparison between Random Survival Forest and Cox Regression: An Example of Colorectal Cancer Study

Sirin Cetin; Ayse Ulgen; Isa Dede; Wentian Li

doi:10.28991/SciMedJ-2021-0301-9

Authors

Sirin Cetin Department of Biostatistics, Faculty of Medicine, Tokat GaziosmanPasa University,, Turkey
Ayse Ulgen
ayshe.ulgen@global.t-bird.edu
Department of Biostatistics, Faculty of Medicine, Girne American University, Karmi,, Cyprus
Isa Dede Medical Oncology, Faculty of Medicine, Mustafa Kemal University, Antakya,, Turkey
Wentian Li The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY,, United States

Vol. 3 No. 1 (2021): March

Research Articles

Downloads

PDF

Abstract
How to Cite
Metrics
References
License

Random Forest (RF), a mostly model-free and robust machine learning method, has been successfully applied to right-censored survival data, under the name of Random Survival Forest (RSF). However, RF/RSF has its distinct strategies in classification and prediction. First, it is an ensemble classifier and its performance is an average of multiple rounds of data fitting. Second, the training set is a bootstrap (sampling with replacement) generated set with repeated used of roughly 2/3 of all samples and testing set consists of those not used (out of bag samples). Both features are not intrinsic to Cox regression or other single classifiers. Not considering these two features could potentially lead to a partial comparison between the performance of the two methods. By using a colorectal survival dataset, we illustrate the problems of using k-fold cross-validation, using only one resampling without an ensemble average, and using the whole dataset for both fitting and testing, in Cox regression, when comparing with RSF. We provide a more accessible R code for simple calculation of discordance index (D-index) and unweighted integrated Brier score (IBS) for Cox regression, and unweighted IBS for RSF.

Doi: 10.28991/SciMedJ-2021-0301-9

Full Text: PDF

Wang, P., Li, Y., & Reddy, C. K. (2019). Machine Learning for Survival Analysis. ACM Computing Surveys, 51(6), 1–36. doi:10.1145/3214306.

Breiman L (2001). Random forests, Machine Learning, 45, 5-32. doi:10.1023/A:1010933404324.

Hothorn T., Bühlmann P., Dudoit S., Molinaro A., Van Der Laan M.J. (2006). Survival ensembles, Biostatistics, 7(3), 355–373. doi:10.1093/biostatistics/kxj011.

Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. The Annals of Applied Statistics, 2(3). doi:10.1214/08-aoas169.

Boulesteix, A.-L., Janitza, S., Kruppa, J., & König, I. R. (2012). Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 493–507. doi:10.1002/widm.1072.

Scornet, E. (2017). Tuning parameters in random forests. ESAIM: Proceedings and Surveys, 60, 144–162. doi:10.1051/proc/201760144.

Probst, P., Wright, M. N., & Boulesteix, A. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(e1301). doi:10.1002/widm.1301.

Therneau, T. M., & Grambsch, P. M. (2000). The Cox Model. Modeling Survival Data: Extending the Cox Model, 39–77. doi:10.1007/978-1-4757-3294-8_3.

Mogensen, U. B., Ishwaran, H., & Gerds, T. A. (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11). doi:10.18637/jss.v050.i11.

Peters, A., Hothorn, T., & Lausen, B. (2002). ipred: Improved predictors. R News 2 (2): 33–36.

Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1), 1-3. doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.

Cetin S, Ulgen A, Dede I, Li W. (2020) COXRSF: R function to calculate IBS or D-index for Cox regression and random survival forest. Available online: http://github.com/wlicol/coxrsf. (accessed on 20 March 2021).

Harrell, F. E. (1982). Evaluating the yield of medical tests. JAMA: The Journal of the American Medical Association, 247(18), 2543–2546. doi:10.1001/jama.247.18.2543.

Efron, B. (1983). Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation. Journal of the American Statistical Association, 78(382), 316–331. doi:10.1080/01621459.1983.10477973.

Mitchell, M. W. (2011). Bias of the Random Forest Out-of-Bag (OOB) Error for Certain Input Parameters. Open Journal of Statistics, 01(03), 205–211. doi:10.4236/ojs.2011.13024.

Janitza, S., & Hornung, R. (2018). On the overestimation of random forest’s out-of-bag error. PLOS ONE, 13(8), e0201904. doi:10.1371/journal.pone.0201904.

Kittler, J. (1998). Combining classifiers: A theoretical framework. Pattern Analysis and Applications, 1(1), 18–27. doi:10.1007/bf01238023.

Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. doi:10.1109/34.667881.

Dietterich, T. G. (2000). Ensemble Methods in Machine Learning. Proc. 1st Int. Workshop on Multiple Classifier Systems (MCS00), Lecture Notes in Computer Science, 1857, 1–15. doi:10.1007/3-540-45014-9_1.

Kurt Omurlu, I., Ture, M., & Tokatli, F. (2009). The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer. Expert Systems with Applications, 36(4), 8582–8588. doi:10.1016/j.eswa.2008.10.023.

Datema, F. R., Moya, A., Krause, P., Bäck, T., Willmes, L., Langeveld, T., … Blom, H. M. (2011). Novel head and neck cancer survival analysis approach: Random survival forests versus cox proportional hazards regression. Head & Neck, 34(1), 50–58. doi:10.1002/hed.21698.

Kwamboka Mageto, D. (2015). Modelling of Credit Risk: Random Forests versus Cox Proportional Hazard Regression. American Journal of Theoretical and Applied Statistics, 4(4), 247. doi:10.11648/j.ajtas.20150404.13.

Zhou, L., Xu, Q., & Wang, H. (2015). Rotation survival forest for right censored data. PeerJ, 3, e1009. doi:10.7717/peerj.1009.

Saadati, M., & Bagheri, A. (2019). Comparison of Survival Forests in Analyzing First Birth Interval. Jorjani Biomedicine Journal, 7(3), 11–23. doi:10.29252/jorjanibiomedj.7.3.11.

Kim, D. W., Lee, S., Kwon, S., Nam, W., Cha, I.-H., & Kim, H. J. (2019). Deep learning-based survival prediction of oral cancer patients. Scientific Reports, 9(1). doi:10.1038/s41598-019-43372-7.

Ma, B., Geng, Y., Meng, F., Yan, G., & Song, F. (2020). Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method. Journal of Cancer, 11(5), 1288–1298. doi:10.7150/jca.34585.

Nicolò, C., Périer, C., Prague, M., Bellera, C., MacGrogan, G., Saut, O., & Benzekry, S. (2019). Machine learning and mechanistic modeling for prediction of metastatic relapse in early-stage breast cancer. doi:10.1101/634428.

Nasejje, J. B., & Mwambi, H. (2017). Application of random survival forests in understanding the determinants of under-five child mortality in Uganda in the presence of covariates that satisfy the proportional and non-proportional hazards assumption. BMC Research Notes, 10(1). doi:10.1186/s13104-017-2775-6.

Steele, A. J., Aylin Cakiroglu, S., Shah, A. D., Denaxas, S. C., Hemingway, H., & Luscombe, N. M. (2018). Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. doi:10.1101/256008.

Zhang, X., Tang, F., Ji, J., Han, W., & Lu, P. (2019). Risk Prediction of Dyslipidemia for Chinese Han Adults Using Random Forest Survival Model. Clinical Epidemiology, Volume 11, 1047–1055. doi:10.2147/clep.s223694.

Miao, F., Cai, Y.-P., Zhang, Y.-T., & Li, C.-Y. (2015). Is Random Survival Forest an Alternative to Cox Proportional Model on Predicting Cardiovascular Disease? 6th European Conference of the International Federation for Medical and Biological Engineering, 740–743. doi:10.1007/978-3-319-11128-5_184.

Kantidakis G (2018). Prediction Models for Liver Transplantation (Master Thesis, Statistical Science, Universiteit Leiden). Available online: www.universiteitleiden.nl/binaries/content/assets/science/mi/scripties/statscience/2018-2019/2018_10_29_ masterthesis_kantidakis.pdf (accessed on 12 Mach 2021).

Myte R (2013). Covariate Selection for Colorectal Cancer Survival Data (Bachelor thesis, Umeå University). Available online: https://www.diva-portal.org/smash/get/diva2:627337/FULLTEXT01.pdf (accessed on 18 Mach 2021).

Wang J (2018). Apply Machine Learning Approaches to Survival Data (project report, Dept of Computing, Imperal College London). Available online: https://www.imperial.ac.uk/media/imperial-college/faculty-of-engineering/computing/public/1718-ug-projects/Jingya-Wang-Applying-machine-learning-approaches-to-survival-data.pdf (accessed on 2 April 2021).

Wright, M. N., Dankowski, T., & Ziegler, A. (2017). Unbiased split variable selection for random survival forests using maximally selected rank statistics. Statistics in Medicine, 36(8), 1272–1284. doi:10.1002/sim.7212.

Acceptance Rate:	29%
Review Speed:	56 days
Issue Per Year:	4
Number of Volumes:	3
Number of Issues:	13
Number of Articles:	96
Number of Reviewers:	194
Number of Contributors:	314
Contributing Countries:	48
No. of WoS Citations:	426
No. of Scopus Citations:	528
No. of Google Citations:	665
Google h-index:	13
Google i10-index:	21
Abstract Views:	22493
PDF Download:	19147

On Fair Performance Comparison between Random Survival Forest and Cox Regression: An Example of Colorectal Cancer Study

Authors

Downloads

Downloads

submission

Online submissions

SidebarMenu

IndexedBy

Indexed In

Information

Address

Contact Info:

On Fair Performance Comparison between Random Survival Forest and Cox Regression: An Example of Colorectal Cancer Study

Authors

Downloads

Downloads

submission

Online submissions

SidebarMenu

IndexedBy

Indexed In

social

Publisher

Affiliated Societies

Journal Imprint

Journal Membership

Journal Metrics

Information