HYPERPARAMETER TUNING ON RANDOM FOREST  FOR DIAGNOSE COVID-19

Anna Baita; Inggar Adi Prasetyo; Nuri Cahyono

doi:10.33387/jiko.v6i2.6389

HYPERPARAMETER TUNING ON RANDOM FOREST FOR DIAGNOSE COVID-19

Anna Baita, Inggar Adi Prasetyo, Nuri Cahyono

Abstract

Diagnosis of Covid using the RT-PCR (Reverse Transcription Polymerase Chain Reaction) test requires high costs and takes a long time. For this reason, another method is needed that can be used to diagnose Covid-19 quickly and accurately. Random Forest is one of the popular classification algorithms for making predictive models. Random forest involves many hyperparameters that control the structure of each tree, the forest, and its randomness. Random Forest is a method which very sensitive to hyperparameter values, as their prediction accuracy can increase significantly when optimized hyperparameters are predefined and then adjusted according to the procedure. The purpose of doing hyperparameter tuning on the random forest algorithm is to increase accuracy in the diagnosis of covid-19. Searching for optimal values of hyperparameters is done by the Grid Search method and Random Search. The result explains that the Random Forest can be used to diagnose Covid-19 with an accuracy of 94%, and with hyperparameter tuning, it can increase the accuracy of the random forest by 2%.

Full Text:

PDF

References

. WHO, â€œWeekly epidemiological update on COVID-19 - 13 July 2023,â€ who.int, Jul. 13, 2023. https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19---13-july-2023 (accessed Jul. 14, 2023).

. M. Maniruzzaman et al., â€œCOVID-19 diagnostic methods in developing countries,â€ Environmental Science and Pollution Research, vol. 29, no. 34. Springer Science and Business Media Deutschland GmbH, pp. 51384â€“51397, Jul. 01, 2022. doi: 10.1007/s11356-022-21041-z.

. R. Pu et al., â€œThe screening value of RT-LAMP and RT-PCR in the diagnosis of COVID-19: systematic review and meta-analysis,â€ Journal of Virological Methods, vol. 300. Elsevier B.V., Feb. 01, 2022. doi: 10.1016/j.jviromet.2021.114392.

. A. Kadir, SE-DIRJEN-YANKES-TTG-BATAS-TARIF-TERTINGGI-PEMERIKSAAN-RT-PCR. Indonesia, 2021.

. M. DÃ¶hla et al., â€œRapid point-of-care testing for SARS-CoV-2 in a community screening setting shows low sensitivity,â€ Public Health, vol. 182, pp. 170â€“172, May 2020, doi: 10.1016/j.puhe.2020.04.009.

. A. S. Kwekha-Rashid, H. N. Abduljabbar, and B. Alhayani, â€œCoronavirus disease (COVID-19) cases analysis using machine-learning applications,â€ Applied Nanoscience (Switzerland), vol. 13, no. 3, pp. 2013â€“2025, Mar. 2023, doi: 10.1007/s13204-021-01868-7.

. V. K. Gupta, A. Gupta, D. Kumar, and A. Sardana, â€œPrediction of COVID-19 confirmed, death, and cured cases in India using random forest model,â€ Big Data Mining and Analytics, vol. 4, no. 2, pp. 116â€“123, Jun. 2021, doi: 10.26599/BDMA.2020.9020016.

. M. Rostami and M. Oussalah, â€œA novel explainable COVID-19 diagnosis method by integration of feature selection with random forest,â€ Inform Med Unlocked, vol. 30, Jan. 2022, doi: 10.1016/j.imu.2022.100941.

. W. Aser, H. Samosir, and T. Gantini, â€œAnalisis Dataset COVID-19 menggunakan Algoritma KNN dan Random Forest,â€ Jurnal Strategi, vol. 4, May 2022.

. Y. H. Bhosale and K. S. Patnaik, â€œApplication of Deep Learning Techniques in Diagnosis of Covid-19 (Coronavirus): A Systematic Review,â€ Neural Processing Letters. Springer, 2022. doi: 10.1007/s11063-022-11023-0.

. R. Hertel and R. Benlamri, â€œA deep learning segmentation-classification pipeline for X-ray-based COVID-19 diagnosis,â€ Biomedical Engineering Advances, vol. 3, p. 100041, Jun. 2022, doi: 10.1016/j.bea.2022.100041.

. S. Aslani and J. Jacob, â€œUtilisation of deep learning for COVID-19 diagnosis,â€ Clinical Radiology, vol. 78, no. 2. W.B. Saunders Ltd, pp. 150â€“157, Feb. 01, 2023. doi: 10.1016/j.crad.2022.11.006.

. P. Probst, M. N. Wright, and A. L. Boulesteix, â€œHyperparameters and tuning strategies for random forest,â€ Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 3. Wiley-Blackwell, May 01, 2019. doi: 10.1002/widm.1301.

. M. Daviran, A. Maghsoudi, R. Ghezelbash, and B. Pradhan, â€œA new strategy for spatial predictive mapping of mineral prospectivity: Automated hyperparameter tuning of random forest approach,â€ Comput Geosci, vol. 148, Mar. 2021, doi: 10.1016/j.cageo.2021.104688.

. D. Dablain, B. Krawczyk, and N. V. Chawla, â€œDeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,â€ IEEE Trans Neural Netw Learn Syst, 2022, doi: 10.1109/TNNLS.2021.3136503.

. Asniar, N. U. Maulidevi, and K. Surendro, â€œSMOTE-LOF for noise identification in imbalanced data classification,â€ Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 3413â€“3423, Jun. 2022, doi: 10.1016/j.jksuci.2021.01.014.

. A. Indrawati, â€œPenerapan Teknik Kombinasi Oversampling Dan Undersampling Untuk Mengatasi Permasalahan Imbalanced Dataset,â€ Jurnal Informatika dan Komputer), vol. 4, no. 1, 2021, doi: 10.33387/jiko.

. S. K. M. Mukhiya and U. Ahmed, Hands-on exploratory data analysis with Python : perform EDA techniques to understand, summarize, and investigate your data, vol. 1. Birmingham-Mumbay: Packt Publishing, 2020.

. C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, â€œA Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,â€ Frontiers in Energy Research, vol. 9. Frontiers Media S.A., Mar. 29, 2021. doi: 10.3389/fenrg.2021.652801.

. M. Utari, â€œImplementation of Data Mining for Drop-Out Prediction using Random Forest Method,â€ 2020.

. S. Satpathy, â€œSMOTE for Imbalanced Classification with Python,â€ Oct. 06, 2020. https://www.analyticsvidhya.com/blog/2020/10/overcoming-class-imbalance-using-smote-techniques/ (accessed Jul. 18, 2023).

. L. K. Xin and N. binti A. Rashid, â€œPrediction of depression among women using random oversampling and random forest,â€ in 2021 International Conference of Women in Data Science at Taif University, WiDSTaif 2021, Institute of Electrical and Electronics Engineers Inc., Mar. 2021. doi: 10.1109/WIDSTAIF52235.2021.9430215.

. F. Khozeimeh et al., â€œRF-CNN-F: random forest with convolutional neural network features for coronary artery disease diagnosis based on cardiac magnetic resonance,â€ Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-15374-5.

. L. Breiman, â€œRandom Forests,â€ Mach Learn, vol. 45, pp. 5â€“32, 2001, doi: https://doi.org/10.1023/A:1010933404324.

. D. Markovics and M. J. Mayer, â€œComparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction,â€ Renewable and Sustainable Energy Reviews, vol. 161, Jun. 2022, doi: 10.1016/j.rser.2022.112364.

. L. Torre-Tojal, A. Bastarrika, A. Boyano, J. M. Lopez-Guede, and M. GraÃ±a, â€œAbove-ground biomass estimation from LiDAR data using random forest algorithms,â€ J Comput Sci, vol. 58, Feb. 2022, doi: 10.1016/j.jocs.2021.101517.

. M. Fajri and A. Primajaya, â€œKomparasi Teknik Hyperparameter Optimization pada SVM untuk Permasalahan Klasifikasi dengan Menggunakan Grid Search dan Random Search,â€ 2023. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

. R. Valarmathi and T. Sheela, â€œHeart disease prediction using hyper parameter optimization (HPO) tuning,â€ Biomed Signal Process Control, vol. 70, Sep. 2021, doi: 10.1016/j.bspc.2021.103033.

. H. Hariskrishnan, â€œSymptoms and COVID Presence (May 2020 data),â€ Kaggle, May 2020. https://www.kaggle.com/datasets/hemanthhari/symptoms-and-covid-presence (accessed May 21, 2023).

DOI: https://doi.org/10.33387/jiko.v6i2.6389

Refbacks

There are currently no refbacks.

Username
Password
Remember me