HYPERPARAMETER TUNING ON RANDOM FOREST FOR DIAGNOSE COVID-19
Abstract
Diagnosis of Covid using the RT-PCR (Reverse Transcription Polymerase Chain Reaction) test requires high costs and takes a long time. For this reason, another method is needed that can be used to diagnose Covid-19 quickly and accurately. Random Forest is one of the popular classification algorithms for making predictive models. Random forest involves many hyperparameters that control the structure of each tree, the forest, and its randomness. Random Forest is a method which very sensitive to hyperparameter values, as their prediction accuracy can increase significantly when optimized hyperparameters are predefined and then adjusted according to the procedure. The purpose of doing hyperparameter tuning on the random forest algorithm is to increase accuracy in the diagnosis of covid-19. Searching for optimal values of hyperparameters is done by the Grid Search method and Random Search. The result explains that the Random Forest can be used to diagnose Covid-19 with an accuracy of 94%, and with hyperparameter tuning, it can increase the accuracy of the random forest by 2%.
Full Text:
PDFReferences
. WHO, “Weekly epidemiological update on COVID-19 - 13 July 2023,” who.int, Jul. 13, 2023. https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19---13-july-2023 (accessed Jul. 14, 2023).
. M. Maniruzzaman et al., “COVID-19 diagnostic methods in developing countries,” Environmental Science and Pollution Research, vol. 29, no. 34. Springer Science and Business Media Deutschland GmbH, pp. 51384–51397, Jul. 01, 2022. doi: 10.1007/s11356-022-21041-z.
. R. Pu et al., “The screening value of RT-LAMP and RT-PCR in the diagnosis of COVID-19: systematic review and meta-analysis,” Journal of Virological Methods, vol. 300. Elsevier B.V., Feb. 01, 2022. doi: 10.1016/j.jviromet.2021.114392.
. A. Kadir, SE-DIRJEN-YANKES-TTG-BATAS-TARIF-TERTINGGI-PEMERIKSAAN-RT-PCR. Indonesia, 2021.
. M. Döhla et al., “Rapid point-of-care testing for SARS-CoV-2 in a community screening setting shows low sensitivity,” Public Health, vol. 182, pp. 170–172, May 2020, doi: 10.1016/j.puhe.2020.04.009.
. A. S. Kwekha-Rashid, H. N. Abduljabbar, and B. Alhayani, “Coronavirus disease (COVID-19) cases analysis using machine-learning applications,” Applied Nanoscience (Switzerland), vol. 13, no. 3, pp. 2013–2025, Mar. 2023, doi: 10.1007/s13204-021-01868-7.
. V. K. Gupta, A. Gupta, D. Kumar, and A. Sardana, “Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model,” Big Data Mining and Analytics, vol. 4, no. 2, pp. 116–123, Jun. 2021, doi: 10.26599/BDMA.2020.9020016.
. M. Rostami and M. Oussalah, “A novel explainable COVID-19 diagnosis method by integration of feature selection with random forest,” Inform Med Unlocked, vol. 30, Jan. 2022, doi: 10.1016/j.imu.2022.100941.
. W. Aser, H. Samosir, and T. Gantini, “Analisis Dataset COVID-19 menggunakan Algoritma KNN dan Random Forest,” Jurnal Strategi, vol. 4, May 2022.
. Y. H. Bhosale and K. S. Patnaik, “Application of Deep Learning Techniques in Diagnosis of Covid-19 (Coronavirus): A Systematic Review,” Neural Processing Letters. Springer, 2022. doi: 10.1007/s11063-022-11023-0.
. R. Hertel and R. Benlamri, “A deep learning segmentation-classification pipeline for X-ray-based COVID-19 diagnosis,” Biomedical Engineering Advances, vol. 3, p. 100041, Jun. 2022, doi: 10.1016/j.bea.2022.100041.
. S. Aslani and J. Jacob, “Utilisation of deep learning for COVID-19 diagnosis,” Clinical Radiology, vol. 78, no. 2. W.B. Saunders Ltd, pp. 150–157, Feb. 01, 2023. doi: 10.1016/j.crad.2022.11.006.
. P. Probst, M. N. Wright, and A. L. Boulesteix, “Hyperparameters and tuning strategies for random forest,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 3. Wiley-Blackwell, May 01, 2019. doi: 10.1002/widm.1301.
. M. Daviran, A. Maghsoudi, R. Ghezelbash, and B. Pradhan, “A new strategy for spatial predictive mapping of mineral prospectivity: Automated hyperparameter tuning of random forest approach,” Comput Geosci, vol. 148, Mar. 2021, doi: 10.1016/j.cageo.2021.104688.
. D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans Neural Netw Learn Syst, 2022, doi: 10.1109/TNNLS.2021.3136503.
. Asniar, N. U. Maulidevi, and K. Surendro, “SMOTE-LOF for noise identification in imbalanced data classification,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 3413–3423, Jun. 2022, doi: 10.1016/j.jksuci.2021.01.014.
. A. Indrawati, “Penerapan Teknik Kombinasi Oversampling Dan Undersampling Untuk Mengatasi Permasalahan Imbalanced Dataset,” Jurnal Informatika dan Komputer), vol. 4, no. 1, 2021, doi: 10.33387/jiko.
. S. K. M. Mukhiya and U. Ahmed, Hands-on exploratory data analysis with Python : perform EDA techniques to understand, summarize, and investigate your data, vol. 1. Birmingham-Mumbay: Packt Publishing, 2020.
. C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,” Frontiers in Energy Research, vol. 9. Frontiers Media S.A., Mar. 29, 2021. doi: 10.3389/fenrg.2021.652801.
. M. Utari, “Implementation of Data Mining for Drop-Out Prediction using Random Forest Method,” 2020.
. S. Satpathy, “SMOTE for Imbalanced Classification with Python,” Oct. 06, 2020. https://www.analyticsvidhya.com/blog/2020/10/overcoming-class-imbalance-using-smote-techniques/ (accessed Jul. 18, 2023).
. L. K. Xin and N. binti A. Rashid, “Prediction of depression among women using random oversampling and random forest,” in 2021 International Conference of Women in Data Science at Taif University, WiDSTaif 2021, Institute of Electrical and Electronics Engineers Inc., Mar. 2021. doi: 10.1109/WIDSTAIF52235.2021.9430215.
. F. Khozeimeh et al., “RF-CNN-F: random forest with convolutional neural network features for coronary artery disease diagnosis based on cardiac magnetic resonance,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-15374-5.
. L. Breiman, “Random Forests,” Mach Learn, vol. 45, pp. 5–32, 2001, doi: https://doi.org/10.1023/A:1010933404324.
. D. Markovics and M. J. Mayer, “Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction,” Renewable and Sustainable Energy Reviews, vol. 161, Jun. 2022, doi: 10.1016/j.rser.2022.112364.
. L. Torre-Tojal, A. Bastarrika, A. Boyano, J. M. Lopez-Guede, and M. Graña, “Above-ground biomass estimation from LiDAR data using random forest algorithms,” J Comput Sci, vol. 58, Feb. 2022, doi: 10.1016/j.jocs.2021.101517.
. M. Fajri and A. Primajaya, “Komparasi Teknik Hyperparameter Optimization pada SVM untuk Permasalahan Klasifikasi dengan Menggunakan Grid Search dan Random Search,” 2023. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
. R. Valarmathi and T. Sheela, “Heart disease prediction using hyper parameter optimization (HPO) tuning,” Biomed Signal Process Control, vol. 70, Sep. 2021, doi: 10.1016/j.bspc.2021.103033.
. H. Hariskrishnan, “Symptoms and COVID Presence (May 2020 data),” Kaggle, May 2020. https://www.kaggle.com/datasets/hemanthhari/symptoms-and-covid-presence (accessed May 21, 2023).
DOI: https://doi.org/10.33387/jiko.v6i2.6389
Refbacks
- There are currently no refbacks.