AN LSTM-BASED APPROACH FOR INDONESIAN NEWS CATEGORIZATION: PERFORMANCE ANALYSIS OF HYPERPARAMETER TUNING AND PREPROCESSING

Iwan La Udin

Abstract


News disseminated through internet-based systems or news portals is generally classified into specific categories, such as politics, sports, economy, entertainment, technology, health, and others. Currently, this categorization is performed manually, requiring a thorough reading of the entire news content. To address this inefficiency, an automatic classification system for Indonesian news articles is necessary to categorize them based on predetermined categories. This research employs a Natural Language Processing (NLP) approach and implements the Long Short-Term Memory (LSTM) architecture. The study was conducted using several testing scenarios, including (1) hyperparameter tuning of the learning rate to 0.01 and 0.001, (2) the application and omission of stemming, and (3) various dataset comparison ratios of 60:40, 70:30, 80:20, and 90:10. The evaluation utilized a dataset of 10,000 articles across 5 categories and was measured using accuracy, precision, recall, and f-measure metrics. From the three scenarios, seven training models were generated. The second model, with a learning rate of 0.001, without stemming, and a 90:10 dataset ratio, achieved the highest accuracy of 90.7%, with average precision, recall, and f-measure scores of 91%. The third and fourth models, which applied stemming, did not demonstrate a performance improvement, both yielding an accuracy of 89%. The fifth model, with a 60:40 dataset ratio, produced an accuracy of 90%, while the sixth and seventh models, with 70:30 and 80:20 ratios, resulted in accuracies of 79% and 88%, respectively.


References


Setiawan, A., Santoso, L. W., & Adipranata, R. (2020). Klasifikasi Artikel Berita Bahasa Indonesia Dengan Naive Bayes Classifier. Jurnal Infra, 8(1), 146–151.

Findra Kartika Sari Dewi, T. P. A. (2021). Klasifikasi Berita Menggunakan Metode Multinomial Naive Bayes. XVI(2017), 1–8.

Sari, W. K., Rini, D. P., Malik, R. F., & Azhar, I. S. B. (2020). Klasifikasi Teks Multilabel pada Artikel Berita Menggunakan Long Short Term Memory dengan Word2Vec. Resti, 1(10), 276–

Sari, W. K., Rini, D. P., Malik, R. F., & Azhar, I. S. B. (2020). Klasifikasi Teks Multilabel pada Artikel Berita Menggunakan Long Short Term Memory dengan Word2Vec. Resti, 1(10), 276–285.

Pooja, S and Khanna, V. Multi‑category news classification using Support Vector Machine based classifiers. Applied Sciene pp, 1-12. 2020. https://doi.org/10.1007/s42452-020-2266-6.

Rizal, M. Fikry, and U. Khalil. News Opinion Classification Application with Support Vector Machine Algorithm Using Framework Codeigniter. JITE (Journal of Informatics and Telecommunication Engineering) Vol 5. No 1. Pp. 160-166. 2021. DOI : 10.31289/jite.v5i1.5189.

H. Hu, M. Liao, C. Zhang and Y. Jing. Text classification based recurrent neural network. IEEE 5th Information Technology and Mechatronics Engineering Conference. Pp 652-655.

T. Praha, W. Widodo, and M. Nugraheni. Indonesian Fake News Classification Using Transfer Learning in CNN and LSTM. JOIV : International Journal on Informatics Visualization. Vol 8. No 3. Pp. 1213-1221. 2024. http://dx.doi.org/10.62527/joiv.8.3.2126

R. Saputra, A. Waworuntu, A. Rusli. Classification of Indonesian News using LSTM-RNN Method. 6th International Conference on New Media Studies (CONMEDIA). 2021. DOI: 10.1109/CONMEDIA53104.2021.9617187

N. Rai, et. all. Fake News Classification using transformer based enhanced LSTM and BERT. International Journal of Cognitive Computing in Engineering vol. 3. Pp. 98-105. 2022. https://doi.org/10.1016/j.ijcce.2022.03.003

Kurniawan, K., & Louvan, S.. IndoSum: A New Benchmark Dataset for Indonesian Text Summarization. Proceedings of the 2018

Rozi, I. F., Wijayaningrum, V. N., & Khozin, N. Klasifikasi Teks Laporan Masyarakat Pada Situs Lapor! Menggunakan Recurrent Neural Network. Sistemasi, 9(3), 633-638. 2020.

Rais, I. L., & Jondri, J. Klasifikasi Data Kuesioner dengan Metode Recurrent Neural Network. EProceedings of Engineering, 7(1), 2817–2826. 2020.

W. Afandi, et al. Klasifikasi Judul Berita Clickbait menggunakan RNN-LSTM. Jurnal Informatika: Jurnal Pengembangan IT. Vol 7. No 1. Pp. 85-89. 2022.

M. R. Jhaerol and S. Sedianto. Implementation Of Chatbot For Merdeka Belajar Kampus Merdeka Program Using Long Short-Term Memory. Jurnal Nasional Pendidikan Teknik Informatika. Vol 12. No 2. Pp. 253-262. 2023. https://doi.org/10.23887/janapati.v12i2.58794




DOI: https://doi.org/10.33387/jiko.v8i3.10783

Refbacks

  • There are currently no refbacks.