SENTIMENT ANALYSIS OF PUBLIC HEALTH APP REVIEWS USING INDOBERT AND XLM-ROBERTA: A STUDY ON SATUSEHAT MOBILE APP

Dimas Ananda, Indra Budi, Aris Budi Santoso, Ali Adil Qureshi

Abstract


Sentiment analysis is a key method for deriving insights from user-generated content, particularly in evaluating public satisfaction with digital health services. This study conducts a comparative analysis of sentiment polarity classification models on 34,178 Indonesian-language reviews from SATUSEHAT Mobile, a national health application by the Indonesian Ministry of Health. The dataset was manually annotated into positive, neutral, and negative classes. Three model categories were evaluated: classical machine learning (Support Vector Machine, XGBoost), baseline neural networks (Multilayer Perceptron, Convolutional Neural Network), and pretrained transformer-based models (IndoBERT, XLM-RoBERTa). All models were trained using stratified 5-fold cross-validation and tested on a held-out set. Results show that transformer-based models significantly outperform others in all metrics. IndoBERT achieved the highest weighted F1-score (0.8555), followed closely by XLM-RoBERTa (0.8552). Despite the similar average performance, XLM-RoBERTa exhibited the lowest performance variance across folds, making it the most stable and effective model overall. Statistical validation using Friedman and Nemenyi tests confirmed these differences as significant. However, all models struggled with neutral sentiment detection due to data imbalance. Although computationally more expensive than IndoBERT, XLM-RoBERTa offers superior robustness for sentiment classification in Indonesian health-related text. These findings support the integration of transformer-based sentiment monitoring into public health dashboards to enable timely, data-driven service improvements

References


E. Boiy and M. F. Moens, “A machine learning approach to sentiment analysis in multilingual web texts,” Inf Retr Boston, vol. 12, no. 5, pp. 526–558, Sep. 2009, doi: 10.1007/S10791-008-9070-Z/TABLES/14.

T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Mar. 2016, doi: 10.1145/2939672.2939785.

Y. Zhang and B. C. Wallace, “A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification,” Oct. 2015, Accessed: Jun. 05, 2025. [Online]. Available: https://arxiv.org/pdf/1510.03820

B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Sep. 2020, Accessed: Jun. 05, 2025. [Online]. Available: https://arxiv.org/pdf/2009.05387

A. Conneau et al., “Unsupervised Cross-lingual Representation Learning at Scale,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451, 2020, doi: 10.18653/V1/2020.ACL-MAIN.747.

H. Imaduddin, F. Y. A’la, and Y. S. Nugroho, “Sentiment Analysis in Indonesian Healthcare Applications using IndoBERT Approach,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 8, pp. 113–117, 2023, doi: 10.14569/IJACSA.2023.0140813.

N. Paramita and S. Noviarisanti, “Service Quality Analysis of Mhealth Services Using Text Mining Method : Alodokter and Halodoc,” International Journal of Management, Finance and Accounting, vol. 2, no. 2, pp. 1–21, Aug. 2021, doi: 10.33093/IJOMFA.2021.2.2.1.

K. A. Safitri, D. Vita, W. Swasto, and A. Nurfikri, “Sentiment Analysis Telemedicine Apps Reviews Using NVIVO,” Proceedings 2022, Vol. 83, Page 4, vol. 83, no. 1, p. 4, Dec. 2022, doi: 10.3390/PROCEEDINGS2022083004.

F. A. Alijoyo, S. Suhaerudin, and S. Meilia, “MEASURING THE USER EXPERIENCE OF THE SATUSEHAT APPLICATION WITH THE HEART METRICS METHOD APPROACH,” SIBATIK JOURNAL: Jurnal Ilmiah Bidang Sosial, Ekonomi, Budaya, Teknologi, Dan Pendidikan, vol. 3, no. 4, pp. 515–534, Mar. 2024, doi: 10.54443/SIBATIK.V3I4.1854.

M. Clinton, T. Manullang, A. Z. Rakhman, H. Tantriawan, and A. Setiawan, “Comparative Analysis of CNN, Transformers, and Traditional ML for Classifying Online Gambling Spam Comments in Indonesian,” Journal of Applied Informatics and Computing, vol. 9, no. 3, pp. 592–602, Jun. 2025, doi: 10.30871/JAIC.V9I3.9468.

S. S. Sabrina, D. F. Shiddieq, and F. F. Roji, “Comparative Analysis of SVM and BERT for Sentiment and Sarcasm Detection in the Boycott of Israeli Products on Platform X,” Sinkron : jurnal dan penelitian teknik informatika, vol. 9, no. 2, pp. 872–883, May 2025, doi: 10.33395/SINKRON.V9I2.14723.

H. M. Lee and Y. Sibaroni, “Comparison of IndoBERTweet and Support Vector Machine on Sentiment Analysis of Racing Circuit Construction in Indonesia,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 1, p. 99, Jan. 2023, doi: 10.30865/MIB.V7I1.5380.

H. Jayadianti, W. Kaswidjanti, A. T. Utomo, S. Saifullah, F. A. Dwiyanto, and R. Drezewski, “Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN,” ILKOM Jurnal Ilmiah, vol. 14, no. 3, pp. 348–354, Dec. 2022, doi: 10.33096/ILKOM.V14I3.1505.348-354.

S. Aras, M. Yusuf, R. Ruimassa, E. Agustinus, B. Wambrauw, and E. B. Palalangan, “Sentiment Analysis on Shopee Product Reviews Using IndoBERT,” Journal of Information Systems and Informatics, vol. 6, no. 3, pp. 1616–1627, Sep. 2024, doi: 10.51519/JOURNALISI.V6I3.814.

W. Wongso, D. Samuel Setiawan, S. Limcorn, A. Joyoadikusumo, and S. Wales, “NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural,” 2025. Accessed: Jun. 05, 2025. [Online]. Available: https://aclanthology.org/2025.sealp-1.2/

C. H. Lin and U. Nuha, “Sentiment analysis of Indonesian datasets based on a hybrid deep-learning strategy,” J Big Data, vol. 10, no. 1, pp. 1–19, Dec. 2023, doi: 10.1186/S40537-023-00782-9/TABLES/5.

I. G. B. A. Budaya and I. K. P. Suniantara, “Comparison of Sentiment Analysis Algorithms with SMOTE Oversampling and TF-IDF Implementation on Google Reviews for Public Health Centers,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 4, no. 3, pp. 1077–1086, Jul. 2024, doi: 10.57152/MALCOM.V4I3.1459.

S. Gohil, S. Vuik, and A. Darzi, “Sentiment Analysis of Health Care Tweets: Review of the Methods Used,” JMIR Public Health Surveill, vol. 4, no. 2, Jan. 2018, doi: 10.2196/PUBLICHEALTH.5789.

A. B. Nair, A. K., A. U., D. T. Jaison, A. V., and V. S. Anoop, “‘Hey..! This medicine made me sick’: Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques,” Apr. 2024, Accessed: Jun. 06, 2025. [Online]. Available: https://arxiv.org/pdf/2404.13057

A. W. Pradana and M. Hayaty, “The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, vol. 4, no. 4, pp. 375–380, Oct. 2019, doi: 10.22219/KINETIK.V4I4.912.

M. H. R. Sofyan, A. Zulkifli, and R. Rasim, “Sentiment Analysis of Indonesian Presidential Candidate Before and After the Election,” UltimaInfoSys: Jurnal Ilmu Sistem Informasi, vol. 15, no. 2, pp. 99–104, Dec. 2024, doi: 10.31937/SI.V15I2.3689.

T. Rahman, F. E. M. Agustin, and N. F. Rozy, “Normalization of Unstructured Indonesian Tweet Text For Presidential Candidates Sentiment Analysis,” 2019 7th International Conference on Cyber and IT Service Management, CITSM 2019, Nov. 2019, doi: 10.1109/CITSM47753.2019.8965324.

S. K. Dirjen, P. Riset, D. Pengembangan, R. Dikti, S. Khomsah, and A. S. Aribowo, “Text-Preprocessing Model Youtube Comments in Indonesian,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4, pp. 648–654, Aug. 2020, doi: 10.29207/RESTI.V4I4.2035.

J. E. Ramos, “Using TF-IDF to Determine Word Relevance in Document Queries,” 2003.

X. Zhang, J. Zhao, and Y. Lecun, “Character-level Convolutional Networks for Text Classification,” Adv Neural Inf Process Syst, vol. 2015-January, pp. 649–657, Sep. 2015, Accessed: Jun. 06, 2025. [Online]. Available: https://arxiv.org/pdf/1509.01626

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, pp. 4171–4186, Oct. 2018, Accessed: Jun. 06, 2025. [Online]. Available: https://arxiv.org/pdf/1810.04805

T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features,” pp. 137–142, 1998, doi: 10.1007/BFB0026683.

T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Mar. 2016, doi: 10.1145/2939672.2939785.

Y. Kim, “Convolutional Neural Networks for Sentence Classification,” EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1746–1751, Aug. 2014, doi: 10.3115/v1/d14-1181.

L. Jian, Z. Huang, J. Zhang, and Z. Hu, “Rapid Analysis of Cylindrical Bypass Flow Field Based on Deep Learning Model,” IOP Conf Ser Earth Environ Sci, vol. 1037, no. 1, p. 012013, Jun. 2022, doi: 10.1088/1755-1315/1037/1/012013..




DOI: https://doi.org/10.33387/jiko.v8i3.10083

Refbacks

  • There are currently no refbacks.