COMPARISON OF NAÏVE BAYES CLASSIFIER AND K-NEAREST NEIGHBOR ALGORITHMS IN SENTIMENT ANALYSIS ON SOCIAL MEDIA X WITH VADER LEXICON
Abstract
The increasing use of social media as a platform for expressing public opinion has established platform X (formerly Twitter) an important data source for sentiment analysis. However, the ever-growing volume of data and the lack of sentiment labels present significant challenges for manual analysis, which is inefficient and time-consuming. This research addresses the problem of selecting effective algorithms for accurate and efficient sentiment classification on large-scale unlabeled data. The study aims to compare the performance of the Naïve Bayes Classifier and K-Nearest Neighbor (KNN) algorithms in sentiment classification related to the Value Added Tax (VAT) increase on platform X. To support classification accuracy, sentiment labeling is performed automatically using the VADER Lexicon. The research methodology involves data scraping, automatic sentiment labeling, implementation and training of classification models, and performance evaluation using a Confusion Matrix and ROC curve. The results show that the KNN algorithm with k = 1 achieved the best performance with an accuracy of 93.19%, precision of 94.07%, recall of 92.96%, a misclassification error of 6.81%, and an AUC of 0.95. In contrast, the Naïve Bayes Classifier achieved an accuracy of 88.29%, precision of 87.43%, recall of 86.67%, misclassification error of 11.71%, and an AUC of 0.93. Therefore, KNN is proven to be superior in classifying sentiment more accurately and efficiently than the Naïve Bayes Classifier.
References
F. A. Adiyatma, S. Alam, and M. A. Komara, “Analisis Sentimen Masyarakat di Platform X Terhadap Penggunaan Bansos Untuk Memenangkan Salah Satu Capres Tertentu di Pilpres 2024 Menggunakan Metode Naive Bayes Classifier,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 5, pp. 9941–9947, 2024.
L. Nursinggah and T. Mufizar, “Analisis Sentimen Pengguna Aplikasi X Terhadap Program Makan Siang Gratis Dengan Metode Naive Bayes Classifier,” JITET (Jurnal Inform. dan Tek. Elektro Ter., vol. 12, no. 3, pp. 1615–1622, 2024.
T. Prasetyo, H. Zakaria, and P. Wiliantoro, “Analisis Layanan Pelanggan PT PLN Berdasarkan Media Sosial Twitter Dengan Menggunakan Metode Naïve Bayes Classifier,” OKTAL J. Ilmu Komput. dan Sains, vol. 1, no. 6, pp. 573–582, 2022, [Online]. Available: https://journal.mediapublikasi.id/index.php/oktal.
T. A. Siddiq and M. Ikhsan, “Analisis Sentimen X Terhadap Pemilihan Presiden Indonesia 2024 dengan Metode K-Nearest Neighbor,” J. Comput. Syst. Informatics, vol. 5, no. 4, pp. 1064–1078, 2024, doi: 10.47065/josyc.v5i4.5802.
P. A. Prastyo, Berlilana, and I. Tahyudin, “Analisis Sentimen dan Pemodelan Topik pada Ulasan Pengguna Aplikasi myIM3 Menggunakan Support Vector Machine dan Latent Dirichlet Allocation,” Build. Informatics, Technol. Sci., vol. 6, no. 3, pp. 1618–1626, 2024, doi: 10.47065/bits.v6i3.6268.
M. I. Maulana, E. Budianita, M. Fikry, and F. Yanto, “Klasifikasi Sentiment Ulasan Aplikasi Sausage Man Menggunakan VADER Lexicon dan Naïve Bayes Classifier,” J. Sist. Komput. dan Inform., vol. 4, no. 3, pp. 485–492, 2023, doi: 10.30865/json.v4i3.5854.
Muttaqin et al., Implementasi Artificial Intelligence (AI) Dalam Kehidupan. Medan: Yayasan Kita Menulis, 2023.
D. Purnamasari et al., Pengantar Metode Analisis Sentimen. Depok: Gunadarma, 2023.
Y. Asri, W. N. Suliyanti, and D. Kuswardani, “Analisis Sentimen Opini Pelanggan Aplikasi Pln Mobile Menggunakan Metode Vader Lexicon & Naive Bayes,” in Prosiding Seminar Nasional Energi, Kelistrikan, Teknik dan Informatika, 2022, vol. 3.
F. Amaliah and I. K. D. Nuryana, “Perbandingan Akurasi Metode Lexicon Based Dan Naive Bayes Classifier Pada Analisis Sentimen Pendapat Masyarakat Terhadap Aplikasi Investasi Pada Media Twitter,” J. Informatics Comput. Sci., vol. 3, no. 3, pp. 384–393, 2022, doi: 10.26740/jinacs.v3n03.p384-393.
C. J. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-based Model for Sentimen Analysis of Social Media Text,” in Eighth International AAAI Conference on Weblogs and Social Media, 2014, pp. 216–225, [Online]. Available: https://ojs.aaai.org/index.php/ICWSM/article/view/14550.
V. Nurcahyawati and Z. Mustaffa, “Vader Lexicon and Support Vector Machine Algorithm to Detect Customer Sentiment Orientation,” J. Inf. Syst. Eng. Bus. Intell., vol. 9, no. 1, pp. 108–118, 2023, doi: 10.20473/jisebi.9.1.108-118.
I. S. Wibowo, A. Witanti, and I. Susilawati, “Keyword Extraction Judul Berita Online Di Indonesia Menggunakan Metode TF-IDF,” J. Tek. Inform. dan Sist. Inf., vol. 11, no. 1, pp. 99–111, 2024, [Online]. Available: http://jurnal.mdp.ac.id.
M. D. Afandi, A. Homaidi, A. Ghofur, and A. Zubairi, “Penerapan Information Retrieval dalam Sistem Analisis Kemiripan Proposal Skripsi menggunakan Cosine Similarity,” Swabumi, vol. 12, no. 1, pp. 39–46, 2024.
D. Tuhenay and E. Mailoa, “Perbandingan Klasifikasi Bahasa Menggunakan Metode Naïve Bayes Classifier (NBC) Dan Support Vector Machine (SVM),” JIKO (Jurnal Inform. dan Komputer), vol. 4, no. 2, pp. 105–111, 2021, doi: 10.33387/jiko.v4i2.2958.
A. Aninditya, M. A. Hasibuan, and E. Sutoyo, “Text Mining Approach Using TF-IDF and Naive Bayes for Classification of Exam Questions Based on Cognitive Level of Bloom’s Taxonomy,” in Proceedings - 2019 IEEE International Conference on Internet of Things and Intelligence System, IoTaIS 2019, 2019, no. 11, pp. 112–117, doi: 10.1109/IoTaIS47347.2019.8980428.
Yuyun, N. Hidayah, and S. Sahibu, “Algoritma Multinomial Naïve Bayes Untuk Klasifikasi Sentimen Pemerintah Terhadap Penanganan Covid-19 Menggunakan Data Twitter,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 4, pp. 820–826, 2021, doi: 10.29207/resti.v5i4.3146.
A. S. Wilindia, M. Dasuki, and N. Q. Fitriyah, “Implementasi Algoritma Multinomial Naïve Bayes Untuk Analisis Sentimen Twitter Terhadap Kebijakan Merdeka Belajar,” J. Smart Teknol., vol. 1, no. 1, pp. 100–102, 2021.
P. Putra, A. M. H. Pardede, and S. Syahputra, “Analisis Metode K-Nearest Neighbour (Knn) Dalam Klasifikasi Data Iris Bunga,” J. Tek. Inform. Kaputama, vol. 6, no. 1, pp. 297–305, 2022.
M. A. Muslim et al., Data Mining Algoritma C4.5. 2019.
M. R. F. Nur and S. I. Oktora, “Analisis Kurva Roc Pada Model Logit Dalam Pemodelan Determinan Lansia Bekerja Di Kawasan Timur Indonesia,” Indones. J. Stat. Its Appl., vol. 4, no. 1, pp. 116–135, 2020, doi: 10.29244/ijsa.v4i1.524.
T. Abdillah, U. Khaira, and B. F. Hutabarat, “Komparasi Metode Naive Bayes dan K-Nearest Neighbors Terhadap Analisis Sentimen Pengguna Aplikasi Zenius,” J. Process., vol. 19, no. 1, pp. 32–44, 2024, doi: 10.33998/processor.2024.19.1.1596.
M. R. Elfansyah, Rudiman, and F. Yulianto, “Perbandingan Metode K–Nearest Neighbor (Knn) Dan Naive Bayes Terhadap Analisis Sentimen Pada Pengguna E-Wallet Aplikasi Dana Menggunakan Fitur Ekstraksi Tf-Idf,” J. Teknol. Inf. J. Keilmuan dan Apl. Bid. Tek. Inform., vol. 8, no. 2, pp. 139–159, 2024.
DOI: https://doi.org/10.33387/jiko.v8i2.9865
Refbacks
- There are currently no refbacks.