COMPARISON OF SPELL CORRECTION IN BAHASA INDONESIA: PETER NORVIG, LSTM, AND N-GRAM

Anggasta Tirta Adi Kusuma, Chanifah I Ratnasari

Abstract


This study conducts a comprehensive comparison of spell-checking methods in Bahasa Indonesia, specifically focusing on three approaches: Peter Norvig's method, Long Short-Term Memory (LSTM), and N-gram. The primary metric for evaluation is the accuracy in correcting spelling errors. Notably, Peter Norvig's method outperforms the others, with N-gram following closely, and LSTM trailing behind. The study draws valuable insights that contribute to the enhancement of spelling correction accuracy in the Bahasa Indonesia language. To carry out the evaluation, the research employs SPECIL data (Spell Error Corpus for Indonesian Language), which includes documents with various error types such as insertion, deletion, transposition, and substitution. The testing dataset consists of 150 words, aligning with the 150-word corpus references from the 'Leipzig Corpora Collection' used for Peter Norvig's and N-gram methods. It's noteworthy that the LSTM method utilizes a reference dataset from SPECIL, comprising 150 data points and specifically focusing on insertion errors for the test data. This research provides valuable insights for researchers, developers, and language technology enthusiasts seeking to refine spell-checking techniques for the Bahasa Indonesia language. By leveraging diverse error types and a standardized testing dataset, the study aims to contribute to the continual improvement of spell-checking tools

Full Text:

PDF

References


J. Li, W. Xiao, and C. Zhang, “Data security crisis in universities: identification of key factors affecting data breach incidents,” Humanit Soc Sci Commun, vol. 10, no. 1, Dec. 2023, doi: 10.1057/s41599-023-01757-0.

A. I. Fahma, I. Cholissodin, and R. S. Perdana, “Identifikasi Kesalahan Penulisan Kata (Typographical Error) pada Dokumen Berbahasa Indonesia Menggunakan Metode N-gram dan Levenshtein Distance,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 1, pp. 53–62, Jan. 2018, doi: http://j-ptiik.ub.ac.id.

J. Jatminto and I. K. D. Nuryana, “Implementasi Spelling Checker dengan Algoritma Levenshtein distance pada Ensiklopedia IT (Information Technology) berbasis website,” Jurnal Ilmiah Inovasi Teknologi Informasi, vol. 1, no. 1, May 2016.

Mutammimah, H. Sujaini, and R. D. Nyoto, “Analisis Perbandingan Metode Spelling Corrector Peter Norvig dan Spelling Checker BK-Trees pada Kata Berbahasa Indonesia,” Jurnal Sistem dan Teknologi Informasi, vol. 5, no. 1, pp. 12–16, 2017.

M. S. Simanjuntak, H. Sujaini, and N. Safriadi, “Spelling Corrector Bahasa Indonesia dengan Kombinasi Metode Peter Norvig dan N-Gram,” Jurnal Edukasi dan Penelitian Informatika, vol. 4, no. 1, p. 17, Jun. 2018, doi: 10.26418/jp.v4i1.24075.

M. Hardiyanti, “Identifying The Common Type of Spelling Error by Leveraging Levenshtein Distance and N-gram,” Scientific Journal of Informatics, vol. 8, no. 1, 2021, doi: 10.15294/sji.v8i1.xxxxx.

R. Martin, D. S. Naga, and V. C. Mawardi, “Penggunaan Spelling Correction Dengan Metode Peter Norvig dan N-gram,” Jurnal Ilmu Komputer dan Sistem Informasi, vol. 9, no. 1, pp. 175–180, 2021, doi: https://doi.org/10.24912/jiksi.v9i1.11591.

T. Soisoonthorn, H. Unger, and M. Maliyaem, “Spelling Check: A New Cognition-Inspired Sequence Learning Memory,” Journal of Advances in Information Technology, vol. 14, no. 3, pp. 399–410, 2023, doi: 10.12720/jait.14.3.399-410.

A. Viamianni, R. Mulyana, and F. Dewi, “Cobit 2019 Information Securtiy Focus Area Implementation For Reinsurco Digital Transformation,” JIKO (Jurnal Informatika dan Komputer), vol. 6, no. 2, pp. 106–115, Aug. 2023, doi: 10.33387/jiko.v6i2.6366.

Y. Chaabi and F. A. Ataa, “Amazigh spell checker using Damerau-Levenshtein algorithm and N-gram,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 8, pp. 6116–6124, Sep. 2022, doi: 10.1016/j.jksuci.2021.07.015.

R. Saptono et al., “Text Classification Using Naive Bayes Updateable Algorithm In SBMPTN Test Question,” TELEMATIKA, vol. 13, no. 02, pp. 123–133, 2016, doi: https://doi.org/10.31315/telematika.v13i2.1728.

V. C. Mawardi, N. Susanto, and D. S. Naga, “Spelling Correction For Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method,” International Conference on Electrical Systems, Technology and Information, vol. 164, p. 1047, Apr. 2017, doi: https://doi.org/10.1051/matecconf/201816401047.

M. E. Purbaya, D. P. Rakhmadani, Maliana Puspa Arum, and Luthfi Zian Nasifah, “Implementation of n-gram Methodology to Analyze Sentiment Reviews for Indonesian Chips Purchases in Shopee E-Marketplace,” Rekayasa Sistem dan Teknologi Informasi, vol. 7, no. 3, pp. 609–617, Jun. 2023, doi: 10.29207/resti.v7i3.4726.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,” in IOP Conf. Ser.: Mater. Sci. Eng., Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1757-899X/874/1/012017.

B. Aytan and C. O. Sakar, “Deep learning-based Turkish spelling error detection with a multi-class false positive reduction model,” Turk. J. Elec. Eng. & Comp. Sci., vol. 31, no. 3, pp. 581–595, 2023, doi: 10.55730/1300-0632.4003.

N. Hamidah, N. Yusliani, and D. Rodiah, “Spelling Checker using Algorithm Damerau Levenshtein Distance and Cosine Similarity,” 2020. [Online]. Available: http://sjia.ejournal.unsri.ac.id

M. O. Braddley, M. Fachrurrozi, and N. Yusliani, “Pengoreksian Ejaan Kata Berbahasa Indonesia Menggunakan Algoritma Levensthein Distance,” Prosiding Annual Research Seminar, vol. 3, no. 1, pp. 1–5, 2017.

E. Erwina, T. Tommy, and M. Mayasari, “Indonesian Spelling Error Detection and Type Identification Using Bigram Vector and Minimum Edit Distance Based Probabilities,” SinkrOn, vol. 6, no. 1, pp. 183–190, Nov. 2021, doi: 10.33395/sinkron.v6i1.11224.

T. M. Fahrudin et al., “A Rule-based Spelling Checker for Correcting Punctuation Errors in Indonesia Text using KEBI 1.0 Checker,” in International Seminar of Research Month 2021, Galaxy Science, May 2022, pp. 1–8. doi: 10.11594/nstp.2022.2433.

R. Kumar, M. Bala, and K. Sourabh, “A study of spell checking techniques for Indian Languages,” JK Research Journal in Mathematics and Computer Sciences, no. 1, 2018.

M. V. Christanti, Rudy, and D. S. Naga, “Fast and accurate spelling correction using trie and Damerau-levenshtein distance bigram,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 16, no. 2, pp. 827–833, Apr. 2018, doi: 10.12928/TELKOMNIKA.v16i2.6890.

M. AYDOĞAN and A. KARCİ, “Kelime Gömmelerini Kullanarak Türkçe Dili İçin Sözlük Metodu ile Yazım Düzeltme,” European Journal of Science and Technology, pp. 57–63, Apr. 2020, doi: 10.31590/ejosat.araconf8.

P. E. Ltrc, M. Chinnakotla, and R. Mamidi, “Automatic Spelling Correction for Resource-Scarce Languages using Deep Learning,” Jul. 2018. [Online]. Available: https://github.com/PravallikaRao/SpellChecker




DOI: https://doi.org/10.33387/jiko.v6i3.7072

Refbacks

  • There are currently no refbacks.