Multiclass Email Classification by Using  Ensemble Bagging and Ensemble Voting

Ali Helmut; Danang Triantoro Murdiansyah

doi:10.33387/jiko.v6i2.6394

Multiclass Email Classification by Using Ensemble Bagging and Ensemble Voting

Ali Helmut, Danang Triantoro Murdiansyah

Abstract

Email is a common communication technology in modern life. The more emails we receive, the more difficult and time consuming it is to sort them out. One solution to overcome this problem is to create a system using machine learning to sort emails. Each method of machine learning and data sampling result in different performance. Ensemble learning is a method of combining several learning models into one model to get better performance. In this study we tried to create a multiclass email classification system by combining learning models, data sampling, and several data classes to obtain the effect of Ensemble Bagging and Ensemble Voting methods on the performance of the macro average f1 score, and compare it with non-ensemble models. The results of this study show that the sensitivity of NaÃ¯ve Bayes to imbalance data is helped by the Ensemble Bagging and Ensemble Voting method with âˆ†P (delta performance) of range 0.0001 â€“ 0.0018. Logistic Regression has performance with Ensemble Bagging and Ensemble Voting by âˆ†P of range 0.0001-0.00015. Decision Tree has lowest performance compared to others with âˆ†P of -0.01

Full Text:

PDF

References

X. L. Wang and I. Cloete, â€œLearning to classify email: A survey,â€ 2005 Int. Conf. Mach. Learn. Cybern. ICMLC 2005, pp. 5716â€“5719, 2005.

The Radicati Group.inc, â€œEmail Statistics Report, 2017-2021â€, 2018

S. Tsugawa, K. Takahashi, H. Ohsaki, and M. Imase, â€œRobust estimation of message importance using inferred inter-recipient trust for supporting email triage,â€ Proc. - 2010 10th Annu. Int. Symp. Appl. Internet, SAINT 2010, pp. 177â€“180, 2010.

M. Zivkovic et al., â€œTraining Logistic Regression Model by Hybridized Multi-verse Optimizer for Spam Email Classification,â€ in Proceedings of International Conference on Data Science and Applications: ICDSA 2022, Volume 2, 2023, pp. 507â€“520.

D. M. Ablel-Rheem, A. O. Ibrahim, S. Kasim, A. A. Almazroi, M. A. Ismail, and others, â€œHybrid feature selection and ensemble learning method for spam email classification,â€ Int. J., vol. 9, no. 1.4, pp. 217â€“223, 2020.

P. Kumar, â€œPredictive analytics for spam email classification using machine learning techniques,â€ Int. J. Comput. Appl. Technol., vol. 64, no. 3, pp. 282â€“296, 2020.

A. Sharaff and U. Srinivasarao, â€œTowards classification of email through selection of informative features,â€ in 2020 First International Conference on Power, Control and Computing Technologies (ICPC2T), 2020, pp. 316â€“320.

A. Alghoul, S. Al Ajrami, G. Al Jarousha, G. Harb, and S. S. Abu-Naser, â€œEmail Classification Using Artificial Neural Network,â€ 2018.

V. Babar and R. Ade, â€œMLP-based undersampling technique for imbalanced learning,â€ in 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), 2016, pp. 142â€“147.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, â€œSMOTE: synthetic minority over-sampling technique,â€ J. Artif. Intell. Res., vol. 16, pp. 321â€“357, 2002.

B. Singh, N. Kushwaha, and O. P. Vyas, â€œA Scalable Hybrid Ensemble model for text classification,â€ IEEE Reg. 10 Annu. Int. Conf. Proceedings/TENCON, pp. 3148â€“3152, Feb. 2017.

L. Breiman, â€œBagging predictors,â€ Mach. Learn., vol. 24, no. 2, pp. 123â€“140, Aug. 1996.

V. Metsis, I. Androutsopoulos, and G. Paliouras, â€œSpam filtering with naive bayes-which naive bayes?,â€ in CEAS, 2006, vol. 17, pp. 28â€“69.

M. Dumont, R. MarÃ©e, L. Wehenkel, and P. Geurts, â€œFast multi-class image annotation with random subwindows and multiple output randomized trees,â€ in International Conference on Computer Vision Theory and Applications (VISAPP), 2009.

H.-F. Yu, F.-L. Huang, and C.-J. Lin, â€œDual coordinate descent methods for logistic regression and maximum entropy models,â€ Mach. Learn., vol. 85, pp. 41â€“75, 2011.

DOI: https://doi.org/10.33387/jiko.v6i2.6394

Refbacks

There are currently no refbacks.

Username
Password
Remember me