Performance Analysis of Machine Learning Model Combination for Spaceship Titanic Classification using Voting Classifier

Haria Wirawan, Robet Robet, Jackri Hendrik

Abstract


The Spaceship Titanic dataset is fictional yet complex and challenging, featuring a mix of numerical and categorical features and missing values. This study aims to evaluate the performance of three machine learning model scenarios for classifying passenger status as “Transported” or “not”. The three scenarios implemented include linear-like models, a combination of the Top 5 Diverse models, and tree-based/ensemble models, each using a voting classifier approach. The voting model is employed because it can combine the strengths of multiple algorithms to reduce bias and variance, thus improving overall prediction accuracy and stability. The voting mechanism aggregates predictions from several base classifiers using two strategies: hard voting, which selects the majority class, and soft voting, which averages the predicted probabilities across models. The dataset was obtained from Kaggle and processed through several stages: data preprocessing, data splitting, model training, and evaluation. The evaluation results show that the tree-based/ensemble scenario achieved the highest accuracy of 90.38%, followed by the Top 5 Diverse model combination at 87.31% and the Linear-like model at 76.51%. Visualization using the confusion matrix, ROC Curve, and Feature importance analysis further supports the claim that ensemble models are superior at detecting complex classification patterns. These findings suggest that tree-based ensemble models provide the most optimal approach for classification tasks on a dataset like Spaceship Titanic.


References


A. Rahman and S. S. Prasetiyowati, “Performance Analysis of the Hybrid Voting Method on the Classification of the Number of Cases of Dengue Fever,” Int. J. Inf. Commun. Technol., vol. 8, no. 1, pp. 10–19, 2022, doi: 10.21108/ijoict.v8i1.614.

A. Gumilar, S. S. Prasetiyowati, and Y. Sibaroni, “Performance Analysis of Hybrid Machine Learning Methods on Imbalanced Data (Rainfall Classification),” J. RESTI, vol. 6, no. 3, pp. 481–490, 2022, doi: 10.29207/resti.v6i3.4142.

A. J. Barid, Hadiyanto, and A. Wibowo, “Optimization of the algorithms use ensemble and synthetic minority oversampling technique for air quality classification,” Indones. J. Electr. Eng. Comput. Sci., vol. 33, no. 3, pp. 1632–1640, 2024, doi: 10.11591/ijeecs.v33.i3.pp1632-1640.

R. Mardianto, Stefanie Quinevera, and S. Rochimah, “Perbandingan Metode Random Forest, Convolutional Neural Network, dan Support Vector Machine Untuk Klasifikasi Jenis Mangga,” J. Appl. Comput. Sci. Technol., vol. 5, no. 1, pp. 63–71, 2024, doi: 10.52158/jacost.v5i1.742.

D. N. Cholis and N. Ulinnuha, “An Ensemble Voting Approach for Dropout Student Classification Using Decision Tree C4.5, K-Nearest Neighbor and Backpropagation,” Indones. J. Artif. Intell. Data Min., vol. 6, no. 1, p. 107, 2023, doi: 10.24014/ijaidm.v6i1.23412.

F. T. Kurniati, D. H. Manongga, E. Sediyono, S. Y. J. Prasetyo, and R. R. Huizen, “Object Classification Model Using Ensemble Learning with Gray-Level Co-Occurrence Matrix and Histogram Extraction,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 9, no. 3, pp. 793–801, 2023, doi: 10.26555/jiteki.v9i3.26683.

Enas Mohammed Hussien Saeed, “An Ensemble Voting Classifier based on Machine Learning Models for Phishing Detection,” Int. J. Sci. Res. Sci. Eng. Technol., vol. 12, no. 1, pp. 15–27, 2025, doi: 10.32628/ijsrset251211.

S. Yousefi and M. Poornajaf, “Comparison of result of machine learning algorithms in predicting heart disease,” Front. Heal. Informatics, vol. 12, 2023, doi: 10.30699/fhi.v12i0.402.

S. Putri Aulia, B. Rahmat, and A. Junaidi, “Enhancing Heart Disease Prediction through SMOTE-ENN Balancing and RFECV Feature Selection,” J. Artif. Intell. Eng. Appl., vol. 4, no. 3, pp. 1968–1973, 2025, doi: 10.59934/jaiea.v4i3.1057.

S. Tomar, D. Dembla, and Y. Chaba, “Analysis and Enhancement of Prediction of Cardiovascular Disease Diagnosis using Machine Learning Models SVM, SGD, and XGBoost,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 4, pp. 469–479, 2024, doi: 10.14569/IJACSA.2024.0150449.

K. Cao et al., “Prediction of cardiovascular disease based on multiple feature selection and improved PSO-XGBoost model,” Sci. Rep., vol. 15, no. 1, pp. 1–12, 2025, doi: 10.1038/s41598-025-96520-7.

G. Alwakid, F. Ul Haq, N. Tariq, M. Humayun, M. Shaheen, and M. Alsadun, “Optimized machine learning framework for cardiovascular disease diagnosis: a novel ethical perspective,” BMC Cardiovasc. Disord., vol. 25, no. 1, 2025, doi: 10.1186/s12872-025-04550-w.

S. Akinola, R. Leelakrishna, and V. Varadarajan, “Enhancing cardiovascular disease prediction: A hybrid machine learning approach integrating oversampling and adaptive boosting techniques,” AIMS Med. Sci., vol. 11, no. 2, pp. 58–71, 2024, doi: 10.3934/medsci.2024005.

N. Chandrasekhar and S. Peddakrishna, “Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization,” Processes, vol. 11, no. 4, 2023, doi: 10.3390/pr11041210.

D. Kurniadi, A. I. Pertiwi, and A. Mulyani, “Multi-Algorithm-Based Ensemble Voting Classifier and SMOTE Method for He ar t Disease Classification,” vol. 14, no. 2, pp. 145–153, 2025.

L. B. V. de Amorim, G. D. C. Cavalcanti, and R. M. O. Cruz, “The choice of scaling technique matters for classification performance,” Appl. Soft Comput., vol. 133, pp. 1–37, 2023, doi: 10.1016/j.asoc.2022.109924.

A. Helmut and D. T. Murdiansyah, “Multiclass Email Classification by Using Ensemble Bagging and Ensemble Voting,” JIKO (Jurnal Inform. dan Komputer), vol. 6, no. 2, pp. 144–149, 2023, doi: 10.33387/jiko.v6i2.6394.

N. Agustina and C. N. Ihsan, “Pendekatan Ensemble untuk Analisis Sentimen Covid19 Menggunakan Pengklasifikasi Soft Voting,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 2, pp. 263–270, 2023, doi: 10.25126/jtiik.20231026215.

I. P. Adi Pratama, E. S. Jullev Atmadji, D. A. Purnamasar, and E. Faizal, “Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties,” Indones. J. Data Sci., vol. 5, no. 1, pp. 23–29, 2024, doi: 10.56705/ijodas.v5i1.124.

A. Munandar, W. Maulana Baihaqi, and A. Nurhopipah, “A Soft Voting Ensemble Classifier to Improve Survival Rate Predictions of Cardiovascular Heart Failure Patients,” Ilk. J. Ilm., vol. 15, no. 2, pp. 344–352, 2023, doi: 10.33096/ilkom.v15i2.1632.344-352.

Octavian, A. Badruzzaman, M. Y. Ridho, and B. D. Trisedya, “Enhancing Weighted Averaging for CNN Model Ensemble in Plant Diseases Image Classification,” J. RESTI, vol. 8, no. 2, pp. 272–279, 2024, doi: 10.29207/resti.v8i2.5669.

A. S. Kirono and Y. Nataliani, “Perbandingan Algoritma Machine Learning dalam Analisis Penyebab Penyakit Gagal Jantung,” J. Edukasi dan Penelit. Inform., vol. 10, no. 2, p. 296, 2024, doi: 10.26418/jp.v10i2.78369.




DOI: https://doi.org/10.33387/jiko.v8i3.10866

Refbacks

  • There are currently no refbacks.