APPLICATION OF THE K-MEANS AND DECISION TREE ALGORITHMS IN DETERMINING STUDENT ACHIEVEMENT

Various factors influence student achievement, both internal and external; this makes it difficult for some teachers to detect every student in class. This research aims to determine student achievement in class among students at the SDS Kartika X-6 school. Data comes from SDS Kartika X-6, an elementary school owned by the Indonesian Army. By knowing the factors that influence the determinants of student learning achievement, steps can be taken to improve student learning achievement at SDS Kartika x-6. The methods used in this research are the K-Means algorithm and Decision Tree. This method will be chosen to determine student learning achievement. The process begins by determining clusters using the K-Means algorithm; then a classification process is carried out using a Decision Tree. The number of datasets in this research is 28, and the criteria are gender, mathematics grades, English, natural sciences, religion, class performance, and school achievement. The implementation results show that academic grades, class achievements, and school achievements play a role in determining student achievement for SDS Kartika X-6 students. Meanwhile, 3 clusters were formed: Fairly Good, Good, and Very Good. In the testing stage using the Decision Tree method, prediction accuracy was 71%, with an error of 29.


INTRODUCTION
To keep up with the times, one must first strengthen oneself; education is a beautiful approach to achieving this goal.Education is necessary to produce quality human resources because its people's resources determine a nation's achievement [1].In previous studies, in applications using k-means algorithm methods, it can process data entered in the form of values of nahwu, shorof, moral, memorization, presence, and honesty, which is a process in which students' values will be selected in the process of grouping students with achievements so that they are quick and accurate [2].In a recent study, classification was carried out using a decision tree algorithm, so it was found that parents' income did not affect student achievement.However, the distance from home to school influenced student achievement [1].
Then the study entitled Implementation of K-Means and K-Nearest Neighbors in the Outstanding Student Category, uses the K-means and K-Nearest Neighbors algorithm methods to categorize outstanding students.The result of the research is that it is hoped that more special attention will be paid to creating the best graduates at SMKN 3 Karawang [3].This research decided to collect data at SDS Kartika X-6, which is a private elementary school belonging to the Army.SDS Kartika X-6 in determining student achievement with class achievement scores, and school achievement.With what is currently being done, there are many shortcomings and problems faced.Especially considering that there is a lot of data that will influence the process of determining student academic grades with class achievement, and school achievement in determining student achievement.
The application in making achievement is in terms of method and based on class achievement and school achievement, considering that previous research used parental income and the distance from home to school [1].And where the majority only use the K-Means algorithm, by using two methods and academic grades, class achievement, and school achievement, you will find out how influential they are in determining student achievement.In this research, the K-Means Jevntya, Darussalam and Abdullah, Application of The K-Means … 14 and Decision Tree algorithms will be used to solve this problem.Researchers can then help take steps toward student achievement in SDS Kartika X-6 and find out how much is involved in determining students' achievements with academic scores, class achievement, and school achievement.

RESEARCH METHOD
In this investigation, the Decision Tree method and the K-Means algorithm were used.Both the Decision Tree method and the K-Means algorithm can be used in the following way, which consists of the stages involved in Figure 1.To proceed to the next step.This is where the work Clearing Data does with student data sets begins.After interviewing the principal of SDS Kartika X-6, an Excel dataset was obtained.After the data clearing method is complete, the data will be modeled using the K-Means approach with parameter k = 3.This allows the cluster findings to be used as labels for further modeling.A labeling process will be carried out on the dataset before the Decision Tree method is used for the modeling process.Using measurements such as, classification reports, matrix of confusion, and accuracy, the final step is to analyze the Decision Tree algorithm to determine which students achieved academic scores, class achievement, and school achievement.

K-Means Algorithm
It is said that clustering is a process that tries to organize data according to the similarities between the data in the cluster.This process aims to make data within a cluster very similar to each other, while data between clusters is very different from each other [4], [5], [6].A group of objects or components that are connected is called a cluster.Therefore, clustering analysis will result in the formation of many groups [2].Here are some well-known and widely used types of clustering algorithms for processing all types of data, such as density-based, centroid, hierarchical, and K-means [7].
The K-Means algorithm is one of the algorithms that is widely used in clustering because of its simplicity and efficiency and is recognized as one of the top 10 data mining algorithms by the IEEE.Among the methods often used in clustering, the K-Means algorithm is one of the algorithms that is often used due to its ease of use, speed, and recognition as a data mining tool [8], [9].With the clustering strategy, researchers often use the K-Means method in combination with it [3], [10].This is because the clustering approach is very effective in finding groups of data.The K-Means Clustering approach divides data into many groups, assigning data that is similar to one group and data that is different to another group [11], [12], [13].
In the K-Means Clustering algorithm, data is divided into several groups, where data in one group has the same characteristics and data in other groups has different characteristics.The K-Means Clustering method begins with the following stages [11]: 1. Determine k, which is the total cluster you want to create.2. K centroid (cluster center point) prefix that will be used randomly.
The formula determines the initial centroid. is the centroid and i is the iteration.d(x,y): Euclidean distance, namely the distance between data at points x and y using mathematical calculations [14].

Decision Tree
This modeling can be used to determine the type of information that can be extracted from the data that has been generated.In making decisions with the help of the Decision Tree model, it is at this stage that the Decision Tree algorithm can be used [1].To make predictions, a Decision Tree that has been trained using the C4.5 approach can use a collection of decision rules [15].In a Decision Tree, internal and root nodes are labeled with attribute names, edges are labeled with attribute values and leaf nodes are labeled with multiple classes.This makes the Decision Tree a fundamental example of a classification technique that can be used for many limited classes [16], [17].
This modeling stage is useful for finding out what patterns or information can be used as useful information from the prepared data.At this modeling stage, the Decision Tree algorithm is used, and this algorithm will be useful for making decisions whose modeling is in the form of a decision tree.In formulating the Decision Tree algorithm, the formula used is as follows [1]: The formula for calculating entropy

1 Data Collection
This research data has been collected at the SDS Kartika X-6 school.Various characteristics included in the data include number, gender, Mathematics subjects (MTK), Indonesian Language (BI), Natural Sciences (IPA), Religion, class achievement, and school achievement (with a total of data obtained from 28 students ).The data collection process consisted of interviewing the Principal of SDS Kartika X-6 with students.

3 K-Means uses Google Colab
As soon as the data is ready, use Figure 3.The first step of this research was to develop a model based on the student data set.Resources provided by Google Colab were used in this investigation.This is the first step to finding labels in existing data, namely using K-Means modeling (Figure 4).In dividing the k-values into three different groups, we will later use these groupings as labels in the data set relating to students.The clustering process is used to get the average value of the current dataset.This is the reason why the value of k is divided by three.This results in a model that appears as follows in Figure 5 and 6:

4 Decision Tree using Google Colab
Next, the student dataset will be labeled according to the results of the clustering model trained using K-Means before being applied to the data.Next, a classification method is used, namely the Decision Tree algorithm, to model the student dataset, which has been labeled in Figure 7 [1].To make it easier to obtain information from the Decision Tree modeling that has been carried out, we will detail or explain the model that was successfully executed.This is the explanation in Figure 10:

5 Decision Tree Testing
In the context of Decision Tree modeling, the first step taken is making predictions, which will then be assessed for accuracy, confusion matrix, and classification report.All are using Google Colab.The following is a list of test results in Figure 11 : C. Very Good, there are 2 students in the Very good category.
The results of determining the cluster are Fairly Good, Good, and Very Good by adding up the distance calculation values for each cluster member in the K-Means algorithm and then sorting them from the values of Fairly Good, Good, and Very Good.Decision Tree modeling is based on subject score IPA, subject score Religion, and cluster.The decision tree approach is capable of achieving a prediction accuracy score of 71%.In this research in determining students perform with the subject score IPA, subject score Religion, and cluster using the Decision Tree algorithm but with different attributes.Decision Tree modeling based on subject score IPA, subject score Religion and cluster.Decision tree approaches are able to a prediction accuracy score of approximately 71%.That in this research in determining student achievement with subject score IPA, subject score Religion, and cluster using the Decision Tree algorithm but with different attributes.
In previous research, the Decision Tree algorithm was not suitable to use.The results of the research are different in several ways from the results of this research, in that the Decision Tree has an influence in determining student achievement, namely in terms of methods and based on the many attributes used.Previous research used parental income and distance from home to school [1], and the majority only used the K-Means algorithm.

CONCLUSION
The research that has been carried out in applying the clustering method using K-Means and the Decision Tree algorithm classification method using a student dataset containing number, gender, MTK, BI, IPA, Religion, class achievement, and school achievement.In the research user, judging from the science and religion scores, the application in determining high achieving students is suitable.Fairly Good, Good, and Very Good are three separate categories that are thought to be related to the K-Means method.Next, the

Figure 1 .
Figure 1.Research Stages implementation K-Means and Decision Tree

3 . 2 ( 2 )
The distance of each object to each centroid is calculated based on each cluster.The distance between the object and the centroid is calculated using Euclidian Distance.d(x,y) = |x-y| = √∑ ( − )  =1 The formula takes into account the distance between centroids.4. Allocate each object to the nearest centroid.5. Carry out the iteration.Next, using the equation, determine the position of the new centroid.If the centroid locations are not similar, it means repeating stage 3 [1].Where : x: First data point y: Second data point n: Number (attributes) in data mining of Attribute Partitions A Si = Number of Cases in Partition i S = Number of Cases in S [1]

Figure 2
Figure 2. Research Dataset Explanation of Figure 2. above: Gender 1 Boy 2 Women Class Achievement 0 None 1 Yes School Achievement 0 None 1 Yes

Figure 4 .
Figure 4. Dataset for K-Means Modeling By using MTK, BI, IPA, Religion, and school achievement data as a basis, clustering was carried out.In dividing the k-values into three different groups, we will later use these groupings as labels in the data set relating to students.The clustering process is used to get the average value of the current dataset.This is the reason why the value of k is divided by three.This results in a model that appears as follows in Figure5and 6:

Figure 7 .
Figure 7. labeled Datasets The attributes used are MTK, BI, IPA, Religion, school achievement.With clusters that have been labeled from K-Means Figure 8.

Figure 8 .
Figure 8. Decision Tree Attributes Modeling is done by dividing the labeled student dataset into training data and random states.The data used is MTK, BI, IPA, Religion, Class achievement and cluster, so the model results are in Figure 9:

Figure 10 .
Figure 10.Explanation of the Decision Tree Model

Table 1 .
K-Means Algorithm Results