Conference PaperPDF Available

Evaluation of Random Forest and Support Vector Machine Models in Educational Data Mining

June 2024

June 2024

DOI:10.1109/InCACCT61598.2024.10551110

Conference: 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT)
At: Gharuan, India

Authors:

Tsehay Admassu

Injibara University

Ayodeji Salau

Afe Babalola University

Gunjan Chhabra

Graphic Era Hill University

Keshav Kaushik

University of Petroleum & Energy Studies

Show all 5 authorsHide

The computer science field has witnessed the popularity of machine learning (ML) in discriminating low achieving and high-achieving students. However, various ML methods have different performances in predicting student performance. Therefore, the investigative analysis of their effectiveness in the discrimination of student based on their academic achievement would have been the major research concern these days. This study investigates the performance of the random forest (RF) and support vector machine (SVM) against their power in academic performance prediction of a student grade score (SGS). The analysis is performed based on the classification capability of the two algorithms using the Portuguese SGS dataset. Furthermore, the study also focused on the analysis of the impact of sigmoid and radial basis functions on the capability of the SVM for classifying SGS. We also presented a comparison among the various ML methods namely RF, and SVM, in identifying the student performance based on the SGS. Various demographic information (age, sex) and student assessment results (assignment, mid-term exam, and quiz) were used as the features in training. The result revealed that RF and SVM classifiers have the power to predict student performance. The SVM scored more accuracy than the RF. We obtained high accuracy (75.72%) using the linear kernel. The result implied that SGS can be predicted by using previous assessment results with the proposed SVM classifier.

The block diagram for the proposed system III. RESULTS AND DISCUSSION The validation of SVM and RF in discriminating the SAA has produced good results across various types of research. A research article [26] presented some key validation tests of the ML systems in SAA classification [26]. The performance of SVM and RF for the identification of the SAA is validated based on prediction accuracy.

…

The accuracy of RF and SVM for SAA

…

Figures - uploaded by Keshav Kaushik

Content may be subject to copyright.

Content uploaded by Keshav Kaushik

Content may be subject to copyright.

2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT)

Evaluation of Random Forest and Support Vector

Machine Models in Educational Data Mining

Tsehay Admassu Assegie

School of Electronics Engineering,

Kyungpook National University,

Daegu, Republic of Korea

tsehayadmassu2006@gmail.com

Ayodeji Olalekan Salau

Department of Electrical and Computer

Engineering,

Afe Babalola University,

Ado-Ekiti, Nigeria

ayodejisalau98@gmail.com

Gunjan Chhabra

Department of CSE, Graphic Era Hill

University,

Dehradun, Uttarakhand, India,

chhgunjan@gmail.com

Keshav Kaushik

School of Computer Science, University of

Petroleum and Energy Studies,

Dehradun, Uttarakhand, India

officialkeshavkaushik@gmail.com

Sepiribo Lucky Braide

Department of Electrical and Electronics

Engineering, Rivers State University,

Port Harcourt 5080, Nigeria

braidesepiribo@yahoo.com

Abstract—The computer science field has witnessed the

popularity of machine learning (ML) in discriminating low

achieving and high-achieving students. However, various ML

methods have different performances in predicting student

performance. Therefore, the investigative analysis of their

effectiveness in the discrimination of student based on their

academic achievement would have been the major research

concern these days. This study investigates the performance of

the random forest (RF) and support vector machine (SVM)

against their power in academic performance prediction of a

student grade score (SGS). The analysis is performed based on

the classification capability of the two algorithms using the

Portuguese SGS dataset. Furthermore, the study also focused on

the analysis of the impact of sigmoid and radial basis functions

on the capability of the SVM for classifying SGS. We also

presented a comparison among the various ML methods namely

RF, and SVM, in identifying the student performance based on

the SGS. Various demographic information (age, sex) and

student assessment results (assignment, mid-term exam, and

quiz) were used as the features in training. The result revealed

that RF and SVM classifiers have the power to predict student

performance. The SVM scored more accuracy than the RF. We

obtained high accuracy (75.72% ) using the linear kernel. The

result implied that SGS can be predicted by using previous

assessment results with the proposed SVM classifier.

Keywords—data mining, quality of education, education

student performance, classification

I. INTRODUCTION

The past few years have experienced extensive research in

the analysis and classification of student academic

achievement (SAA). The applicability of the ML system has

become significant in predicting SAA and providing early

information for low-achieving students [1]. Additionally, the

availability of a large volume of data in the educational

landscape has paved the way for the ML capability to

disseminate the low and high achievers possibly aiding the

analysis and extraction of knowledge about the factors

influencing SAA.

Hence, because of their highly accurate discriminative

capability, the use of ML systems has become prominent in

the classification of high and love achieving students.

Moreover, these systems also aid in the investigation of the

influential factors on SAA. Thus, the evaluation of various

ML algorithms become one of the most important research

topics in machine learning. In [2], the researchers investigated

the effectiveness of SVM, K-Neighbors, Naïve Bayes (NB)

Artificial Neural Network (ANN), and decision tree (DT) for

the classification of the SAA. The study highlighted that the

ANN model has higher performance compared to the other

model.

The application of ML gained much research attention in

improving the SAA at the higher institution. Authors in [3]

applied machine-learning methods to predict student dropout

at a higher education institution. The study analyzed the

accuracy of RF, SVM, DT, and ANN on student dropout

prediction. The result appears to prove that the RF classifier

outperforms the RF, SVM, and DT. The discriminative power

of the low and high achievers using the RF is 70.98% accuracy

on the test data used in the research.

Lau et al. [4] employed ANN to develop a discriminatory

system of high and low-achieving students. The study

showcased that the ANN has been one of the dominant SAA

assessment techniques. The discrimination of low and high-

performing students with ANN helps the teachers to

differentiate low achievers before their failure by delivering

compositing and other supportive sessions to aid them in

improving their academic achievements. The experiment

revealed that ANN has an accuracy of 84.8% on SAA

classification.

Similarly, a research study [5] investigated a literature

survey on the application of ML in improving the

discrimination of high and low-achieving students by

analyzing their score quality. The researchers also highlighted

in their findings that ML has a wide range of applications in

the educational sector for the discrimination of the SAA [6].

The study highlighted that the ensemble learning methods are

one of the most commonly applied machine learning methods

for predicting student's academic performance.

SVM has also been used for developing an ML system to

identify the SAA by analyzing student performance data. It is

used for the classification of student grade scores as pass or

fail based on certain previous records [7]. I.K. Nti et al. [8]

supported the claim that ML systems have the power to

discriminate students based on performance by comparing

SVM with linear regression. The paper suggested that the

result showed that SVM had a lower mean square error in

classifying those students likely to drop out.

Another study in [9] implemented a DT-based predictive

model for SAA. The study [10] employed students’ previous

131

Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.

grades to predict the grade score of the student in future

subjects. The experimental result demonstrates the

implemented DT model achieves 63.63% accuracy on SAA

prediction. The study suggested that the implemented model

is helpful to the administration in evaluating and assessing the

results of students in their decision-making.

Higher education institutions have their standards for

evaluating student success [11]. However, they lack proper

procedures for handling student data and analyzing the

academic achievement of their student [12]. Thus, machine-

learning approaches have significance in implementing

appropriate procedures for predicting student academic

achievement in higher education institutions [13]. Research

articles [14] developed DT and RF-based models for

predicting the SAA. The study compared the performance of

DT with RF [15]. The result of the comparison with accuracy

as a performance metric demonstrates that the RF model

achieves a higher accuracy of 69.9% on student academic

performance (SAA) prediction [16].

The comparative study of different machine learning

approaches for predicting SAA conducted in different studies

[17] suggests that supervised learning methods have become

significant in discovering correlations among different

attributes of the student. The application of machine learning

approaches to student data assists in enhancing the quality of

academic performance of higher institution students.

Associative classification has become one of the significant

tools for predicting SAA [18]. A comparative study [19] on

different supervised learning methods such as DT, RF, NB,

and deep learning shows promising results. The RF model

gives an accuracy of 75.52% in predicting the SAA [20].

The literature survey in [21] shows that machine-learning

approaches are widely applicable to the educational sector for

analyzing educational data to improve education quality.

Thus, this study aims to investigate the RF and SVM

models. Overall, the objectives of this study are discussed as

follows: (1) To study the performance of RF, and SVM for

predicting the performance of student success. (2) To study

the effect of the SVM regularization parameters on the

performance of SVM. To study the effect of the depth of the

tree on its discriminative power of discrimination the SAA

with the RF and SVM. The organization of the research is as

follows: we described the method and the source of data in

section 2 while section 3 focuses on the comparative result

analysis, and section 4 covers the conclusion. The research

focused on the investigation of the discriminative power of RF

and SVM SAA, using various performance indicators, such as

accuracy, and the parameters of SVM which include radial

basis, polynomial, and sigmoid functions.

II. METHODOLOGY

The study procedures followed in conducting this analysis

of the RF and SVM in the discrimination of the student against

their SGS involved the following steps as suggested by a study

[22]. The collection of the dataset is conducted in the first step.

Firstly, we gathered the SAA dataset, including their

demographic data, grade scores, attendance records, and

related variables that have an impact on the SGS as previously

used in [23]. In the second step, we conducted preprocessing

of the collected dataset by cleaning missing values, removing

redundant data samples, and removing outliers, and

categorical variables. The dataset is collected from Portuguese

SAA obtained from the Kaggle data repository employed by

the previous study [24]. Thirdly, the dataset has split the

dataset into training and testing to train and validate the SVM

and RF. Then in the fourth step, the RF, and SVM are trained

on the training set using varying hyperparameters to obtain

good discriminative power of the employed ML methods.

Various validation methods such as accuracy, and confusion

matrix are used in analyzing the effectiveness of the SVM and

RF on the test set, these measures help validate the predictive

power of the ML systems [25].

After obtaining the SAA the sex, gender, parent's

qualification, economic, and academic attributes are collected

from the Kaggle repository. To implement the selected ML

methods for discriminating the students with their SGS, SVM,

and RF we used Python 3.8 Programming Language using

Intel(R) Core (TM) i7-8550U CPU @ 1.80GHz 2.00 GHz

with 8GB RAM. We have removed redundant and missing

values from the collected dataset before training the SVM, and

RF. Additionally, we have label-encoded the categorical

features to feed the input data to the RF, and SVM. Figure 1

indicates the procedures we followed in implementing and

testing the RF and SVM-based predictors for SAA. The steps

involved in the process of conducting this study involved data

collection to validation as observed in Figure 1. Finally, the

model is validated against its prediction accuracy of SAA.

Fig. 1. The block diagram for the proposed system

III. RESULTS AND DISCUSSION

The validation of SVM and RF in discriminating the SAA

has produced good results across various types of research. A

research article [26] presented some key validation tests of the

ML systems in SAA classification [26]. The performance of

SVM and RF for the identification of the SAA is validated

based on prediction accuracy.

The comparative investigation of the RF and the SVM

(with sigmoid) has shown that the SVM classifier has better

discriminative power than the RF classifier in identifying the

SAA based on grade score. moreover, the results also

indicated that the SVM discriminative ability varies with the

variation of the parameters such as sigmoid, polynomial, and

radial basis functions. The higher discriminative power is

achieved by training the SVM with the sigmoid function as

compared with the other parameters. The performance of the

RF and SVM is presented in sections 3.1 and 3.2 respectively.

A. The Performance of the RF

RF is a popular machine learning algorithm that is widely

used in Educational Data Mining (EDM) due to its ability to

handle complex relationships in data, handle high-

dimensional feature spaces, and provide robust prediction [27].

132

Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.

It typically performs well in predicting student outcomes, such

as academic performance, dropout risk, or course completion.

Its ensemble nature, combining multiple decision trees, helps

reduce overfitting and improve prediction accuracy compared

to individual decision trees [28]. While RF is considered a

black-box model to some extent, it still offers some level of

interpretability through feature importance analysis [29].

Educators and researchers can gain insights into which

variables are most influential in predicting student outcomes

[30]. It can handle large datasets efficiently and is

parallelizable, making it suitable for processing vast amounts

of educational data. It can also handle imbalanced class

distributions, which are common in educational datasets [30].

Fig.2 demonstrates the performance of the proposed RF in

predicting the SAA. Fig. 2 indicates the random forest

achieves the highest accuracy of 84.02%, a minimum

accuracy of 59.72%, and an average accuracy of 71.84% in

predicting the performance of SAA. Thus, the model is

effective in predicting students learning outcomes even

though the model has scope for improvement, as an accuracy

score of 84.02% does not accurately predict the learning

outcomes of a student.

Fig. 2. The performance of the RF

B. The Performance of the SVM

The performance of the SVM model is analyzed on

different parameters. The SVM model achieves different

accuracies for different parameters such as the sigmoid,

polynomial, and radial basis function (RBF). The sigmoid

SVM model achieves a higher accuracy of 83.33% as

compared to the polynomial and RBF parameters. Figure 3

demonstrates the maximum, minimum, and average

accuracies of the SVM model using the sigmoid, polynomial,

and RBF for predicting SAA.

Fig. 3. The performance of SVM

C. Comparison of the RF and SVM

Table I indicates the accuracy achieved by the RF and

SVM models in predicting student academic performance. As

indicated in Table I, the SVM model achieves higher average

accuracy. The RF model achieves the highest accuracy but the

average accuracy of the RF model is lower than the SVM

model for predicting the learning outcomes of students.

Overall, the SVM model achieved 75.72% while the RF model

achieved 71.81% accuracy.

In terms of accuracy, the SVM has been shown to achieve

high prediction accuracy in tasks such as predicting student

academic performance, dropout, and learning styles. The

experiment appears to prove that SVM has a better capability

of handling noisy and highly correlated data. Handling of

noisy and higher correlation makes SVM a good choice for a

dataset with high dimensions such as the SAA. Furthermore,

the SVM has also shown high accuracy in discriminating

student outcomes, particularly with the kernel parameter

showing relationships between SAA and input features and

complex discrimination boundaries between good achievers

and those with lower scores. The linear kernel allowed SVM

to capture inherent patterns in the SAA data.

TABLE I. COMPARISON OF THE PERFORMANCE OF RF AND SVM.

Algorithm

Minimum accuracy

Maximum

accuracy

Average

accuracy

59.72%

84.02%

71.81%

SVM

65.97%

84.72%

75.72%

The experimental result also showed that RF was found to

be robust against overfitting and making and it is hence a good

choice for predictive analysis tasks with high dimensionality

or noisy data [31]. Moreover, as an ensemble method, it also

combines various decision trees, which reduces variance and

improves the prediction of SAA.

Similarly, SVM is good for datasets with outliers as it can

handle high-dimensional data. The main maximization of the

objective in the SVM is to achieve good predictive accuracy

by finding the optimal decision boundary, leading to a good

discriminative capability. It is evident that both SVM and RF

have their strengths and weaknesses, and the selection of these

classifiers should be based on the type of the dataset being

used the task to be addressed, and the characteristics of the

dataset used in classification. However, further research is

needed to validate the potential of other ML classifiers for

SAA prediction. Fig. 4 indicates the comparison of the

performance of SVM and RF.

133

Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.

Fig. 4. The accuracy of RF and SVM for SAA

In conclusion, RF and SVM have been shown to have a

good predictive ability of the SAA, with each classifier

offering unique strengths in terms of prediction capability,

robustness, explainability, and computational complexity. The

choice between these algorithms would depend on the specific

characteristics of the dataset, the types of data of the value to

be predicted, and the priorities of the researcher or investigator.

By considering the trade-offs between these factors,

researchers can select a good-performing ML system for their

EDM application.

By following the procedures presented in section II, we

can effectively validate SVM, and RF in predicting the SAA

and assist in informed results on selecting the important ML

classifier for an SAA prediction. However, the methods

largely would depend on the specific characteristics of the

SAA as this study only validated their performance based on

SAA, the nature of the classification problem, and the trade-

offs between accuracy, transparency, and computational

complexity. Researchers and investigators in EDM can benefit

by considering the strengths and limitations of SVM, and RF

when choosing the better-performing classifier for their

specific needs.

IV. CONCLUSION

This paper proposed an RF and SVM-based model for the

identification of SAA. The study employed various

approaches of data pre-processing to improve the

discrimination ability of the proposed classifiers for

discriminating student-learning outcomes. Moreover, the

study investigated the parameters of SVM, and how the

performance of the SVM model is affected by the parameters

used during the training. The result demonstrates that the

linear SVM model performs better than the RF model

achieving an overall prediction accuracy of 75.72%. in

conclusion, the result of the experiment shows that supervised

learning methods such as RF, and SVM significantly assist in

improving the education quality at higher education

institutions providing higher predictive power. Overall, both

RF and SVM have shown promising results in predicting

students' CGPA. RF tends to perform well in handling noisy

data and large datasets, while SVM is effective in high-

dimensional spaces and non-linear data. However, the choice

between the two algorithms ultimately depends on the specific

characteristics of the dataset and the goals of the prediction

task.

The results show that both RF and SVM are effective in

predicting student performance (predicting student cumulative

grade point average), but RF performs RF as an ensemble

method that combines multiple decision trees, which reduces

the risk of overfitting and improves the generalization of the

model. SVM, on the other hand, is a powerful algorithm for

classification tasks, but it may not perform as well as Random

Forest when dealing with large datasets or noisy data. Overall,

the study concludes that the choice of algorithm should be

based on the specific problem being addressed and the

characteristics of the dataset. Further research is needed to

explore the potential of other machine-learning algorithms for

educational data mining.

REFERENCES

[1].

A. Almasri, E. Celebi, and R.S. Alkhawaldeh., “G. Nguyen et al.,

“Machine Learning and Deep Learning frameworks and libraries for

large-scale data mining: a survey,” Hindawi Scientific Programming,

vol. 2019, pp. 1–14, 2019, doi: https://doi.org/10.1155/2019/3610248.

[2].

Y.A. Alsariera, “Assessment and Evaluation of Different Machine

Learning Algorithms for Predicting Student Performance,” Hindawi

Computational Intelligence and Neuroscience, vol. 2022, no. 1, pp. 1–

11, 2022, doi: s https://doi.org/10.1155/2022/4151487.

[3].

K. Dake, and C. Buabeng-Andoh, “Using Machine Learning

Techniques to Predict Learner Drop-out Rate in Higher Educational

Institutions,” Hindawi Mobile Information Systems, vol. 2022, no. 1,

2022, doi: https://doi.org/10.1155/2022/2670562.

[4].

E.T. Lau, L. Sun, and Q. Yang, “Modelling, prediction and

classification of student academic performance using artificial neural

networks,” SN Computer Science, 2019, doi: |

https://doi.org/10.1007/s42452-019-0884-7.

[5].

P. Balaji et al., “Contributions of Machine Learning Models towards

Student Academic Performance Prediction: A Systematic Review,”

Applied Science 2021 doi: https://doi.org/10.3390/app112110007.

[6].

R. Hasan et al., “Student Academic Performance Prediction by Using

Decision Tree Algorithm,” IEEE, 2018.

[7].

M. Kamal et al., “Metaheuristics Method for Classification and

Prediction of Student Performance Using Machine Learning

Predictors,” Hindawi Mathematical Problems in Engineering, vol.

2022, pp. 1–5, 2022, doi: https://doi.org/10.1155/2022/2581951.

[8].

I.K. Nti et al., “An empirical assessment of different kernel functions

on the performance of support vector machines,” Bulletin of Electrical

Engineering and Informatics, vol. 10, no. 6, pp. 3403-3411, 2021, doi:

10.11591/eei. v10i6.3046.

[9].

M. Zaffar, and K.S. Savita, “A Study of Feature Selection Algorithms

for Predicting Students Academic Performance,” International Journal

of Advanced Computer Science and Applications, vol. 9, no. 5, pp.

541–549, 2019. [10] H. Gull et al., “Improving Learning Experience of

Students by Early Prediction of Student Performance using Machine

Learning,” IEEE, pp. 1-4, 2019.

[10].

F.J Kaunang et al., “Students’ Academic Performance Prediction using

Data Mining,” IEEE, 2019. [12] C.C Kiu et al., “Data Mining Analysis

on Student's Academic Performance through Exploration of Student's

Background and Social Activities,” IEEE, 2018, doi:

10.1109/ICACCAF.2018.8776809.

[11].

S. Biju, A.O. Salau, J.N. Eneh, V. E. Sochima, I. T. Ozue, “A Novel

Pre-Class Learning Content Approach for the Implementation of

Flipped Classrooms,” International Journal of Advanced Computer

Science and Applications (IJACSA), Vol. 11(7), pp. 131-136,

2020. DOI: 10.14569/IJACSA.2020.0110718

[12].

J. Sadowski, “Predicting Student Academic Performance in Computer

Science Courses: A Comparison of Neural Network Models,”

International Journal of Modern Education and Computer Science, vol.

1, no. 1, pp. 1–9, 2018, doi: 10.5815/ijmecs.2018.06.01

[13].

H. Karalar, C. Kapucu, and H. Gürüler, “Predicting students at risk of

academic failure using ensemble model during the pandemic in a

distance learning system,” International Journal of Education in Higher

Education, vol. 18, no. 63, pp. 1–18, 2021, doi

https://doi.org/10.1186/s41239-021-00300-y.

[14].

E. Alyahyan, and Dilek Düştegör, “Predicting academic success in

higher education: literature review and best practices,” International

Journal of Education in Higher Education, vol. 17, no. 3, pp. 1–21,

2020, doi: https://doi.org/10.1186/s41239-020- 0177-7.

[15].

D.S Maylawati et al., “Data science for digital culture improvement in

higher education using K-means clustering and text analytics,”

International Journal of Electrical and Computer Engineering., vol. 10,

no. 5, 2020, pp. 4569-4580, doi: 10.11591/ijece. v10i5.pp4569-4580.

134

Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.

[16].

A.S. Hashim, W.D. Awadh, and A.K. Hamoud, “Student Performance

Prediction Model based on Supervised Machine Learning Algorithms,”

Materials Science and Engineering, 2020, doi:10.1088/1757-

899X/928/3/032019.

[17].

L. Cagliero et al., “Predicting Student Academic Performance using

Associative Classification,” Applied Sciences., pp. 1–22, 2021, doi:

https://doi.org/10.3390/app11041420.

[18].

N.A. Yassein, R.G.Helali, and S.B. Mohomad, “Predicting Student

Academic Performance in KSA using Data Mining Techniques”

Journal of Information Technology & Software Engineering, vol. 7, no.

5, pp. 1–5, 2017, doi: 10.4172/2165-7866.1000213

[19].

L. Nanglae, “Determining patterns of student graduation using a bi-

level learning framework,” Bulletin of Electrical Engineering and

Informatics, vol. 10, no. 4, 2021, pp. 2201-2211, doi: 10.11591/eei.

v10i4.2502.

[20].

S.A. Alwarthan, N. Aslam, and I.U. Khan, "Predicting Student

Academic Performance at Higher Education Using Data Mining: A

Systematic Review," Hindawi Applied Computational Intelligence and

Soft Computing, 2022, doi: https://doi.org/10.1155/2022/8924028.

[21].

S. Leonelli and N. Tempini, “Predicting Student Performance to

Improve Academic Advising Using the Random Forest Algorithm,"

International Journal of Distance Education Technologies. vol. 20, no.

1, pp. 1-17, doi: https://orcid.org/0000-0001-8440-5889.

[22].

S. Huang, and J. Wei, "Student Performance Prediction in Mathematics

Course Based on the Random Forest and Simulated Annealing,"

Hindawi Scientific Programming, 2022, doi:

https://doi.org/10.1155/2022/9340434.

[23].

D.T. Ha et al., “An Empirical Study for Student Academic Performance

Prediction Using Machine Learning Techniques,” International Journal

of Computer Science and Information Security, vol. 18, no. 3, 2020.

[24].

A. Asselman et al., “Enhancing the prediction of student performance

based on the machine learning XGBoost algorithm,” Interactive

Learning Environments, vol. 31, no. 6, 2023.

[25].

A. Triayudi, and I Fitri, “Comparison of the feature selection algorithm

in educational data mining,” TELKOMNIKA Telecommunication,

Computing, Electronics and Control vol. 19, No. 6, December 2021,

pp. 1865~1871, DOI: 10.12928/TELKOMNIKA.v19i6.21594.

[26].

S.T. Ahmed, R. Al-Hamdani, and M.S. Croock, “Enhancement of

student performance prediction using modified K-nearest neighbor,”

TELKOMNIKA Telecommunication, Computing, Electronics and

Control Vol. 18, No. 4, August 2020, pp. 1777-1783, DOI:

10.12928/TELKOMNIKA.v18i4.13849.

[27].

M. Yağcı, “Educational data mining: prediction of students' academic

performance using machine learning algorithms,” Smart Learn.

Environ. 9, 11 (2022). https://doi.org/10.1186/s40561-022-00192-z

[28].

W. Xing, R. Guo, E. Petakovic & S. Goggins, “Participation-based

student final performance prediction model through interpretable

Genetic Programming: Integrating learning analytics, educational data

mining and theory. Computers in Human Behavior, 47, pp. 168–181,

2015.

[29].

P. Dabhade, R. Agarwal, K.P. Alameen, A.T. Fathima, R. Sridharan,

G. Gopakumar, “Educational data mining for predicting students’

academic performance using machine learning algorithms,” Materials

Today: Proceedings, 2021. doi: 10.1016/j.matpr.2021.05.646

[30].

P. Chaudhury, and H.K. Tripathy, “A novel academic performance

estimation model using two-stage feature selection,” Indonesian

Journal of Electrical Engineering and Computer Science, Vol. 19, No.

3, September 2020, pp. 1610-619, DOI: 10.11591/ijeecs. v19.i3. pp

1610-1619.

[31].

S. Mohamed, and A. Ezzati, “A data mining process using

classification techniques for employability prediction, and Future

Opportunities,” Indonesian Journal of Electrical Engineering and

Computer Science, vol. 14, no. 2, May 2019, pp. 1025-1029, DOI:

10.11591/ijeecs. v14.i2. pp1025-1029.

135

Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.

ResearchGate has not been able to resolve any citations for this publication.

Using Machine Learning Techniques to Predict Learner Drop-out Rate in Higher Educational Institutions

Article

Full-text available

Nov 2022

Recently, students dropping out of school at the tertiary level without prior notice or permission has intrigued deep concern among academic authorities, instructors, and counsellors. It has therefore become necessary to understand factors that lead to high attrition rates among learners and identify at-risk students for urgent academic counselling. In providing a proactive response to learner attrition, the study deployed a machine learning algorithm with high model accuracy to predict students’ drop-out rates and identify dominant attributes that affect learner attrition and retention. An attrition model was built and validated among support vector machine, decision tree, multilayer perceptron, and random forest algorithms. The machine learning algorithms were tested for accuracy, precision, recall, F-measure, and ROC using the 10-fold and the 5-fold comparative cross-validation techniques. In addition to the cross-validation technique, the chi-square feature selection mechanism was implemented to understand the algorithms’ training time and accuracy. The random forest emerged as the best-performing algorithm, with an accuracy of 70.98% and 69.74% for the 10-fold and the 5-fold cross-validation implementations, respectively.

Predicting Student Academic Performance at Higher Education Using Data Mining A Systematic Review

Article

Full-text available

Sep 2022

Recently, educational institutions faced many challenges. One of these challenges is the huge amount of educational data that can be used to discover new insights that have a signicant contribution to students, teachers, and administrators. Nowadays, researchers from numerous domains are very interested in increasing the quality of learning in educational institutions in order to improve student success and learning outcomes. Several studies have been made to predict student achievement at various levels. Most of the previous studies were focused on predicting student performance at graduation time or at the level of a specic course. e main objective of this paper is to highlight the recently published studies for predicting student academic performance in higher education. Moreover, this study aims to identify the most commonly used techniques for predicting the student’s academic level. In addition, this study summarized the highest inuential features used for predicting the student academic performance where identifying the most inuential factors on student’s performance level will help the student as well as the policymakers and will give detailed insights into the problem. Finally, the results showed that the RF and ensemble model were the most accurate models as they outperformed other models in many previous studies. In addition, researchers in previous studies did not agree on whether the admission requirements have a strong relationship with students’ achievement or not, indicating the need to address this issue. Moreover, it has been noticed that there are few studies which predict the student academic performance using students’ data in arts and humanities major.

Metaheuristics Method for Classification and Prediction of Student Performance Using Machine Learning Predictors

Article

Full-text available

Jul 2022
MATH PROBL ENG

Over the last few decades, there has been a gradual deterioration in higher education in all three areas: the academic setting (both staff and students), as well as research and development output (including graduates). All colleges and universities are essentially focused on improving management decision-making and educating pupils. High-quality higher education can be obtained through a variety of methods. One method is to accurately forecast pupils’ achievement in their chosen educational context. There are numerous prediction models from which to pick. While it is unclear whether there are any markers that can predict whether a kid will be an academic genius, a dropout, or an average performer, the researcher reports student achievement. This article presents a metaheuristics and machine learning-based method for the classification and prediction of student performance. Firstly, features are selected using a relief algorithm. Machine learning classifiers such as BPNN, RF, and NB are used to classify student academic performance data. BPNN is having better accuracy for classification and prediction of student academic performance.

Assessment and Evaluation of Different Machine Learning Algorithms for Predicting Student Performance

Article

Full-text available

May 2022
Comput Intell Neurosci

Student performance is crucial to the success of tertiary institutions. Especially, academic achievement is one of the metrics used in rating top-quality universities. Despite the large volume of educational data, accurately predicting student performance becomes more challenging. e main reason for this is the limited research in various machine learning (ML) approaches. Accordingly, educators need to explore e ective tools for modelling and assessing student performance while recognizing weaknesses to improve educational outcomes. e existing ML approaches and key features for predicting student performance were investigated in this work. Related studies published between 2015 and 2021 were identi ed through a systematic search of various online databases. irty-nine studies were selected and evaluated. e results showed that six ML models were mainly used: decision tree (DT), arti cial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), linear regression (LinR), and Naive Bayes (NB). Our results also indicated that ANN outperformed other models and had higher accuracy levels. Furthermore, academic, demographic, internal assessment, and family/personal attributes were the most predominant input variables (e.g., predictive features) used for predicting student performance. Our analysis revealed an increasing number of research in this domain and a broad range of ML algorithms applied. At the same time, the extant body of evidence suggested that ML can be bene cial in identifying and improving various academic performance areas.

Student Performance Prediction in Mathematics Course Based on the Random Forest and Simulated Annealing

Article

Full-text available

Mar 2022

Educational data mining is becoming a more and more popular research field in recent years, mainly with the help of cross research conducted by various disciplines, so as to solve various difficult problems in the teaching and education process. In this paper, we proposed a hybrid approach for student performance prediction. We collected the dataset, including 15 characteristics of students from three categories (individual basic information, individual education information, and individual behavior information). Based on the random forest (RF) and simulated annealing (SA) algorithms, we binary encode the relevant parameters (number of features, tree size, and tree decision weights) as the target variables for algorithm optimization, use the out-of-bag error as the optimization objective function, and then propose the IRFC (improved random forest classifier) algorithm in this paper. Compared with other mainstream improved random forest algorithms, the research results demonstrate that the proposed algorithm in this paper has higher generalization ability and smaller OOB error. This study provides a methodological reference for the prediction of student achievement and also makes a marginal contribution to student management work.

Educational data mining: prediction of students' academic performance using machine learning algorithms

Article

Full-text available

Mar 2022

Mustafa Yagci

Educational data mining has become an effective tool for exploring the hidden relationships in educational data and predicting students' academic achievements. This study proposes a new model based on machine learning algorithms to predict the final exam grades of undergraduate students, taking their midterm exam grades as the source data. The performances of the random forests, nearest neighbour, support vector machines, logistic regression, Naïve Bayes, and k-nearest neighbour algorithms, which are among the machine learning algorithms, were calculated and compared to predict the final exam grades of the students. The dataset consisted of the academic achievement grades of 1854 students who took the Turkish Language-I course in a state University in Turkey during the fall semester of 2019–2020. The results show that the proposed model achieved a classification accuracy of 70–75%. The predictions were made using only three types of parameters; midterm exam grades, Department data and Faculty data. Such data-driven studies are very important in terms of establishing a learning analysis framework in higher education and contributing to the decision-making processes. Finally, this study presents a contribution to the early prediction of students at high risk of failure and determines the most effective machine learning methods.

Predicting students at risk of academic failure using ensemble model during pandemic in a distance learning system

Article

Full-text available

Dec 2021

Predicting students at risk of academic failure is valuable for higher education institu- tions to improve student performance. During the pandemic, with the transition to compulsory distance learning in higher education, it has become even more impor- tant to identify these students and make instructional interventions to avoid leaving them behind. This goal can be achieved by new data mining techniques and machine learning methods. This study took both the synchronous and asynchronous activity characteristics of students into account to identify students at risk of academic failure during the pandemic. Additionally, this study proposes an optimal ensemble model predicting students at risk using a combination of relevant machine learning algo- rithms. Performances of over two thousand university students were predicted with an ensemble model in terms of gender, degree, number of downloaded lecture notes and course materials, total time spent in online sessions, number of attendances, and quiz score. Asynchronous learning activities were found more determinant than synchro- nous ones. The proposed ensemble model made a good prediction with a specificity of 90.34%. Thus, practitioners are suggested to monitor and organize training activities accordingly.

An empirical assessment of different kernel functions on the performance of support vector machines

Article

Full-text available

Dec 2021

Artificial intelligence (AI) and machine learning (ML) have influenced every part of our day-today activities in this era of technological advancement, making a living more comfortable on the earth. Among the several AI and ML algorithms, the support vector machine (SVM) has become one of the most generally used algorithms for data mining, prediction and other (AI and ML) activities in several domains. The SVM's performance is significantly centred on the kernel function (KF); nonetheless, there is no universal accepted ground for selecting an optimal KF for a specific domain. In this paper, we investigate empirically different KFs on the SVM performance in various fields. We illustrated the performance of the SVM based on different KF through extensive experimental results. Our empirical results show that no single KF is always suitable for achieving high accuracy and generalisation in all domains. However, the gaussian radial basis function (RBF) kernel is often the default choice. Also, if the KF parameters of the RBF and exponential RBF are optimised, they outperform the linear and sigmoid KF based SVM method in terms of accuracy. Besides, the linear KF is more suitable for the linearly separable dataset.

Contributions of Machine Learning Models towards Student Academic Performance Prediction: A Systematic Review

Article

Full-text available

Oct 2021

Machine learning is emerging nowadays as an important tool for decision support in many areas of research. In the field of education, both educational organizations and students are the target beneficiaries. It facilitates the educational sector in predicting the student’s outcome at the end of their course and for the students in deciding to choose a suitable course for them based on their performances in previous exams and other behavioral features. In this study, a systematic literature review is performed to extract the algorithms and the features that have been used in the prediction studies. Based on the search criteria, 2700 articles were initially considered. Using specified inclusion and exclusion criteria, quality scores were provided, and up to 56 articles were filtered for further analysis. The utmost care was taken in studying the features utilized, database used, algorithms implemented, and the future directions as recommended by researchers. The features were classified as demographic, academic, and behavioral features, and finally, only 34 articles with these features were finalized, whose details of study are provided. Based on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability to predict the students’ performance based on specified features as categorized and can be used by students as well as academic institutions. A specific machine learning model identification for the purpose of student academic performance prediction would not be feasible, since each paper taken for review involves different datasets and does not include benchmark datasets. However, the application of the machine learning techniques in educational mining is still limited, and a greater number of studies should be carried out in order to obtain well-formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work.

Comparison of the feature selection algorithm in educational data mining

Article

Dec 2021

Evaluation of Random Forest and Support Vector Machine Models in Educational Data Mining

Abstract and Figures

Recommended publications

A systematic review of the literature on machine learning application of determining the attributes...

Heart Disease Detection Model Using Support Vector Machine with Feature Selection

A Comparative Analysis of Feature Selection Algorithms in Order to Improve Students Success

Boltzmann stacked classification data mining model for the student performance improvement in academ...