ArticlePDF Available

Application of Machine Learning Algorithms to Predict Students Performance

June 2020
International Journal of Advanced Research in Computer Science 29(5):7249-7261

June 2020
29(5):7249-7261

Authors:

Randhir Singh

Delhi Technological University

Saurabh Pal

Veer Bahadur Singh Purvanchal University

Student's performance is a major problem for the society. Rapid growth of technologies and the application of differentmachine learning methodsin present years, the development of good models increase the progress of student's performance progress have become more and more accurate. Therefore, development of machine learning techniques, which can effectivelypredict student's performance, is of vastimportance.In this research paper, we apply five different data mining techniques Passive Aggressive Classifier (PAC), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Radius Neighbour Classifier (RNC) and Extra Tree (ET) and then compare the results of five machine learning algorithms to choose the best performing algorithm. We use educational data toanalysis differentmachine learning techniques to evaluate the performance of student. The results obtained by different machine learning algorithms are discussed in this paper and we get the highest accuracy in the case of Support Vector Machine (SVM). Various metrics are also evaluated to verify the results of accuracy like sensitivity, specificity and precision. These results can be applied on the new coming students to check whether they perform well or not and by knowing the non-performing students, higher educational institutions can pay attention for improving student's performance.

Formulas

…

Figures - uploaded by Saurabh Pal

Content may be subject to copyright.

Content uploaded by Saurabh Pal

Content may be subject to copyright.

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7249

ISSN: 2005-4238 IJAST

Application of Machine Learning Algorithms to Predict

Students Performance

Randhir Singh1 and Saurabh Pal2

1Research Scholor, Computer Applications, VBS Purvanchal University, Jaunpur.

2Dept. of Computer Applications, VBS Purvanchal University, Jaunpur, UP, India

* Corresponding author’s Email: drsaurabhpal@yahoo.co.in

Abstract

Student’s performance is a major problem for the society. Rapid growth of technologies and the

application of differentmachine learning methodsin present years, the development of good models

increase the progress of student’s performance progress have become more and more accurate.

Therefore, development of machine learning techniques, which can effectivelypredict student’s

performance, is of vastimportance.In this research paper, we apply five different data mining

techniques Passive Aggressive Classifier (PAC), Support Vector Machine (SVM), Linear Discriminant

Analysis (LDA), Radius Neighbour Classifier (RNC) and Extra Tree (ET) and then compare the results

of five machine learning algorithms to choose the best performing algorithm. We use educational data

toanalysis differentmachine learning techniques to evaluate the performance of student.

The results obtained by different machine learning algorithms are discussed in this paper and we get

the highest accuracy in the case of Support Vector Machine (SVM).Various metrics are also evaluated

to verify the results of accuracy like sensitivity, specificity and precision. These results can be applied

on the new coming students to check whether they perform well or not and by knowing the non-

performing students, higher educational institutions can pay attaint ion for improving student’s

performance.

Keywords: Educational Data Mining; Support Vector Machines, Radius Neighbor Classifier, Linear

Discriminant Analysis, Passive Aggressive Classifier.

1. Introduction

The quality of an academic institution is depend on the performance of student and

dropout rate between the enrolled students in a course and finally completed the

course. The dropout rate is high because students do not know whether the course in

which they are going to take admission is suitable for their study or not. In India

parents forced the student to take admission in Engineering or professional courses

without knowing their interest and this is the main reason of the dropout and low

performance.

Educational Data Mining (EDM) is an area focusing to use technologies and data

mining techniques in the teaching environment. EDM relates to the machine learning

for identifying hidden patterns within huge academic data, to develop data mining and

statistical methods, research and implementation, which would provide fruitful results

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7250

ISSN: 2005-4238 IJAST

for the students. Data mining is a complicated procedure and needs multi-step for

identifying hidden patterns. Data mining has a cross-cutting regulation which requires

for merging knowledge from all walks of life [1]. Higher Education uses this

information to help to focus on those poor students, who have a higher risk of failure

by providing classification features [2-6]. Data searched by mining techniques,

knowledge, institutions of higher learning will not be limited to make better decisions

in a variety of ways, Students make more advanced plan to the instructions, will be

able to predict individual behavior with high accuracy and organization allocated

more effectively the resources and staff. . This results in improved effectiveness and

efficiency of the processes [3]. Data is a form of classification data mining Analysis

that could be critical data used to describe the classes or remove models to predict the

set of data for the future. Classification process has two steps, the first step learning

process; Training data will be followed by the classification algorithms. Learned

models or classification rules will be represented as. Next, the second stage

classification process where the classification model used test data to estimate the

accuracy of the classifier.

The keypurpose of this research paper is to develop an efficient predictive modelwith

the help of Passive Aggressive Classifier (PAC), Support Vector Machine (SVM),

Linear Discriminant Analysis (LDA), Radius Neighbour Classifier (RNC) and Extra

Tree (ET) to predict student’s performance into performer or not-performer

students.The performer and non-performer students are predicted by student’s dataset.

2. Related Work:

The literature related to the use of machine learning technology in field of higher

education mainly focuses on the application of technologies such as clustering,

association rules, classification, regression and statistics to predict, the performance of

the developed model. Educational data mining (EDM) researchers provide other

aspects related to academic activities, including identifying factors related to student

success, failure, and intention to drop out [7-9], institutional planning and strategies

[10-11], and understanding Teacher support and administrative decision-making.

The applications of machine learning techniques in the field of higher education is

still in primary stage and it’s need more consideration. Educational Data Mining in

the field of education mainly to improvestudents’performance with the help of

learning process by identifying, extracting and evaluating attributes related to the

students characteristics[1]. With the help of educational data mining we can improve

the decision making and implement policies for student that helps institutions of

higher education today [12-13].

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7251

ISSN: 2005-4238 IJAST

Ashwin Satyanarayana, and Mariusz Nuckowski conducted a survey over 788 school

student dataset which includes 37 questions each. In this research multiple classifiers

were used like Decision Trees, Naïve Bayes and Random Forest to get accuracy over

student’s data by eliminating noisy instances. From this study association rules were

identified that affect student’s results using a set of rule based techniques like Apriori,

Filtered Associator and Tertius. In this regard the result was found that prior work

there was no filtering on student data has been performed and focused on using single

classifiers. So in this study comparison of single filters and ensemble filters was done

and it is concluded that ensemble filters works better for identifying and removing

noisy instances [14].

Another comparative study was done by Bhrigu kapur, Nakin Ahluwalia and

Sathyaraj (2017). They compared six algorithms like J48, Random Forest, Naïve

Bayes, Naïve Bayes Multinomial, K-Star and IBK. They used 480 entry of data set

and implemented through Weka tool. The Survey conducted based on seven attributes

and found Random Forest algorithm provides more accuracy compared to other

algorithms [15].

Various previous works has been done by Pal et. al [16-19] to improve the

performance of the prediction using different data mining techniques and they provide

a better results which are also applicable at various institution to find the weak

students.

K. Prasada Rao et. al [20] conducted a survey over 200 college students. In this

research classification techniques were used on student database to predict the

learning behavior of student’s. From this research, the researcher identified the slow

learners, and effectively the action taken to rectify the failures and take appropriate

action to qualify the weaker students in perfect manner. In this study the performance

of J48, Naïve Bayes and Random forest algorithms were compared. Finally the

researcher got accuracy using Random forest algorithm when the data set is in

massive size.

A research carried out by the team [21] (2016), and the performance of the student’s

were predicted. Classification techniques were used to create prediction module of the

system to predict the future values. Various parameters like previous academic

performance were considered to predict student’s academic results and placement.

The dashboard is the module which describes the whole overview of the institution in

a graphical representation of data. Decision tree algorithms ID3 and C4.5 were

implemented to generate reports based on structured database. From this research ID3

algorithm provided the best accuracy of 95.33%.

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7252

ISSN: 2005-4238 IJAST

3. Methods

Machine Learning is the technique for developingnew algorithms, which

provides computer the capabilityto learn from previously stored information’s.

In this research paper different machine learning classifiers are used. (i) PAC

(ii) SVM, (iii) LDA (iv) RNCand (v)ET. A brief description of the classifiers

used are described below.

 Passive Aggressive Classifier (PAC):

PAC algorithm is a set of algorithms which are used for comprehensive

learning. Passive-aggressive algorithms are very similar to Multilayer

Perceptron except learning rate is not required. But, converse to Perceptron,

Passive-aggressive algorithmscomprise a regularization variable C.

 Support Vector Machine (SVM):

Support Vector machine is discriminative classifier used in supervised

learning problems i.e. given labeled training data, and finds out the line (or

hyperplane) in a multidimensional space, which separate outs classes

 LinearDiscriminantAnalysis (LDA):

This algorithm is also known as attribute reduction method. LDA is

supervised machine learning technique. This method reduced the attributes as

less as possible in a dataset without affecting the results of the classification.

Linear Discriminant Analysis, or LDA, uses the information from all reduced

features to create a new axis and projects the data on to the new axis in such a

way as to minimizes the variance and maximizes the distance between the

means of the two classes.

 RadiusNeighborsClassifier (RNC):

RNC is a type of KNeighborsClassifier (Radius based learning algorithms).

RNC returns the indices and distances of each data points from the dataset

lying in a ball with size radius around the points of the query array. Points

lying on the boundary are included in the results.

 Extra Tree (ET):

This method is an ensemble method which stands for Extremely Randomized

Trees. The main objective of this algorithm is to further randomizing tree

building in the context of numeric input features, where the choice of the

optimal cut-point is responsible for a large proportion of the variance of the

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7253

ISSN: 2005-4238 IJAST

induced tree. It often leads to increased accuracy when compared to the

ordinary random forest.

Fig. 1 shows the structure of methodology used in this research paper.

Figure. 1. Methodological approach for Performance Prediction

3.1 Dataset Analysis:

The data used in this study is of Bachelor of Computer Applications programme,

which has been collected from United Institute of Management, Prayagraj. The BCA

course is divided in 3 years which consist of two semester per year, therefore total six

semester examination completes the whole BCA course. In this research paper we

have taken count of only final semester results. The data is collected with the

permission of examination and admission departments from the year 2014 to 2019

and total number of students passed from the institution is 1000, therefore total 1000

instances are available with 22 attributes; these attributes are collected from the

registration as well as examination form. The target and other variables discussed in

this study are listed in table 1.

Table 1: Student Dataset

Feature

Attribute

Domain

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7254

ISSN: 2005-4238 IJAST

Sex of Students

1= Female, 2=Male

Students category

1= General, 2=OBC, 3=SC, 4=ST,

5=Minority

Discussion at home

1=Always, 2=Almost Always,

3=Sometimes, 4= Never

Own Computer /Laptop

1=Yes, 2=No

Laptop shared with

family

1=Yes, 2=No

Study desk at home

1=Yes, 2=No

Own mobile phone

1=Yes, 2=No

Own Gaming system

1=Yes, 2=No

Heating/Cooling

systems at

1=Yes, 2=No

S10

Absent from school

1=Once a week or more, 2=Once

every two weeks, 3=Once a month,

4=Never or almost never

S11

How often use

computer/Laptop at

home

1=Every day or almost every day,

2=Once or twice a week, 3=Once or

twice a month, 4=Never or almost

never

S12

How often use computer

at School

1=Every day or almost every day,

2=Once or twice a week, 3=Once or

twice a fifteen days, 4= Once or twice

in a month, 5=Never or almost never

S13

Access textbooks

1=Yes, 2=No

S14

Completed assignments

1=Yes, 2=No

S15

Collaborate with

classmates

1=Yes, 2=No

S16

Communicate with

1=Yes, 2=No

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7255

ISSN: 2005-4238 IJAST

teacher

S17

Students grade in Senior

Secondary education

1 =90% -100%, 2= 80% - 89%, 3=

70% - 79%, 4= 60% - 69%, 5= 50% -

59%, 6= 40% - 49%, 7= < 40%

S18

Fathers qualification

1=elementary, 2=secondary,

3=graduate/ost-graduate, 4=doctorate

S19

Mother’s Qualification

1=elementary, 2=secondary,

3=graduate/ost-graduate, 4=doctorate

S20

Father’s Occupation

1=Service, 2=business, 3=not-

applicable

S21

Mother’s Occupation

1=House-wife , 2=Service,

3=business, 4=not-applicable

S22

Grade obtained in

B.C.A

1= > 60%, 2= >45 &<60%, 3= >36

&<45%, 4= < 36%}

3.2 DataPreprocessing:

The methodology proposed in this research paper starts with data preprocessing.

Data preprocessing step includes (i) a data driven method to select students’

records and selecting important variables for analysis and (ii) The collected data

from students records are not clean and may include noise, incorrect, missing

values, or inconsistent data. So we have to apply different method of data cleaning

to clean such anomalies. The Experimental investigation is made using the

students data set collected from the United Institute of Management, PrayagRaj.

The dataset had descriptions, 1028 instances, in 22 dimensions. The noise and

missing values present in the dataset may impact the predictive ability of the

machine learning model. Hence students dataset is extensively pre-processed

using a normal scalar using the equation

𝑥𝑁=(𝑥 − 𝑥𝑚𝑒𝑎𝑛 )

𝑆𝐷

Where

𝑥𝑁= Normalized value of x,

x=Original value of x,

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7256

ISSN: 2005-4238 IJAST

𝑥𝑚𝑒𝑎𝑛 =Mean value of x and

SD=Standard deviation of the given population.

4. Results and Discussion

Before conducting the experiment, we first visualize the values of attributes shown in

Fig 2. Figure shows the histogram of all attributes related to student dataset which

consists 1000 instances and 22 attributes. Each attribute represent the bar of frequency

of different values excluding the feature S22 (which is target attribute Result).

Figure2. Dataset visualization using histogram

The experiment is performed on student’s dataset using Python code along with the

supporting packages such as Scikit-learn, Pandas and Numpy etc. The student dataset

is divided into 80% training set and 20% test dataset.

Another diagram that helps summarize the observed distribution is the box and the

whisker. The plot draws a 25th and 75th percentile around the data that captures the

middle 50% of the observations. Draw a line at the 50th percentile (median) and draw

whiskers above and below the box to summarize the general range of observations.

Draw points for outliers outside the data or for outliers outside the range. The box and

whisker plot of data set is shown in Fig.3.

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7257

ISSN: 2005-4238 IJAST

Figure3. Dataset visualization using Box and Whisker Plot

In this paper Python code is used to represent the different graphs and to evaluate

accuracy, precision, recall and sensitivity of the different machine learning

techniques initially.Python programming is chosen because the codes for different

classifiers have been defined in the form of predefined modules.

The Performance of classifiers is the most important metrics for any predictive model

especially when the model is built for the performance prediction. A wrong prediction

may have to pay a heavy cost of student. Hence, the selection of a performance

metrics plays a very crucial role in performance prediction. In data analysis system,

there are a number of performance metrics such as accuracy, sensitivity, precision and

specificity which are shown in table 2.

Table 2: Formulas

Sr. No.

Performance Metrics

Formula

Accuracy

(𝑇𝑃 +𝑇𝑁)

(𝑇𝑃 +𝐹𝑃 +𝑇𝑁 +𝐹𝑁)

Sensitivity

𝑇𝑃

(𝑇𝑃 +𝐹𝑁)

Specificity

𝐹𝑃

(𝐹𝑃 +𝑇𝑁)

Precision

𝑇𝑃

(𝑇𝑃 +𝐹𝑃)

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7258

ISSN: 2005-4238 IJAST

The value calculated by five classifiers is shown in Table 3.

Table 3:Output of Evaluating Algorithms

Classifier

Accuracy

Precision

Sensitivity

Specificity

PCA

89.51

84.95

62.38

97.25

SVM

94.86

89.17

63.1

98.65

LDA

93.21

88.26

59.23

98.25

RNC

87.23

91.23

66.27

92.89

91.27

90.04

70.23

93.44

A high accuracy score of a classifier does not ensure that the classifier correctly

predicts the desired results. A high accuracy may be the result of more number of

correct predictions of true negative cases. Therefore, only achieving high accuracy of

a classifier cannot be considered as a good measure for a classification algorithm.

In a predictive analysis system a wrong prediction may be a false-negatives or false-

positives. The cost of these two wrong predictions may vary from one system to other

system. In one system, a false-negative result may incur more cost from a false-

positive result. For example in a performance prediction system such as good

performance prediction, classifier cannot afford a wrong prediction about a student

which is actually bad performer (TRUE- POSITIVE) and is predicted as non-

performer (FALSE-NEGATIVE). So we need a model in which the chance of false-

positives and false-negatives is less. In other words, its precision should be high since

number of false-positives is less, similarly recall should be also high because it shows

lower number of false-negatives. A high precision and high recall of classifier ensures

that it predicts less number of False-Positive and False-Negative results. In case of

high false-positives cost, precision will be good measure and recall is for high false-

negative cost. Accuracy will be a good measure if cost of false positives and false

negatives are nearly same but when it is different precision and sensitivity both are

considered.

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7259

ISSN: 2005-4238 IJAST

Figure4. Accuracy of different algorithms

5. Conclusion

The aim of the proposed work is to build efficient framework which can extensively

improve Performance accuracy of students. Machine learning techniques are widely

used in student’s performance prediction. Knowledge gained with the help of machine

learning techniques can be used to make successful and effective decisions that

improve and develop student’s performance.This paper describes different machine

learning techniques for evaluating the performance of students.Five machine learning

techniques PCA, SVM, LDA,RNC and ET are used to classify the prediction of

students. The best accuracy find among these different techniques is 94.86% from

SVM. The second highest accuracy obtained is 93.21% in the case of LDA.

We get the highest accuracy in the literature available on student’s performance

prediction. The machine learning-based method reduces generation errors and obtains

more information by using the first-stage prediction as a feature rather than a separate

training. In addition, by using machine learning, the complex relationships between

classifiers are automatically learned, enabling the collection method for better

predictions.

These results can be used to pay a more attention on the non-performing students to

improve their performance and the quality of higher educational institute.

100

PCA

SVM

LDA

RNC

Accuracy

Precision

Sensitivity

Specificity

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7260

ISSN: 2005-4238 IJAST

References

[1]. B.Baradwaj, S.pal, “Mining Educational Data to Analyze Students’ Performance” (IJACSA)

International Journal Of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011.

[2]. Paulo Cortez and Alice Maria Gonçalves Silva. 2008. Using data mining to predict secondary

school student performance. In: Proceedings of 5th Annual Future Business Technology

Conference, Porto, 5-12.

[3]. D. Michie, D.J. Spiegelhalter, and C.C. Taylor, "Machine Learning, Neural and Statistical

Classification", Ellis Horwood Series in Artificial Intelligence, 1994.

[4]. H. E. Erdem, “A cross-sectional survey in progress on factors affecting students’ academic

performance at a Turkish university,” Procedia-Social and Behavioral Sciences, vol. 70, pp.

691-695, 2013.

[5]. G. Elakia and N. J. Aarthi, “Application of data mining in educational database for predicting

behavioural patterns of the students,” International Journal of Computer Science and

Information Technologies, pp. 4649-4652, 2014.

[6]. S. Parack and F. Z. Zahid, “Application of data mining in educational databases for predicting

academic trends and patterns, in: Technology Enhanced Education (ICTEE),” IEEE

International Conference on, IEEE, pp. 1-4, 2012.

[7]. Cambruzzi, W.L., Rigo, S.J., Barbosa, J.L., 2015. Dropout prediction and reduction in

distance education courses with the learning analytics multitrail approach. J. UCS 21 (1), 23–

47.

[8]. Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., Loumos, V., 2009. Dropout

prediction in e-learning courses through the combination of machine learning techniques.

Comput. Educ. 53 (3), 950–965.

[9]. Marquez-Vera, C., Cano, A., Romero, C., Noaman, A.Y.M., Mousa Fardoun, H., Ventura, S.,

2016. Early dropout prediction using data mining: a case study with high school students.

Exp. Syst. 33 (1), 107–124.

[10]. Caputi, V., Garrido, A., 2015. Student-oriented planning of e-learning contents for Moodle. J.

Network Comput. Appl. 53, 115–127.

[11]. Mankad, S.H., 2016. Predicting learning behaviour of students: Strategies for making the

course journey interesting. In: Paper presented at the Intelligent Systems and Control (ISCO),

2016 10th International Conference on.

[12]. Sacin, C.V., Agapito, J.B., Shafti, L., Ortigosa, A., 2009. Recommendation in higher

education using data mining techniques. In: Paper presented at the Educational Data Mining

2009.

[13]. Ji, H., Park, K., Jo, J., Lim, H., 2016. Mining students activities from a computer supported

collaborative learning system based on peer to peer network. Peer-to-Peer Netw. Appl. 9 (3),

465–476.

[14]. Ashwin Satyanarayana, Mariusz Nuckowski, “Data Mining using Ensemble Classifiers for

Improved Prediction of Student Academic Performance” Spring '2016' Mid. Atlantic 'ASEE'

Conference, 'April' 8.9,'2016' GWU.

[15]. Bhrigu Kapur, Nakin Ahluwalia and Sathyaraj R, “Comparative Study on Marks Prediction

using Data Mining and Classification Algorithms”, International Journal of Advanced

Research in Computer Science, 8 (3), March-April 2017,632-636.

[16]. Pandey, U.K. and Pal, S., 2011. Data Mining: A prediction of performer or underperformer

using classification. (IJCSIT) International Journal of Computer Science and Information

Technologies, Vol. 2 (2), 2011, 686-690.

International Journal of Advanced Science and Technology

Vol. 29, No. 5, (2020), pp. 7249-7261

7261

ISSN: 2005-4238 IJAST

[17]. Bhardwaj, B.K. and Pal, S., 2012. Data Mining: A prediction for performance improvement

using classification. (IJCSIS) International Journal of Computer Science and Information

Security, Vol. 9, No. 4, 2011.

[18]. Yadav, S.K., Bharadwaj, B. and Pal, S., 2012. Data Mining Applications: A Comparative

Study for Predicting Student’s Performance. International Journal of Innovative Technology

& Creative Engineering (ISSN: 2045-711), Vol. 1, No.12.

[19]. Yadav, S.K. and Pal, S., 2012. Data mining: A prediction for performance improvement of

engineering students using classification. World of Computer Science and Information

Technology Journal (WCSIT). (ISSN: 2221-0741), Vol. 2, No. 2, 51-56, 2012.

[20]. Prasada Rao, K. , M. V.P. Chandra Sekhara, and B. Ramesh. "Predicting Learning Behavior

of Students using Classification Techniques." International Journal of Computer Applications

(0975 – 8887) Volume 139 – No.7, April 2016.

[21]. Siddhi Parekh, Ameya Nadkarni, and Riya Mehta (2016) "Results and Placement

Analysis and Prediction using Data Mining and Dashboard." International Journal of Computer

Applications (0975 – 8887) Volume 137 – No.13, March 2016 In Proceedings of the 22nd

International Conference on World Wide Web, pp. 413-418. ACM.

An Ensemble Modeling Approach to Enhance Grade Prediction in Academic Engineering Programming Courses

Article

Full-text available

Nov 2023

Predicting the future academic grades of students can play a pivotal role in enhancing their performance in specific courses, consequently yielding a positive impact on their prospective academic, professional, and personal achievements, as well as on society at large. The field of programming is rapidly gaining prominence as an essential profession spanning multiple domains, marked by abundant opportunities and financial rewards. To cater to the diverse interests of students, the recommended curriculum structure for engineering programs in computing adeptly combines theoretical knowledge with practical programming skills. This approach ensures that students acquire a comprehensive understanding of programming courses, allowing them to choose the path that aligns best with their envisioned careers as programmers This research endeavors to introduce ensemble prediction techniques aimed at identifying students who exhibit the potential for advancement, or conversely, those who may not excel in four university-level programming courses. The outcomes of this study are presented alongside valuable performance assessment metrics for five ensemble methodologies, namely AdaBoost, Bagging, Random Forest, Stacking, and Voting. This evaluation employs a 10-fold cross-validation methodology and incorporates the Principal Component Analysis (PCA) for feature ranking. The results unequivocally demonstrate that both the Stacking and Random Forest ensemble approaches have attained the highest level of accuracy when applied to two distinct datasets.

Human Intelligence Analysis through Perception of AI in Teaching and Learning

Article

Full-text available

Jun 2022
Comput Intell Neurosci

Instructional practices have undergone a drastic change as a result of the development of new educational technology. Artificial intelligence (AI) as a teaching and learning technology will be examined in this theoretical review study. To enhance the quality of teaching and learning, the use of artificial intelligence approaches is being studied. Artificial intelligence integration in educational institutions has been addressed, though. Students’ assistance, teaching, learning, and administration are also addressed in the discussion of students’ adoption of artificial intelligence. Artificial intelligence has the potential to revolutionize our social interactions and generate new teaching and learning methods that may be evaluated in a variety of contexts. New educational technology can help students and teachers better accomplish and manage their educational objectives. Artificial intelligence algorithms are used in a hybrid teaching mode in this work to examine students’ attributes and introduce predictions of future learning success. The teaching process may be carried out in a more efficient manner using the hybrid mode. Educators and scientists alike will benefit from artificial intelligence algorithms that may be used to extract useful information from the vast amounts of data collected on human behavior.

CRITICAL LITERATURE REVIEW ON CURRENT STATE-OF-THE ART IN PREDICTING STUDENTS' PERFORMANCE USING MACHINE LEARNING ALGORITHM IN BLENDED LEARNING ENVIRONMENT

Article

Full-text available

Aug 2023

Background of the study: Predicting and analyzing the performance of the student in a blended learning environment is important to help educators identify poor performing students and improve their academic score. Meanwhile, achieving accurate predictions require selecting machine learning techniques that can produce optimum score. However, there seems to be no critical literature review on current state of art in predicting students' performance using machine learning algorithms in blended learning environment. Methodology: This critical literature review focuses on, studies on the current state of the art in predicting students' performance in the blended learning for past 10 years, sources of dataset used by various authors and the machined learning algorithm with high prediction accuracy. Findings: Naïve Bayes was the most frequently used algorithm for predicting students' performance. Authors mostly used online data for their student's performance prediction. Finally, artificial neural network was found to give higher prediction accuracy of 98.7%.

Model for Prediction of Student Grades using Data Mining Algorithms

Article

Full-text available

Mar 2022

There has been a rapid growth in the educational domain since education has become an important need. Data is collected in this domain which can be put to meaningful use to derive a lot of benefits to the students. Predicting student performance can help students and their teachers keep track of student progress. Mining Educational data helps to uncover invisible patterns, relationships, or trends in the unstructured data and helps in delivering logical and meaningful recommendations. Several kinds of research are being conducted across the world to analyze the data regarding student learning to identify the factors affecting performance and to provide support to students to help them improve. It is the objective of the proposed research to conduct a detailed study in the Sultanate of Oman regarding the existing toolsets, systems, and mode of data collection that are used currently in the Education sector for the prediction of Student Grades. Taking this as the baseline, later a model that will feature different prediction algorithms which are more accurate in predicting the grades of a student will be developed. The objective of this research is to understand the various predictive methods used to predict student performance and to propose a machine learning model to predict student grades.

Future Prospects And Developments In Human Intelligence Identification Using Artificial Intelligence

Experiment Findings

Full-text available

Jun 2023

Human beings possess more than one intelligences, each one of them can be defined as the expertise with which an individual functions or solves problems or the area of creativity, which is considered as the basic inherent intelligence in the individual. Dr.Howard Gardner through his research has made it evident to the world the existence of multiple intelligences: Linguistic, logical mathematical, musical, spatial, bodily kinesthetic, interpersonal, intrapersonal, and naturalist intelligence highly depends on several factors like age, understanding of the environment and the extent of isolation of the brain. Various mechanisms like systematic review and meta-analysis using the traditional paper-and-pencil isolated tasks through several activities enabled the assessment of the intelligence inherently present in an individual in one-to-two-hour sessions and further processing of the collected data. By using Artificial Intelligence algorithms an attempt can be made to develop a framework to identify the multiple intelligence in an individual based on Multiple Intelligence Theory.

Artificial Intelligence Algorithms For Future Prospects And Developments In Human Intelligence Identification

Conference Paper

Full-text available

May 2023

Human beings possess not one but multiple intelligences, each one defined as the ability to solve a specific problem or create a product which is perceived as valuable in one or more context. As per Dr. Howard Gardner the existence of eight intelligences: Linguistic, logical mathematical, musical, spatial, bodily kinesthetic, interpersonal, intrapersonal, and naturalist intelligence highly depends on several factors like age, understanding of the environment and the extent of isolation of the brain. Various mechanisms like systematic review and meta-analysis using the traditional paper-and-pencil isolated tasks, consisted in several culturally meaningful activities, always related to professions which enabled the assessment of the psychological processes inherent to each intelligence in one-to-two-hour sessions. These mechanisms would contribute to demonstrating the existence of independent intelligence and, hence, to identify the strengths and weaknesses of individuals to assess human intelligence. By using AI (Artificial Intelligence) algorithms an attempt can be made to identify the multiple intelligence in an individual and help the individual in learning and to succeed in life.

An Efficient Approach of Feature Selection and Metrics for Analyzing the Risk of the Students Using Machine Learning

Conference Paper

Oct 2021

Application of Machine Learning for Higher Education

Article

May 2021

Sampada Taralgatti

A Critical Review on Educational Data Mining Segment: A New Perspective

Conference Paper

Jan 2021

Nowadays, more attention is taken in data mining for research in various fields like educational sector, disease prediction, customer behavior, fraud detection, etc. Data mining techniques are applied to the huge amount of data generated by different domains to predict the hidden patterns and managerial decisions. Today, more attention is given on the application of data mining techniques to the analyzing of educational data, which is also known as educational data mining. The primary goal of educational data mining is to provide quality education to students by predicting the performance of students and to find the drop-out ratio. Data mining considered as stepping stone to the procedure of information detection in databases. Here, an analysis of the obtainable literature on data mining is provided. The knowledge of data mining as well as its an assortment of methodologies is summarized. Some applications, tasks as well as issues associated with it have also been illustrated.

Predict Student Performance using Sentiment Analysis based on Hybrid Approach

Research Proposal

Aug 2020

Denis Goh Ee Kin

My research proposal is about predicting student performance based on the university student feedback/opinion. Hybrid approach (combination of machine learning & lexicon-based) has been proposed as a new solution but this paper is a proposal and I would like a feedback regarding the topic.

Data Mining using Ensemble Classifiers for Improved Prediction of Student Academic Performance

Conference Paper

Full-text available

Apr 2016

In the last decade Data mining (DM) has been applied in the field of education, and is an emerging interdisciplinary research field also known as Educational Data Mining (EDM). One of the goals of EDM is to better understand how to predict student academic performance given personal, socio-economic, psychological and other environmental attributes. Another goal is to identify factors and rules that influence educational academic outcomes. In this paper, we use multiple classifiers (Decision Trees-J48, Naïve Bayes and Random Forest) to improve the quality of student data by eliminating noisy instances, and hence improving predictive accuracy. We also identify association rules that influence student outcomes using a combination of rule based techniques (Apriori, Filtered Associator and Tertius). We empirically compare our technique with single model based techniques and show that using ensemble models not only gives better predictive accuracies on student performance, but also provides better rules for understanding the factors that influence better student outcomes.

Predicting Learning Behavior of Students using Classification Techniques

Article

Full-text available

Apr 2016

Early Dropout Prediction using Data Mining: A Case Study with High School Students

Article

Full-text available

Feb 2016
EXPERT SYST

Early prediction of school dropout is a serious problem in education, but it is not an easy issue to resolve. On the one hand, there are many factors that can influence student retention. On the other hand, the traditional classification approach used to solve this problem normally has to be implemented at the end of the course to gather maximum information in order to achieve the highest accuracy. In this paper, we propose a methodology and a specific classification algorithm to discover comprehensible prediction models of student dropout as soon as possible. We used data gathered from 419 high schools students in Mexico. We carried out several experiments to predict dropout at different steps of the course, to select the best indicators of dropout and to compare our proposed algorithm versus some classical and imbalanced well-known classification algorithms. Results show that our algorithm was capable of predicting student dropout within the first four-six weeks of the course and trustworthy enough to be used in an early warning system.

Mining students activities from a computer supported collaborative learning system based on peer to peer network

Article

Full-text available

Aug 2015

As Information & Communication Technology (ICT) is rapidly evolved, educational paradigms have been changing. The ultimate goal of education with the aid of ICT is to provide customized training for learners to improve the effectiveness of their learning at anytime and anywhere. In the online learning environment where the Internet, mobile devices, peer-to-peer (P2P) and the cloud technology are leveraged, all the information in learning activities is converted into digital data and stored in the Computer Supported Collaborative Learning (CSCL) system. The data in the CSCL system contains various learners’ information including the learning objectives, learning preferences, competences and achievements. Thus, by analyzing the activity information of learners in an online CSCL system, meaningful and useful information can be extracted and provided for learners, teachers and administrators as feedback. In this paper, we propose a learner activity model that represents the learner’s activity information stored in a CSCL system. As for the proposed learner activity model, we classified the learning activities in a CSCL system into three categories: vivacity, learning and relationship; then we created quotients to represent them accordingly. In addition, we developed a CSCL System, which we termed as COLLA, applied the proposed learner activity model and analyzed the results.

Application of Data Mining in Educational Database for Predicting Behavioural Patterns of the Students

Article

Full-text available

Jun 2014

Abstract - Data mining refers to the technique of obtaining hidden, previously unknown and possibly significant knowledge from humongous amount of data. Data mining uses a combination of a vast knowledge base, advanced analytical skills, and domain knowledge to unveil hidden trends and patterns which can be applied in almost any sector ranging from business to medicine, then to Engineering. Nonetheless educational institutes can employ data mining to discover valuable information from their databases known as Educational Data Mining (EDM). Educational data mining requires transformation of existing or innovation of new approaches derived from statistics, machine learning, psychometrics, scientific computing etc. Current system is designed to justify that various data mining techniques which includes classification, can be used in educational databases to suggest career options for the high school students and also to predict the potentially violent behaviour among the students by including extra parameters other than academic details. RapidMiner has been used as Data mining tool. Keywords - Data Mining, Rapid Miner, C4.5 Classification.

Predicting learning behaviour of students: Strategies for making the course journey interesting

Conference Paper

Jan 2016

Sapan H Mankad

This paper focuses on improving higher education with the help of data mining techniques. The emphasis is on analysing specific attributes of students and predict their learning behaviour. A student can be a slow, medium or fast learner depending on his/her activeness in educational activities. Experimental results in this study show that decision tree classifier performs significantly well compared to nearest neighbour and naïve Bayesian algorithms. This paper also describes some strategies to seek students’ attention to stay attached with classroom sessions and complete the course with full involvement.

Results and Placement Analysis and Prediction using Data Mining and Dashboard

Article

Mar 2016

Dropout Prediction and Reduction in Distance Education Courses with the Learning Analytics Multitrail Approach

Article

Jan 2015

Distance Education courses are present in large number of educational institutions. Virtual Learning Environments development contributes to this wide adoption of Distance Education modality and allows new pedagogical methodologies. However, dropout rates observed in these courses are very expressive, both in public and private educational institutions. This paper presents a Learning Analytics system developed to deal with dropout problem in Distance Education courses on university-level education. Several complementary tools, allowing data visualization, dropout predictions, support to pedagogical actions and textual analysis, among others, are available in the system. The implementation of these tools is feasible due to the adoption of an approach called Multitrail to represent and manipulate data from several sources and formats. The obtained results from experiments carried out with courses in a Brazilian university show the dropout prediction with an average of 87% precision. A set of pedagogical actions concerning students among the higher probabilities of dropout was implemented and we observed average reduction of 11% in dropout rates.

Student-oriented planning of e-learning contents for Moodle

Article

Apr 2015

We present a way to automatically plan student-oriented learning contents in Moodle. Rather than offering the same contents for all students, we provide personalized contents according to the students׳ background and learning objectives. Although curriculum personalization can be faced in several ways, we focus on artificial intelligence (AI) planning as a very useful formalism for mapping actions, i.e. learning contents, in terms of preconditions (precedence relationships) and causal effects to find plans, i.e. learning paths that best fit the needs of each student. A key feature is that the learning path is generated and shown in Moodle in a seamless way for both the teacher and student, respectively. We also include some experimental results to demonstrate the scalability and viability of our approach.

Application of data mining in educational databases for predicting academic trends and patterns

Conference Paper

Jan 2012

Data mining is a process of identifying and extracting hidden patterns and information from databases and data warehouses. There are various algorithms and tools available for this purpose. Data mining has a vast range of applications ranging from business to medicine to engineering. In this paper, we discuss the application of data mining in education for student profiling and grouping. We make use of Apriori algorithm for student profiling which is one of the popular approaches for mining associations i.e. discovering co-relations among set of items. The other algorithm used, for grouping students is K-means clustering which assigns a set of observations into subsets. In the field of academics, data mining can be very useful in discovering valuable information which can be used for profiling students based on their academic record. We apply Apriori algorithm to the database containing academic records of various students and try to extract association rules in order to profile students based on various parameters like exam scores, term work grades, attendance and practical exams. We also apply K-means clustering to the same set of data in order to group the students. The implemented algorithms offer an effective way of profiling students which can be used in educational systems.

Application of Machine Learning Algorithms to Predict Students Performance

Abstract and Figures

Recommended publications

Mining Education Data to Predict Student's Retention: A comparative Study

Educational Data Mining & Students’ Performance Prediction

Data Mining: A Prediction for Performance Improvement of EngineeringStudents using Classification

Data Mining Techniques in EDM for Predicting the Performance of Students