ArticlePDF Available

Application of Machine Learning Algorithms to Predict Students Performance

Authors:

Abstract and Figures

Student's performance is a major problem for the society. Rapid growth of technologies and the application of differentmachine learning methodsin present years, the development of good models increase the progress of student's performance progress have become more and more accurate. Therefore, development of machine learning techniques, which can effectivelypredict student's performance, is of vastimportance.In this research paper, we apply five different data mining techniques Passive Aggressive Classifier (PAC), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Radius Neighbour Classifier (RNC) and Extra Tree (ET) and then compare the results of five machine learning algorithms to choose the best performing algorithm. We use educational data toanalysis differentmachine learning techniques to evaluate the performance of student. The results obtained by different machine learning algorithms are discussed in this paper and we get the highest accuracy in the case of Support Vector Machine (SVM). Various metrics are also evaluated to verify the results of accuracy like sensitivity, specificity and precision. These results can be applied on the new coming students to check whether they perform well or not and by knowing the non-performing students, higher educational institutions can pay attention for improving student's performance.
Content may be subject to copyright.
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7249
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
Application of Machine Learning Algorithms to Predict
Students Performance
Randhir Singh1 and Saurabh Pal2
1Research Scholor, Computer Applications, VBS Purvanchal University, Jaunpur.
2Dept. of Computer Applications, VBS Purvanchal University, Jaunpur, UP, India
* Corresponding author’s Email: drsaurabhpal@yahoo.co.in
Abstract
Student’s performance is a major problem for the society. Rapid growth of technologies and the
application of differentmachine learning methodsin present years, the development of good models
increase the progress of student’s performance progress have become more and more accurate.
Therefore, development of machine learning techniques, which can effectivelypredict student’s
performance, is of vastimportance.In this research paper, we apply five different data mining
techniques Passive Aggressive Classifier (PAC), Support Vector Machine (SVM), Linear Discriminant
Analysis (LDA), Radius Neighbour Classifier (RNC) and Extra Tree (ET) and then compare the results
of five machine learning algorithms to choose the best performing algorithm. We use educational data
toanalysis differentmachine learning techniques to evaluate the performance of student.
The results obtained by different machine learning algorithms are discussed in this paper and we get
the highest accuracy in the case of Support Vector Machine (SVM).Various metrics are also evaluated
to verify the results of accuracy like sensitivity, specificity and precision. These results can be applied
on the new coming students to check whether they perform well or not and by knowing the non-
performing students, higher educational institutions can pay attaint ion for improving student’s
performance.
Keywords: Educational Data Mining; Support Vector Machines, Radius Neighbor Classifier, Linear
Discriminant Analysis, Passive Aggressive Classifier.
1. Introduction
The quality of an academic institution is depend on the performance of student and
dropout rate between the enrolled students in a course and finally completed the
course. The dropout rate is high because students do not know whether the course in
which they are going to take admission is suitable for their study or not. In India
parents forced the student to take admission in Engineering or professional courses
without knowing their interest and this is the main reason of the dropout and low
performance.
Educational Data Mining (EDM) is an area focusing to use technologies and data
mining techniques in the teaching environment. EDM relates to the machine learning
for identifying hidden patterns within huge academic data, to develop data mining and
statistical methods, research and implementation, which would provide fruitful results
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7250
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
for the students. Data mining is a complicated procedure and needs multi-step for
identifying hidden patterns. Data mining has a cross-cutting regulation which requires
for merging knowledge from all walks of life [1]. Higher Education uses this
information to help to focus on those poor students, who have a higher risk of failure
by providing classification features [2-6]. Data searched by mining techniques,
knowledge, institutions of higher learning will not be limited to make better decisions
in a variety of ways, Students make more advanced plan to the instructions, will be
able to predict individual behavior with high accuracy and organization allocated
more effectively the resources and staff. . This results in improved effectiveness and
efficiency of the processes [3]. Data is a form of classification data mining Analysis
that could be critical data used to describe the classes or remove models to predict the
set of data for the future. Classification process has two steps, the first step learning
process; Training data will be followed by the classification algorithms. Learned
models or classification rules will be represented as. Next, the second stage
classification process where the classification model used test data to estimate the
accuracy of the classifier.
The keypurpose of this research paper is to develop an efficient predictive modelwith
the help of Passive Aggressive Classifier (PAC), Support Vector Machine (SVM),
Linear Discriminant Analysis (LDA), Radius Neighbour Classifier (RNC) and Extra
Tree (ET) to predict student’s performance into performer or not-performer
students.The performer and non-performer students are predicted by student’s dataset.
2. Related Work:
The literature related to the use of machine learning technology in field of higher
education mainly focuses on the application of technologies such as clustering,
association rules, classification, regression and statistics to predict, the performance of
the developed model. Educational data mining (EDM) researchers provide other
aspects related to academic activities, including identifying factors related to student
success, failure, and intention to drop out [7-9], institutional planning and strategies
[10-11], and understanding Teacher support and administrative decision-making.
The applications of machine learning techniques in the field of higher education is
still in primary stage and it’s need more consideration. Educational Data Mining in
the field of education mainly to improvestudents’performance with the help of
learning process by identifying, extracting and evaluating attributes related to the
students characteristics[1]. With the help of educational data mining we can improve
the decision making and implement policies for student that helps institutions of
higher education today [12-13].
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7251
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
Ashwin Satyanarayana, and Mariusz Nuckowski conducted a survey over 788 school
student dataset which includes 37 questions each. In this research multiple classifiers
were used like Decision Trees, Naïve Bayes and Random Forest to get accuracy over
student’s data by eliminating noisy instances. From this study association rules were
identified that affect student’s results using a set of rule based techniques like Apriori,
Filtered Associator and Tertius. In this regard the result was found that prior work
there was no filtering on student data has been performed and focused on using single
classifiers. So in this study comparison of single filters and ensemble filters was done
and it is concluded that ensemble filters works better for identifying and removing
noisy instances [14].
Another comparative study was done by Bhrigu kapur, Nakin Ahluwalia and
Sathyaraj (2017). They compared six algorithms like J48, Random Forest, Naïve
Bayes, Naïve Bayes Multinomial, K-Star and IBK. They used 480 entry of data set
and implemented through Weka tool. The Survey conducted based on seven attributes
and found Random Forest algorithm provides more accuracy compared to other
algorithms [15].
Various previous works has been done by Pal et. al [16-19] to improve the
performance of the prediction using different data mining techniques and they provide
a better results which are also applicable at various institution to find the weak
students.
K. Prasada Rao et. al [20] conducted a survey over 200 college students. In this
research classification techniques were used on student database to predict the
learning behavior of student’s. From this research, the researcher identified the slow
learners, and effectively the action taken to rectify the failures and take appropriate
action to qualify the weaker students in perfect manner. In this study the performance
of J48, Naïve Bayes and Random forest algorithms were compared. Finally the
researcher got accuracy using Random forest algorithm when the data set is in
massive size.
A research carried out by the team [21] (2016), and the performance of the student’s
were predicted. Classification techniques were used to create prediction module of the
system to predict the future values. Various parameters like previous academic
performance were considered to predict student’s academic results and placement.
The dashboard is the module which describes the whole overview of the institution in
a graphical representation of data. Decision tree algorithms ID3 and C4.5 were
implemented to generate reports based on structured database. From this research ID3
algorithm provided the best accuracy of 95.33%.
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7252
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
3. Methods
Machine Learning is the technique for developingnew algorithms, which
provides computer the capabilityto learn from previously stored information’s.
In this research paper different machine learning classifiers are used. (i) PAC
(ii) SVM, (iii) LDA (iv) RNCand (v)ET. A brief description of the classifiers
used are described below.
Passive Aggressive Classifier (PAC):
PAC algorithm is a set of algorithms which are used for comprehensive
learning. Passive-aggressive algorithms are very similar to Multilayer
Perceptron except learning rate is not required. But, converse to Perceptron,
Passive-aggressive algorithmscomprise a regularization variable C.
Support Vector Machine (SVM):
Support Vector machine is discriminative classifier used in supervised
learning problems i.e. given labeled training data, and finds out the line (or
hyperplane) in a multidimensional space, which separate outs classes
LinearDiscriminantAnalysis (LDA):
This algorithm is also known as attribute reduction method. LDA is
supervised machine learning technique. This method reduced the attributes as
less as possible in a dataset without affecting the results of the classification.
Linear Discriminant Analysis, or LDA, uses the information from all reduced
features to create a new axis and projects the data on to the new axis in such a
way as to minimizes the variance and maximizes the distance between the
means of the two classes.
RadiusNeighborsClassifier (RNC):
RNC is a type of KNeighborsClassifier (Radius based learning algorithms).
RNC returns the indices and distances of each data points from the dataset
lying in a ball with size radius around the points of the query array. Points
lying on the boundary are included in the results.
Extra Tree (ET):
This method is an ensemble method which stands for Extremely Randomized
Trees. The main objective of this algorithm is to further randomizing tree
building in the context of numeric input features, where the choice of the
optimal cut-point is responsible for a large proportion of the variance of the
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7253
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
induced tree. It often leads to increased accuracy when compared to the
ordinary random forest.
Fig. 1 shows the structure of methodology used in this research paper.
Figure. 1. Methodological approach for Performance Prediction
3.1 Dataset Analysis:
The data used in this study is of Bachelor of Computer Applications programme,
which has been collected from United Institute of Management, Prayagraj. The BCA
course is divided in 3 years which consist of two semester per year, therefore total six
semester examination completes the whole BCA course. In this research paper we
have taken count of only final semester results. The data is collected with the
permission of examination and admission departments from the year 2014 to 2019
and total number of students passed from the institution is 1000, therefore total 1000
instances are available with 22 attributes; these attributes are collected from the
registration as well as examination form. The target and other variables discussed in
this study are listed in table 1.
Table 1: Student Dataset
Feature
Attribute
Domain
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7254
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
S1
Sex of Students
1= Female, 2=Male
S2
Students category
1= General, 2=OBC, 3=SC, 4=ST,
5=Minority
S3
Discussion at home
1=Always, 2=Almost Always,
3=Sometimes, 4= Never
S4
Own Computer /Laptop
1=Yes, 2=No
S5
Laptop shared with
family
1=Yes, 2=No
S6
Study desk at home
1=Yes, 2=No
S7
Own mobile phone
1=Yes, 2=No
S8
Own Gaming system
1=Yes, 2=No
S9
Heating/Cooling
systems at
1=Yes, 2=No
S10
Absent from school
1=Once a week or more, 2=Once
every two weeks, 3=Once a month,
4=Never or almost never
S11
How often use
computer/Laptop at
home
1=Every day or almost every day,
2=Once or twice a week, 3=Once or
twice a month, 4=Never or almost
never
S12
How often use computer
at School
1=Every day or almost every day,
2=Once or twice a week, 3=Once or
twice a fifteen days, 4= Once or twice
in a month, 5=Never or almost never
S13
Access textbooks
1=Yes, 2=No
S14
Completed assignments
1=Yes, 2=No
S15
Collaborate with
classmates
1=Yes, 2=No
S16
Communicate with
1=Yes, 2=No
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7255
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
teacher
S17
Students grade in Senior
Secondary education
1 =90% -100%, 2= 80% - 89%, 3=
70% - 79%, 4= 60% - 69%, 5= 50% -
59%, 6= 40% - 49%, 7= < 40%
S18
Fathers qualification
1=elementary, 2=secondary,
3=graduate/ost-graduate, 4=doctorate
S19
Mother’s Qualification
1=elementary, 2=secondary,
3=graduate/ost-graduate, 4=doctorate
S20
Father’s Occupation
1=Service, 2=business, 3=not-
applicable
S21
Mother’s Occupation
1=House-wife , 2=Service,
3=business, 4=not-applicable
S22
Grade obtained in
B.C.A
1= > 60%, 2= >45 &<60%, 3= >36
&<45%, 4= < 36%}
3.2 DataPreprocessing:
The methodology proposed in this research paper starts with data preprocessing.
Data preprocessing step includes (i) a data driven method to select students’
records and selecting important variables for analysis and (ii) The collected data
from students records are not clean and may include noise, incorrect, missing
values, or inconsistent data. So we have to apply different method of data cleaning
to clean such anomalies. The Experimental investigation is made using the
students data set collected from the United Institute of Management, PrayagRaj.
The dataset had descriptions, 1028 instances, in 22 dimensions. The noise and
missing values present in the dataset may impact the predictive ability of the
machine learning model. Hence students dataset is extensively pre-processed
using a normal scalar using the equation
𝑥𝑁=(𝑥 𝑥𝑚𝑒𝑎𝑛 )
𝑆𝐷
Where
𝑥𝑁= Normalized value of x,
x=Original value of x,
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7256
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
𝑥𝑚𝑒𝑎𝑛 =Mean value of x and
SD=Standard deviation of the given population.
4. Results and Discussion
Before conducting the experiment, we first visualize the values of attributes shown in
Fig 2. Figure shows the histogram of all attributes related to student dataset which
consists 1000 instances and 22 attributes. Each attribute represent the bar of frequency
of different values excluding the feature S22 (which is target attribute Result).
Figure2. Dataset visualization using histogram
The experiment is performed on student’s dataset using Python code along with the
supporting packages such as Scikit-learn, Pandas and Numpy etc. The student dataset
is divided into 80% training set and 20% test dataset.
Another diagram that helps summarize the observed distribution is the box and the
whisker. The plot draws a 25th and 75th percentile around the data that captures the
middle 50% of the observations. Draw a line at the 50th percentile (median) and draw
whiskers above and below the box to summarize the general range of observations.
Draw points for outliers outside the data or for outliers outside the range. The box and
whisker plot of data set is shown in Fig.3.
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7257
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
Figure3. Dataset visualization using Box and Whisker Plot
In this paper Python code is used to represent the different graphs and to evaluate
accuracy, precision, recall and sensitivity of the different machine learning
techniques initially.Python programming is chosen because the codes for different
classifiers have been defined in the form of predefined modules.
The Performance of classifiers is the most important metrics for any predictive model
especially when the model is built for the performance prediction. A wrong prediction
may have to pay a heavy cost of student. Hence, the selection of a performance
metrics plays a very crucial role in performance prediction. In data analysis system,
there are a number of performance metrics such as accuracy, sensitivity, precision and
specificity which are shown in table 2.
Table 2: Formulas
Sr. No.
Performance Metrics
Formula
1.
Accuracy
(𝑇𝑃 +𝑇𝑁)
(𝑇𝑃 +𝐹𝑃 +𝑇𝑁 +𝐹𝑁)
2.
Sensitivity
𝑇𝑃
(𝑇𝑃 +𝐹𝑁)
3.
Specificity
𝐹𝑃
(𝐹𝑃 +𝑇𝑁)
4.
Precision
𝑇𝑃
(𝑇𝑃 +𝐹𝑃)
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7258
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
The value calculated by five classifiers is shown in Table 3.
Table 3:Output of Evaluating Algorithms
Classifier
Precision
Sensitivity
Specificity
PCA
84.95
62.38
97.25
SVM
89.17
63.1
98.65
LDA
88.26
59.23
98.25
RNC
91.23
66.27
92.89
ET
90.04
70.23
93.44
A high accuracy score of a classifier does not ensure that the classifier correctly
predicts the desired results. A high accuracy may be the result of more number of
correct predictions of true negative cases. Therefore, only achieving high accuracy of
a classifier cannot be considered as a good measure for a classification algorithm.
In a predictive analysis system a wrong prediction may be a false-negatives or false-
positives. The cost of these two wrong predictions may vary from one system to other
system. In one system, a false-negative result may incur more cost from a false-
positive result. For example in a performance prediction system such as good
performance prediction, classifier cannot afford a wrong prediction about a student
which is actually bad performer (TRUE- POSITIVE) and is predicted as non-
performer (FALSE-NEGATIVE). So we need a model in which the chance of false-
positives and false-negatives is less. In other words, its precision should be high since
number of false-positives is less, similarly recall should be also high because it shows
lower number of false-negatives. A high precision and high recall of classifier ensures
that it predicts less number of False-Positive and False-Negative results. In case of
high false-positives cost, precision will be good measure and recall is for high false-
negative cost. Accuracy will be a good measure if cost of false positives and false
negatives are nearly same but when it is different precision and sensitivity both are
considered.
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7259
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
Figure4. Accuracy of different algorithms
5. Conclusion
The aim of the proposed work is to build efficient framework which can extensively
improve Performance accuracy of students. Machine learning techniques are widely
used in student’s performance prediction. Knowledge gained with the help of machine
learning techniques can be used to make successful and effective decisions that
improve and develop student’s performance.This paper describes different machine
learning techniques for evaluating the performance of students.Five machine learning
techniques PCA, SVM, LDA,RNC and ET are used to classify the prediction of
students. The best accuracy find among these different techniques is 94.86% from
SVM. The second highest accuracy obtained is 93.21% in the case of LDA.
We get the highest accuracy in the literature available on student’s performance
prediction. The machine learning-based method reduces generation errors and obtains
more information by using the first-stage prediction as a feature rather than a separate
training. In addition, by using machine learning, the complex relationships between
classifiers are automatically learned, enabling the collection method for better
predictions.
These results can be used to pay a more attention on the non-performing students to
improve their performance and the quality of higher educational institute.
0
10
20
30
40
50
60
70
80
90
100
PCA
SVM
LDA
RNC
ET
Accuracy
Precision
Sensitivity
Specificity
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7260
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
References
[1]. B.Baradwaj, S.pal, “Mining Educational Data to Analyze Students’ Performance” (IJACSA)
International Journal Of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011.
[2]. Paulo Cortez and Alice Maria Gonçalves Silva. 2008. Using data mining to predict secondary
school student performance. In: Proceedings of 5th Annual Future Business Technology
Conference, Porto, 5-12.
[3]. D. Michie, D.J. Spiegelhalter, and C.C. Taylor, "Machine Learning, Neural and Statistical
Classification", Ellis Horwood Series in Artificial Intelligence, 1994.
[4]. H. E. Erdem, “A cross-sectional survey in progress on factors affecting students’ academic
performance at a Turkish university,” Procedia-Social and Behavioral Sciences, vol. 70, pp.
691-695, 2013.
[5]. G. Elakia and N. J. Aarthi, “Application of data mining in educational database for predicting
behavioural patterns of the students,” International Journal of Computer Science and
Information Technologies, pp. 4649-4652, 2014.
[6]. S. Parack and F. Z. Zahid, “Application of data mining in educational databases for predicting
academic trends and patterns, in: Technology Enhanced Education (ICTEE),” IEEE
International Conference on, IEEE, pp. 1-4, 2012.
[7]. Cambruzzi, W.L., Rigo, S.J., Barbosa, J.L., 2015. Dropout prediction and reduction in
distance education courses with the learning analytics multitrail approach. J. UCS 21 (1), 23
47.
[8]. Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., Loumos, V., 2009. Dropout
prediction in e-learning courses through the combination of machine learning techniques.
Comput. Educ. 53 (3), 950965.
[9]. Marquez-Vera, C., Cano, A., Romero, C., Noaman, A.Y.M., Mousa Fardoun, H., Ventura, S.,
2016. Early dropout prediction using data mining: a case study with high school students.
Exp. Syst. 33 (1), 107124.
[10]. Caputi, V., Garrido, A., 2015. Student-oriented planning of e-learning contents for Moodle. J.
Network Comput. Appl. 53, 115127.
[11]. Mankad, S.H., 2016. Predicting learning behaviour of students: Strategies for making the
course journey interesting. In: Paper presented at the Intelligent Systems and Control (ISCO),
2016 10th International Conference on.
[12]. Sacin, C.V., Agapito, J.B., Shafti, L., Ortigosa, A., 2009. Recommendation in higher
education using data mining techniques. In: Paper presented at the Educational Data Mining
2009.
[13]. Ji, H., Park, K., Jo, J., Lim, H., 2016. Mining students activities from a computer supported
collaborative learning system based on peer to peer network. Peer-to-Peer Netw. Appl. 9 (3),
465476.
[14]. Ashwin Satyanarayana, Mariusz Nuckowski, “Data Mining using Ensemble Classifiers for
Improved Prediction of Student Academic Performance” Spring '2016' Mid. Atlantic 'ASEE'
Conference, 'April' 8.9,'2016' GWU.
[15]. Bhrigu Kapur, Nakin Ahluwalia and Sathyaraj R, “Comparative Study on Marks Prediction
using Data Mining and Classification Algorithms”, International Journal of Advanced
Research in Computer Science, 8 (3), March-April 2017,632-636.
[16]. Pandey, U.K. and Pal, S., 2011. Data Mining: A prediction of performer or underperformer
using classification. (IJCSIT) International Journal of Computer Science and Information
Technologies, Vol. 2 (2), 2011, 686-690.
International Journal of Advanced Science and Technology
Vol. 29, No. 5, (2020), pp. 7249-7261
7261
ISSN: 2005-4238 IJAST
Copyright 2020 SERSC
[17]. Bhardwaj, B.K. and Pal, S., 2012. Data Mining: A prediction for performance improvement
using classification. (IJCSIS) International Journal of Computer Science and Information
Security, Vol. 9, No. 4, 2011.
[18]. Yadav, S.K., Bharadwaj, B. and Pal, S., 2012. Data Mining Applications: A Comparative
Study for Predicting Student’s Performance. International Journal of Innovative Technology
& Creative Engineering (ISSN: 2045-711), Vol. 1, No.12.
[19]. Yadav, S.K. and Pal, S., 2012. Data mining: A prediction for performance improvement of
engineering students using classification. World of Computer Science and Information
Technology Journal (WCSIT). (ISSN: 2221-0741), Vol. 2, No. 2, 51-56, 2012.
[20]. Prasada Rao, K. , M. V.P. Chandra Sekhara, and B. Ramesh. "Predicting Learning Behavior
of Students using Classification Techniques." International Journal of Computer Applications
(0975 8887) Volume 139 No.7, April 2016.
[21]. Siddhi Parekh, Ameya Nadkarni, and Riya Mehta (2016) "Results and Placement
Analysis and Prediction using Data Mining and Dashboard." International Journal of Computer
Applications (0975 8887) Volume 137 No.13, March 2016 In Proceedings of the 22nd
International Conference on World Wide Web, pp. 413-418. ACM.
... In recent years, there has been a growing trend in the utilization of data mining and machine learning techniques to forecast the academic performance of students across various educational settings. A study, referenced as [11], employed five different data mining methods, including Extra Tree (ET), Linear Discriminant Analysis (LDA), Passive Aggressive Classifier (PAC), Radius Neighbor Classifier (RNC), and Support Vector Machine (SVM), to construct a practical framework aimed at enhancing the accuracy of predicting students' performance. Subsequently, the study compared the outcomes generated by these five algorithms to determine the most effective one. ...
... Classifier performance is the most critical metric for any predictive model, especially when designed to predict student performance. Consequently, performance metrics are crucial in predicting performance [11]. We have expected the grades of graduate students in four programming courses enrolled in computer engineering technology with 340 instances from two consecutive academic batches, i.e., batch 2016 and batch 2017. ...
Article
Full-text available
Predicting the future academic grades of students can play a pivotal role in enhancing their performance in specific courses, consequently yielding a positive impact on their prospective academic, professional, and personal achievements, as well as on society at large. The field of programming is rapidly gaining prominence as an essential profession spanning multiple domains, marked by abundant opportunities and financial rewards. To cater to the diverse interests of students, the recommended curriculum structure for engineering programs in computing adeptly combines theoretical knowledge with practical programming skills. This approach ensures that students acquire a comprehensive understanding of programming courses, allowing them to choose the path that aligns best with their envisioned careers as programmers This research endeavors to introduce ensemble prediction techniques aimed at identifying students who exhibit the potential for advancement, or conversely, those who may not excel in four university-level programming courses. The outcomes of this study are presented alongside valuable performance assessment metrics for five ensemble methodologies, namely AdaBoost, Bagging, Random Forest, Stacking, and Voting. This evaluation employs a 10-fold cross-validation methodology and incorporates the Principal Component Analysis (PCA) for feature ranking. The results unequivocally demonstrate that both the Stacking and Random Forest ensemble approaches have attained the highest level of accuracy when applied to two distinct datasets.
... First and foremost on the list is the student's ability to pay attention to and duplicate what they have learned in the classroom during the test. For a variety of reasons, marks/grades based on a student's understanding of the subject matter are at the top of the list [5]. ...
... Singh and Pal [5] stated that learners' performance is analyzed using a variety of machine learning approaches. PCA, SVM, LDA, RNC, and ET are five machine learning algorithms used to categorise students' predictions. ...
Article
Full-text available
Instructional practices have undergone a drastic change as a result of the development of new educational technology. Artificial intelligence (AI) as a teaching and learning technology will be examined in this theoretical review study. To enhance the quality of teaching and learning, the use of artificial intelligence approaches is being studied. Artificial intelligence integration in educational institutions has been addressed, though. Students’ assistance, teaching, learning, and administration are also addressed in the discussion of students’ adoption of artificial intelligence. Artificial intelligence has the potential to revolutionize our social interactions and generate new teaching and learning methods that may be evaluated in a variety of contexts. New educational technology can help students and teachers better accomplish and manage their educational objectives. Artificial intelligence algorithms are used in a hybrid teaching mode in this work to examine students’ attributes and introduce predictions of future learning success. The teaching process may be carried out in a more efficient manner using the hybrid mode. Educators and scientists alike will benefit from artificial intelligence algorithms that may be used to extract useful information from the vast amounts of data collected on human behavior.
... Research was conducted by Singh and Pal (2020) (RNC) and Extra Tree (ET) and then compare the results of five machine learning algorithms to choose the best performing algorithm. The results obtained showed that, Support Vector Machine (SVM) gave the highest accuracy of 94.86%. ...
Article
Full-text available
Background of the study: Predicting and analyzing the performance of the student in a blended learning environment is important to help educators identify poor performing students and improve their academic score. Meanwhile, achieving accurate predictions require selecting machine learning techniques that can produce optimum score. However, there seems to be no critical literature review on current state of art in predicting students' performance using machine learning algorithms in blended learning environment. Methodology: This critical literature review focuses on, studies on the current state of the art in predicting students' performance in the blended learning for past 10 years, sources of dataset used by various authors and the machined learning algorithm with high prediction accuracy. Findings: Naïve Bayes was the most frequently used algorithm for predicting students' performance. Authors mostly used online data for their student's performance prediction. Finally, artificial neural network was found to give higher prediction accuracy of 98.7%.
... Reference [11] use machine learning variants such as Passive-Aggressive Classifier (PAC), Linear Discriminant Analysis (LDA), RadiusNeighborsClassifier (RNC), Support Vector Machine (SVM), Extra Tree (ET). The best accuracy found between these different processes was 94.86 percent from SVM. ...
Article
Full-text available
There has been a rapid growth in the educational domain since education has become an important need. Data is collected in this domain which can be put to meaningful use to derive a lot of benefits to the students. Predicting student performance can help students and their teachers keep track of student progress. Mining Educational data helps to uncover invisible patterns, relationships, or trends in the unstructured data and helps in delivering logical and meaningful recommendations. Several kinds of research are being conducted across the world to analyze the data regarding student learning to identify the factors affecting performance and to provide support to students to help them improve. It is the objective of the proposed research to conduct a detailed study in the Sultanate of Oman regarding the existing toolsets, systems, and mode of data collection that are used currently in the Education sector for the prediction of Student Grades. Taking this as the baseline, later a model that will feature different prediction algorithms which are more accurate in predicting the grades of a student will be developed. The objective of this research is to understand the various predictive methods used to predict student performance and to propose a machine learning model to predict student grades.
Experiment Findings
Full-text available
Human beings possess more than one intelligences, each one of them can be defined as the expertise with which an individual functions or solves problems or the area of creativity, which is considered as the basic inherent intelligence in the individual. Dr.Howard Gardner through his research has made it evident to the world the existence of multiple intelligences: Linguistic, logical mathematical, musical, spatial, bodily kinesthetic, interpersonal, intrapersonal, and naturalist intelligence highly depends on several factors like age, understanding of the environment and the extent of isolation of the brain. Various mechanisms like systematic review and meta-analysis using the traditional paper-and-pencil isolated tasks through several activities enabled the assessment of the intelligence inherently present in an individual in one-to-two-hour sessions and further processing of the collected data. By using Artificial Intelligence algorithms an attempt can be made to develop a framework to identify the multiple intelligence in an individual based on Multiple Intelligence Theory.
Conference Paper
Full-text available
Human beings possess not one but multiple intelligences, each one defined as the ability to solve a specific problem or create a product which is perceived as valuable in one or more context. As per Dr. Howard Gardner the existence of eight intelligences: Linguistic, logical mathematical, musical, spatial, bodily kinesthetic, interpersonal, intrapersonal, and naturalist intelligence highly depends on several factors like age, understanding of the environment and the extent of isolation of the brain. Various mechanisms like systematic review and meta-analysis using the traditional paper-and-pencil isolated tasks, consisted in several culturally meaningful activities, always related to professions which enabled the assessment of the psychological processes inherent to each intelligence in one-to-two-hour sessions. These mechanisms would contribute to demonstrating the existence of independent intelligence and, hence, to identify the strengths and weaknesses of individuals to assess human intelligence. By using AI (Artificial Intelligence) algorithms an attempt can be made to identify the multiple intelligence in an individual and help the individual in learning and to succeed in life.
Conference Paper
Nowadays, more attention is taken in data mining for research in various fields like educational sector, disease prediction, customer behavior, fraud detection, etc. Data mining techniques are applied to the huge amount of data generated by different domains to predict the hidden patterns and managerial decisions. Today, more attention is given on the application of data mining techniques to the analyzing of educational data, which is also known as educational data mining. The primary goal of educational data mining is to provide quality education to students by predicting the performance of students and to find the drop-out ratio. Data mining considered as stepping stone to the procedure of information detection in databases. Here, an analysis of the obtainable literature on data mining is provided. The knowledge of data mining as well as its an assortment of methodologies is summarized. Some applications, tasks as well as issues associated with it have also been illustrated.
Research Proposal
My research proposal is about predicting student performance based on the university student feedback/opinion. Hybrid approach (combination of machine learning & lexicon-based) has been proposed as a new solution but this paper is a proposal and I would like a feedback regarding the topic.
Conference Paper
Full-text available
In the last decade Data mining (DM) has been applied in the field of education, and is an emerging interdisciplinary research field also known as Educational Data Mining (EDM). One of the goals of EDM is to better understand how to predict student academic performance given personal, socio-economic, psychological and other environmental attributes. Another goal is to identify factors and rules that influence educational academic outcomes. In this paper, we use multiple classifiers (Decision Trees-J48, Naïve Bayes and Random Forest) to improve the quality of student data by eliminating noisy instances, and hence improving predictive accuracy. We also identify association rules that influence student outcomes using a combination of rule based techniques (Apriori, Filtered Associator and Tertius). We empirically compare our technique with single model based techniques and show that using ensemble models not only gives better predictive accuracies on student performance, but also provides better rules for understanding the factors that influence better student outcomes.
Article
Full-text available
Early prediction of school dropout is a serious problem in education, but it is not an easy issue to resolve. On the one hand, there are many factors that can influence student retention. On the other hand, the traditional classification approach used to solve this problem normally has to be implemented at the end of the course to gather maximum information in order to achieve the highest accuracy. In this paper, we propose a methodology and a specific classification algorithm to discover comprehensible prediction models of student dropout as soon as possible. We used data gathered from 419 high schools students in Mexico. We carried out several experiments to predict dropout at different steps of the course, to select the best indicators of dropout and to compare our proposed algorithm versus some classical and imbalanced well-known classification algorithms. Results show that our algorithm was capable of predicting student dropout within the first four-six weeks of the course and trustworthy enough to be used in an early warning system.
Article
Full-text available
As Information & Communication Technology (ICT) is rapidly evolved, educational paradigms have been changing. The ultimate goal of education with the aid of ICT is to provide customized training for learners to improve the effectiveness of their learning at anytime and anywhere. In the online learning environment where the Internet, mobile devices, peer-to-peer (P2P) and the cloud technology are leveraged, all the information in learning activities is converted into digital data and stored in the Computer Supported Collaborative Learning (CSCL) system. The data in the CSCL system contains various learners’ information including the learning objectives, learning preferences, competences and achievements. Thus, by analyzing the activity information of learners in an online CSCL system, meaningful and useful information can be extracted and provided for learners, teachers and administrators as feedback. In this paper, we propose a learner activity model that represents the learner’s activity information stored in a CSCL system. As for the proposed learner activity model, we classified the learning activities in a CSCL system into three categories: vivacity, learning and relationship; then we created quotients to represent them accordingly. In addition, we developed a CSCL System, which we termed as COLLA, applied the proposed learner activity model and analyzed the results.
Article
Full-text available
Abstract - Data mining refers to the technique of obtaining hidden, previously unknown and possibly significant knowledge from humongous amount of data. Data mining uses a combination of a vast knowledge base, advanced analytical skills, and domain knowledge to unveil hidden trends and patterns which can be applied in almost any sector ranging from business to medicine, then to Engineering. Nonetheless educational institutes can employ data mining to discover valuable information from their databases known as Educational Data Mining (EDM). Educational data mining requires transformation of existing or innovation of new approaches derived from statistics, machine learning, psychometrics, scientific computing etc. Current system is designed to justify that various data mining techniques which includes classification, can be used in educational databases to suggest career options for the high school students and also to predict the potentially violent behaviour among the students by including extra parameters other than academic details. RapidMiner has been used as Data mining tool. Keywords - Data Mining, Rapid Miner, C4.5 Classification.
Conference Paper
This paper focuses on improving higher education with the help of data mining techniques. The emphasis is on analysing specific attributes of students and predict their learning behaviour. A student can be a slow, medium or fast learner depending on his/her activeness in educational activities. Experimental results in this study show that decision tree classifier performs significantly well compared to nearest neighbour and naïve Bayesian algorithms. This paper also describes some strategies to seek students’ attention to stay attached with classroom sessions and complete the course with full involvement.
Article
Distance Education courses are present in large number of educational institutions. Virtual Learning Environments development contributes to this wide adoption of Distance Education modality and allows new pedagogical methodologies. However, dropout rates observed in these courses are very expressive, both in public and private educational institutions. This paper presents a Learning Analytics system developed to deal with dropout problem in Distance Education courses on university-level education. Several complementary tools, allowing data visualization, dropout predictions, support to pedagogical actions and textual analysis, among others, are available in the system. The implementation of these tools is feasible due to the adoption of an approach called Multitrail to represent and manipulate data from several sources and formats. The obtained results from experiments carried out with courses in a Brazilian university show the dropout prediction with an average of 87% precision. A set of pedagogical actions concerning students among the higher probabilities of dropout was implemented and we observed average reduction of 11% in dropout rates.
Article
We present a way to automatically plan student-oriented learning contents in Moodle. Rather than offering the same contents for all students, we provide personalized contents according to the students׳ background and learning objectives. Although curriculum personalization can be faced in several ways, we focus on artificial intelligence (AI) planning as a very useful formalism for mapping actions, i.e. learning contents, in terms of preconditions (precedence relationships) and causal effects to find plans, i.e. learning paths that best fit the needs of each student. A key feature is that the learning path is generated and shown in Moodle in a seamless way for both the teacher and student, respectively. We also include some experimental results to demonstrate the scalability and viability of our approach.
Conference Paper
Data mining is a process of identifying and extracting hidden patterns and information from databases and data warehouses. There are various algorithms and tools available for this purpose. Data mining has a vast range of applications ranging from business to medicine to engineering. In this paper, we discuss the application of data mining in education for student profiling and grouping. We make use of Apriori algorithm for student profiling which is one of the popular approaches for mining associations i.e. discovering co-relations among set of items. The other algorithm used, for grouping students is K-means clustering which assigns a set of observations into subsets. In the field of academics, data mining can be very useful in discovering valuable information which can be used for profiling students based on their academic record. We apply Apriori algorithm to the database containing academic records of various students and try to extract association rules in order to profile students based on various parameters like exam scores, term work grades, attendance and practical exams. We also apply K-means clustering to the same set of data in order to group the students. The implemented algorithms offer an effective way of profiling students which can be used in educational systems.