ArticlePDF Available

Predicting Student Enrolments and Attrition Patterns in Higher Educational Institutions using Machine Learning

Authors:
  • Abu Dhabi School of Management
562 The International Arab Journal of Information Technology, Vol. 18, No. 4, July 2021
Predicting Student Enrolments and Attrition
Patterns in Higher Educational Institutions using
Machine Learning
Samar Shilbayeh and Abdullah Abonamah
Business Analytics Department, Abu Dhabi School of Management, UAE
Abstract: In higher educational institutions, student enrollment management and increasing student retention are fundamental
performance metrics to academic and financial sustainability. In many educational institutions, high student attrition rates are
due to a variety of circumstances, including demographic and personal factors such as age, gender, academic background,
financial abilities, and academic degree of choice. In this study, we will make use of machine learning approaches to develop
prediction models that can predict student enrollment behavior and the students who have a high risk of dropping out. This can
help higher education institutions develop proper intervention plans to reduce attrition rates and increase the probability of
student academic success. In this study, real data is taken from Abu Dhabi School of Management (ADSM) in the UAE. This
data is used in developing the student enrollment model and identifying the student’s characteristics who are willing to enroll in
a specific program, in addition to that, this research managed to find out the characteristics of the students who are under the
risk of dropout.
Keywords: Machine learning, predictive model, apriori algorithm, student retention, enrolment behaviour, association rule
mining, boosting, ensemble method.
Received March 10, 2020; accepted July 19, 2020
https://doi.org/10.34028/18/4/8
1. Introduction
Machine learning can play a critical role in building up
a strategic enrolment plan for academic institutions.
This includes detecting the enrolment trends over time,
which gives the institution a better ability to manage
student enrolments and align institutional resources
more efficiently and effectively.
Higher education institutions are increasingly
interested in identifying their enrolment pattern to
understand the factors that maximize their enrolment
numbers. These factors allow the student recruitment
department to identify, attract, engage and enrol the
right students. The direct result of this is to help the
institutions to improve their financial and academic
performance. In academic institutions, student attrition
continues to be a significant and costly challenge. It is
considered to be the main concern that negatively
impacts the institution's reputation, ranking, and
success. The attrition rate in the academic institutions is
inspired by churn analysis that is usually performed in
marketing analysis. It is found out that attracting a new
customer in an organization is much more expensive
than retaining existing ones. In the same way, for any
academic institution, maintaining the current students is
one of their higher concern. This can be accomplished
by understanding the student attrition pattern by
detecting the students who are under the risk of attrition,
then building up a general
model that can predict this incident in advance. In this
way, they can develop proper strategies to retain those
students and give them more support if needed.
This paper closely investigates the enrolment pattern
that helps in attracting the type of students who are most
likely to enrol in one of the Abu Dhabi School of
Management (ADSM) academic programs and also
detects the student attrition patterns.
2. Literature Review
There are many enrolment and attrition rate prediction
studies that have utilized machine learning approaches
to identify the student enrolment and attrition pattern.
Petkovski et al. [7] conducted a study focused on the
student's performance as an indicator for the student
dropout rate in their freshman year. This study uses
naïve Bayes, decision tree, and rule-based induction
machine learning algorithm to build the best model that
predicts the attrition rate with 80% accuracy. Yukselturk
et al. [12] examined on their study the dropouts rate in
an online program for 189 students who registered in the
online information technology certificate program. This
study uses four machine learning algorithms K Nearest
Neighbour KNN, Decision Tree (DT), Naive Bays (NB)
and Neural Network (NN), resulting in 87% accuracy
for NN, and 80% accuracy for DT and finding out that
the student demographics information plays a very
critical role in their dropout rate. Rai and Jain [8]
Predicting Student Enrolments and Attrition Patterns in Higher Educational... 563
performed a study for the aim of understanding the
student drop rate factors using machine learning
algorithms utilizes ID3 and J48 decision trees using
weka on 220 undergraduate students in Information
Technology courses. In this research; it is found out that
personal factors is the most important factor that effects
the student attrition and weights for 28% of dropout rate.
On the same study, the institutional factors such as the
university environment, and the course cost weight for
17% of dropout rate, also it is noticed that few students
are most likely to drop out due to homesickness and
adjustment problems [8]. In the same prospect; kemper
et al. [4] managed to predict the student dropout rate and
success factor using logistic regression and decision
trees on a resample examination data at the Karlsruhe
Institute of Technology (KIT) with 95% accuracy after
three semesters of the student enrolments.
On the other hand, multiple authors and studies use
machine learning predictive methods and data science
techniques to predict student behaviour and
performance in educational settings. The researchers
Mulugeta and Berhanu [5] performed three different
machine learning algorithms: mainly J48, naïve bayes,
and neural network to build three prediction models that
predict the student's enrolment at the department level
for higher education institutions in both private and
public universities. The developed model in [5] shows
that the highest model accuracy is reached by applying
neural network machine learning algorithm over J48
and naïve Bayes. In the same prospect; Nakhkob and
Khademi [6] conducted a study that utilized neural
network, bagging, boosting, and naïve bayes to predict
the rate of student enrolment in higher education in
different universities in Iran. In the same study, a
comparison between those four models is performed
using model accuracy, and ROC curve. The result shows
that bagging outperforms neural network, boosting and
naïve Bayes.
Another study conducted by Waters and
Miikkulainen [11] utilized machine learning algorithms
to predict how likely 588 Ph.D. applicants will be
admitted to their colleges based on their information
provided in their applications, the result is used to
reduce the admission process time.
Two machine learning approaches: logistic
regressions, and decision trees are performed to predict
student dropout at Karlsruhe Institute of Technology
(KIT), both used models yield high prediction up to 95%
after three semesters, this study is done by Kemper et al.
[4]. For the same aim; a study is done by Al-Shabandar
et al. [2]. This study utilized two machine learning
models to find out the students who under the risk of
failure and withdraw at the early stages of online
courses. Random forest on decision tree and gradient
boosting machine are used for this aim and result in 89
% and 95% accuracy respectively.
3. Research Methodology
3.1. Data Description and Understanding
Data understanding and description is an important
stage in data analysis, which includes gathering,
understanding, cleaning, describing and verifying the
data. More accurate, comprehensive and higher quality
data will result in more efficient output. The data used
for this research is taken from the Student Information
System (SIS) database of Abu Dhabi School of
Management (ADSM). This particular dataset is used to
capture information about the students who enrolled in
ADSM along with their demographics and academic
background information for 1600 students enrolled
from (2013-2018). Student dataset variables are
described in Table 1.
Data Cleansing process includes filling up the
missing values, deleting the outliers or incorrectly input
data to make sure that the models will be built on clean,
proper data that can be converted into interesting results.
Table 1. Student dataset variables.
Variable name
Description
Program
This includes four master programs available
at ADSM
Workplace
The student workplace
UGSChool
Student undergraduate school
UGMajor
Student undergraduate major
Status
Student enrolment status (active, inactive,
under process, admitted)
UGCountry
Student undergraduate country
Experience Year
Number work experience years
Age
Student age
WorkPlaceSector
Student workplace sector divided into
(private, public)
Position
Working student current position
UGGPA
Undergraduate GPA
Emirates
The emirate region (state)
Nationality
Student Nationality
ELR Score
English Test Score
Sponsor
The name of the organization sponsoring the
student (if any)
Start Year
The student starting year in the academic
program
3.2. Predictive Analysis
For the aim of developing the proposed models, this
research is divided into three stages, the following is the
description for each stage with its results:
Stage 1: For the aim of predicting the number of
enrolment for the coming four years (2019-2022), a
boosted regression tree algorithm is applied to six
years of data on 1600 students. The boosted tree is
compared with the single regression tree. The
boosting method initially is defined by Schapire and
Freund [10] as a classification supervised ensemble
method. Boosting is one type of ensemble
approaches and defined as the method of applying the
same machine learning algorithm (regression
decision tree in our case) in the whole dataset several
times. On each learning stage; the weight of correctly
classified instances is decreased, and the weight of
564 The International Arab Journal of Information Technology, Vol. 18, No. 4, July 2021
incorrectly classified instances is increased to give
those that are incorrectly classified in the previous
learning stage more attention in the next learning
process [3]. Figure 1 shows the stage 1 framework.
Figure 1. Stage 1 framework.
The final result of the boosted regression model
prediction is the average of those different created
models. In our work, 500 regression trees are created.
The final result of applying boosted regression model is
shown in Figure 3. The boosted regression model is
tested using 10 folds cross validation and gives 89%
accuracy which outperforms the single regression
decision tree that gives only 76% accuracy tested using
10 folds cross-validation as well.
Stage 2: The association rule Apriori algorithm is
applied to the given dataset for the aim of revealing
interesting patterns about the characteristics of the
students who are most likely to enrol in one of the
school academic programs. Association rules mining
is one of the machine learning techniques that is used
to reveal interesting and meaningful hidden patterns
between different dataset variables. The developed
rules are interestingly unexpected for business
stakeholders. Apriori algorithm is defined by
Agrawal et al. [1]. Apriori algorithm is used to
uncover interesting relations between different items
in the market basket analysis. In Apriori algorithm, a
predefined interestingness measurements called
support and confidence are identified to limit the
number of rules generated [9]. The generated rule
could be written as: If {X} Then {Y}.
The If part of the rule (the {X} above) is known as the
antecedent and the THEN part of the rule is known as
the consequent (the {Y} above). The antecedent is the
condition and the consequent is the result. The support
is defined as the number of transactions that include
items in the {X} and {Y} parts of the rule as a percentage
of the total number of transactions N. It is a measure of
how frequently the collection of items occur together as
a percentage of all transactions. Support formula is
shown in Equation (1)
 󰇛󰇜
Confidence is defined as is the ratio of the number of
transactions that include all items in {Y} as well as the
number of transactions that include all items in {X} to
the number of transactions that include all items in {X}.
Confidence formula is shown in Equation (2)
 󰇛󰇜
󰇛󰇜
The two measurements support and confidence reveal
the degree of interestingness for the generated rules. The
user should define a minimum threshold that meets
his/her requirements for every set of generated rules.
The generated rules should assure minimum support and
minimum confidence that is greater than the user
predefined thresholds. Figure 2 shows stage 2
framework.
Figure 2. Stage 2 framework.
Figure 3. Enrollments number prediction using boosted regression trees (2019 -2022).
ADSM
dataset
Data
Description &
Cleaning
Apply Algorithm
Boosted
Regression
Model
Boosted Trees
Output
Evaluation
10
folds
CV
Apriori
Algorith
m
ADSM
dataset
Data Description
& Cleaning
Apply Algorithm
Generate Rules
Output
(1)
(2)
Predicting Student Enrolments and Attrition Patterns in Higher Educational... 565
Table 2 shows the minimum thresholds defined for
each ADSM programs, the measurements vary
according to the number of the students enrolled in each
program.
Table 2. Minimum association rules thresholds defined for each
ADSM program in stage 2.
Program
Minimum
Support
Minimum
Confidence
Rule Generated
MBA (Master of Business
Administration)
0.5
0.95
20
MSBA (Master of Science of
Business Analytics)
0.05
0.90
10
MSLOD (Master of Science in
Leadership and Organizational
Development)
0.4
0.90
15
MSQBE (Master of Science in
Quality and Business Excellence)
0.3
0.95
10
Table 3 shows the sample of the association rules that
are revealed from our dataset after specifying the
previous threshold for the student’s characteristics in
MBA Program.
Table 3. Sample of rules generated using apriori algorithm for the
student's characteristics in the MBA Program.
LHS
RHS
Support
Confidence
{Status=Graduated,
workplaceSector=Government}
{MBA}
0.6708861
0.9814815
{UGCountry=United Arab Emirates,
workPlace=OilGasSector}
{MBA}
0.5063291
0.9756098
{UGSchool=UAE
UNIVERSITY,workplace=Education}
{MBA}
0.5000000
0.9753086
{gender=male, Nationality=United Arab
Emirates, position=Admin, age = [30-35] }
{MBA}
0.6582278
0.9904762
Figure 4 is the sample of the association rules that are
revealed from our dataset after specifying the previous
threshold for MBA.
Figure 4. Sample of rules generated using apriori algorithm for the
students in the master of Business Administration Program (MBA)
program.
Stage 3: For the aim of detecting the students who are
under the risk of attrition, association rule Apriori
algorithm is applied to the given dataset with the
following minimum threshold defined in Table 4.
The measurements vary according to the number of
the students drop out from each program in the past
five years. Stage 3 framework is shown in Figure 5
Table 4. Minimum association rules thresholds defined for each
ADSM program in stage 3 for attrition rate detection.
Program
Minimum
Support
Minimum
Confidence
Rule Generated
MBA (Master of Business
Administration)
0.01
0.8
28
MSBA (Master of Science of
Business Analytics)
0.05
0.90
10
MSLOD (Master of Science in
Leadership and Organizational
Development)
0.4
0.90
15
MSQBE (Master of Science in
Quality and Business
Excellence)
0.3
0.95
10
Figure 5. Stage 3 framework.
Table 5 shows the sample of the association rules that
are revealed from our dataset after specifying the
previous thresholds for the student’s attrition in MBA
program.
Table 5. Sample of rules generated using apriori algorithm for the
students attrition rate in MBA program.
LHS
RHS
Support
Confidence
{Age=30-40,Experience.Years=<
2,UGCountry=United Arab
Emirates,UGGPA=2.5-
3,ELR.Score=4-
5,UGcollege=College of Business
and Economics}
{AttritionStatus=
Attrition}
0.012
1.0
{UGGPA=2.5-3,ELR.Score=4-
5,UGSchool=UAE
UNIVERSITY,UGcollege=College
of Business and Economics}
{AttritionStatus=
Attrition}
0.012
1.0
{Programs=MBA,Age=30-
40,gender=Male,UGGPA=2.5-
3,ELR.Score=4-5,position=NOT
EMPLOYEE}
AttritionStatus=
Attrition}
0.014
0.818
3.3. Results Comparison
Given the fact that the admission cycle for year 2019
and 2020 is over. In this section the boosted regression
model accuracy is computed by comparing the value of
predicted and the actual enrolments that is done over
2019 and 2020 academic year. The boosted regression
model that is developed for the aim of predicting the
number of enrolments students for the academic years
(2019 to 2022) and explained in section 3.1 stage 1.
ADSM
dataset
Description &
Cleaning
Apply Algorithm
Generate Rules
for Attrition
Rate
Output
566 The International Arab Journal of Information Technology, Vol. 18, No. 4, July 2021
It is found out that the model results in 86% accuracy
in 2019, and 78% accuracy in 2020. The drop in the
model accuracy in 2020 may occurred because of
unexpected corona virus incident that effects the
student’s enrolment pattern.
4. Conclusions
The models we presented in this paper have broad
significance for the higher educational institutions.
They attempt to answer two important questions facing
almost every institution of higher education, these are:
What type of student the academic institution should
attract and maintain to keep their academic standards?”
and “What type of students are most likely to drop out
during their academic journey?”
In the context of our work, we managed to answer
these two questions using a set of machine learning
approaches. The work described in this paper was held
at ADSM and is aimed to identify and prioritize students
who are at risk of attrition. Although the work in this
paper is aimed to predicting students who are likely to
enrol at specific academic program, we believe that this
solution (problem formulation, the feature extraction
process, data pre-processing, predictive analysis, and
evaluation) applies and generalizes to other academic
outcomes as well, such as predicting student behaviour.
We differentiate our work in this paper from the
above other researches in enrolment and attrition rate
prediction by predicting the number of enrolment of
students in the coming three years and also we managed
to generalize interesting patterns for the student who is
interested in enrolling in a specific program. Besides,
the size of our dataset is significantly larger (1600) than
those used in previous studies, and the model
performance is significantly improved by using the
ensemble approach.
References
[1] Agrawal R., Mannila H., Srikant R., Toivonen H.,
and Verkamo A., Fast Discovery of Association
Rules,” Advances in Knowledge Discovery and
Data Mining, vol. 12, no. 1, pp. 307-328, 1996.
[2] Al-Shabandar R., Hussain A., Liatsis P., and
Keight R., “Detecting At-Risk Students with Early
Interventions Using Machine Learning
Techniques, IEEE Access, vol. 7, pp. 149464-
149478, 2019.
[3] Freund Y., Schapire R., and Abe N., “A Short
Introduction to Boosting,” Journal-Japanese
Society for Artificial Intelligence, vol. 14, pp.
1612, 1999.
[4] Kemper L., Vorhoff G., and Wigger B.,
“Predicting Student Dropout: A Machine
Learning Approach, European Journal of Higher
Education, vol. 10, no. 1, pp. 28-47, 2020.
[5] Mulugeta M. and Borena B., Higher Education
Students’ Enrolment Forecasting System Using
Data Mining Application in Ethiopia HiLCoE
Journal of Computer Science and
Technology, vol. 2, no. 2, pp. 37-43, 2013.
[6] Nakhkob B. and Khademi M., Predicted Increase
Enrolment In Higher Education Using Neural
Networks and Data Mining Techniques, Journal
of Advances in Computer Research, vol. 7, no. 4,
pp. 125-140, 2016.
[7] Petkovski A., Stojkoska B., Trivodaliev K., and
Kalajdziski S., Analysis of Churn Prediction: A
Case Study on Telecommunication services in
Macedonia, in Proceedings of 24th
Telecommunications Forum, Belgrade, pp. 1-4,
2016.
[8] Rai S. and Jain A., Students' Dropout Risk
Assessment in Undergraduate Courses of ICT at
Residential University-A Case Study,
International Journal of Computer
Applications, vol. 84, no. 14, pp. 31-36, 2013.
[9] Sadatrasoul S., Gholamian M., and Shahanaghi
K., Combination of Feature Selection and
Optimized Fuzzy Apriori Rules: The Case of
Credit Scoring, The International Arab Journal
of Information Technology, vol. 12, no. 2, pp. 138-
145, 2015.
[10] Schapire R. and Freund Y., Boosting:
Foundations and Algorithms,” Kybernetes, 2013.
[11] Waters A. and Miikkulainen R., Grade: Machine
Learning Support for Graduate Admissions, AI
Magazine, vol. 35, no. 1, pp. 64-75, 2014.
[12] Yukselturk E., Ozekes S., and Türel Y.,
Predicting Dropout Student: An Application of
Data Mining Methods in an Online Education
Program, European Journal of Open, Distance
and e-learning, vol. 17, no. 1, pp.118-133, 2014.
Predicting Student Enrolments and Attrition Patterns in Higher Educational... 567
Samar Shilbayeh is Assistant
Professor in Business Analytics
program in Abu Dhabi School of
Management, she worked as a senior
data scientist and head of research
Centre in Cognitro Analytics
company experienced in extracting
data, analyzing, findings and applying predictive
modeling techniques, assisted in the development of an
analytics framework for smart construction of a new
type of predictive indictors “Cognitive Indicators”.
Developed active learning algorithms. Developed a cost
effective, expert intelligent system, which guides the
data miner to optimize models selection. Her research
interests include machine learning and AI approaches,
cost sensitive machine learning algorithms, applying
machine learning solutions in health, finance, telecom
and insurance. Dr Samar has a PhD in Machine learning
and AI from the university of Salford, Computer science
and Engineering department, Manchester, UK.
Abdullah Abonamah is the
President and Provost of the Abu
Dhabi School of Management. From
August, 2000 to October, 2007, he
was a Professor and Director of the
Institute for Technological
Innovation at Zayed University and
the Assistant Dean of the College of
Information Systems. Before coming to the UAE, he
was a Professor of Computer Science and the Computer
Science Division Head at the University of Akron. Dr.
Abonamah has a PhD in Computer Science from the
Illinois Institute of Technology, Chicago, Illinois and an
Executive Management Graduate Degree from Yale
University School of Management. His research and
teaching interests include strategy, technology
management, and entrepreneurship and innovation.
With over fifty publications in international journals and
conferences and a US patent in reliable systems, Dr.
Abdullah remains a strong advocate of strategic
management, innovation, entrepreneurship, and proper
technology-business integration.
... In the late 2010s, higher education institutions around the world began using predictive analytics to identify students at risk of dropping out or falling behind (Seidel & Kutieleh, 2017). AI models analyze data such as attendance, grades, and engagement to support early interventions, enabling educators and academic service units to provide targeted support to struggling students (Shilbayeh & Abonamah, 2021). With the rise of online learning at the dawn of the 2020s (Chan et al., 2021), partly prompted by the pandemic's boost to distance learning (Singh et al., 2022), AI-powered proctoring tools were introduced to maintain exam integrity for remote proctored exams (Kurni et al., 2023). ...
Chapter
Full-text available
This chapter explores the nuanced implications of artificial intelligence (AI), spotlighting the use of ChatGPT and generative AI within the area of higher education. Specifically, this chapter explores AI's transformative potential across industries, positioning ChatGPT as a prime example. By tracing the historical integration of AI in higher education, from initial adoption to advanced applications such as online proctoring and content creation, this chapter addresses pertinent concerns related to data privacy, biases, and ethical considerations. Emphasizing the necessity of understanding faculty and student readiness for ChatGPT integration, this chapter proposes recommendations to address challenges with AI and foster effective utilization. This chapter also highlights the importance of continual dialog and research for asserting AI's pivotal role in shaping the current and future of global higher education. While AI, exemplified by ChatGPT, promises improved educational support, personalized learning, and administrative efficiency, it also prompts inquiries about the future of teaching, learning, and privacy. As such, this introductory chapter advocates ongoing discourse and research to navigate ethical considerations, technological developments, and the broader impact of AI in education, involving diverse stakeholders for responsible and beneficial integration into global higher education.
... Additionally, they propose that these methods can be applied more broadly to benefit other higher education institutions, aiding in enhancing their overall performance and sustainability. The boosted regression tree model for predicting student enrolments achieved 89% accuracy using 10-fold crossvalidation and outperformed the single regression tree model that achieved only 76% accuracy (Shilbayeh & Abonamah, 2021). ...
Article
This study aims to enhance business sustainability in the context of Higher Education Institutions (HEIs) by utilizing AI and forecasting techniques. It explores the development and comparison of prediction models, including the use of dashboard development, to support decision-making processes within HEIs. The study covers various aspects, including the background of forecasting and prediction models, the use of specific models such as the Prophet Model, Long Short-Term Memory (LSTM) Model, and Polynomial Regression Model, as well as the importance of dashboards for HEIs. The methodology section outlines the data collection and preparation process, model selection, approach, diagrams, functional and non-functional requirements, justification of tools, and libraries and models used. The implementation section delves into the system design and development of the dashboard, including the login page, homepage, forecast page, and insert data page. As for the findings, the LSTM Model has proven to be the most accurate and suitable model to be implemented for forecasting student enrolment data in this study. The dashboard's future enhancements involve adding more faculties, predictive features for resource allocation, refining the visual identity, improving user registration on the login page, and exploring better models for student enrolment predictions. Overall, the study provides valuable insights into the application of AI and forecasting techniques in HEIs, aiming to enhance business sustainability and decision-making processes. It contributes to the growing body of knowledge on the use of technology-enabled AI in higher education institutions, with a focus on forecasting student enrolment data and developing prediction models.
... Understanding existing gender differences in skill development is becoming increasingly important to gain valuable information on how to close the gender gap. The analysis of specific sociodemographic characteristics in machine learning studies in education is still in its infancy [48], [49]. Gender prediction using machine learning models mainly focused on gender prediction of educational leadership [50]; female models and reinforcement in STEM [51]; exploring gender differences in learning computational thinking [52]; the intersection of the academic gender gap [53]; and gender stereotyping in academic dropout [54]. ...
Article
Full-text available
This article aims to study the performance of machine learning models in forecasting gender based on the students' open education competency perception. Data were collected from a convenience sample of 326 students from 26 countries using the eOpen instrument. The analysis comprises 1) a study of the students' perceptions of knowledge, skills, and attitudes or values related to open education and its sub-competencies from a 30-item questionnaire using machine learning models to forecast participants' gender, 2) validation of performance through cross-validation methods, 3) statistical analysis to find significant differences between machine learning models, and 4) an analysis from explainable machine learning models to find relevant features to forecast gender. The results confirm our hypothesis that the performance of machine learning models can effectively forecast gender based on the student's perceptions of knowledge, skills, and attitudes or values related to open education competency.
... Data description is a relevant process in data analysis, including collection, determination, cleaning, consistency and verifications. Therefore, consistent, complete and accurate data will provide results more efficiently [13]. The dataset under scrutiny comprised 709 records pertaining to employees within the life insurance sector, encompassing a diverse array of roles and attributes. ...
... An AI virtual assistant at the University of Georgia answered more than 5,000 admissions questions, saving staff time [25]. Operationally, AI can predict student attrition, pattern enrolment, improve building efficiency, and improve campus security through pattern recognition in security footage [29]. The benefits of AI are presented as a table 1. Table 1 Benefits of AI. ...
Article
Full-text available
Artificial intelligence (AI) has the potential to transform higher education through its potential to improve learning, research, and leadership in colleges and universities. This study aims to identify key themes, challenges, and recommendations related to AI implementation in higher education and leadership. A literature review was supported by 2 LLMs. The results show that AI is being used to personalize teaching, provide formative feedback, identify at-risk students, accelerate research discovery, streamline administrative processes through chatbots, and optimize resource utilization. However, the findings also highlight significant technical, ethical, cultural and resource challenges that institutions must overcome in order to successfully harness AI's enormous potential while mitigating risks. Empowering management through knowledge, funding, and support structures that promote good governance and responsible AI adoption is fundamental to successfully harnessing the enormous but still largely untapped potential of AI.
Book
Full-text available
Studies on Social and Education Sciences 2023 Editors Ömer Bilen, Eman Shaaban In the ever-evolving landscape of education and social sciences, the contributions from scholars worldwide continue to shape our understanding of the complex dynamics within society. As we navigate through another year, this compilation stands as a testament to the dedication and intellect of researchers addressing critical issues and advancements. As we embark on the intellectual journey presented in these chapters, we anticipate that this collection will serve as a source of inspiration and knowledge for scholars, educators, and practitioners alike. May these diverse perspectives contribute to the ongoing dialogue in education and social sciences, fostering a global community of thinkers dedicated to positive change. Citation Bilen, Ö. & Shaaban, E. (2023). Preface. In Ö. Bilen & E. Shaaban (Eds.), Studies on Social and Education Sciences 2023. ISTES Organization.
Article
Full-text available
The timing of degree completion for students taking post-secondary courses has been a constant source of angst for administrators wanting the best outcomes for their students. Most methods for predicting student degree completion extensions are completed by analog methods using human effort to analyze data. The majority of data analysis reporting of degree completion extension variables and impacts has, for decades, been done manually. Administrators primarily forecast the factors based on their expertise and intuition to evaluate implications and repercussions. The variables are large, varied, and situational to each individual and complex. We used machine learning (automated processes using predictive algorithms) to predict undergraduate extensions for at least 2 years beyond a standard 4 years to complete a bachelor's degree. The study builds a machine learning-based education understanding XAI model (ED-XAI) to examine students’ dependent and independent variables and accurately predict/explain degree extension. The study utilized Random Forest, Support Vector Machines, and Deep Learning Machine learning algorithms. XAI used Information Fusion, SHapley Additive exPlanations (SHAP), and Local Interpretable Model-Agnostic Explanations (LIME) models to explain the findings of the Machine Learning models. The ED-XAI model explained multiple scenarios and discovered variables influencing students’ degree completion linked to their status and funding source. The Random Forest model gave supreme predictive results with 89.1% Mean ROC, 71.6% Overall Precision, 86% Overall Recall, and 71.6% In-class Precision. The educational information system introduced in this study has significant implications for accurate variables reporting and impacts on degree extensions leading to successful degree completions minimally reported in higher education marketing analytics research.
Conference Paper
Full-text available
The traditional educational systems in certain nations, such as those in the Arab world, use the previous year's scores to forecast academic achievement. Meanwhile, the science, technology, engineering, arts and mathematics (STEAM) educational system also considers the student's talents and interests, in addition to their test results, for each of the years of study used to forecast academic achievement. However, despite the numerous variables that could potentially impact a student's future profession, the STEAM educational system accurately predicts the academic achievement of students in five to seven broad majors. A smart model, data, generative, factors (DGF), has been proposed to assign a specific major for each student based on their academic background, interests, skills, and main influencing factors. The DGF model uses the supervised deep neural network and is trained by the generative adversarial network (GAN) algorithm. The DGF model has the capability to predict a specific major among 17 different majors for every student, with a high learning performance (1.4133), a plausible error rate (0.0046), a rational number of learning epochs (206), and an optimal accuracy of 99.99%.
Conference Paper
Student dropout is a worldwide problem that affects an entire society; thus, being of great concern for academic institutions that seek to retain their students through different strategies, machine learning is the most used for the early detection of students at risk. For this reason, in the present work, an exhaustive systematic literature review study of manuscripts related to the prediction of student dropout was carried out. The articles were obtained from six databases, which were searched using the PRISMA methodology. A total of 88 manuscripts were selected from which 4 questions were posed. Finally, we obtained as an answer to the questions that the most used model is the random forest, with an accuracy of between 73 and 99% for predicting student dropout. For this, aspects such as academic, demographic, economic, and health aspects must be considered. Meanwhile, the technological tool for the models was the Python language according to this systematic review.KeywordsModelMachine learningPredictionStudent dropoutSchool dropout
Chapter
Full-text available
The digitization and collection of big data by higher education institutions can help administrators make informed decisions about resource allocation, specifically in enrollment management. This study explores the use of machine learning and neural network algorithms to predict future student enrollments in courses. Real data from the Arab American University in Palestine (AAUP) was used. Eight machine learning algorithms, in addition to the Multilayer Perceptron (MLP) neural network model, were used to predict which students are most likely to enroll in a specific course. Results show that ensemble-based and bagging algorithms outperform other classifiers, including neural network models, for individual-level prediction of student enrollments. The Random Forest algorithm achieves the highest accuracy of 94% and an F1 score of 79% after applying under-sampling techniques on the highly imbalanced dataset. The study recommends future research to develop a generalized model for predicting enrollment in any course at AAUP and highlights the effectiveness of these techniques for improving resource allocation and student support.
Article
Full-text available
This study examined the prediction of dropouts through data mining approaches in an online program. The subject of the study was selected from a total of 189 students who registered to the online Information Technologies Certificate Program in 2007-2009. The data was collected through online questionnaires (Demographic Survey, Online Technologies Self-Efficacy Scale, Readiness for Online Learning Questionnaire, Locus of Control Scale, and Prior Knowledge Questionnaire). The collected data included 10 variables, which were gender, age, educational level, previous online experience, occupation, self efficacy, readiness, prior knowledge, locus of control, and the dropout status as the class label (dropout/not). In order to classify dropout students, four data mining approaches were applied based on k-Nearest Neighbour (k-NN), Decision Tree (DT), Naive Bayes (NB) and Neural Network (NN). These methods were trained and tested using 10-fold cross validation. The detection sensitivities of 3-NN, DT, NN and NB classifiers were 87%, 79.7%, 76.8% and 73.9% respectively. Also, using Genetic Algorithm (GA) based feature selection method, online technologies self-efficacy, online learning readiness, and previous online experience were found as the most important factors in predicting the dropouts.
Article
Full-text available
Massive Open Online Courses (MOOCs) have shown rapid development in recent years, allowing learners to access high-quality digital material. Because of facilitated learning and the flexibility of the teaching environment, the number of participants is rapidly growing. However, extensive research reports that the high attrition rate and low completion rate are major concerns. In this paper, the early identification of students who are at risk of withdrew and failure is provided. Therefore, two models are constructed namely at-risk student model and learning achievement model. The models have the potential to detect the students who are in danger of failing and withdrawal at the early stage of the online course. The result reveals that all classifiers gain good accuracy across both models, the highest performance yield by GBM with the value of 0.894, 0.952 for first, second model respectively, while RF yield the value of 0.866, in at-risk student framework achieved the lowest accuracy. The proposed frameworks can be used to assist instructors in delivering intensive intervention support to at-risk students.
Article
Full-text available
Advanced data mining techniques can be classified into university, finding valuable patterns in the successful design of an application or a successful method of teaching university and critical points of financial management are used. In this article, we present an algorithm to predict absorption and increase student enrollment coming years using these techniques on data sets of volunteers postgraduate Islamic Azad university entrance exam. In this regard, at first we are modeling data in 15 models of neural network and to increase the accuracy use collective Bagging and Boosting models. And then, the four models: neural networks, decision trees, Bayes simple logistic regression on the data set was used and re-apply three criteria to evaluate the accuracy, Matthews correlation ROC curve, and was compared the results. Finally, by using the T-test and the optimal model to predict Bagging model “Students who-where accepted”, will enroll approximately 96% was chosen carefully.
Conference Paper
Full-text available
Customer churn is one of the main problems in the telecommunications industry. Several studies have shown that attracting new customers is much more expensive than retaining existing ones. Therefore, companies are focusing on developing accurate and reliable predictive models to identify potential customers that will churn in the near future. The aim of this paper is investigating the main reasons for churn in telecommunication sector in Macedonia. The proposed methodology for analysis of churn prediction covers several phases: understanding the business; selection, analysis and data processing; implementing various algorithms for classification; evaluation of the classifiers and choosing the best one for prediction. The obtained results for the data from a telecommunication company in Macedonia, should be of great value for management and marketing departments of other telecommunication companies in the country and wider.
Article
We perform two approaches of machine learning, logistic regressions and decision trees, to predict student dropout at the Karlsruhe Institute of Technology (KIT). The models are computed on the basis of examination data, i.e. data available at all universities without the need of specific collection. Therefore, we propose a methodical approach that may be put in practice with relative ease at other institutions. We find decision trees to produce slightly better results than logistic regressions. However, both methods yield high prediction accuracies of up to 95% after three semesters. A classification with more than 83% accuracy is already possible after the first semester.
Article
This article describes GRADE, a statistical machine-learning system developed to support the work of the graduate admissions committee at the University of Texas at Austin Department of Computer Science (UTCS). In recent years, the number of applications to the UTCS Ph.D. program has become too large to manage with a traditional review process. GRADE uses historical admissions data to predict how likely the committee is to admit each new applicant. It reports each prediction as a score similar to those used by human reviewers, and accompanies each by an explanation of what applicant features most influenced its prediction. GRADE makes the review process more efficient by enabling reviewers to spend most of their time on applicants near the decision boundary and by focusing their attention on parts of each applicant's file that matter the most. An evaluation over two seasons of Ph.D. admissions indicates that the system leads to dramatic time savings, reducing the total time spent on reviews by at least 74 percent. Copyright © 2014, Association for the Advancement of Artificial Intelligence.
Article
Credit scoring is an important topic and banks collect different data from their loan applicants to make appropriate and correct decisions. Rule bases are favourite in credit decision making because of their ability to explicitly distinguish between good and bad applicants. This paper, uses four feature selection approaches as features pre-processing combined with fuzzy apriori. These methods are stepwise regression, Classification And Regression Tree (CARD, correlation matrix and Principle Component Analysis (PCA). Particle Swarm is applied to find the best fuzzy apriori rules by searching different support and confidence. Considering Australian and German University of California at Irvine (UCI) and an Iranian bank datasets, different feature selections methods are compared in terms of accuracy, number of rules and number of features. The results are compared using T test; it reveals that fuzzy apriori combined with PCA creates a compact rule base and shows better results than the single fuzzy apriori model and other combined feature selection methods.