Content uploaded by Iman Bagheri
Author content
All content in this area was uploaded by Iman Bagheri on Nov 26, 2023
Content may be subject to copyright.
Educational Computer Science Development
Based on Data Mining
Mina Fazlikhani *, Iman Bagheri
1 Department of Computer Engineering, Iran University of Science & Technology, Tehran, Iran,
minafazlikhani1999@gmail.com
2 University Lecturer, Department of Electrical and Electronics Engineering, Montazeri Technical and
Vocational University
bagheri.ece@gmail.com
Abstract.
The growing use of technology in education generates massive amounts of data every day,
attracting scholars worldwide. Educational data mining is growing quickly and may use cutting-
edge machine learning and data mining methods and approaches. Advanced data mining
techniques can help understand students, particularly computer science students, and their
learning environments, as well as extract unique, captivating, and meaningful insights from
educational data. EDM may be used to identify at-risk students, prioritize different student
groups, raise graduation rates, evaluate institutional performance, optimize campus resources,
and improve topic curricula. A comprehensive assessment of EDM studies showed its
widespread use and effect. This research investigated the methods and data utilized in numerous
studies and identified the most popular and promising EDM applications.
Keywords: Data Mining, Educational Data Mining, Machine Learning, Data Analytics,
Computer Science Education
Introduction
The primary goal of any educational system is to equip students with the knowledge and skills
necessary to thrive in the workforce within a specified period of time. The effectiveness of these
systems in achieving this goal has a profound effect on both economic and social advancement
as if in some countries, education is freely available to all citizens from primary school through
university [1][2].
Consequently, a significant number of students enroll in higher education each year. However,
with such a large student population, it is becoming increasingly challenging to provide top-
quality education and support. As a result, a considerable proportion of students struggle to
complete their degrees within the designated timeframe.
Empowering educators with valuable insights, and data mining (DM) techniques can aid in
uncovering the factors contributing to student failures. The abundance of data housed in student
databases surpasses human capacity for comprehensive analysis, making automated techniques
critical.
Through knowledge discovery, DM allows for the extraction of significant and previously
unknown information from vast databases, and by identifying patterns that align with specific
user needs, data mining is an essential tool in the pursuit of meaningful insights [3][4].
The use of technology in academic settings is steadily increasing, allowing access to a vast
amount of previously unavailable information. This has been made possible through the method
of Educational Data Mining or EDM for short, which provides a more accurate understanding
of students' individual learning processes. More importantly, EDM offers valuable and relevant
information [5], utilizing data mining strategies to tackle educational challenges.
In a manner akin to other data mining approaches, EDM successfully extracts new, engaging,
and easily understandable information from educational data. However, EDM focuses
specifically on developing methodologies that can effectively utilize a variety of data in
educational settings, and as a result, these methods are now routinely employed [6][7].
In a nutshell, educational data mining (EDM) empowers users to extract valuable insights from
their students' data. These insights can be utilized in numerous ways, such as verifying and
assessing educational systems, enhancing teaching and learning processes, and creating a more
effective learning environment as if we have seen similar tactics successfully applied in
maximizing sales revenue across different datasets, including e-commerce platforms [8].
This has proven particularly fruitful in the business sector. As a result, the effectiveness of
utilizing data mining techniques in business data has contributed to their widespread acceptance
in various fields of knowledge. Notably, data mining has been effectively employed in
educational data to support research initiatives, like elevating the academic studies quality [9].
EDM, or educational data mining, is a valuable tool used in academic settings to manage and
solve a variety of tasks and challenges. In fact, scholars such as Baker [10] and [11] have
identified four key areas in which EDM can be applied: conducting empirical research on
learning and learners, examining the effectiveness of educational software, creating models of
students and domains, and improving domain models.
In total, there are five different methodologies and methods available for EDM, including
prediction, clustering, association mining, data distillation for human assessment, and model-
based discovery. As noted by Castro [12], EDM tasks can be categorized into four main groups:
evaluating student learning performance, adapting courses, and providing personalized learning
recommendations for students.
With its ability to analyze multiple perspectives, sort through vast amounts of data, and identify
meaningful relationships within a database, DM emerges as a robust AI tool. Its valuable
insights are essential for informed decision-making and to achieve optimal outcomes, DM
solutions may rely on individual algorithms or a combination of them [13].
Some algorithms are designed to interpret data, while others harness it to achieve specific goals.
For instance, clustering algorithms can detect patterns in data and group them into numerous
categories. The resulting data within each group is highly cohesive, creating a comprehensive
understanding of the information [14][15].
Through the application of regression tree methodology, it may be possible to extract
association rules and financial predictions, thereby conducting a thorough market analysis. In
today's vast digital landscape, the sheer volume of data stored in databases makes it nearly
impossible for individuals to manually sift through and uncover relevant insights. This is where
automated analytical techniques come into play [16].
Knowledge discovery involves tapping into vast databases to unearth previously unrecognized
and valuable information. The utilization of data mining within KD has led to the discovery of
patterns and patterns in a user's preferences.
An example provided in [17] illustrates this concept, showcasing how linguistic statements can
accurately describe a subset of data, referred to as "pattern description."
Successfully unraveling patterns requires confidence, influenced by various factors such as
sample size, data quality, and domain expert input. The accuracy of pattern discovery by DM
is consequently impacted.
Although DM may uncover numerous patterns in a database, only a small portion will be
noteworthy. It is up to the user to carefully consider the level of confidence in a pattern before
deeming it valid, as patterns that initially capture attention may not always be reliable [18][19].
Data Mining in Educational Computer Science
Educational data mining in the field of computer science is a rapidly evolving field that centers
on developing effective techniques for analyzing the specific data generated within educational
settings.
By utilizing these methods, researchers aim to gain a deeper understanding of both students and
their learning environments. Although the discipline is relatively new, it has been steadily
gaining momentum in recent years.
Explicit EDM, unlike traditional data mining approaches, acknowledges the complex
hierarchical structure of educational data and offers opportunities for its optimization. However,
its effectiveness is limited by the lack of access to independent educational data [6].
Educational data mining techniques can be discovered through a plethora of literature sources,
spanning several related fields such as data mining, machine learning, psychometrics, and other
computer modeling, statistics, and information visualization disciplines.
Within the realm of educational data mining (EDM), the two primary categories of work include
"web mining" and "statistics and visualization."
Both fields have garnered significant focus from both empirical research and theoretical
discourse. According to Baker's perspective, the tasks associated with EDM can be organized
under the following headings:
Prediction, including
1) Regression
2) Classification
3) Density
Relationship Mining
1) Association Rule Mining
2) Correlation mining
3) Sequential Pattern Mining
4) Casual Data Mining
Clustering
Distillation of Data
Model Discovery
The majority of the items listed are believed to be considered regarding DM. However, there
are some who do not classify the act of extracting information for the purpose of human
decision-making as DM.
In terms of EDM research, the area that has been most heavily studied is relationship mining in
all its forms. When viewed through the lens of traditional DM, one of the most unique categories
in Baker's EDM taxonomy is the use of models for discovery. This method is commonly utilized
to simulate processes that can be mapped out and verified through a set procedure.
Afterward, the model is integrated into other models, such as those for predicting and mining
relationships. However, the concept of model discovery has not received much attention in the
educational data mining community recently. This study aims to identify which subcategories
of learning resources yield the greatest advantages for students [20], the impact of different
student behaviors on their learning outcomes [21], and the influence of tutorial design on
student learning. In the field of educational data mining, there has been a stronger emphasis on
connection mining strategies in recent years.
Data Mining, Prediction and Clustering
Effective prediction relies on utilizing past data to forecast unknown variables. This involves
inputting relevant information, known as predictor variables, which may either retain their
original status or gain new classifications. The selection of variables fed into the prediction
model plays a crucial role in determining its accuracy. Therefore, it is vital to properly restrict
and label the data for the output variable.
This tagged data provides a historical perspective on the factors that require projection. As the
model for prediction is being developed, it is important to consider how the quality of the
training data will impact its effectiveness. One way to address this is by segmenting the data
into distinct groups using a method called clustering.
The proliferation of the Internet of Things (IoT) continues to be a major concern, as many of
these devices were not designed with robust security measures in place, making them vulnerable
to cyber-attacks. Despite efforts to implement encryption and other intrusion protection
methods, their effectiveness is hindered by the limitations of these systems.
As such, safeguarding personal privacy is crucial, especially in environments like smart homes
where a large amount of sensitive information is collected. While there have been proposed
solutions for protecting privacy through device monitoring, these tactics are rarely utilized due
to device restrictions and other associated challenges.
As these groups interact, they will likely display some common characteristics. In data analysis,
clustering is a distinct method from classification in that it does not rely on pre-defined labels
for the data. With clustering, users can gain a comprehensive understanding of the patterns
within the dataset.
It is often referred to as an "unsupervised classification" [10], as it does not rely on provided
labels. Our approach involves identifying natural affinities between data points and grouping
them into distinct clusters, providing a method for organizing and categorizing the data.
By utilizing the clustering method, it is possible to pre-determine the number of resulting
groups. This approach is commonly employed in situations where the dominant category is not
clearly defined within the dataset.
Furthermore, it can aid in reducing the scope of a study area. For instance, a multitude of schools
could be grouped together under a single category, based on shared characteristics and
distinctions.
Conclusion
The ever-increasing integration of technology in education is generating a vast amount of data
on a daily basis, grabbing the interest of researchers across the globe. As a result, the realm of
educational data mining is rapidly expanding and presents the advantage of incorporating
cutting-edge algorithms and techniques derived from machine learning and other data mining
fields. With the implementation of advanced data mining techniques, there is the potential for
a deeper understanding of students, specifically the students of the computer science field, and
their learning environments, as well as the extraction of distinct, captivating, and meaningful
insights from educational data. There is no limit to the possibilities of using EDM - from
identifying at-risk students to prioritizing the needs of diverse student groups, boosting
graduation rates, evaluating institutional performance, optimizing campus resources, and
improving subject curriculum. In fact, a thorough review of EDM research revealed its
widespread implementation and impact. This study not only examined the methodology and
data used in various investigations but also highlighted the most popular EDM applications and
those with the greatest potential for growth.
References
[1] Anghel, A. G., Drăghicescu, L. M., Cristea, G. C., Gorghiu, G., Gorghiu, L. M., & Petrescu,
A. M. (2014). The social knowledge–a goal of the social sustainable development. Procedia-
Social and Behavioral Sciences, 149, 43-49.
[2] Novikov, S. G. (2019). THE GOAL OF EDUCATON IN KNOWLEDGE SOCIETY. In
The European Proceedings of Social & Behavioural Sciences EpSBS (pp. 590-597).
[3] Sarra, A., Fontanella, L., & Di Zio, S. (2019). Identifying students at risk of academic failure
within the educational data mining framework. Social Indicators Research, 146, 41-60.
[4] López Zambrano, J., Lara Torralbo, J. A., & Romero Morales, C. (2021). Early prediction
of student learning performance through data mining: A systematic review. Psicothema.
[5] J. Mostow and J. Beck, “Some useful tactics to modify, map and mine data from intelligent
tutors,” Natural Language Engineering, vol. 12, no. 02, pp. 195–208, 2006.
[6] S. K. Mohamad and Z. Tasir, “Educational data mining: A review,” Procedia-Social and
Behavioral Sciences, vol. 97, pp. 320–324, 2013.
[7] Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Van Erven, G.
(2019). Educational data mining: Predictive analysis of academic performance of public school
students in the capital of Brazil. Journal of business research, 94, 335-343.
[8] C. Romero, S. Ventura, and P. De Bra, “Knowledge discovery with genetic programming
for providing feedback to courseware authors,” User Modeling and User-Adapted Interaction,
vol. 14, no. 5, pp. 425– 464, 2004
[9] N. S. Raghavan, “Data mining in e-commerce: A survey,” Sadhana, vol. 30, no. 2-3, pp.
275–289, 2005.
[10] R. Baker et al., “Data mining for education,” International encyclopedia of education, vol.
7, pp. 112–118, 2010.
[11] R. S. Baker and K. Yacef, “The state of educational data mining in 2009: A review and
future visions,” JEDM-Journal of Educational Data Mining, vol. 1, no. 1, pp. 3–17, 2009.
[12] F. Castro, A. Vellido, A. Nebot, and F. Mugica, “Applying data min- ` ing techniques to
e-learning problems,” in Evolution of teaching and learning paradigms in intelligent
environment, pp. 183–221, Springer, 2007.
[13] Guo, Y., Zhang, W., Qin, Q., Chen, K., & Wei, Y. (2023). Intelligent manufacturing
management system based on data mining in artificial intelligence energy-saving resources.
Soft Computing, 27(7), 4061-4076.
[14] Nguyen, G., Dlugolinsky, S., Bobák, M., Tran, V., López García, Á., Heredia, I., ... &
Hluchý, L. (2019). Machine learning and deep learning frameworks and libraries for large-scale
data mining: a survey. Artificial Intelligence Review, 52, 77-124.
[15] Li, L. (2023). Application of Machine Learning and Data Mining in Medicine:
Opportunities and Considerations.
[16] Khan, S. (2021). Study Factors for Student Performance Applying Data Mining Regression
Model Approach. International Journal of Computer Science & Network Security, 21(2), 188-
192.
[17] S.-T. Wu, “Knowledge discovery using pattern taxonomy model in text mining,” 2007.
[18] Aher, S. B., & Lobo, L. M. R. J. (2011, March). Data mining in educational system using
weka. In International conference on emerging technology trends (ICETT) (Vol. 3, pp. 20-25).
[19] Agarwal, S. (2013, December). Data mining: Data mining concepts and techniques. In
2013 international conference on machine intelligence and research advancement (pp. 203-
207). IEEE.
[20] J. E. Beck and J. Mostow, “How who should practice: Using learning decomposition to
evaluate the efficacy of different types of practice for different types of students,” in Intelligent
tutoring systems, pp. 353–362, Springer, 2008.
[21] M. Cocea, A. Hershkovitz, and R. S. Baker, “The impact of off-task and gaming behaviors
on learning: immediate or aggregate?,” 2009.