Conference PaperPDF Available

Educational Computer Science Development Based on Data Mining

Authors:
  • St. Mary's University

Abstract

The growing use of technology in education generates massive amounts of data every day, attracting scholars worldwide. Educational data mining is growing quickly and may use cutting-edge machine learning and data mining methods and approaches. Advanced data mining techniques can help understand students, particularly computer science students, and their learning environments, as well as extract unique, captivating, and meaningful insights from educational data. EDM may be used to identify at-risk students, prioritize different student groups, raise graduation rates, evaluate institutional performance, optimize campus resources, and improve topic curricula. A comprehensive assessment of EDM studies showed its widespread use and effect. This research investigated the methods and data utilized in numerous studies and identified the most popular and promising EDM applications.
Educational Computer Science Development
Based on Data Mining
Mina Fazlikhani *, Iman Bagheri
1 Department of Computer Engineering, Iran University of Science & Technology, Tehran, Iran,
minafazlikhani1999@gmail.com
2 University Lecturer, Department of Electrical and Electronics Engineering, Montazeri Technical and
Vocational University
bagheri.ece@gmail.com
Abstract.
The growing use of technology in education generates massive amounts of data every day,
attracting scholars worldwide. Educational data mining is growing quickly and may use cutting-
edge machine learning and data mining methods and approaches. Advanced data mining
techniques can help understand students, particularly computer science students, and their
learning environments, as well as extract unique, captivating, and meaningful insights from
educational data. EDM may be used to identify at-risk students, prioritize different student
groups, raise graduation rates, evaluate institutional performance, optimize campus resources,
and improve topic curricula. A comprehensive assessment of EDM studies showed its
widespread use and effect. This research investigated the methods and data utilized in numerous
studies and identified the most popular and promising EDM applications.
Keywords: Data Mining, Educational Data Mining, Machine Learning, Data Analytics,
Computer Science Education
Introduction
The primary goal of any educational system is to equip students with the knowledge and skills
necessary to thrive in the workforce within a specified period of time. The effectiveness of these
systems in achieving this goal has a profound effect on both economic and social advancement
as if in some countries, education is freely available to all citizens from primary school through
university [1][2].
Consequently, a significant number of students enroll in higher education each year. However,
with such a large student population, it is becoming increasingly challenging to provide top-
quality education and support. As a result, a considerable proportion of students struggle to
complete their degrees within the designated timeframe.
Empowering educators with valuable insights, and data mining (DM) techniques can aid in
uncovering the factors contributing to student failures. The abundance of data housed in student
databases surpasses human capacity for comprehensive analysis, making automated techniques
critical.
Through knowledge discovery, DM allows for the extraction of significant and previously
unknown information from vast databases, and by identifying patterns that align with specific
user needs, data mining is an essential tool in the pursuit of meaningful insights [3][4].
The use of technology in academic settings is steadily increasing, allowing access to a vast
amount of previously unavailable information. This has been made possible through the method
of Educational Data Mining or EDM for short, which provides a more accurate understanding
of students' individual learning processes. More importantly, EDM offers valuable and relevant
information [5], utilizing data mining strategies to tackle educational challenges.
In a manner akin to other data mining approaches, EDM successfully extracts new, engaging,
and easily understandable information from educational data. However, EDM focuses
specifically on developing methodologies that can effectively utilize a variety of data in
educational settings, and as a result, these methods are now routinely employed [6][7].
In a nutshell, educational data mining (EDM) empowers users to extract valuable insights from
their students' data. These insights can be utilized in numerous ways, such as verifying and
assessing educational systems, enhancing teaching and learning processes, and creating a more
effective learning environment as if we have seen similar tactics successfully applied in
maximizing sales revenue across different datasets, including e-commerce platforms [8].
This has proven particularly fruitful in the business sector. As a result, the effectiveness of
utilizing data mining techniques in business data has contributed to their widespread acceptance
in various fields of knowledge. Notably, data mining has been effectively employed in
educational data to support research initiatives, like elevating the academic studies quality [9].
EDM, or educational data mining, is a valuable tool used in academic settings to manage and
solve a variety of tasks and challenges. In fact, scholars such as Baker [10] and [11] have
identified four key areas in which EDM can be applied: conducting empirical research on
learning and learners, examining the effectiveness of educational software, creating models of
students and domains, and improving domain models.
In total, there are five different methodologies and methods available for EDM, including
prediction, clustering, association mining, data distillation for human assessment, and model-
based discovery. As noted by Castro [12], EDM tasks can be categorized into four main groups:
evaluating student learning performance, adapting courses, and providing personalized learning
recommendations for students.
With its ability to analyze multiple perspectives, sort through vast amounts of data, and identify
meaningful relationships within a database, DM emerges as a robust AI tool. Its valuable
insights are essential for informed decision-making and to achieve optimal outcomes, DM
solutions may rely on individual algorithms or a combination of them [13].
Some algorithms are designed to interpret data, while others harness it to achieve specific goals.
For instance, clustering algorithms can detect patterns in data and group them into numerous
categories. The resulting data within each group is highly cohesive, creating a comprehensive
understanding of the information [14][15].
Through the application of regression tree methodology, it may be possible to extract
association rules and financial predictions, thereby conducting a thorough market analysis. In
today's vast digital landscape, the sheer volume of data stored in databases makes it nearly
impossible for individuals to manually sift through and uncover relevant insights. This is where
automated analytical techniques come into play [16].
Knowledge discovery involves tapping into vast databases to unearth previously unrecognized
and valuable information. The utilization of data mining within KD has led to the discovery of
patterns and patterns in a user's preferences.
An example provided in [17] illustrates this concept, showcasing how linguistic statements can
accurately describe a subset of data, referred to as "pattern description."
Successfully unraveling patterns requires confidence, influenced by various factors such as
sample size, data quality, and domain expert input. The accuracy of pattern discovery by DM
is consequently impacted.
Although DM may uncover numerous patterns in a database, only a small portion will be
noteworthy. It is up to the user to carefully consider the level of confidence in a pattern before
deeming it valid, as patterns that initially capture attention may not always be reliable [18][19].
Data Mining in Educational Computer Science
Educational data mining in the field of computer science is a rapidly evolving field that centers
on developing effective techniques for analyzing the specific data generated within educational
settings.
By utilizing these methods, researchers aim to gain a deeper understanding of both students and
their learning environments. Although the discipline is relatively new, it has been steadily
gaining momentum in recent years.
Explicit EDM, unlike traditional data mining approaches, acknowledges the complex
hierarchical structure of educational data and offers opportunities for its optimization. However,
its effectiveness is limited by the lack of access to independent educational data [6].
Educational data mining techniques can be discovered through a plethora of literature sources,
spanning several related fields such as data mining, machine learning, psychometrics, and other
computer modeling, statistics, and information visualization disciplines.
Within the realm of educational data mining (EDM), the two primary categories of work include
"web mining" and "statistics and visualization."
Both fields have garnered significant focus from both empirical research and theoretical
discourse. According to Baker's perspective, the tasks associated with EDM can be organized
under the following headings:
Prediction, including
1) Regression
2) Classification
3) Density
Relationship Mining
1) Association Rule Mining
2) Correlation mining
3) Sequential Pattern Mining
4) Casual Data Mining
Clustering
Distillation of Data
Model Discovery
The majority of the items listed are believed to be considered regarding DM. However, there
are some who do not classify the act of extracting information for the purpose of human
decision-making as DM.
In terms of EDM research, the area that has been most heavily studied is relationship mining in
all its forms. When viewed through the lens of traditional DM, one of the most unique categories
in Baker's EDM taxonomy is the use of models for discovery. This method is commonly utilized
to simulate processes that can be mapped out and verified through a set procedure.
Afterward, the model is integrated into other models, such as those for predicting and mining
relationships. However, the concept of model discovery has not received much attention in the
educational data mining community recently. This study aims to identify which subcategories
of learning resources yield the greatest advantages for students [20], the impact of different
student behaviors on their learning outcomes [21], and the influence of tutorial design on
student learning. In the field of educational data mining, there has been a stronger emphasis on
connection mining strategies in recent years.
Data Mining, Prediction and Clustering
Effective prediction relies on utilizing past data to forecast unknown variables. This involves
inputting relevant information, known as predictor variables, which may either retain their
original status or gain new classifications. The selection of variables fed into the prediction
model plays a crucial role in determining its accuracy. Therefore, it is vital to properly restrict
and label the data for the output variable.
This tagged data provides a historical perspective on the factors that require projection. As the
model for prediction is being developed, it is important to consider how the quality of the
training data will impact its effectiveness. One way to address this is by segmenting the data
into distinct groups using a method called clustering.
The proliferation of the Internet of Things (IoT) continues to be a major concern, as many of
these devices were not designed with robust security measures in place, making them vulnerable
to cyber-attacks. Despite efforts to implement encryption and other intrusion protection
methods, their effectiveness is hindered by the limitations of these systems.
As such, safeguarding personal privacy is crucial, especially in environments like smart homes
where a large amount of sensitive information is collected. While there have been proposed
solutions for protecting privacy through device monitoring, these tactics are rarely utilized due
to device restrictions and other associated challenges.
As these groups interact, they will likely display some common characteristics. In data analysis,
clustering is a distinct method from classification in that it does not rely on pre-defined labels
for the data. With clustering, users can gain a comprehensive understanding of the patterns
within the dataset.
It is often referred to as an "unsupervised classification" [10], as it does not rely on provided
labels. Our approach involves identifying natural affinities between data points and grouping
them into distinct clusters, providing a method for organizing and categorizing the data.
By utilizing the clustering method, it is possible to pre-determine the number of resulting
groups. This approach is commonly employed in situations where the dominant category is not
clearly defined within the dataset.
Furthermore, it can aid in reducing the scope of a study area. For instance, a multitude of schools
could be grouped together under a single category, based on shared characteristics and
distinctions.
Conclusion
The ever-increasing integration of technology in education is generating a vast amount of data
on a daily basis, grabbing the interest of researchers across the globe. As a result, the realm of
educational data mining is rapidly expanding and presents the advantage of incorporating
cutting-edge algorithms and techniques derived from machine learning and other data mining
fields. With the implementation of advanced data mining techniques, there is the potential for
a deeper understanding of students, specifically the students of the computer science field, and
their learning environments, as well as the extraction of distinct, captivating, and meaningful
insights from educational data. There is no limit to the possibilities of using EDM - from
identifying at-risk students to prioritizing the needs of diverse student groups, boosting
graduation rates, evaluating institutional performance, optimizing campus resources, and
improving subject curriculum. In fact, a thorough review of EDM research revealed its
widespread implementation and impact. This study not only examined the methodology and
data used in various investigations but also highlighted the most popular EDM applications and
those with the greatest potential for growth.
References
[1] Anghel, A. G., Drăghicescu, L. M., Cristea, G. C., Gorghiu, G., Gorghiu, L. M., & Petrescu,
A. M. (2014). The social knowledgea goal of the social sustainable development. Procedia-
Social and Behavioral Sciences, 149, 43-49.
[2] Novikov, S. G. (2019). THE GOAL OF EDUCATON IN KNOWLEDGE SOCIETY. In
The European Proceedings of Social & Behavioural Sciences EpSBS (pp. 590-597).
[3] Sarra, A., Fontanella, L., & Di Zio, S. (2019). Identifying students at risk of academic failure
within the educational data mining framework. Social Indicators Research, 146, 41-60.
[4] López Zambrano, J., Lara Torralbo, J. A., & Romero Morales, C. (2021). Early prediction
of student learning performance through data mining: A systematic review. Psicothema.
[5] J. Mostow and J. Beck, “Some useful tactics to modify, map and mine data from intelligent
tutors,” Natural Language Engineering, vol. 12, no. 02, pp. 195208, 2006.
[6] S. K. Mohamad and Z. Tasir, “Educational data mining: A review,” Procedia-Social and
Behavioral Sciences, vol. 97, pp. 320324, 2013.
[7] Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Van Erven, G.
(2019). Educational data mining: Predictive analysis of academic performance of public school
students in the capital of Brazil. Journal of business research, 94, 335-343.
[8] C. Romero, S. Ventura, and P. De Bra, “Knowledge discovery with genetic programming
for providing feedback to courseware authors,” User Modeling and User-Adapted Interaction,
vol. 14, no. 5, pp. 425 464, 2004
[9] N. S. Raghavan, “Data mining in e-commerce: A survey,” Sadhana, vol. 30, no. 2-3, pp.
275289, 2005.
[10] R. Baker et al., “Data mining for education,” International encyclopedia of education, vol.
7, pp. 112118, 2010.
[11] R. S. Baker and K. Yacef, “The state of educational data mining in 2009: A review and
future visions,” JEDM-Journal of Educational Data Mining, vol. 1, no. 1, pp. 317, 2009.
[12] F. Castro, A. Vellido, A. Nebot, and F. Mugica, “Applying data min- ` ing techniques to
e-learning problems,” in Evolution of teaching and learning paradigms in intelligent
environment, pp. 183221, Springer, 2007.
[13] Guo, Y., Zhang, W., Qin, Q., Chen, K., & Wei, Y. (2023). Intelligent manufacturing
management system based on data mining in artificial intelligence energy-saving resources.
Soft Computing, 27(7), 4061-4076.
[14] Nguyen, G., Dlugolinsky, S., Bobák, M., Tran, V., López García, Á., Heredia, I., ... &
Hluchý, L. (2019). Machine learning and deep learning frameworks and libraries for large-scale
data mining: a survey. Artificial Intelligence Review, 52, 77-124.
[15] Li, L. (2023). Application of Machine Learning and Data Mining in Medicine:
Opportunities and Considerations.
[16] Khan, S. (2021). Study Factors for Student Performance Applying Data Mining Regression
Model Approach. International Journal of Computer Science & Network Security, 21(2), 188-
192.
[17] S.-T. Wu, “Knowledge discovery using pattern taxonomy model in text mining,” 2007.
[18] Aher, S. B., & Lobo, L. M. R. J. (2011, March). Data mining in educational system using
weka. In International conference on emerging technology trends (ICETT) (Vol. 3, pp. 20-25).
[19] Agarwal, S. (2013, December). Data mining: Data mining concepts and techniques. In
2013 international conference on machine intelligence and research advancement (pp. 203-
207). IEEE.
[20] J. E. Beck and J. Mostow, “How who should practice: Using learning decomposition to
evaluate the efficacy of different types of practice for different types of students,” in Intelligent
tutoring systems, pp. 353362, Springer, 2008.
[21] M. Cocea, A. Hershkovitz, and R. S. Baker, “The impact of off-task and gaming behaviors
on learning: immediate or aggregate?,” 2009.
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
With the continuous development of information technology, machine learning and data mining have gradually found widespread applications across various industries. These technologies delve deeper into uncovering intrinsic patterns through the application of computer science. This trend is especially evident in today’s era of advanced artificial intelligence, which marks the anticipated third industrial revolution. By harnessing cutting-edge techniques such as multimodal large-scale models, artificial intelligence is profoundly impacting traditional scientific research methods. The use of machine learning and data mining techniques in medical research has a long-standing history. In addition to traditional methods such as logistic regression, decision trees, and Bayesian analysis, newer technologies such as neural networks, random forests, support vector machines, Histogram-based Gradient Boosting, XGBoost, LightGBM, and CatBoost have gradually gained widespread adoption. Each of these techniques has its own advantages and disadvantages, requiring careful selection based on the specific research objectives in clinical practice. Today, with the emergence of large language models such as ChatGPT 3.5, machine learning and data mining are gaining new meanings and application prospects. ChatGPT offers benefits such as optimized code algorithms and ease of use, saving time and enhancing efficiency for medical researchers. It is worth promoting the use of ChatGPT in clinical research.
Article
Full-text available
At present, the old production management mode has become a stumbling block to the development of enterprises, and the high-end manufacturing technology is still not mature enough. This research mainly discusses the intelligent manufacturing management system based on data mining in artificial intelligence energy-saving resources. The enterprise business management system cannot accurately and timely grasp the actual situation of the production site, and the accuracy and feasibility of the upper-level planning cannot be guaranteed. At the same time, on-site personnel and equipment cannot get practical production plans and production instructions in time, resulting in product backlogs and excessive inventory. On the other hand, equipment is idle and resources are wasted, and the workshop scheduling system loses the corresponding scheduling role. The development of this system is mainly composed of front-end technology, back-end technology and front-end and back-end interaction technology. The interface design of the front end is mainly completed by the windows form application in c#. The interaction between the front and back ends is mainly realized by programming in each control of the form application. Back-end technology is the core content of the system, mainly including two key technologies: mixed programming of C #. Net and MATLAB and C # connecting SQL Server database. The system mainly includes five sub-functional modules: order management, material management, mixed model assembly line balance, assembly line logistics scheduling and system management. Order management and material management are the basis of the system, which provides parameter input for the balance of assembly line and logistics scheduling. The balance of mixed model assembly line is the core function of the system. The balance of mixed model assembly line is carried out by calling the intelligent algorithm written in MATLAB, and the optimal assembly scheme of workstation is displayed to the front end of the system, which reflects the intelligent characteristics of production control system for intelligent manufacturing. The logistics scheduling of assembly line takes the balance result of mixed model assembly line as the premise, takes the balance result as the task sequence input of logistics scheduling, and optimizes the operation efficiency of logistics system (driving path and running time of AGV). The operation results show that the comprehensive energy consumption of 10,000 yuan industrial output value is 401.19 kg standard coal/10,000 yuan, a year-on-year decrease of 6.96%. This study is helpful to the fine management of manufacturing industry.
Article
Full-text available
In this paper, we apply data mining techniques and machine learning algorithms using R software, which is used to predict, here we applied a regression model to test some factor on the dataset for which we assumed that it effects student performance. Model was built on an existing dataset which contains many factors and the final grades. The factors tested are the attention to higher education, absences, study time, parent's education level, parent's jobs, and the number of failures in the past. The result shows that only study time and absences can affect the students' performance. Prediction of student academic performance helps instructors develop a good understanding of how well or how poorly the students in their classes will perform, so instructors can take proactive measures to improve student learning. This paper also focuses on how the prediction algorithm can be used to identify the most important attributes in a student's data.
Conference Paper
Full-text available
Article
Full-text available
The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.
Article
Full-text available
Data mining is widely considered a powerful instrument for searching and acquiring essential relationships among different variables/attributes in a database. Data mining applied in the educational framework is referred to as educational data mining (EDM). EDM enables to get insights into various higher education phenomena, such as students’ academic paths, learning behaviours and determinants of academic success or dropout. In this paper, we aim at evaluating the usefulness of a particular latent class model, the Bayesian Profile Regression, for the identification of students more likely to drop out. Considering students’ performance, motivation and resilience, this technique allows to draw the profiles of students with a higher risk of academic failure. The working example is based on real data collected through an online questionnaire filled in by undergraduate students of an Italian University.
Article
Background: Early prediction of students’ learning performance using data mining techniques is an important topic these days. The purpose of this literature review is to provide an overview of the current state of research in that area. Method: We conducted a literature review following a two-step procedure, looking for papers using the major search engines and selection based on certain criteria. Results: The document search process yielded 133 results, 82 of which were selected in order to answer some essential research questions in the area. The selected papers were grouped and described by the type of educational systems, the data mining techniques applied, the variables or features used, and how early accurate prediction was possible. Conclusions: Most of the papers analyzed were about online learning systems and traditional face-to-face learning in secondary and tertiary education; the most commonly-used predictive algorithms were J48, Random Forest, SVM, and Naive Bayes (classification), and logistic and linear regression (regression). The most important factors in early prediction were related to student assessment and data obtained from student interaction with Learning Management Systems. Finally, how early it was possible to make predictions depended on the type of educational system.
Article
In this article, we present a predictive analysis of the academic performance of students in public schools of the Federal District of Brazil during the school terms of 2015 and 2016. Initially, we performed a descriptive statistical analysis to gain insight from data. Subsequently, two datasets were obtained. The first dataset contains variables obtained prior to the start of the school year, and the second included academic variables collected two months after the semester began. Classification models based on the Gradient Boosting Machine (GBM) were created to predict academic outcomes of student performance at the end of the school year for each dataset. Results showed that, though the attributes ‘grades’ and ‘absences’ were the most relevant for predicting the end of the year academic outcomes of student performance, the analysis of demographic attributes reveals that ‘neighborhood’ ‘school’ and ‘age’ are also potential indicators of a student's academic success or failure.