Conference PaperPDF Available

Educational Computer Science Development Based on Data Mining

December 2023

December 2023

Conference: 5th international Conference on Science, Engineering, and role of Technology in new Businesses
At: Copenhagen, Denmark

Authors:

St. Mary's University

The growing use of technology in education generates massive amounts of data every day, attracting scholars worldwide. Educational data mining is growing quickly and may use cutting-edge machine learning and data mining methods and approaches. Advanced data mining techniques can help understand students, particularly computer science students, and their learning environments, as well as extract unique, captivating, and meaningful insights from educational data. EDM may be used to identify at-risk students, prioritize different student groups, raise graduation rates, evaluate institutional performance, optimize campus resources, and improve topic curricula. A comprehensive assessment of EDM studies showed its widespread use and effect. This research investigated the methods and data utilized in numerous studies and identified the most popular and promising EDM applications.

Content uploaded by Iman Bagheri

Content may be subject to copyright.

Educational Computer Science Development

Based on Data Mining

Mina Fazlikhani *, Iman Bagheri

1 Department of Computer Engineering, Iran University of Science & Technology, Tehran, Iran,

minafazlikhani1999@gmail.com

2 University Lecturer, Department of Electrical and Electronics Engineering, Montazeri Technical and

Vocational University

bagheri.ece@gmail.com

Abstract.

The growing use of technology in education generates massive amounts of data every day,

attracting scholars worldwide. Educational data mining is growing quickly and may use cutting-

edge machine learning and data mining methods and approaches. Advanced data mining

techniques can help understand students, particularly computer science students, and their

learning environments, as well as extract unique, captivating, and meaningful insights from

educational data. EDM may be used to identify at-risk students, prioritize different student

groups, raise graduation rates, evaluate institutional performance, optimize campus resources,

and improve topic curricula. A comprehensive assessment of EDM studies showed its

widespread use and effect. This research investigated the methods and data utilized in numerous

studies and identified the most popular and promising EDM applications.

Keywords: Data Mining, Educational Data Mining, Machine Learning, Data Analytics,

Computer Science Education

Introduction

The primary goal of any educational system is to equip students with the knowledge and skills

necessary to thrive in the workforce within a specified period of time. The effectiveness of these

systems in achieving this goal has a profound effect on both economic and social advancement

as if in some countries, education is freely available to all citizens from primary school through

university [1][2].

Consequently, a significant number of students enroll in higher education each year. However,

with such a large student population, it is becoming increasingly challenging to provide top-

quality education and support. As a result, a considerable proportion of students struggle to

complete their degrees within the designated timeframe.

Empowering educators with valuable insights, and data mining (DM) techniques can aid in

uncovering the factors contributing to student failures. The abundance of data housed in student

databases surpasses human capacity for comprehensive analysis, making automated techniques

critical.

Through knowledge discovery, DM allows for the extraction of significant and previously

unknown information from vast databases, and by identifying patterns that align with specific

user needs, data mining is an essential tool in the pursuit of meaningful insights [3][4].

The use of technology in academic settings is steadily increasing, allowing access to a vast

amount of previously unavailable information. This has been made possible through the method

of Educational Data Mining or EDM for short, which provides a more accurate understanding

of students' individual learning processes. More importantly, EDM offers valuable and relevant

information [5], utilizing data mining strategies to tackle educational challenges.

In a manner akin to other data mining approaches, EDM successfully extracts new, engaging,

and easily understandable information from educational data. However, EDM focuses

specifically on developing methodologies that can effectively utilize a variety of data in

educational settings, and as a result, these methods are now routinely employed [6][7].

In a nutshell, educational data mining (EDM) empowers users to extract valuable insights from

their students' data. These insights can be utilized in numerous ways, such as verifying and

assessing educational systems, enhancing teaching and learning processes, and creating a more

effective learning environment as if we have seen similar tactics successfully applied in

maximizing sales revenue across different datasets, including e-commerce platforms [8].

This has proven particularly fruitful in the business sector. As a result, the effectiveness of

utilizing data mining techniques in business data has contributed to their widespread acceptance

in various fields of knowledge. Notably, data mining has been effectively employed in

educational data to support research initiatives, like elevating the academic studies quality [9].

EDM, or educational data mining, is a valuable tool used in academic settings to manage and

solve a variety of tasks and challenges. In fact, scholars such as Baker [10] and [11] have

identified four key areas in which EDM can be applied: conducting empirical research on

learning and learners, examining the effectiveness of educational software, creating models of

students and domains, and improving domain models.

In total, there are five different methodologies and methods available for EDM, including

prediction, clustering, association mining, data distillation for human assessment, and model-

based discovery. As noted by Castro [12], EDM tasks can be categorized into four main groups:

evaluating student learning performance, adapting courses, and providing personalized learning

recommendations for students.

With its ability to analyze multiple perspectives, sort through vast amounts of data, and identify

meaningful relationships within a database, DM emerges as a robust AI tool. Its valuable

insights are essential for informed decision-making and to achieve optimal outcomes, DM

solutions may rely on individual algorithms or a combination of them [13].

Some algorithms are designed to interpret data, while others harness it to achieve specific goals.

For instance, clustering algorithms can detect patterns in data and group them into numerous

categories. The resulting data within each group is highly cohesive, creating a comprehensive

understanding of the information [14][15].

Through the application of regression tree methodology, it may be possible to extract

association rules and financial predictions, thereby conducting a thorough market analysis. In

today's vast digital landscape, the sheer volume of data stored in databases makes it nearly

impossible for individuals to manually sift through and uncover relevant insights. This is where

automated analytical techniques come into play [16].

Knowledge discovery involves tapping into vast databases to unearth previously unrecognized

and valuable information. The utilization of data mining within KD has led to the discovery of

patterns and patterns in a user's preferences.

An example provided in [17] illustrates this concept, showcasing how linguistic statements can

accurately describe a subset of data, referred to as "pattern description."

Successfully unraveling patterns requires confidence, influenced by various factors such as

sample size, data quality, and domain expert input. The accuracy of pattern discovery by DM

is consequently impacted.

Although DM may uncover numerous patterns in a database, only a small portion will be

noteworthy. It is up to the user to carefully consider the level of confidence in a pattern before

deeming it valid, as patterns that initially capture attention may not always be reliable [18][19].

Data Mining in Educational Computer Science

Educational data mining in the field of computer science is a rapidly evolving field that centers

on developing effective techniques for analyzing the specific data generated within educational

settings.

By utilizing these methods, researchers aim to gain a deeper understanding of both students and

their learning environments. Although the discipline is relatively new, it has been steadily

gaining momentum in recent years.

Explicit EDM, unlike traditional data mining approaches, acknowledges the complex

hierarchical structure of educational data and offers opportunities for its optimization. However,

its effectiveness is limited by the lack of access to independent educational data [6].

Educational data mining techniques can be discovered through a plethora of literature sources,

spanning several related fields such as data mining, machine learning, psychometrics, and other

computer modeling, statistics, and information visualization disciplines.

Within the realm of educational data mining (EDM), the two primary categories of work include

"web mining" and "statistics and visualization."

Both fields have garnered significant focus from both empirical research and theoretical

discourse. According to Baker's perspective, the tasks associated with EDM can be organized

under the following headings:

 Prediction, including

1) Regression

2) Classification

3) Density

 Relationship Mining

1) Association Rule Mining

2) Correlation mining

3) Sequential Pattern Mining

4) Casual Data Mining

 Clustering

 Distillation of Data

 Model Discovery

The majority of the items listed are believed to be considered regarding DM. However, there

are some who do not classify the act of extracting information for the purpose of human

decision-making as DM.

In terms of EDM research, the area that has been most heavily studied is relationship mining in

all its forms. When viewed through the lens of traditional DM, one of the most unique categories

in Baker's EDM taxonomy is the use of models for discovery. This method is commonly utilized

to simulate processes that can be mapped out and verified through a set procedure.

Afterward, the model is integrated into other models, such as those for predicting and mining

relationships. However, the concept of model discovery has not received much attention in the

educational data mining community recently. This study aims to identify which subcategories

of learning resources yield the greatest advantages for students [20], the impact of different

student behaviors on their learning outcomes [21], and the influence of tutorial design on

student learning. In the field of educational data mining, there has been a stronger emphasis on

connection mining strategies in recent years.

Data Mining, Prediction and Clustering

Effective prediction relies on utilizing past data to forecast unknown variables. This involves

inputting relevant information, known as predictor variables, which may either retain their

original status or gain new classifications. The selection of variables fed into the prediction

model plays a crucial role in determining its accuracy. Therefore, it is vital to properly restrict

and label the data for the output variable.

This tagged data provides a historical perspective on the factors that require projection. As the

model for prediction is being developed, it is important to consider how the quality of the

training data will impact its effectiveness. One way to address this is by segmenting the data

into distinct groups using a method called clustering.

The proliferation of the Internet of Things (IoT) continues to be a major concern, as many of

these devices were not designed with robust security measures in place, making them vulnerable

to cyber-attacks. Despite efforts to implement encryption and other intrusion protection

methods, their effectiveness is hindered by the limitations of these systems.

As such, safeguarding personal privacy is crucial, especially in environments like smart homes

where a large amount of sensitive information is collected. While there have been proposed

solutions for protecting privacy through device monitoring, these tactics are rarely utilized due

to device restrictions and other associated challenges.

As these groups interact, they will likely display some common characteristics. In data analysis,

clustering is a distinct method from classification in that it does not rely on pre-defined labels

for the data. With clustering, users can gain a comprehensive understanding of the patterns

within the dataset.

It is often referred to as an "unsupervised classification" [10], as it does not rely on provided

labels. Our approach involves identifying natural affinities between data points and grouping

them into distinct clusters, providing a method for organizing and categorizing the data.

By utilizing the clustering method, it is possible to pre-determine the number of resulting

groups. This approach is commonly employed in situations where the dominant category is not

clearly defined within the dataset.

Furthermore, it can aid in reducing the scope of a study area. For instance, a multitude of schools

could be grouped together under a single category, based on shared characteristics and

distinctions.

Conclusion

The ever-increasing integration of technology in education is generating a vast amount of data

on a daily basis, grabbing the interest of researchers across the globe. As a result, the realm of

educational data mining is rapidly expanding and presents the advantage of incorporating

cutting-edge algorithms and techniques derived from machine learning and other data mining

fields. With the implementation of advanced data mining techniques, there is the potential for

a deeper understanding of students, specifically the students of the computer science field, and

their learning environments, as well as the extraction of distinct, captivating, and meaningful

insights from educational data. There is no limit to the possibilities of using EDM - from

identifying at-risk students to prioritizing the needs of diverse student groups, boosting

graduation rates, evaluating institutional performance, optimizing campus resources, and

improving subject curriculum. In fact, a thorough review of EDM research revealed its

widespread implementation and impact. This study not only examined the methodology and

data used in various investigations but also highlighted the most popular EDM applications and

those with the greatest potential for growth.

References

[1] Anghel, A. G., Drăghicescu, L. M., Cristea, G. C., Gorghiu, G., Gorghiu, L. M., & Petrescu,

A. M. (2014). The social knowledge–a goal of the social sustainable development. Procedia-

Social and Behavioral Sciences, 149, 43-49.

[2] Novikov, S. G. (2019). THE GOAL OF EDUCATON IN KNOWLEDGE SOCIETY. In

The European Proceedings of Social & Behavioural Sciences EpSBS (pp. 590-597).

[3] Sarra, A., Fontanella, L., & Di Zio, S. (2019). Identifying students at risk of academic failure

within the educational data mining framework. Social Indicators Research, 146, 41-60.

[4] López Zambrano, J., Lara Torralbo, J. A., & Romero Morales, C. (2021). Early prediction

of student learning performance through data mining: A systematic review. Psicothema.

[5] J. Mostow and J. Beck, “Some useful tactics to modify, map and mine data from intelligent

tutors,” Natural Language Engineering, vol. 12, no. 02, pp. 195–208, 2006.

[6] S. K. Mohamad and Z. Tasir, “Educational data mining: A review,” Procedia-Social and

Behavioral Sciences, vol. 97, pp. 320–324, 2013.

[7] Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Van Erven, G.

(2019). Educational data mining: Predictive analysis of academic performance of public school

students in the capital of Brazil. Journal of business research, 94, 335-343.

[8] C. Romero, S. Ventura, and P. De Bra, “Knowledge discovery with genetic programming

for providing feedback to courseware authors,” User Modeling and User-Adapted Interaction,

vol. 14, no. 5, pp. 425– 464, 2004

[9] N. S. Raghavan, “Data mining in e-commerce: A survey,” Sadhana, vol. 30, no. 2-3, pp.

275–289, 2005.

[10] R. Baker et al., “Data mining for education,” International encyclopedia of education, vol.

7, pp. 112–118, 2010.

[11] R. S. Baker and K. Yacef, “The state of educational data mining in 2009: A review and

future visions,” JEDM-Journal of Educational Data Mining, vol. 1, no. 1, pp. 3–17, 2009.

[12] F. Castro, A. Vellido, A. Nebot, and F. Mugica, “Applying data min- ` ing techniques to

e-learning problems,” in Evolution of teaching and learning paradigms in intelligent

environment, pp. 183–221, Springer, 2007.

[13] Guo, Y., Zhang, W., Qin, Q., Chen, K., & Wei, Y. (2023). Intelligent manufacturing

management system based on data mining in artificial intelligence energy-saving resources.

Soft Computing, 27(7), 4061-4076.

[14] Nguyen, G., Dlugolinsky, S., Bobák, M., Tran, V., López García, Á., Heredia, I., ... &

Hluchý, L. (2019). Machine learning and deep learning frameworks and libraries for large-scale

data mining: a survey. Artificial Intelligence Review, 52, 77-124.

[15] Li, L. (2023). Application of Machine Learning and Data Mining in Medicine:

Opportunities and Considerations.

[16] Khan, S. (2021). Study Factors for Student Performance Applying Data Mining Regression

Model Approach. International Journal of Computer Science & Network Security, 21(2), 188-

192.

[17] S.-T. Wu, “Knowledge discovery using pattern taxonomy model in text mining,” 2007.

[18] Aher, S. B., & Lobo, L. M. R. J. (2011, March). Data mining in educational system using

weka. In International conference on emerging technology trends (ICETT) (Vol. 3, pp. 20-25).

[19] Agarwal, S. (2013, December). Data mining: Data mining concepts and techniques. In

2013 international conference on machine intelligence and research advancement (pp. 203-

207). IEEE.

[20] J. E. Beck and J. Mostow, “How who should practice: Using learning decomposition to

evaluate the efficacy of different types of practice for different types of students,” in Intelligent

tutoring systems, pp. 353–362, Springer, 2008.

[21] M. Cocea, A. Hershkovitz, and R. S. Baker, “The impact of off-task and gaming behaviors

on learning: immediate or aggregate?,” 2009.

ResearchGate has not been able to resolve any citations for this publication.

Application of Machine Learning and Data Mining in Medicine: Opportunities and Considerations

Chapter

Full-text available

Oct 2023

Luwei Li

With the continuous development of information technology, machine learning and data mining have gradually found widespread applications across various industries. These technologies delve deeper into uncovering intrinsic patterns through the application of computer science. This trend is especially evident in today’s era of advanced artificial intelligence, which marks the anticipated third industrial revolution. By harnessing cutting-edge techniques such as multimodal large-scale models, artificial intelligence is profoundly impacting traditional scientific research methods. The use of machine learning and data mining techniques in medical research has a long-standing history. In addition to traditional methods such as logistic regression, decision trees, and Bayesian analysis, newer technologies such as neural networks, random forests, support vector machines, Histogram-based Gradient Boosting, XGBoost, LightGBM, and CatBoost have gradually gained widespread adoption. Each of these techniques has its own advantages and disadvantages, requiring careful selection based on the specific research objectives in clinical practice. Today, with the emergence of large language models such as ChatGPT 3.5, machine learning and data mining are gaining new meanings and application prospects. ChatGPT offers benefits such as optimized code algorithms and ease of use, saving time and enhancing efficiency for medical researchers. It is worth promoting the use of ChatGPT in clinical research.

Intelligent manufacturing management system based on data mining in artificial intelligence energy-saving resources

Article

Full-text available

Jan 2022
SOFT COMPUT

At present, the old production management mode has become a stumbling block to the development of enterprises, and the high-end manufacturing technology is still not mature enough. This research mainly discusses the intelligent manufacturing management system based on data mining in artificial intelligence energy-saving resources. The enterprise business management system cannot accurately and timely grasp the actual situation of the production site, and the accuracy and feasibility of the upper-level planning cannot be guaranteed. At the same time, on-site personnel and equipment cannot get practical production plans and production instructions in time, resulting in product backlogs and excessive inventory. On the other hand, equipment is idle and resources are wasted, and the workshop scheduling system loses the corresponding scheduling role. The development of this system is mainly composed of front-end technology, back-end technology and front-end and back-end interaction technology. The interface design of the front end is mainly completed by the windows form application in c#. The interaction between the front and back ends is mainly realized by programming in each control of the form application. Back-end technology is the core content of the system, mainly including two key technologies: mixed programming of C #. Net and MATLAB and C # connecting SQL Server database. The system mainly includes five sub-functional modules: order management, material management, mixed model assembly line balance, assembly line logistics scheduling and system management. Order management and material management are the basis of the system, which provides parameter input for the balance of assembly line and logistics scheduling. The balance of mixed model assembly line is the core function of the system. The balance of mixed model assembly line is carried out by calling the intelligent algorithm written in MATLAB, and the optimal assembly scheme of workstation is displayed to the front end of the system, which reflects the intelligent characteristics of production control system for intelligent manufacturing. The logistics scheduling of assembly line takes the balance result of mixed model assembly line as the premise, takes the balance result as the task sequence input of logistics scheduling, and optimizes the operation efficiency of logistics system (driving path and running time of AGV). The operation results show that the comprehensive energy consumption of 10,000 yuan industrial output value is 401.19 kg standard coal/10,000 yuan, a year-on-year decrease of 6.96%. This study is helpful to the fine management of manufacturing industry.

Study Factors for Student Performance Applying Data Mining Regression Model Approach

Article

Full-text available

Feb 2021

Shakir Khan

In this paper, we apply data mining techniques and machine learning algorithms using R software, which is used to predict, here we applied a regression model to test some factor on the dataset for which we assumed that it effects student performance. Model was built on an existing dataset which contains many factors and the final grades. The factors tested are the attention to higher education, absences, study time, parent's education level, parent's jobs, and the number of failures in the past. The result shows that only study time and absences can affect the students' performance. Prediction of student academic performance helps instructors develop a good understanding of how well or how poorly the students in their classes will perform, so instructors can take proactive measures to improve student learning. This paper also focuses on how the prediction algorithm can be used to identify the most important attributes in a student's data.

The Goal Of Educaton In Knowledge Society

Conference Paper

Full-text available

Sep 2019

Sergey G. Novikov

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Article

Full-text available

Jun 2019
ARTIF INTELL REV

The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.

Identifying Students at Risk of Academic Failure Within the Educational Data Mining Framework

Article

Full-text available

Nov 2019
SOC INDIC RES

Data mining is widely considered a powerful instrument for searching and acquiring essential relationships among different variables/attributes in a database. Data mining applied in the educational framework is referred to as educational data mining (EDM). EDM enables to get insights into various higher education phenomena, such as students’ academic paths, learning behaviours and determinants of academic success or dropout. In this paper, we aim at evaluating the usefulness of a particular latent class model, the Bayesian Profile Regression, for the identification of students more likely to drop out. Considering students’ performance, motivation and resilience, this technique allows to draw the profiles of students with a higher risk of academic failure. The working example is based on real data collected through an online questionnaire filled in by undergraduate students of an Italian University.

Early Prediction of Student Learning Performance Through Data Mining: A Systematic Review

Article

Jul 2021

Javier López-Zambrano

Background: Early prediction of students’ learning performance using data mining techniques is an important topic these days. The purpose of this literature review is to provide an overview of the current state of research in that area. Method: We conducted a literature review following a two-step procedure, looking for papers using the major search engines and selection based on certain criteria. Results: The document search process yielded 133 results, 82 of which were selected in order to answer some essential research questions in the area. The selected papers were grouped and described by the type of educational systems, the data mining techniques applied, the variables or features used, and how early accurate prediction was possible. Conclusions: Most of the papers analyzed were about online learning systems and traditional face-to-face learning in secondary and tertiary education; the most commonly-used predictive algorithms were J48, Random Forest, SVM, and Naive Bayes (classification), and logistic and linear regression (regression). The most important factors in early prediction were related to student assessment and data obtained from student interaction with Learning Management Systems. Finally, how early it was possible to make predictions depended on the type of educational system.

Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil

Article

Feb 2018
J BUS RES

In this article, we present a predictive analysis of the academic performance of students in public schools of the Federal District of Brazil during the school terms of 2015 and 2016. Initially, we performed a descriptive statistical analysis to gain insight from data. Subsequently, two datasets were obtained. The first dataset contains variables obtained prior to the start of the school year, and the second included academic variables collected two months after the semester began. Classification models based on the Gradient Boosting Machine (GBM) were created to predict academic outcomes of student performance at the end of the school year for each dataset. Results showed that, though the attributes ‘grades’ and ‘absences’ were the most relevant for predicting the end of the year academic outcomes of student performance, the analysis of demographic attributes reveals that ‘neighborhood’ ‘school’ and ‘age’ are also potential indicators of a student's academic success or failure.

Data Mining: Data Mining Concepts and Techniques

Conference Paper