ArticlePDF Available

Evaluating Students Performance by Artificial Neural Network using WEKA

Authors:
International Journal of Computer Applications (0975 8887)
Volume 119 No.23, June 2015
36
Evaluating Students Performance by Artificial Neural
Network using WEKA
Sumam Sebastian
M-Tech Computer and Information Science
College of Engineering Poonjar
Jiby J Puthiyidam
Assistant Professor
Dept. of Computer Science and Engineering
College of Engineering Poonjar
ABSTRACT
Data mining is the process of extracting hidden patterns and
useful information from large set of data is now becoming
part of current inventions. Data mining now can be applied to
different fields like marketing, education; health etc. Data
mining in field of education is named as educational data
mining. Educational data mining can help institutions to
predict the performance of their students so as to improve
their academic results. In this paper artificial neural network is
used to predict the performance of student. Multilayer
Perceptron Neural Network is used for the implementation of
prediction strategy. Experiment is conducted using weka and
real time dataset available.
Keywords
Data Mining; Educational Data Mining; Artificial Neural
Network; Multilayer Perceptron Neural Network(MLP);
Association Rule Mining;
1. INTRODUCTION
Data mining [1] is the process of analyzing data from different
perspectives and summarizing it into important information so
as to find hidden patterns from a large data set. Data mining
[2] points to the strategy of discovering of implicit, previously
unknown and practically useful information from the data in
the databases. It uses techniques of machine learning,
statistical and visualization to discover and present knowledge
in a form which is easily understandable to us. The abundance
and fast evolution of the data mining discipline comes from its
large variety of research areas of interest. Data mining
applications adopts different kind of parameters to examine
the data. Educational Data Mining[3] is a newly emerged
technique that helps to discover methods that will explore
unique types of data from education database and helps to
predict students’ academic performance.
Figure 1: Educational Data mining System
It is very necessary for an institution to maintain a good
academic result, for that student’s academic performance has
to be maintained in better manner. So a continuous student’s
performance evaluation strategy has to be invented. Different
kinds of data mining techniques can be used for this like
association rule mining.K-means clustering, artifiacial neural
network etc.Among the different methods most advanced and
accurate method is the evaluation using artificial neural
network.
2. OBJECTIVES
The main objectives are, first to determine all the personal and
academic factors that affects the performance of student,
second to transform these factors to a suitable form for system
coding and third is to model a neural network that can predict
the performance based on the data of student. The main
concept used in this paper is that of artificial neural network.
3. THE ARTIFICIAL NEURAL
NETWORKS
An artificial neural network (ANN)[4], often called as a
"neural network" (NN), is a computational model based on the
biological neural networks, in other words, is a representation
and emulation of human neural system. It consists of an
interconnected group of artificial neurons and processes
information using a connectionist approach to computation. In
practical terms neural networks are non-linear statistical data
modeling tools [5]. They can be used to model complex
relationships between inputs and outputs or to find patterns in
data. Using neural networks as a tool, data warehousing firms
are harvesting information from datasets in the process known
as data mining [6].
Multilayer Perceptron
The most popular form of neural network architecture is the
multilayer perceptron (MLP). A multilayer perceptron:
Has any number of inputs.
Has one or more hidden layers with any number of
units.
Uses generally sigmoid activation functions in the
hidden layers.
Have connections between the input layer and the
first hidden layer, between the hidden layers, and
between the last hidden layer and the output layer.
Figure 2: Feed forward neural network
International Journal of Computer Applications (0975 8887)
Volume 119 No.23, June 2015
37
MLP is especially suitable for approximating a classification
function which sets the example determined by the vector
attribute values into one or more classes. MLP trained with
back propagation algorithm is used for data mining.
4. DATA COLLECTION AND
PREPROCESSING
In this research paper the data was collected from the two
classes 8 and 9 of Sacred Hearts Girls' High School
Bharananganam Kerala.A dataset of 300 students was used
for the evaluation. Neural network is used for predicting the
student performance. The attributes selected are mainly of two
types, first academic attributes related to the academic details
of student and personal attributes related to the personal
details of student that affects the study and performance of
study of student. The academic attributes selected are
1.Interest of study of the student categorized as low, average
and good, 2.Unit test mark the average mark of unit tests
conducted divided as low, average and good, 3.Assignment
mark which is the average of all assignments divided as
average and good, 4.Attendance score which is the average of
attendance of the student taken divided into average and
good,5.Extracurricular activities performance which is the
performance of student in other activities along with studies
grouped in to low average and good,6.Residence which is the
staying of student categorized into either hostler or day
scholar. The personal attributes selected are 1.parent’s
education and family status where parents education divided
as poor average and good and 2.family status is divided as low
and average and good. In a given dataset Data Pre-Processing
technique is used to identify noise data, missing attribute
values, irrelevant and redundant data.
Table1. Attributes and Its Possible Values
5. METHODOLOGY
In this research we have used Weka for the entire
implementation. Simple training and testing using multilayer
perceptron neural network was done first. For this the entire
data set was divided into two separate tests. Half for training
set and another half for testing set. Training was done by
adjusting the different learning and momentum rates. Among
the training results the best was taken for analysis. Second
MLP training after association rule mining is done.
Association rule mining extracts the important rules so that it
helps to identify the important attributes. Unnecessary
attributes are removed from
the data set and MLP neural network training is done. It gives
a better result than simple train & test method. For the most
fare evaluation of result K-fold cross validation method of
MLP training was used. In this the entire data set is not
divided into two different sets as of prior, instead as the input
to the system whole data set is given.10 filed cross validation
is used here. In this the entire data set is divided into 10
subsets. Among this one set is used as test set and remaining
nine sets are used training set. The entire evaluation steps are
depicted in the following figure
Figure 3: Work Methodology
5.1 WEKA Environment
WEKA [8] stands for Waikato Environment for Knowledge
Learning. It was developed by the University of Waikato,
New Zealand. WEKA supports many data mining tasks such
as data re-processing, classification, clustering, regression and
feature selection to name a few.
The supported data formats are ARFF, CSV, C4.5 and binary.
Alternatively you could also import from URL or an SQL
database. After loading the data, preprocessing filters could be
used for adding/removing, attributes, discretization, Sampling,
randomizing etc.Weka is a collection of machine learning
algorithms for data mining & machine learning tasks. Weka is
open source software issued under the GNU General Public
License.
5.2 MLP Training with WEKA
Two sets are used for MLP neural network training in WEKA.
They are Training set and Test set [9].
ATTRIBUTES
DESCRIPTION
VALUES
INTEREST OF
STUDY
Interest of student
in studying
Low
Average
Good
UNIT TEST MARK
Average mark of
student in unit tests
Low
Average
Good
ASSIGNMENT
Average of marks
of assignments
Average
Good
ATTENDANCE
Average
attendance of
student in the class
Average
Good
EXTRACARRICULAR
ACTIVITIES
Performance of
student in
extracurricular
activities
Low
Average
Good
RESIDENCE
Residence of
student in studying
ie in hostel or not
Non
Hoster
Hostler
PARENT
EDUCATION
Education of the
parents of student
Poor
Average
Good
FAMILY STATUS
The total family
status of the
student
Low
Average
and good
International Journal of Computer Applications (0975 8887)
Volume 119 No.23, June 2015
38
Training set: A set of examples used for learning that is
to fit the parameters [i.e., weights] of the classifier.
Test set: A set of examples used only to assess the
performance [generalization] of a fully-specified
classifier.
Back Propagation algorithm is used for the network training.
Back Propagation algorithm
Initialize all weights to small random numbers
Until satisfied DO
For each training example Do
1. Input the training example to the network and
compute the training outputs
2. For each output unit k
3. For each hidden unit h
4. Update each network weight
Where
Here we are adjusting the learning rate and momentum to get
a better training result.
Main parameters for learning: hiddenLayers,
learningRate,momentum,trainingTime(epoh, seed .The
parameter setting function is given as
weka.classifiers.functions.MultilayerPercep
tron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H
a
hiddenLayers -- This defines the hidden layers of the neural
network. This is a list of positive whole numbers. 1 for each
hidden layer. Comma seperated. To have no hidden layers put
a single 0 here. This will only be used if autobuild is set.
There are also wildcard values 'a' = (attribs + classes) / 2, 'i' =
attribs, 'o' = classes , 't' = attribs + classes.
learningRate -- The amount the weights are updated.
momentum -- Momentum applied to the weights during
updating.
For fare evaluation, the ‘cross-validation’ scheme is used
K-fold Cross Validation
The data set is randomly divided into k subsets.
One of the k subsets is used as the test set and the
other k-1 subsets are put together to form a training
set.
5.3 Association Rule Mining
In EDM, association rule learning is a conventional and well
researched method for determining interesting relations
between attributes in large databases [10]. Association rule
Mining is mainly intended to recognize strong rules from
databases using different measures of support and confidence.
Support (s) and confidence (c) are two measures of rule
interestingness. They truely reflect the usefulness and
certainty of the discovered rule.
Apriori Algorithm
Apriori is a seminal algorithm proposed by R. Agarwal and R.
Srikant in 1994 for mining frequent item sets for Boolean
association rules.
The following lines state the steps in generating frequent item
set in Apriori algorithm. [11]
Let Ck be a candidate item set of size k and Lk as a frequent
item set of size k. The main steps of iteration are:
Find frequent set Lk-1
Join step: Ck is generated by joining Lk -1 with itself
(Cartesian product Lk-1 x Lk-1)
Prune step (apriori property): Any (k 1) size item set
that is not frequent cannot be a subset of a frequent k size
item set, hence should be removed
Frequent set Lk has been achieved [11].
The parameter setting function in weka for association rule
mining is
weka.associations.Apriori -N 10 -T 0 -C 0.9
-D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
car -- If enabled class association rules are mined instead of
(general) association rules.
classIndex -- Index of the class attribute. If set to -1, the last
attribute is taken as class attribute.
delta -- Iteratively decrease support by this factor. Reduces
support until min support is reached or required number of
rules has been generated.
6. PERFORMANCE EVALUATION
To evaluate the performance of above methods of neural
network training different parameters are available like
Accuracy, Precision, Recall, F-Measure, Kappa score etc.
Here accuracy, precision and recall are considered.
Accuracy (percent correct)
Accuracy is how close a measured value is to the actual (true)
value. Accuracy retrieves the percentage of correctly
classified instances.
Accuracy= TP+TN
TP+FP+TN+FN
Precision
Precision is a value of the accuracy provided by a unique class
that was predicted.
Precision= TP
TP+FP
Recall
Recall is a measure of the ability of a prediction model to
select instances of a certain class from a data set. It is c also
called sensitivity, and points to the true positive rate.
Recall= TP
TP+FN
International Journal of Computer Applications (0975 8887)
Volume 119 No.23, June 2015
39
Where
TP =True Positive
TN=True Negative
FP=False Positive
FN=False Negative
7. RESULTS AND DISCUSSION
The table below shows the values obtained by various
performance evaluation parameters.
Table 2: Results of different neural network training
No.
Method of training
Momentum rate
Precision
Recall
1
Simple Train &
Test
0.6
0.5
0.35
2
Simple Train &
Test after
Association Rule
Mining
0.8
0.76
0.62
3
K-Fold Cross
validation of Train
& Test
0.3
0.95
0.91
In this table the considered parameters are method of training,
learning rate, momentum rate, the accuracy precision and
recall obtained on that specified learning and momentum rate.
Learning and momentum rate that gives better training result
is considered here. The values given to learning rate and
momentum can range from 0.2 to 1.0.The three different
training has three different results. So a better method
selection is easier. Accuracy indicates how accurate the
training method is. Here k-fold cross validation is better than
MLP neural network training method.
8. CONCLUSION
This paper presented one use of data mining in the educational
data mining field for predicting student’s performance.
Artificial neural network was used here for prediction.k-fold
cross validation gives the most accurate result than basic
training method and training after association rule mining.
Association rule mining retrieved the most important
attributes that affects the performance of student. Those
attributes are mark of unit test, mark of assignment and
attendance in the class.MLP training with this attributes gave
a far better result than simple training.10 fold cross validation
is used here for the training. The data set considered here is
the real time dataset of marks of 300 students. As a future
work fuzzy logic can be implemented to increase the
performance evaluation result of the student.
9. REFERENCES
[1] Han J. and Kamber M.: “Data Mining: Concepts and
Techniques,” Morgan Kaufmann Publishers, San
Francisco, 2000.
[2] Anwar, M. A., and Naseer Ahmed. Knowledge Mining in
Supervised and Unsupervised ssessment Data of
Students’ Performance." 2011 2nd International
Conference on Networking and Information Technology
IPCSIT vol. Vol. 17. 2011.
[3] http://www.educationaldatamining.org/JEDM/index.php/
JEDM
[4] V.O. Oladokun, Ph.D., A.T. Adebanjo, B.Sc., and O.E.
Charles-Owaba, Ph.D. “Predicting Students’ Academic
Performance using Artificial Neural Network: A Case
Study of an Engineering Course.
[5] Refaat, M. Data Preparation for Data Mining Using SAS,
Elsevier, 2007.
[6] S. M. Kamruzzaman and A. M. Jehad Sarkar A New
Data Mining Scheme Using Artificial Neural
Networks”,2011.
[7] Amrender Kumar: “Artificial Neural Network for Data
Mining”.
[8] WEKA MANUAL
[9] Jung-Woo Ha.” Classification using Weka (Brain,
Computation, and Neural Learning).
[10] Predicting Student Performance by Using Data Mining
Methods for Classification; Dorina Kabakchieva Sofia
University “St. Kl. Ohridski”, Sofia 1000.
[11] Paresh Tanna, Dr. Yogesh Ghodasara:” Using Apriori
with WEKA for Frequent Pattern Mining.
[12] Baha Sen, Emine Ucar. Evaluating the achievements of
computer engineering department of distance education
students with data mining methods. Procedia Technology
1 262 267, 2012.
IJCATM : www.ijcaonline.org
... Some of the most important of these researches will be shown. In [13] authors presented a new model for evaluating and forecasting the performance of students, they used Artificial Neural Networks (ANNs) depending on types of attributes; academic attributes like; unit test marks, attendance, interest in the study, and assignment mark, and personal attributes like; parents' education and family status. The NNs model produces an accuracy of 91%. ...
... The models on the dataset in a GPA grade and in a percentage, grade is applied and converted from GPA to percentage using the following equation: Percentage = GPA * 12.5 + 50 (13) Also, the same equation was used to convert the final result from percentage to GPA for calculating accuracy. Based on the result of GPA and percentage, a percentage will be chosen because it is more accurate, and this is a sample selected randomly: ...
Article
Artificial intelligence techniques can be applied in forecasting the academic performance of university students, with aim of detecting the factors that influence their learning process which allows instructors and university administration to take more effective actions to increase the university student's performance. Identifying the students' performance will improve the quality of education which will be through analyzing and forecasting the students' performance at the course level and degree level. This research focuses on first-year students' performance in two university-requirement courses, depending on features such as attendance, assessment marks, exams, assignments, and projects. Forecasting the students' performance in the whole degree will depend on these features; high school average, Grade Point Average (GPA) for each semester, drop courses, selected core courses in the degree, period of study, and final GPA. A hybrid Adaptive Neuro-Fuzzy Inference System (ANFIS) model was used toperform the forecasting process. In this way, based on the datasets collected from the selected courses, or the whole degree, the future results can be forecasted and suggestions can be made to carry out corrective steps to improve the final results. The experiments result of the applied models performed that ANFIS-Grid outperforms the ANFIS-Cluster, wherein each model produces the lowest error of 0.7%, where it just fails in one sample from thirteen samples, while the ANFISCluster after modification produces an error equal to 0.15%. Keywords:University Student Performance, Forecasting, Fuzzy logic, Neural Network, Adaptive Neuro-Fuzzy Inference System.
... A dataset of 300 students at Sacred Hearts Girls High School in Kerala and the results were evaluated using neural network (multi-layer perceptron (MLP) training using K fold cross validation) with the help of WEKA data mining tool and association rule mining (Sebastian & Puthiyidam, 2015). Attributes were related to academic details (interest of study, unit-test marks, attendance, and assignment) and personal attributes (residence, parent education, family status). ...
Article
Full-text available
Students’ academic performance is a critical issue as it decides his/her career. It is pivotal for the educational institutes to track the performance record because it can help to enhance the standard of their quality education. Thus, the role of the academic result prediction system comes into existence which uses semester grade point average (SGPA) as a metric. The proposed work aims to create a model that can forecast the SGPA of students based on certain traits. It predicts the result in the form of SGPA of computer science students considering their past academic performance, study, and personal habits during their academic semester using different machine learning models, and to compare them based on different accuracy parameters. Some models that are widely used and are found effective in this field are regression algorithms, classification algorithms, and deep learning techniques. The results conclude that deep learning techniques are the most effective in the proposed work because of their high accuracy and performance, depending upon the attributes used in the prediction.
... In the backward step, or backpropagation, this error propagates backward via the layers to update the weights. The cycle of forward-step and backpropagation continues until the error is sufficiently reduced [8]. ...
Article
Full-text available
This article presents a statistical approach using entropy and classification-based analysis to detect anomalies in industrial control systems traffic. Several statistical techniques have been proposed to create baselines and measure deviation to detect intrusion in enterprise networks with a centralized intrusion detection approach in mind. Looking at traffic volume alone to find anomalous deviation may not be enough—it may result in increased false positives. The near real-time communication requirements, coupled with the lack of centralized infrastructure in operations technology and limited resources of the sensor motes, require an efficient anomaly detection system characterized by these limitations. This paper presents extended results from our previous work by presenting a detailed cluster-based entropy analysis on selected network traffic features. It further extends the analysis using a classification-based approach. Our detailed entropy analysis corroborates with our earlier findings that, although some degree of anomaly may be detected using univariate and bivariate entropy analysis for Denial of Service (DOS) and Man-in-the-Middle (MITM) attacks, not much information may be obtained for the initial reconnaissance, thus preventing early stages of attack detection in the Cyber Kill Chain. Our classification-based analysis shows that, overall, the classification results of the DOS attacks were much higher than the MITM attacks using two Modbus features in addition to the three TCP/IP features. In terms of classifiers, J48 and random forest had the best classification results and can be considered comparable. For the DOS attack, no resampling with the 60–40 (training/testing split) had the best results (average accuracy of 97.87%), but for the MITM attack, the 80–20 non-attack vs. attack data with the 75–25 split (average accuracy of 82.81%) had the best results.
... As RNAs são sistemas de processamento paralelo, inspirados no funcionamento do cérebro humano, capazes de lidar com problemas complexos. São capazes de apreender, por meio da modificação dos pesos sinápticos, e generalizar para dados desconhecidos (SEBASTIAN, 2016;ABRAHAM et al., 2019, BASTIANI et al., 2018HAYKIN, 2001;PINHEIRO et al., 2020;SANTOS, 2021 Apesar da importância da produção de energia eólica, para o Brasil, muito poucos são os trabalhos que utilizam redes híbridas, como a BIGRU-CNN, na previsão da produção de energia eólica. ...
Article
Full-text available
O presente trabalho tem como objetivo comparar modelos, de redes neurais artificiais, para previsão de geração eólica. A base de dados fornecida pelo Operador Nacional do Sistema Elétrico (ONS) apresenta uma série histórica de geração de energia, do parque eólico de Icaraizinho, no Ceará, no período entre 2010 e 2020. Modelos de previsão, baseados em redes neurais LSTM (Bidirectional Long Short Term Memory), GRU (Gated Recurrent Units), CNN (Convolutional neural network) e BIGRU-CNN (Bidirectional Gated Recurrent Units - Convolutional neural network), foram implementados na linguagem python, utilizando o framework Keras. Resultados obtidos, dos quatro modelos, foram comparados por meio das métricas RSME (Root Mean Squared Error), MAPE (Mean Absolute Percent Error) e MAE (Mean Absolute Error). Verificou-se, para um horizonte de curto prazo, que o modelo híbrido BIGRU-CNN apresentou melhor desempenho.
... As redes neurais são capazes de memorizar, analisar e processar um grande número de dados obtidos de um experimento. É uma técnica de modelagem que pode resolver muitos problemas não lineares e complexos (SEBASTIAN, 2016;ABRAHAM, 2019, BASTIANI et al., 2018. ...
Article
Full-text available
Atualmente, o setor agrícola enfrenta o desafio de crescer, de modo competitivo, para atender a demanda interna e manter o espaço conquistado no mercado externo. Produtores, no mercado competitivo da soja, precisam de ferramentas de previsão de preço. As previsões de preço incorporam informações cruciais no momento da comercialização da safra. Neste contexto, este trabalho tem como objetivo aplicar modelos, baseados em redes neurais artificiais, para previsão do preço da saca de soja no estado do Paraná. A base de dados, disponibilizada pelo Centro de Estudos Avançados em Economia Aplicada (CEPEA), apresenta uma série de preços mensal compreendida ente Janeiro/2000 e Agosto/2020. Modelos de previsão, baseados em Redes Neurais LSTM e BLSTM, foram implementados na linguagem Python. Os resultados obtidos, para um horizonte de curto prazo, mostram que os dois modelos de previsão fornecem estimativas confiáveis para o preço da saca de soja.
... Gamification gives children the opportunity to collaborate and develop valuable academic and life skills; however, teaching through gamification is not without difficulties, e.g., the expected results are not guaranteed, and the difficulty of supporting the needs of infants when their ideas are very creative and difficult to understand. Another difficulty is when documenting the teaching through photos, videos, etc., and converting this material into visible products [35]. In addition to ensuring that the entire team has a solid understanding of play-based teaching, a positive mindset, and a willingness to change their practices. ...
Article
Full-text available
This research was aimed at designing an image recognition system that can help increase children’s interest in learning natural numbers between 0 and 9. The research method used was qualitative descriptive, observing early childhood learning in a face-to-face education model, especially in the learning of numbers, with additional data from literature studies. For the development of the system, the cascade method was used, consisting of three stages: identification of the population, design of the artificial intelligence architecture, and implementation of the recognition system. The method of the system sought to replicate a mechanic that simulates a game, whereby the child trains the artificial intelligence algorithm such that it recognizes the numbers that the child draws on a blackboard. The system is expected to help increase the ability of children in their interest to learn numbers and identify the meaning of quantities to help improve teaching success with a fun and engaging teaching method for children. The implementation of learning in this system is expected to make it easier for children to learn to write, read, and conceive the quantities of numbers, in addition to exploring their potential, creativity, and interest in learning, with the use of technologies.
Conference Paper
Online social networks have become a popular platform for communication and information exchange, but also a target for various cybercriminal activities. Digital forensics is essential for investigating and combating these crimes, but it faces challenges in handling the vast amount of data generated on social network platforms. This paper proposes a machine learning-based digital forensic framework for online social networks, consisting of five stages: data identification, preparation, acquisition, examination, and reporting. The framework employs machine learning algorithms such as Artificial Neural Networks, Decision Trees, and Support Vector Machines for data analysis and evidence extraction in the examination stage. The framework aims to automate data processing and analysis, enabling investigators to focus on understanding crime dynamics and reporting. The paper presents the theoretical analysis of the framework, addressing challenges in digital forensics for online social networks. It also discusses theoretical limitations and future research directions to enhance the framework’s capabilities. A proof-of-concept implementation using a real-world dataset is planned to validate its practicality in solving actual digital forensic investigations. The paper contributes to the advancement of digital forensics for a safer online environment.
Conference Paper
Full-text available
Framed buildings built on hill slopes exhibit structural behaviour that differs from those built on flat ground. Because these structures are unsymmetrical in nature, they draw a high quantity of shear pressures and torsional moments, and their distribution is uneven owing to different column lengths. Because analysing complex structures takes a significant amount of time and effort. The aim of this paper is to create a model to predict seismic analysis parameters using machine learning methods and evaluate the outcomes of various techniques utilised.
Article
Full-text available
Knowledge exploration from the large set of data,generated as a result of the various data processing activities due to data mining only. Frequent Pattern Mining is a very important undertaking in data mining. Apriori approach applied to generate frequent item set generally espouse candidate generation and pruning techniques for the satisfaction of the desired objective. This paper shows how the different approaches achieve the objective of frequent mining along with the complexities required to perform the job. This paper demonstrates the use of WEKA tool for association rule mining using Apriori algorithm.
Article
Full-text available
Recently, the internet technology has become an indispensable part of life, a very useful application that cannot be earlier have made it possible. One of these is distance learning technologies. Due to limitations of traditional learning-teaching methods in classroom activities and practitioners who intend to conduct training activities in the absence of the possibility of communication and interaction among learners with special education units are prepared and provided a wide range of media center through a certain method of teaching. According to a further recognition of Distance Education, although far away from each other with the student who teaches the same time (synchronous) or different time (asynchronous) communications with a tool as training system established. The aim of this study is to compare the achievements of Computer Engineering Department students in Karabük University according to criteria such as age, gender, type of high school graduation and whether the students studying in distance education or regular education using data mining techniques. Also discussing the differences of the techniques according to the results and to make suggestions for which technique would be more effective.
Article
Full-text available
The observed poor quality of graduates of some Nigerian Universities in recent times has been partly traced to inadequacies of the National University Admission Examination System. In this study an Artificial Neural Network (ANN) model, for predicting the likely performance of a candidate being considered for admission into the university was developed and tested. Various factors that may likely influence the performance of a student were identified. Such factors as ordinary level subjects' scores and subjects' combination, matriculation examination scores, age on admission, parental background, types and location of secondary school attended and gender, among others, were then used as input variables for the ANN model. A model based on the Multilayer Perceptron Topology was developed and trained using data spanning five generations of graduates from an Engineering Department of University of Ibadan, Nigeria's first University. Test data evaluation shows that the ANN model is able to correctly predict the performance of more than 70% of prospective students.
Article
Full-text available
Classification is one of the data mining problems receiving enormous attention in the database community. Although artificial neural networks (ANNs) have been successfully applied in a wide range of machine learning applications, they are however often regarded as black boxes, i.e., their predictions cannot be explained. To enhance the explanation of ANNs, a novel algorithm to extract symbolic rules from ANNs has been proposed in this paper. ANN methods have not been effectively utilized for data mining tasks because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by human experts. With the proposed approach, concise symbolic rules with high accuracy, that are easily explainable, can be extracted from the trained ANNs. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and the accuracy. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of benchmark data mining classification problems.
Book
Are you a data mining analyst, who spends up to 80% of your time assuring data quality, then preparing that data for developing and deploying predictive models? And do you find lots of literature on data mining theory and concepts, but when it comes to practical advice on developing good mining views find little how to information? And are you, like most analysts, preparing the data in SAS? This book is intended to fill this gap as your source of practical recipes. It introduces a framework for the process of data preparation for data mining, and presents the detailed implementation of each step in SAS. In addition, business applications of data mining modeling require you to deal with a large number of variables, typically hundreds if not thousands. Therefore, the book devotes several chapters to the methods of data transformation and variable selection. FEATURES * A complete framework for the data preparation process, including implementation details for each step. The complete SAS implementation code, which is readily usable by professional analysts and data miners; * A unique and comprehensive approach for the treatment of missing values, optimal binning, and cardinality reduction; * Assumes minimal proficiency in SAS and includes a quick-start chapter on writing SAS macros. * CD includes dozens of SAS macros plus the sample data and the program for the books case study. It is easy to write books that address broad topics and ideas leaving the reader with the question "Yes, but how?#148; By combining a comprehensive guide to data preparation for data mining along with specific examples in SAS, Mamdouh's book is a rare find-a blend of theory and the practical at the same time. As anyone who has mined data will confess, 80% of the problem is in data preparation; Mamdouh addresses this difficult subject with strong practical techniques and methods. If you are working on an SAS data mining project, this book is a must! If you are working on any data mining project, the techniques and methods will be a guiding light! --Frank Byrum, Cormine Intelligent Data, LLC.
Conference Paper
Neural networks have been successfully applied in a wide range of supervised and unsuper vised learning applications� Neuralnetwork methods are not commonly used for datamining tasks however because they often produce incomprehensible models and require long training times� In this article we describe neuralnetwork learning algorithms that are able to produce comprehensible models and that do not require excessive training times� Specically we discuss two classes of approaches for data mining with neural networks� The rst type of approach often called rule extraction involves extracting symbolic models from trained neural networks� The second approach is to directly learn simple easytounderstand networks� We argue that given the current state of the art neuralnetwork methods deserve a place in the tool boxes of datamining specialists�.
Classification using Weka (Brain, Computation, and Neural Learning)
  • Jung-Woo Ha
Jung-Woo Ha." Classification using Weka (Brain, Computation, and Neural Learning)".