ChapterPDF Available

Deep Learning in Engineering Education: Performance Prediction Using Cuckoo-Based Hybrid Classification

Authors:

Abstract and Figures

The goodness measure of any institute lies in minimising the dropouts and targeting good placements. So, predicting students' performance is very interesting and an important task for educational information systems. Machine learning and deep learning are the emerging areas that truly entice more research practices. This research focuses on applying the deep learning methods to educational data for classification and prediction. The educational data of students from engineering domain with cognitive and non-cognitive parameters is considered. The hybrid model with support vector machine (SVM) and deep belief network (DBN) is devised. The SVM predicts class labels from preprocessed data. These class labels and actual class labels acts as input to the DBN to perform final classification. The hybrid model is further optimised using cuckoo search with Levy flight. The results clearly show that the proposed model SVM-LCDBN gives better performance as compared to simple hybrid model and hybrid model with traditional cuckoo search.
Content may be subject to copyright.
Machine Learning and
Deep Learning in Real-
Time Applications
Mehul Mahrishi
Swami Keshvanand Institute of Technology, India
Kamal Kant Hiran
Aalborg University, Denmark
Gaurav Meena
Central University of Rajasthan, India
Paawan Sharma
Pandit Deendayal Petroleum University, India
A volume in the Advances in Computer and
Electrical Engineering (ACEE) Book Series
Published in the United States of America by
IGI Global
Engineering Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue
Hershey PA, USA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: cust@igi-global.com
Web site: http://www.igi-global.com
Copyright © 2020 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or
companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
For electronic access to this publication, please contact: eresources@igi-global.com.
Names: Mahrishi, Mehul, 1986- editor. | Hiran, Kamal Kant, 1982- editor. |
Meena, Gaurav, 1987- editor. | Sharma, Paawan, 1983- editor.
Title: Machine learning and deep learning in real-time applications / Mehul
Mahrishi, Kamal Kant Hiran, Gaurav Meena, and Paawan Sharma, editors.
Description: Hershey, PA : Engineering Science Reference, an imprint of IGI
Global, [2020] | Includes bibliographical references and index. |
Summary: “This book examines recent advancements in deep learning
libraries, frameworks and algorithms. It also explores the
multidisciplinary applications of machine learning and deep learning in
real world”-- Provided by publisher.
Identifiers: LCCN 2019048558 (print) | LCCN 2019048559 (ebook) | ISBN
9781799830955 (hardcover) | ISBN 9781799830962 (paperback) | ISBN
9781799830979 (ebook)
Subjects: LCSH: Machine learning. | Real-time data processing.
Classification: LCC Q325.5 .M3216 2020 (print) | LCC Q325.5 (ebook) | DDC
006.3/1--dc23
LC record available at https://lccn.loc.gov/2019048558
LC ebook record available at https://lccn.loc.gov/2019048559
This book is published in the IGI Global book series Advances in Computer and Electrical Engineering (ACEE) (ISSN:
2327-039X; eISSN: 2327-0403)
187
Copyright © 2020, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Chapter 9
DOI: 10.4018/978-1-7998-3095-5.ch009
ABSTRACT
The goodness measure of any institute lies in minimising the dropouts and targeting good placements. So,
predicting students’ performance is very interesting and an important task for educational information
systems. Machine learning and deep learning are the emerging areas that truly entice more research
practices. This research focuses on applying the deep learning methods to educational data for clas-
sification and prediction. The educational data of students from engineering domain with cognitive and
non-cognitive parameters is considered. The hybrid model with support vector machine (SVM) and deep
belief network (DBN) is devised. The SVM predicts class labels from preprocessed data. These class
labels and actual class labels acts as input to the DBN to perform final classification. The hybrid model
is further optimised using cuckoo search with Levy flight. The results clearly show that the proposed
model SVM-LCDBN gives better performance as compared to simple hybrid model and hybrid model
with traditional cuckoo search.
INTRODUCTION
Nowadays, Educational Data Mining (EDM) exists as a novel trend in the Knowledge Discovery in Da-
tabases (KDD) and Data Mining (DM) field that concerns in mining valuable patterns and finding out
practical knowledge from the educational systems. One important goal of the educational system among
many is tracking the performance of the student. Many techniques and algorithms are used to track the
Deep Learning in
Engineering Education:
Performance Prediction Using Cuckoo-
Based Hybrid Classication
Deepali R. Vora
Vidyalankar Institute of Technology, India
Kamatchi R. Iyer
Amity University, India
188
Deep Learning in Engineering Education
progress of students. However, evaluating the educational performance of students is challenging as
their academic performance pivots on varied constraints. This domain has gained importance with the
increase in data volume and the development of new algorithms.
Data generated from various educational sources is explored using different methods and techniques
in EDM. The multidisciplinary research that deals with the development of such methods and techniques
are the focus of EDM. Analysis of educational data could provide information about student’s behav-
iours, based on which education policies could be enhanced further (Sukhija, Jindal, & Aggarwal, 2015,
October). EDM discusses the techniques, tools, and research intended for automatically extracting the
meaning from large repositories of educational systems’ data.
According to Davies (Davis, 1998), “Education has become a commodity in which people seek
to invest for their own personal gain, to ensure equality of opportunity and as a route to a better life.”
Because of this Higher education providers are competing mainly for students, funding, research and
recognition within the wider society.
It seems important to study data of students studying professional courses as for the growth of any
nation producing better professionals is the key to success. Higher education system faces two main
challenges: finding placements and students dropping out. Analysis of educational data can help in an-
swering the two major challenges satisfactorily. Predicting the performance leads to better placements
and minimise the dropouts.
A statistical technique to predict future behaviour is known as Predictive modelling. Predictive
analytics is used widely in the area of product management and recommendation. It is a powerful tool
to understand the data at hand and get useful insights from it. Figure 1 represents Predictive analytics
in education.
One of the most popular methods for predictive analytics is Machine learning (ML) to predict future
behaviour. From the plethora of algorithms available, it is always interesting to discover which algorithm
or technique is most suitable for analysis of data under consideration. EDM is the area of research where
predictive modelling is most useful.
ML has become very popular among researchers because of the astonishing results the algorithms
are giving for diverse data and applications. But when data is growing enormously simple ML are not
efficient and beneficial. Meantime there are lot many advances in hardware and software. So, it was pos-
sible to have more complex and hybrid architectural models performing various DM or Big Data tasks.
Big data is already posing a challenge on traditional ML models for efficiency and accuracy. Various
hybrid models are proposed and tested in many domains to tackle these challenges and are proved to be
useful. Thus, applying a hybrid model in the education domain will be useful.
ML is changing in a better way to tackle new age data and one of such advances is Deep Learning
(DL). Nonlinear data analysis can be effectively done using deep learning. Characteristics of the data
can be effectively analysed using layers in the deep learning model. DL is being applied in many do-
mains; predominantly in image processing and natural language processing (Deng & Yu, 2014). Thus,
it is interesting to apply Deep Learning in the field of education.
This chapter addresses the main objectives as:
1. Identification of areas like EDM where Deep Learning is applied and is useful
2. Applying hybrid classification method using Deep Learning on Educational Data for Classification
3. Improvising the Hybrid Model By Applying Cuckoo Search with Levy Flight optimization technique
189
Deep Learning in Engineering Education
BACKGROUND
Deep Learning
Hinton and colleagues suggested the concept of Deep Learning in the year 2006. Deep Learning (DL)
is capable of learning from small data sets. The learning is through a nonlinear network structure. The
Deep Learning is made up of the network structure with normally more than 4 hidden layers with one
input and one output layer. Such a network can transform the raw features of images into superior features
thereby making classification and prediction better (Bengio, 2009) (Najafabadi, et al., 2015).
DL differs from ML in many ways. In terms of accuracy of algorithms, DL performs much better
than normal ML. when data increases, DL learns fast from such ever-increasing data thereby increas-
ing accuracy. In contrast, ML algorithms are restricted by the representation of data which hampers the
response time and accuracy of the system using such algorithms. Consider an example of email spam
filtering. To identify if an email is a spam or not, the ML algorithm is given various representations
of a good and bad email. Using which incoming emails are categorized as good or bad. ML algorithm
directly without any representations will not be able to decide on anything.
Here, DL comes to the rescue. Identification of important features and learning from them is easily
performed by DL. DL algorithms can identify the features from the raw data and create representations for
learning. DL has numerous algorithms of ML. These algorithms attempt to model high-level abstractions
in data. They create or design architectures which are composed of many non-linear transformations.
Figure 1. Predictive Modelling
190
Deep Learning in Engineering Education
Deep architectures can be modelled using any combinations of layers of a network, but still, it has set of
traditional algorithms such as Stacked AutoEncoder, Deep Boltzmann Machines, Deep Convolutional
Networks and Deep Belief Networks. Figure 3 shows the set of predefined DL models.
In general, the model of deep learning technique can be classified into discriminative, generative,
and hybrid models (Alwaisi & Baykan, 2017).
Figure 2. A Deep Architecture
Figure 3. Deep Learning Models
191
Deep Learning in Engineering Education
Discriminative models are used for modelling dependency of unobserved (target) variable Y on
observed variables X whereas the generative models are used for learning the joint probability distribu-
tion. The generative model learns the full relationship between input X (features) and label Y giving
maximum flexibility at the time of testing. Discriminative models learn from the only X to predict Y
using conditional probability. By using few modelling assumptions these models can use existing data
more efficiently. CNN, deep neural network and recurrent neural network are Discriminative models
and DBN, restricted Boltzmann machine, and regularized autoencoders are generative models. Hybrid
deep models are a combination of discriminative and generative models.
These DL models are used in various different application areas to gain better accuracy or output.
Table 1 summarizes the work done in various areas:
In addition to the above mentioned, there are many applications in various domains where DL algo-
rithms or deep networks are used very effectively.
From the study of various articles, it is evident that DL is applied widely in many areas. The improve-
ment in hardware has also made application of DL feasible. These algorithms are proved to give better
accuracy in many cases than other traditional ML algorithms. Still, there are many domains where DL
may prove beneficial, one of those being Educational System. In many articles, the DL algorithms are
compared with traditional machine learning algorithms and are observed to be more accurate. Many
articles proved that DL algorithms improve accuracy over traditional ML algorithms.
Also, the review of articles suggests that applying Deep Learning algorithm with other generalised
algorithms may give better results in classification and prediction tasks. Through the survey, it is observed
that hybrid models are more popular than plain DL algorithm based models (Vora & Iyer, A Survey of
Inferences from Deep Learning Algorithms, 2017). In many applications standard dimensionality reduc-
tion algorithms are used to reduce the features and then DL algorithms are applied to improve accuracy.
Educational Data Mining
EDM is a popular research area and an ample amount of research articles are available for study. These
research articles indicate the experimentation and algorithms used in EDM for performing various
tasks. For the performance prediction, various new techniques and ML algorithms have experimented.
There are many factors or features which have a significant effect in predicting the performance of the
students. These factors are classified as cognitive and non-cognitive factors. Non-cognitive factors play
an important role in various EDM goals.
Wattana & Nachirat (Punlumjeak & Rachburee, 2015,October) used various techniques like K-Nearest
Neighbourhood, Naïve Bays, and Neural Network to classify the students’ data. The features considered
were very few and majority attributes were related to marks of students.
Norlida, Usamah & Pauziah (Buniyamin, Mat, & Arshad, 2015, November) used Neuro Fuzzy al-
gorithms to predict the performance of the engineering student. Here only 6 linguistic parameters are
used for prediction.
Camilo, Elizabeth & Fabio (Guarín, Guzmán, & González, 2015) used Decision tree and Bayesian
Classifier for prediction of students’ performance. Students’ admission test score and academic informa-
tion were used for prediction. In addition, few socio-economic parameters were also used for prediction.
The major stress was on the admission parameters.
Phung, Chau & Phung (Phung, Chau, & Phung, 2015,November) used Rule Extraction algorithm
for classification in EDM. The algorithm is able to handle discrete and continuous data. The algorithm
192
Deep Learning in Engineering Education
Table 1. Deep Learning Application area
Application Area Deep Learning Algorithm Key Findings
Malware Detection DBN (Davidt & Netanyahu,
2015,July)
• Dropout method was used while training the network.
• Various layers were used to detect malware signatures.
• The network was trained using a GPU to detect 30 signatures.
Intrusion Detection DBN (GAO, GAO, Gao, &
Wang, 2014,November)
• DBN has proved more accurate than SVM and Artificial Neural
Network (ANN).
• DBN with 4 different configurations was used. The performance of
shallow DBN is same as SVM and ANN.
• DBN with 2 and more hidden layers gave better output. The DBN is
used for multiclass classification.
Spam Filtering DBN (Tzortzis & Likas,
2007,October)
• DBN for Spam Filtering.
• The performance was compared with SVM and DBN and was found
more accurate.
Image Processing
Deep Convolutional DBN
(Nguyen, Fookes, & Sridharan,
2015,September)
• Deep Convolutional DBN used for classification of images.
• Accuracy is improved and training time for the deep network reduced.
Image Processing
DBN & DAE (Vincent,
Larochelle, Lajoie, Bengio, &
Manzagol, 2010)
• DBN and DAE (Denoising AutoEncoder) used for analysing the
images.
• Experimental results show that DAE was helpful for learning of higher
level representations.
• DBN and DAE gave better accuracy for image classification when
combined with SVM.
Classification Deep SVM (Kim, Minho, &
Shen, 2015,July)
• Experimented with a new model created by combining Autoencoder,
Deep SVM and GMM.
• The input was fed to SVM and then to GMM forming one layer.
• Thus deep layers were constructed for feature extraction and then a
Naïve Bays algorithm was used for classification.
Image Processing SVR with LRU (Kuwata &
Shibasaki, 2015,July)
• Used SVR (Support Vector Regression) with Linear Rectifier Units for
estimating the crop yields from remotely sensed data.
• This paper described Illinois crops yield estimation using deep learning
and machine learning algorithm.
• Experimentation was done using Caffe tool.
• SVM with Gaussian Radial Bias function was used for the same
experimentation and proved that traditional SVM overfits the regression
model making accuracy low.
Regression Analysis
Deep SVM (M. A. Wiering,
Millea, Meijster, & Schomaker,
2016)
• Used Deep SVM for the regression analysis. The deep model was
constructed by stacking two layers of SVM.
• Initial layers were used for extracting the important features and final
layer was used for classification.
Finance
Deep SVM with Fuzzy (Deng,
Zhiquan Ren, Kong, Bao, &
Dai, 2017)
• Used Fuzzy Deep Neural Network for the classification of financial
trading data.
• The deep network was given a high-level representation of data. This
representation was generated by the fuzzy model and the neural network
model.
Education SAE (Guo, Zhang, guang, Shi,
& Yang, 2015,July)
• Used sparse Autoencoders for classification and prediction.
• The network was trained using a backpropagation algorithm.
• The experimentation was done on data collected from 9th-grade high
school children.
• The experimentation was carried out on GPU and CPU.
• The observed accuracy of DL algorithm was higher than SVM and
Naive Bayes algorithm.
Music Classification
Deep Feed forward network
and LRU (Rajanna, Aryafar,
Shokoufandeh, & Ptucha, 2015)
• Rectilinear Unit (RLU) was used as an activation function in a deep
neural network with 2 hidden layers.
• The accuracy of the classifier is improved significantly.
193
Deep Learning in Engineering Education
has a major challenge in creating compact rules. The numerous rules formed made the system difficult
to use with more parameters.
Wen and Patrick (Shiau & Chau, 2016) and Sadaf & Eydgahi (Ashtari & Eydgahi, 2017) used Statis-
tical modelling for EDM. Statistical methods are not able to support the change in population and size.
Also, it was difficult to handle lead time bias.
Fernando et al. (Koch, Assunção, Cardonha, & Netto, 2016) used Partial Least square method and
proved that it was cost effective. Here the method was sensitive to the choice of parameters. The param-
eters used were few.
Janice et al. (Gobert, Kim, Pedro, Kennedy, & Betts, 2015) and Anjana, Smija & Kizhekkethottam
(Pradeep, Das, & J, 2015) used Decision trees in EDM. Limited features were used while predicting the
performance. As well tree structure was prone to sampling error. The accuracy was affected by imbal-
anced data.
Evandro B. Costa et al. (Costa, Fonseca, Santana, Araújo, & Rego, 2017) used Naïve Bays, Decision
Tree, SVM and Neural Network to predict the performance of the students. The data used was collected
from distance learning and on-campus students. Performance data per week for the four weeks was col-
lected and analysed for the effectiveness of the algorithms.
Wanli et al. (Xing, Guo, Petakovic, & Goggins, 2015) used genetic programming for predicting
Students’ performance. The genetic algorithm produced an optimised prediction rate. While predict-
ing, less consideration was given to the qualitative aspects. They monitored closed classroom learning
of students and identified the factors which affect the performance. The participation of the student in
various activities was majorly considered.
Xin Chen, et al. (Chen, Vorvoreanu, & Madhavan, 2013) studied social data to identify the factors
which affect the behaviour or performance of students as study-life balance, lack of sleep, lack of social
engagement, and lack of diversity.
Michail N. Giannakos et al. (Giannakos, et al., 2017, April) identified various cognitive factors like
academic performance, attendance etc. and its effect on students’ performance.
Hijazi & Naqvi (Hijazi & Naqvi, 2006) and Shoukat (Shoukat, 2013) has studied the impact of vari-
ous cognitive and non-cognitive on students’ performance.
Mushtaq & Khan (Mushtaq & Khan, 2012) proved that communication, learning facilities, proper
guidance and family stress has a direct impact on studentsperformance. As well as Omar & Dennis (2015)
used many factors for study and identified which factors played a vital role in students’ performance.
Suryawan & Putra (Suryawan & Putra, 2016) did a detailed survey to identify the factors which affect
students GPA. Also, regression tests and correlation analysis were done on various factors. It proved
that the entrance exam and attendance in the class were important factors. Lecturer quality was also
important and has an effect on GPA.
In an interesting article in 2016, Pooja Mondal (Mondal) identified various factors like intellect, learn-
ing, physical, mental, social and economic as factors which affected students’ behaviour and performance.
Most of the research is centred on the application of Data Mining and Machine Learning techniques
in the classification task for students’ performance. Classification and prediction task widely uses Clas-
sification methods based on learning from examples, such as Decision Tree, Artificial Neural Networks
and Support Vector Machine algorithms. Although hybrid algorithms gained popularity for solving
complex problems, they are not cited as commonly as the other methods in students’ performance clas-
sification and prediction (Vora & Kamatchi, EDM Survey of Performance Factors and Algorithms
Applied, 2018).
194
Deep Learning in Engineering Education
Optimization
Optimization is the selection of the best element using some criterion from some set of available al-
ternatives. The goal of optimization is to provide near perfect, effective and all possible alternatives.
Maximizing or minimizing some function related to application or features of the application is the
process of optimization. For example, minimising the cost function or minimising the mean square
error is the goal of optimization in typical ML algorithms. Machine learning often uses optimization
algorithms to enhance their learning performance. Training optimization improves the generalization
ability of any model.
Following diagram shows the taxonomy of optimization techniques.
It is difficult to train the network effectively. Classification accuracy is improved with improvement
in training. So, the training of deep learning algorithms can be improved using optimization techniques.
There are many optimization techniques, but recent techniques indicate the use of metaheuristic algo-
rithms for training optimization. Metaheuristic techniques are popular because they can be applied in
any generalized problem domain. Among many metaheuristic nature inspired algorithms like ant and
bee algorithms, particle swarm optimization, genetic algorithms, firefly algorithms, harmony search,
the Cuckoo Search (CS) Algorithm (Yang & Deb, 2014) was preferred. Primary advantages of CS are
as follows:
Applied in a wide variety of problems like Face Recognition, Engineering optimization, Medical
Domain etc. and proved beneficial
Figure 4. Taxonomy of Optimization
195
Deep Learning in Engineering Education
Cuckoo search has better global convergence properties than other popular variants like Genetic
Algorithm, Particle Swarm Optimization, Ant colony optimization etc.
For a random walk, Cuckoo Search uses Levey Flight which helps in more exploration in search
space. This guarantees early and definite global convergence.
Hybrid model constructed with CS is proved to be beneficial.
PERFORMANCE PREDICTION USING HYBRID MODEL
Experimental Setup
The experimental setup is divided into three parts as (i) Data Collection and preparation (ii) Design and
implementation of the model and finally, (iii) evaluation of the model for the problem identified. Figure
4 shows these steps clearly.
For any experimentation, input data plays a vital role. Thus one of the important steps of experimental
design is the collection of relevant data. The only collection of data is not sufficient, but making it ready
as per the requirement of the model is also important. Thus Data Preprocessing becomes a quintessen-
tial step in an experimental setup. Once data is prepared or preprocessed as per the requirement of the
model, existing models and new suggested models are implemented. The comparative results provide
new insights regarding the usefulness and accuracy of the model.
Figure 5. Experimental Setup
196
Deep Learning in Engineering Education
Algorithms
Dimensionality reduction using Principal Component Analysis (PCA)
In the proposed prediction model, PCA (Maćkiewicz & Ratajczak, 1993) is used for reducing the
vast data. There are many reasons why one wishes to reduce the dimensionality of input data. Many
times the complexity of the model depends on the number of dimensions in the data as well as the size
of the data sample. When the dimensions are reduced then the cost related to extracting the not required
dimensions is reduced. Many times the models are robust when the dataset is small and gives accurate
results. Also, data with few dimensions can be visualized properly to reduce the outliers.
Consider a p-dimensional random variable U with the dispersion matrix and let λ1 .. λn be the
eigenvalues. Consider that P1…Pn are the corresponding Eigenvectors of ∑. Then one can write:
 
 
111
PP P P
ppp
' '
... (1)
  PP P P
p p1 1
' '
... (2)
P P P P
i i j1 1 0
' '
,£ £  
, i≠j (3)
The transformed random variables can be represented as:
Y PU i P
i  
11
', , . (4)
Here Y is the new random variable vector and P is the orthogonal matrix then Y can be obtained
from U by the orthogonal transformation as Y=PU. Here this random variable vector Yi is called as the
ith principal component of U.
Only the basic steps of PCA are followed here. These basic steps of PCA are given in Algorithm 1.
Algorithm 1: Steps of PCA
Step 1: Standardize the input data
Step 2: Evaluate the covariance of the data
Step 3: Deduce Eigenvectors and Eigenvalues
Step 4: Re-orient data with respect to Principal Components
Step 5: Plot re-oriented data
Step 6: Bi-plot
Support Vector Machine (SVM)
PCA acts as dimensionality reduction techniques. The features are given as an input to the PCA and
reduced extracted features are considered for further computation. If there are 12 features then PCA
reduces it to 6 features and so on.
197
Deep Learning in Engineering Education
The reduced dimension and class labels are given to the SVM (Yuan, et al., 2017) for prediction of
the class. The SVM here will get reduced dimensions to work on. As the model is working on reduced
but important data the predictions are more accurate.
The data considered here consists of 35 features and one class label. Providing these many features
directly to DBN makes it computationally intensive as well results may not be so accurate. So intermit-
tently SVM is used to generate near accurate class labels which are fed to the DBN. SVM with a linear
kernel is used to generate the class labels.
Here the tuning of SVM is obviously not so accurate with the resultant prediction (in which class the
performance fall). Hence, the resultant class labels from SVM are considered as the features to DBN
classifier. DBN classifier classifies the students’ overall performance.
Deep Belief Network (DBN)
Generally, DBN includes multiple layers, and each and every layer has visible neurons, which establish
the input layer, and hidden neurons form the output layer. Further, there presents a deep connection with
hidden and input neurons; but there was no connection among hidden neurons and no connections are
present in the visible neurons. The connection among visible as well as hidden neurons is symmetric
and exclusive. This corresponding neuron model defines an accurate output for the input.
Since the stochastic neurons’ output in Boltzmann network is probabilistic, Eq. (5) denotes the output
and Eq. (6) specifies the possibility in sigmoid-shaped function, where tP indicates the pseudo-temperature.
The deterministic model of the stochastic approach is given in Eq. (7).
P
e
q
tp
 
1
1
(5)
PO P
P
q
q
 
 
1 1
0
,
,
with
with
(6)
(7)
The diagrammatic representation of the DBN model is in Figure 10, in which the process of feature
extraction takes place through a set of RBM layers and the process of classification takes place via MLP.
The arithmetic model exposes the energy of Boltzmann machine for the creation of neuron or binary
state bi, and that is defined in Eq. (8), where Wa,l indicates the weights among neurons and θa indicates
the biases.
198
Deep Learning in Engineering Education
EN bi W
a a l a
 
 
,
(8)
The progression of energy in terms of the joint composition of visible as well as hidden neurons
(x,y) is defined in Eq. (9), Eq. (10) and Eq. (11). In this, xa indicates either the binary or neuron state of
a visible unit, Bl indicates the binary state of l hidden unit, and ka is constant.
EN x y W x y k x B y
a l
a l a l
a
a a
l
l a
,
,
,
 
 
 
 
(9)
EN x y W y k
a
l
al l a
,
 
 
(10)
EN x y W x B
a
l
al a l
,
 
 
(11)
The input data’s possibility dissemination is encoded into weight (parameters), which is spread as
RBM’s learning pattern. RBM training can attain distributed possibilities, and the consequent weight
assignment is defined by Eq. (12).
W C x
M
Wx N
 
 
 
max (12)
Figure 6. Architecture of DBN in the proposed model
199
Deep Learning in Engineering Education
For the visible and hidden vectors pair
x hi,
 
, the possibility assigned RBM approach is given in
Eq. (13), where PRF specifies the partition function as in Eq. (14).
c x hi
i
PR
e
F
EN x y
 
,,
 
 
(13)
PR e
F
x y
EN x y
 
 
 
,
, (14)
The DBN is trained using CD (Contrastive Divergence) (Goodfellow, Bengio, & Courville, 2016)
algorithm. The steps of the CD training are as follows:
Step 1: Choose the x training samples and brace it into visible neurons.
Step 2: Evaluate the feasibility of hidden neurons cy by identifying the product of
W
weight matrix and
visible vector
Step 3: Examine the y hidden states from cy probabilities.
Step 4: Evaluates the x exterior product of vectors and cy that is measured as a positive gradient  
x cy
tP
..
Step 5: Examine the reconstruction of
x
visible states from y hidden states. Further, it is needed to
evaluate
y
hidden states from the reconstruction of
x
.
Step 6: Evaluate the
x
and
y
’s exterior product, be it as a negative gradient  
x y tP
..
Step 7: Define the updated weight as defined in Eq. (15), where η indicates the learning rate.
W
 
 
( ) (15)
Step 8: Update the weights with new values.
The following step defines the progression of DBN training with MLP training (normal) and RBM
training (pre-training)
Step 1: Initialize the DBN model with weights, biases and further associated parameters, which are
randomly selected.
Step 2: Firstly, the initialization of RBM model is progressed with the input data that serves the potentials
in its visible neurons and gives the unsupervised learning.
Step 3: Here, the input to the subsequent layer is subjected by potential sampling that processed in the
hidden neurons of the preceding layer. Further, it follows the unsupervised learning.
Step 4: The above-specified steps are continued for the corresponding count of layers. Hence, the pre-
training stage by RBM is processed till it reaches the MLP layer.
Step 5: MLP phase specifies the attained learning by supervised format and is continued till it attains
the target error rate.
200
Deep Learning in Engineering Education
Finally, the classifier predicts the students’ performance with increased accuracy rate. The predictions
are evaluated on the basis of various evaluation measures identified.
Dataset
To predict the performance of students, the private engineering college students are decided as popula-
tion. The engineering colleges under Mumbai University are selected as population.
There are more than 50+ engineering colleges under Mumbai University. Mumbai University con-
sists of the engineering colleges from Mumbai, New Mumbai and Thane region. So at the first stage, it
was decided to collect samples from Mumbai. At the second stage, it was decided to concentrate on the
geographical centre part of Mumbai. The samples are collected from engineering colleges which are
centrally located in Mumbai City.
Data collected here is Primary data and data collection is done using a questionnaire. Data collection
through the questionnaire is the most popular method in case of big enquiries.
Various parameters are identified which have a direct or indirect effect on the performance of stu-
dents. Careful selection of questions was important while keeping in mind that the questionnaire does
not pose a burden on respondents. The cognitive and non-cognitive parameters are identified. The effect
of cognitive and non-cognitive parameters in performance prediction is carefully understood by studying
various articles. The parameters identified are shown below:
Cognitive factors are the characteristic of a person which affect the performance and learning directly.
These factors are measurable. Non-cognitive factors are the parameters which are not directly linked
to but may have an effect on the performance and learning. Studies have shown that non-cognitive pa-
rameters have an equivalent effect on performance and learning. Non-cognitive factors are not directly
measurable. Keeping in mind the scenario of engineering students and colleges, few non-cognitive factors
which may have an indirect effect on the performance of students are decided.
Based on the parameters identified the class label is decided based on CGPA (Cumulative Grade
Point Average) score of 5th semester. The parameter ‘class’ indicates the CGPA score of the student in
Semester 5. The CGPA score is calculated on the scale of 1 to 10. This parameter indicates the perfor-
mance of the student in the coming semester. The implemented system predicts the performance of the
student as a CGPA score range.
Figure 7. Cognitive Parameters
201
Deep Learning in Engineering Education
There are 6% and 8% samples out of total samples in class 1 and 4 respectively. There are 36% and
50% samples out of total samples in class 3 and 4 respectively.
Evaluation Measures
To evaluate the effectiveness of the Machine Learning algorithms basic measures like Accuracy, Preci-
sion, Recall and F1-Measure (Han & Kamber, 2012) were adopted. Squared error based cost functions
are inconsistent for solving classification problems. Also, these measures are widely used in domains
such as information retrieval, machine learning and other domains that involve classification (Olson &
& Delen, 2008). A confusion matrix is a base for the determination of these measures.
Figure 8. Non-cognitive Parameters
Table 2. Output class label
Class (CGPA Score) Class Label
<5
Bet 5 and 7
Bet 7 and 9
More than 9
1
2
3
4
202
Deep Learning in Engineering Education
Confusion Matrix: The confusion matrix can be represented as follows:
Where –
True Positive (TP) = Number of positive instances correctly classified as positive.
False Positive (FP) = Number of positive instances incorrectly classified as negative.
True Negative (TN) = Number of negative instances correctly classified as negative.
False Negative (FN) = Number of negative instances incorrectly classified as positive
Accuracy: Accuracy indicates the closeness of a predicted or classified value to its real value. The
state of being correct is called Accuracy. It can be calculated as:
Accuracy= (TP+TN)/(TP+TN+FP+FN)
Precision: Precision can be defined as the number of relevant items selected out of the total num-
ber of items selected. It represents the probability that an item is relevant. It can be calculated as:
Precision = TP/(FP+TP)
Precision is the measure of exactness.
Recall: The Recall can be defined as the ratio of relevant items selected to relevant items avail-
able. The recall represents a probability that a relevant item is selected. It can be calculated as:
Recall = TP/(FN+TP)
The recall is the measure of completeness.
F1-Measure: F1-Measure is the harmonic mean between Precision and Recall as described
below:
F1-Measure= 2 * (Precision * Recall) / (Precision +Recall)
It creates a balance between precision and recall. Accuracy may be affected by class imbalance but
F1 Measure is not affected by class imbalance. So with accuracy F1-measure is also used for evaluation
of classification algorithms.
Predicted/Classified
Negative Positive
Actual
Negative True Negative (TN) False Positive (FP)
Positive False Negative (FN) True Positive (TP)
203
Deep Learning in Engineering Education
Sensitivity: Sensitivity is used to find out the proportion of positive samples that are correctly
identified also called a true positive rate. It is calculated as:
Sensitivity=TP/P
Where,
P = Total Number of Positive Samples
N = Total number of Negative Samples
Specificity: Specificity is used to find out the proportion of negative samples that are correctly
identified and also called a true negative rate. It is calculated as:
Specificity=TN/N
False Positive Rate (FPR): FPR is used to find out the proportion of negative samples that are
misclassified as positive samples. It is calculated as:
FPR=FP/N
False Negative Rate (FNR): FNR is used to find out the proportion of positive samples which are
misclassified as negative samples. It is calculated as:
FNR=FN/P
Negative Predictive Value (NPV): NPV is used to find out the number of samples which are true
negative. It is calculated as:
NPV=TN/(TN+FN)
False Discovery Rate (FDR): FDR is also called an error rate. It is used to find out a proportion
of false positive among all the samples that are classified as positive. It is calculated as:
FDR=FP/(FP+TP)
Matthews’s correlation coefficient (MCC): It is calculated as:
MCC=(TP*TN)-(FP*FN) /SQRT((TP+FP)(TP+FN)(TN+FP)(TN+FN))
MCC is a balanced measure based on a confusion matrix. This measure is used even if the classes
are of different sizes. It is a correlation coefficient between the actual classes and predicted classes. The
value of MCC lies between -1 to 1. The value near to +1 indicates the prediction is perfect. The value 0
indicates random prediction. The value -1 indicates a total disagreement between the actual and predicted
values. MCC score above zero indicates balanced classification. MCC is a good measure when the data
204
Deep Learning in Engineering Education
have varying classes, unbalanced dataset and random data (Jurman, Riccadonna, & Furlanello, 2012).
With F1-score the MCC guides in a better way to determine the suitable algorithm for classification.
Results
The results for various evaluation measures for the various training percentages are indicated in Figure
9, 10 and 11
The specificity of the hybrid model remains almost same to 0.85 for all training percentages. The
sensitivity score is 0.80. The precision is also almost constant to 0.78 and is increased for 60% training.
The FPR is changing slightly from 0.18 to 0.20. For all training percentages, the ratio lies quite low
from 0.03 to 0.06. The NPV score is good with an average value of 0.80. The FDR score is constant at
0.21 for different training percentages.
The accuracy graph shows a variation from 69% to 75% for different training percentages. The ac-
curacy is good with 50% training. The accuracy has improved and is better than pure SVM and DBN
for the considered data. The F1-score graph shows no variation in F1-score and it has a score of 0.81.
The MCC score also shows a variation with the values ranging from 0.13 to 0.18. The value of MCC
score is far better than the MCC scores of SVM and DBN. The good and positive MCC score suggests
that the proposed model is better suited for the data under consideration. There is a considerable improve-
ment in MCC score indicating the suitability of the model for the educational data.
Figure 9. Results for Hybrid Model: Sensitivity, Specificity, Accuracy and F1_Score
205
Deep Learning in Engineering Education
Figure 10. Results for Hybrid Model: FDR, FNR, NPV and FPR
Figure 11. Results for Hybrid Model: Precision and MCC
206
Deep Learning in Engineering Education
It is important to understand if the hybrid model is better than other models. It is necessary to look
at evaluation measures to find the performance and suitability of the hybrid classification method for
the collected educational data.
Table 3 shows the overall performance of the hybrid classification model over other models.
From this, it is observed that the hybrid prediction model is more superior to other methods with
respect to all measures. Particularly, the specificity of proposed SVM with Deep Learning model is
better from DBN and SVM.
The accuracy of the hybrid model is 5.06% and 6.69% superior to DBN and SVM. The hybrid model
also attained great precision over other methods. Similarly, the FPR of the hybrid model is 2.26% and
76.62% better from DBN and SVM respectively with less FPR.
The F1-Score of the hybrid method is 49.27% and 59.64% better from DBN and SVM. From this
analysis, it is proved that the hybrid prediction model is highly efficient when compared to other con-
ventional methods.
The Graph in Figure 12 shows the overall performance of the proposed model. The hybrid model
has better accuracy, F1 score and MCC indicating that the proposed hybrid model created using SVM
and DBN is able to classify the educational data in a better way.
Discussion
Table 4(a), (b), (c) and (d) shows the performance score of the hybrid model for evaluation parameters
Accuracy, F1 Measure, FPR and MCC for different classes. The training percentage is 60%.
Accuracy for the various classes is improved drastically for the hybrid model, mainly for the classes
where data samples are less. Even F1 score and MCC score is better of the hybrid model. Low FPR
indicates that prediction of classes by the hybrid model is improving through the data is imbalanced.
Table 3. Overall Performance of hybrid classification model over other methods
Algorithms→
Measures↓ SVM DBN SVM with DBN
Specificity 0.48 0.84 0.87
Sensitivity 0.18 0.78 0.82
Accuracy 0.65 0.7 0.76
Precision 0.36 0.38 0.79
FPR 0.83 0.23 0.19
FNR 0.53 0.17 0.04
NPV 0.18 0.78 0.82
FDR 0.65 0.63 0.22
F1- Score 0.41 0.54 0.82
MCC 0.04 0.09 0.27
207
Deep Learning in Engineering Education
The accuracy is increased to 75%. Still, there is scope for improvement in the model to achieve better
accuracy. The model can be further optimised to gain better accuracy. The model can be further improved
by improving training. The scores of evaluation measures for various training percentages indicated that
the model can be improved with improved training experience.
Figure 12. Performance of hybrid model over other algorithms
Table 4. Scores of Accuracy, FPR, F1-Score and MCC for Different Classes for Hybrid Model
Algorithm→
Class↓ SVM DBN SVM-DBN Algorithm→
Class↓ SVM DBN SVM-DBN
1 0.47 0.77 0.77 1 0.75 0.09 0.01
2 0.36 0.76 0.8 2 0.84 0.09 0.01
3 0.37 0.79 0.79 3 0.88 0.09 0.03
4 0.15 0.37 0.43 4 0.09 0.01 0.01
Accuracy FPR
Algorithm→
Class↓ SVM DBN SVM-DBN Algorithm→
Class↓ SVM DBN SVM-DBN
1 0.5 0.58 0.5 1 0.05 0.17 0.19
2 0.49 0.5 0.5 2 0.05 0.18 0.18
3 0.5 0.53 0.5 3 0.06 0.17 0.21
4 0.03 0.5 0 4 0.13 0.17 0.29
F1-Score MCC
208
Deep Learning in Engineering Education
SOLUTIONS AND RECOMMENDATIONS
Hybrid DL model gave better performance than the advanced ML models. In such cases, it is always
interesting to investigate the combinatory models to find out the classification experience. The perfor-
mance of the Combinatory models may be improved by optimizing the learning of the model.
After evaluating the performance of the Hybrid model including Deep Learning, an optimized hybrid
model is implemented using Cuckoo search optimization method. The optimization is achieved in a bet-
ter way when Cuckoo Search with Levy Flight is used.
Algorithms
The Optimized model - LCDBN
The input educational data is first given to the PCA for dimensionality reduction. The important features
extracted are fed to the SVM for further generalized prediction. These predictions which are not so ac-
curate are then fed to the DBN for final classification and prediction. Here in DBN, there is a change
in training the DBN model. The RBM units in DBN model are trained with Cuckoo Search with Levy
Flight. The training of DBN is improved using the CS with Levy Flight so the model - LCDBN.
Algorithm 2 depicts the training of the proposed LCDBN model. Here the RBM is trained using the
Levy Flights of Cuckoo Search. Instead of using the simple random walk in the Cuckoo Search, Levy
Flights are used to further improve the Cuckoo Search Algorithm. In traditional CS algorithm, the ran-
dom walk was taken using Gaussian Processes.
A Levy Flight can be thought of as a random walk where the step size has a Levy tailed probability
distribution. The weight of the DBN model has been updated as shown in Eq. (16). Here, the levy search
of CSA W t
old *
 
is included in the RBM weight matrix update. From Eq. (16), Wold refers to Yi
t of
CSA, Er denotes the error and α indicates the scaling factor.
W W Er s x Q s x X W t
old old
 
 
 
 
 
 
 
* * *
' '
1 1 2 2 2
1| (16)
System Model
Figure 13 shows the optimized hybrid model.
The input parameters are split and are fed to PCA to reduce parameters. The output from each PCA is
given to the individual SVM for prediction of class label. The class labels predicted by the SVM; acts as
the input to the DBN. Using this input and actual class labels, the DBN predicts the classes for the data.
The DBN is constructed with 2 layers of RBM. One layer of RBM represents a hidden layer and a
visible layer. The RBM layers are constructed with 3 neurons each and the activation function used is
a sigmoid function. The numbers of input neurons are 3. After RBM layers an MLP layer is added for
prediction of class. The MLP layer has 3 neurons and logistic regression is used as an activation func-
tion. The output layer has one neuron to predict the class label. Here the learning of DBN is optimized
using Cuckoo Search with Levy Flight.
209
Deep Learning in Engineering Education
The model follows a parallel framework. If the number of features increases in future then more PCA
and SVM components can be introduced. Vertical fragmentation suggests model can be easily adapted
in the Map Reduce framework (Maitrey & C.K.Jha, 2015) for Big Data processing. As well as horizon-
tal fragmentation is also done to suggest the suitability for Big Data application. Here the horizontal
fragmentation may give multiple calls to single PCA block.
Algorithm 2. Modified RBM learning Model
RBM update (X1,ε,W, u,c)
This is the RBM update process for binomial units. It is adopted to other kinds of units
X1 denotes a sample from the training distribution for RBM
ε denotes a learning rate for stochastic gradient descent in contrastive divergence
W denotes RBM weight matrix
u denotes RBM offset vector for input units
c denotes the RBM offset vector for hidden units
Q(s2=1|X2) indicates the vector with components Q(s2i=1|X2)
for the entire hidden units, i do
Evaluate Q(s1i=1|X1) (for binomial units, sigm c W s
i
j
ij i
1
Sample s1i𝜖{0,1} from Q(s1i|X1)
end for
for the entire hidden units, j do
Evaluate P(x2j=1|s1) (for binomial units, sigm u W s
j
i
ij i
1)
Sample x2jϵ{0,1} from P(x2j=1|X1)
end for
for the entire hidden units, i do sigm c W x
i
j
ij j
2
Evaluate Q(s2i=1|X2) (for binomial units, sigm c W x
i
j
ij j
2
end for
Update the weight using Eq. (4)
uu+ε(x1x2)
cc+ε(s1Q(s2=1|X2))
210
Deep Learning in Engineering Education
Results
The overall performance analysis regarding the students’ performance prediction using proposed SVM-
LCDBN model is given by Figure 14. From the analysis, better accuracy, specificity, sensitivity, preci-
sion, FPR, FNR, FDR, NPV, F1-score and MCC was determined for the adopted scheme, thus revealing
its superiority when compared with other schemes.
From the simulation, the accuracy of the presented approach is 49.32% better than NN, 17.63% better
than SVM, 11.32% better than DBN, 3.5% better than SVM-DBN and 2.33% better than SVM-CDBN
models.
Effect of λ on results
Figure 15 shows the performance of proposed model SVM-LCDBN for the various values of CS pa-
rameter λ. The graph is plotted for training percentage 70. One can fairly see that the performance of the
proposed model is best for the value λ=1. The model is tested for various values and it is observed that
the proposed model gives best performance for various evaluation parameters for the value 1.
Also it is seen that the scores of various evaluation parameters is good when the training percentage
is 70.
Effect of α on Prediction
The scale parameter is represented by α. The performance of the classifier depends on scaling parameter,
that’s why various values of α are tested for performance evaluation.
Figure 13. System Model
211
Deep Learning in Engineering Education
Table 5 describes the effect of scaling factor for the various performance measures on predicting the
students’ performance for training percentage 70. Accordingly, here the value of α is varied for α=0.2,
α=0.4, α=0.6 and α=0.8 and the measures are evaluated.
Figure 14. Performance of Optimized Model over other Algorithms
Figure 15. Effect of λ on Performance of Optimized Model
212
Deep Learning in Engineering Education
Discussion
In DBN, LC model was adopted for weight computation. Consequently, the adopted prediction model
SVM-LCDBN was suggested that makes a deep connection with the hybrid classifier to obtain a more
precise output. Moreover, the proposed SVM-LCDBN model was compared with traditional schemes,
and the results are attained.
From the simulation, the accuracy of the presented approach was 17.63% better than SVM, 11.32%
better than DBN, 3.5% better than SVM-DBN and 2.33% better than SVM-CDBN models. Thus better
enhancements were obtained by the proposed SVM-LCDBN model for predicting the student’s performance.
Table 6 shows the score of the evaluation measures for Accuracy, f1 Score, MCC and FPR of the
optimised model. These values are calculated for λ=1 and α=0.2. The accuracy % for any model for
CGPA<5 indicates that the model is able to accurately classify % of total samples in this class.
Table 5. Effect of α on prediction
Parameter
𝛂=0.2 𝛂=0.4 𝛂=0.6 𝛂=0.8
SVM-LCDBN SVM-LCDBN SVM-LCDBN SVM-LCDBN
Specificity 0.88 0.86 0.8 0.8
Sensitivity 0.86 0.86 0.83 0.88
Accuracy 0.79 0.75 0.75 0.72
Precision 0.8 0.73 0.86 0.77
FPR 0.19 0.15 0.1 0.13
FNR 0.03 0.15 0.2 0.3
NPV 0.82 0.86 0.83 0.88
FDR 0.15 0.28 0.15 0.24
F1_Score 0.83 0.82 0.85 0.85
MCC 0.3 0.24 0.23 0.19
Table 6. Evaluation Measures for the Model SVM-LCDBN
Evaluation Measure→
Class↓ Accuracy F1-Score MCC FPR
CGPA Score <5 76 0.79 0.1 0.11
CGPA Score Between 5 and 7 77 0.81 0.23 0.11
CGPA Score Between 7 and 9 80 0.85 0.24 0.12
CGPA Score More than 9 78 0.74 0.13 0.11
Overall Score 79 0.83 0.28 0.13
213
Deep Learning in Engineering Education
The results show that the scores are improved for all classes. Accuracy and MCC score indicates that
the optimised model is better. The model can be used to predict the class label effectively. The motivation
behind the research work is to predict the performance of the student at the early stage. The implemented
model is able to predict the performance of the students by identifying the appropriate class label with
good accuracy. As well, improved MCC score suggests that the model is suitable for the educational data.
FUTURE RESEARCH DIRECTIONS
The chapter represents one of the ways to analyse the Educational Data using an optimised hybrid model.
There are many other ways to work in the area of EDM and DL together. Some improvement in the DL
model is also beneficial to improve the accuracy of prediction in Educational Domain.
The model considered here is a hybrid model for classification using SVM – DBN. The model is used
for performance prediction. There are many other tasks in EDM like course recommendation where such
hybrid models may be effective. SVM and DBN being generative models are applicable in many domains.
The performance of the DBN can be further improved if training is improved. There are many opti-
mization techniques which can be combined to improve the training of the DBN. It is interesting to find
out how other optimization techniques will be beneficial to improve the accuracy of the model.
CONCLUSION
A Deep Learning model for the Performance Prediction of Students in Educational Information System
is implemented. The work started with the motivation to implement a hybrid model with better accuracy
in Educational Domain and extended to optimised hybrid model.
ML and DL are the fields of Artificial Intelligence where algorithms learn by themselves. These
algorithms can be applied in many emerging areas where they may be effective. These algorithms are
found useful in many areas with an increase in data size. The main aim of using DL model is to increase
accuracy.
Before devising the hybrid model, ML algorithms are applied to the data collected from the educa-
tional domain. The ML algorithms like SVM and pure DL algorithm - DBN are applied on the collected
data. Additional evaluation measures are used to test the algorithms. Balanced evaluation measure like
MCC particular for the ML domain is used with traditional F1-score. The results show that pure DL
and advanced ML algorithms are giving similar accuracy. Hence a new hybrid model for performance
prediction of students to get better accuracy is implemented
Optimization is the technique to improve the accuracy of the model by tuning learning weightsThere
are many popular metaheuristic techniques but Cuckoo Search Optimization is chosen because of its
many advantages.
A new students’ performance prediction hybrid model is proposed and is improved by using the
Cuckoo Search with Levy flight optimization technique. The proposed model uses a new hybridized
classifier to predict the performance, which hybridizes the SVM and DBN classifier. The data is trained
by SVM, and as the tuning is not appropriate inaccurate prediction, the resultant class labels from SVM
214
Deep Learning in Engineering Education
are considered as the features to DBN, where it has classified the performance. The performance of the
proposed prediction model is further improved by optimizing the training of the DBN. From the results,
it was evident that the results of the optimized hybrid prediction model are better than traditional ML,
advanced ML and pure DL algorithms.
REFERENCES
Alpaydin, E. (2004). Introduction to Machine Learning. Cambridge: MIT Press.
Alwaisi, S., & Baykan, O. K. (2017). Training Of Artificial Neural Network Using Metaheuristic Algo-
rithm. International Journal of Intelligent Systems and Applications in Engineering.
Ashtari, S., & Eydgahi, A. (2017). Student perceptions of cloud applications effectiveness in higher
education. Journal of Computational Science, 23, 173–180. doi:10.1016/j.jocs.2016.12.007
Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning,
2(1), 1–127. doi:10.1561/2200000006
Buniyamin, N., Mat, U. b., & Arshad, P. M. (2015, November). Educational Data Mining for Predic-
tion and Classification of Engineering Students Achievement. In IEEE 7th International Conference on
Engineering Education (ICEED), (pp. 49-53). Kanazawa: IEEE.
Chen, X., Vorvoreanu, M., & Madhavan, K. (2013). Mining Social Media Data for Understanding Stu-
dents’ Learning Experiences. IEEE Transactions on Learning Technologies.
Chui, K. T., Fung, D. C., Lytras, M. D., & Lam, T. M. (2017). Predicting at-risk university students in a
virtual learning environment via a machine learning algorithm. Computers in Human Behavior.
Costa, E., Fonseca, B., Santana, M. A., Araújo, F., & Rego, J. (2017). Evaluating the effectiveness of
educational data mining techniques for early prediction of students’ academic failure in introductory
programming courses. Computers in Human Behavior, 73, 247–256. doi:10.1016/j.chb.2017.01.047
Davidt, O. E., & Netanyahu, N. S. (2015, July). DeepSign: Deep Learning for Automatic Malware Detec-
tion. International Joint Conference on Neural Networks (IJCNN), 1-8. 10.1109/IJCNN.2015.7280815
Davis, D. (1998). The virtual university: A learning university. Journal of Workplace Learning, 10(4),
175–213. doi:10.1108/13665629810213935
Deng, L., & Yu, D. (2014). Deep Learning for Signal and Information Processing. Redmond, WA:
Microsoft Research.
Deng, Y., Ren, Z., Kong, Y., Bao, F., & Dai, Q. (2017). A Hierarchical Fused Fuzzy Deep Neural
Network for data classification. IEEE Transactions on Fuzzy Systems, 25(4), 1006–1012. doi:10.1109/
TFUZZ.2016.2574915
Gao, N., Gao, L., Gao, Q., & Wang, H. (2014, November). An Intrusion Detection Model Based on
Deep Belief Networks. Second International Conference on Advanced Cloud and Big Data, 247-252.
215
Deep Learning in Engineering Education
Giannakos, M. N., Aalberg, T., Divitini, M., Jaccheri, L., Mikalef, P., Pappas, I. O., & Sindre, G. (2017,
April). Identifying Dropout Factors in Information Technology Education: A Case Study. IEEE Global
Engineering Education Conference (EDUCON), 1187-1194. 10.1109/EDUCON.2017.7942999
Gobert, J. D., Kim, Y. J., Pedro, M. A., Kennedy, M., & Betts, C. G. (2015). Using educational data
mining to assess students’ skills at designing and conducting experiments within a complex systems
microworld. Thinking Skills and Creativity, 18, 81–90. doi:10.1016/j.tsc.2015.04.008
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Guarín, C., Guzmán, E., & González, F. (2015). A Model to Predict Low Academic Performance at a
Specific Enrollment Using Data Mining. IEEE Journal of Latin-American Learning Technologies, 10(3).
Guo, B., Zhang, R., Guang, X., Shi, C., & Yang, L. (2015, July). Predicting Students performance in
educational data mining. International Symposium on Educational Technology (ISET), 125-128. 10.1109/
ISET.2015.33
Han, J., & Kamber, M. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
Hijazi, S. T., & Naqvi, S. R. (2006). Factors Affecting Students’ Performance. Bangladesh e-Journal
of Sociology, 3.
Jurman, G., Riccadonna, S., & Furlanello, C. (2012). A comparison of MCC and CEN error measures
in multi-class prediction. PLoS One, 7(8), e41882. doi:10.1371/journal.pone.0041882 PMID:22905111
Kim, S. M. L., & Shen, J. (2015, July). A novel deep learning by combining discriminative model
with generative model. International Joint Conference on Neural Networks (IJCNN), 1-6. 10.1109/
IJCNN.2015.7280589
Koch, F., Assunção, M. D., Cardonha, C., & Netto, M. A. (2016). Optimising resource costs of cloud com-
puting for education. Future Generation Computer Systems, 55, 473–479. doi:10.1016/j.future.2015.03.013
Kuwata, K., & Shibasaki, R. (2015, July). Estimating crop yields with deep learning and remotely
sensed data. International Geoscience and Remote Sensing Symposium (IGARSS), 858-861. 10.1109/
IGARSS.2015.7325900
Maćkiewicz, A., & Ratajczak, W. (1993). Principal Components Analysis (PCA). Computers & Geosci-
ences, 19(3), 303–342. doi:10.1016/0098-3004(93)90090-R
Maitrey, S., & Jha, C. K. (2015). MapReduce: Simplified Data Analysis of Big Data. Procedia Computer
Science, 57, 563–571. doi:10.1016/j.procs.2015.07.392
Mondal, P. (n.d.). 7 Important Factors that May Affect the Learning Process. Retrieved from http://
www.yourarticlelibrary.com/learning/7-important-factors-that-may-affect-the-learning-process/6064/
Mushtaq, I., & Khan, S. N. (2012). Factors Affecting Students’ Academic Performance. Global Journal
of Management and Business Research, 12(9).
Najafabadi, M. M., Villansutre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., & Muharemagic, E. (2015).
Deep learning applications and challenges in big data analytics. Springer Open Journal of Big Data.
doi:10.118640537-014-0007-7
216
Deep Learning in Engineering Education
Nguyen, K., Fookes, C., & Sridharan, S. (2015, September). Improving Deep Convolutional Neural
Networks with Unsupervised Feature Learning. IEEE International Conference on Image Processing
(ICIP), 2270-2271. 10.1109/ICIP.2015.7351206
Olson, D. L., & Delen, D. (2008). Advanced data mining techniques (1st ed.). Springer Publishing
Company.
Phung, L. T., Chau, V. T., & Phung, N. H. (2015, November). Extracting Rule RF in Educational Data
Classification from a Random Forest to Interpretable Refined Rules. International Conference on Ad-
vanced Computing and Applications (ACOMP), 20-27. 10.1109/ACOMP.2015.13
Pradeep, A., & Das, S., & J, J. (2015). Students Dropout Factor Prediction Using EDM Techniques.
International Conference on Soft-Computing and Network Security. 10.1109/ICSNS.2015.7292372
Punlumjeak, W., & Rachburee, N. (2015, October). A Comparative Study of Feature Selection Tech-
niques for Classify Student Performance. 7th International Conference on Information Technology and
Electrical Engineering (ICITEE), 425-429. 10.1109/ICITEED.2015.7408984
Rajanna, A. R., Aryafar, K., Shokoufandeh, A., & Ptucha, R. (2015). Deep Neural Networks: A Case
Study for Music Genere Classification. 14th International Conference on Machine Learning and Ap-
plications (ICMLA), 665-660. 10.1109/ICMLA.2015.160
Shiau, W.-L., & Chau, P. Y. (2016). Understanding behavioral intention to use a cloud computing
classroom: A multiple model comparison approach. Information & Management, 53(3), 355–365.
doi:10.1016/j.im.2015.10.004
Shoukat, A. (2013). Factors Contributing to the Students’ Academic Performance: A Case Study of
Islamia University Sub-Campus. American Journal of Educational Research, 283–289.
Sukhija, K., Jindal, D. M., & Aggarwal, D. N. (2015, October). The Recent State of Educational Data
Mining: A Survey and Future Visions. IEEE 3rd International Conference on MOOCs, Innovation and
Technology in Education, 354-359. doi: 10.1109/MITE.2015.7375344
Suryawan, A., & Putra, E. (2016). Analysis of Determining Factors for Successful Student’s GPA
Achievement. 11th International Conference on Knowledge, Information and Creativity Support Systems
(KICSS), 1-7.
Tzortzis, G., & Likas, A. (2007, October). Deep Belief Networks for Spam Filtering. IEEE International
Conference on Tools with Artificial Intelligence (ICTAI 2007), 306-309.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked Denoising Autoen-
coders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. Journal
of Machine Learning Research, 11, 3371–3408.
Vora, D., & Iyer, K. (2017). A Survey of Inferences from Deep Learning Algorithms. Journal of Engi-
neering and Applied Sciences, 12(SI), 9467-9472.
217
Deep Learning in Engineering Education
Vora, D., & Kamatchi, R. (2018). EDM Survey of Performance Factors and Algorithms Applied.
International Journal of Engineering & Technology, 7(2.6), 93-97.
Wiering, M., Schutten, M., Millea, A., Meijster, A., & Schomaker, L. (2013, October). Deep Learning
using Linear Support Vector Machines. International Conference on Machine Learning: Challenges in
Representation Learning Workshop.
Wiering, M. A. M. S., Millea, A., Meijster, A., & Schomaker, L. (2016). Deep Support Vector Machines
for Regression Problems. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71
8.987&rep=rep1&type=pdf
Xing, W., Guo, R., Petakovic, E., & Goggins, S. (2015). Participation-based student final performance
prediction model through interpretable Genetic Programming: Integrating learning analytics, educational
data mining and theory. Computers in Human Behavior, 47, 168–181. doi:10.1016/j.chb.2014.09.034
Yang, X.-S., & Deb, S. (2014). Cuckoo Search: Recent Advances and Applications. Neural Computing
& Applications, 24(1), 169–174. doi:10.100700521-013-1367-1
Yuan, Y., Zhang, M., Luo, P., Ghassemlooy, Z., Lang, L., Wang, D., ... Han, D. (2017). SVM-based
detection in visible light communications. Optik (Stuttgart), 151, 55–64. doi:10.1016/j.ijleo.2017.08.089
ADDITIONAL READING
Asif, R., Merceron, A., Ali, S. A., & Haider, N. (2017, October). Analyzing undergraduate students’
performance using educational data mining. Computers & Education, 11, 177–1943. doi:10.1016/j.
compedu.2017.05.007
Baker, R., & Siemens, G. (2013). Educational Data Mining and Learning Analytics. Cambridge hand-
book of the Learning Sciences.
Echegaray-Calderon O. A., Barrios-Aranibar D. (2015, October). Optimal selection of factors using
Genetic Algorithms and Neural Networks for the prediction of students’ academic performance. IEEE
Latin America Congress on Computational Intelligence (LA-CCI), Curitiba, 2015, pp. 1-6.
Fu, J., Chang, J., Huang, Y., & Chao, H. (2012). A Support Vector Regression-Based Prediction of Stu-
dents’ School Performance. International Symposium on Computer, Consumer and Control. 10.1109/
IS3C.2012.31
Guo, X., Huang, H., & Zhang, J. (2014). Comparison of Different Variants of Restricted Boltzmann
Machines. 2nd International Conference on Information Technology and Electronic Commerce(ICITEC).
10.1109/ICITEC.2014.7105610
218
Deep Learning in Engineering Education
KEY TERMS AND DEFINITIONS
Cognitive Factors: Characteristics of the student that have a direct effect on learning and perfor-
mance of the student.
Educational Data Mining: Tools and techniques to extract meaningful patterns from educational data.
Non-Cognitive Factors: Characteristics of the student which do not have as such a direct effect on
learning and performance but may have an indirect effect on performance and learning.
Optimization: Optimization is the technique to improve the accuracy of the model by tuning learn-
ing weights.
Predictive Analytics: Exploration of data to predict the future using various methods like statistics,
machine learning etc.
... In DL, models are trained using a large set of labeled data and neural network architectures that contain many layers. Therefore, DL approaches are achieving results that were not possible before [132]. 2. Augmented Reality and Virtual Reality (AR-VR) approaches in simulation were applied by References [47,68,102,105,110]. Alongside the advancement in data processing and artificial intelligence, solutions that allow users to dive and explore immersive computer-generated, or applications that overlay computer graphic interfaces in humans' field of view have appeared and attracted the research society's attention and different industries. ...
Article
Full-text available
In this paper, a map of the state of the art of recent medical simulators that provide evaluation and guidance for surgical procedures is performed. The systems are reviewed and compared from the viewpoint of the used technology, force feedback, learning evaluation, didactic and visual aid, guidance, data collection and storage, and type of solution (commercial or non-commercial). The works’ assessment was made to identify if—(1) current applications can provide assistance and track performance in training, and (2) virtual environments are more suitable for practicing than physical applications. Automatic analysis of the papers was performed to minimize subjective bias. It was found that some works limit themselves to recording the session data to evaluate them internally, while others assess it and provide immediate user feedback. However, it was found that few works are currently implementing guidance, aid during sessions, and assessment. Current trends suggest that the evaluation process’s automation could reduce the workload of experts and let them focus on improving the curriculum covered in medical education. Lastly, this paper also draws several conclusions, observations per area, and suggestions for future work.
Article
Full-text available
Uno de los problemas a los que se enfrenta el docente universitario, se relaciona con la falta de retroalimentación por parte de los estudiantes, que le permita conocer las dificultades que ellos enfrentan a la hora de adquirir conocimientos nuevos en clase. Algunos esfuerzos de investigación se han realizado para identificar y clasificar los problemas de los estudiantes, con datos provenientes de las redes sociales, pero limitando los criterios de búsqueda y recuperación de los datos. El presente artículo pretende cubrir la brecha dejada por los trabajos anteriores, mediante la creación de una metodología para predecir las dificultades de los estudiantes universitarios, que utiliza técnicas de extracción de datos provenientes de la red social Twitter y que ha sido plasmada en un flujo de trabajo, que integra técnicas de Minería de Textos, Procesamiento de Lenguaje Natural y Aprendizaje Profundo, usando Algoritmos Supervisados y Redes Neuronales Recurrentes. Para probar la metodología propuesta, se construyó un modelo de clasificación, cuya evaluación es prometedora, pues se alcanzó una precisión y exactitud del 80%. Los resultados obtenidos garantizan la predicción de las dificultades estudiantiles, dotando al docente de una herramienta efectiva para identificar las áreas que demanden mayor atención y poder así brindar ayuda oportuna a los estudiantes que tienen dificultades en el aula.
Article
Full-text available
Deep Learning, viewed as a part of machine learning, is capable of learning the essence of data from a small data set by learning a nonlinear network structure. Deep Learning algorithms have shown wide use in speech recognition, image processing, video processing and has wide application in various areas which uses any of these techniques. Still many application areas are emerging and are untouched by Deep Learning Architectures. This paper gives an overview of various application areas where Deep Learning algorithms or variants are applied and proved beneficial. As well it gives a summary of the top five popular Deep Learning tools available for research and implementation of Deep Learning algorithms.
Article
Full-text available
Educational Data Mining (EDM) is a new field of research in the data mining and Knowledge Discovery in Databases (KDD) field. It mainly focuses in mining useful patterns and discovering useful knowledge from the educational information systems from schools, to colleges and universities. Analysing students’ data and information to perform various tasks like classification of students, or to create decision trees or association rules, so as to make better decisions or to enhance student’s performance is an interesting field of research. The paper presents a survey of various tasks performed in EDM and algorithms (methods) used for the same. The paper identifies the lacuna and challenges in Algorithms applied, Performance Factors considered and data used in EDM.
Chapter
Full-text available
During the past decades, the potential of analytics and data mining - methodologies that extract useful and actionable information from large datasets - has transformed one field of scientific inquiry after another (cf. Collins, Morgan, & Patrinos, 2004; Summers et al., 1992). Analytics has become a trend over the past several years, reflected in large numbers of graduate programs promising to make someone a master of analytics, proclamations that analytics skills offer lucrative employment opportunities (Manyika et al., 2011), and airport waiting lounges filled with advertisements from different consultancies promising to significantly increase profits through analytics. When applied to education, these methodologies are referred to as learning analytics (LA) and educational data mining (EDM). In this chapter, we will focus on the shared similarities as we review both parallel areas while also noting important differences. Using the methodologies we describe in this chapter, one can scan through large datasets to discover patterns that occur in only small numbers of students or only sporadically (cf. Baker, Corbett, & Koedinger, 2004; Sabourin, Rowe, Mott, & Lester, 2011); one can investigate how different students choose to use different learning resources and obtain different outcomes (cf. Beck, Chang, Mostow, & Corbett, 2008); one can conduct fine-grained analysis of phenomena that occur over long periods of time (such as the move toward disengagement over the years of schooling - cf. Bowers, 2010); and one can analyze how the design of learning environments may impact variables of interest through the study of large numbers of exemplars (cf. Baker et al., 2009). In the sections that follow, we argue that learning analytics has the potential to substantially increase the sophistication of how the field of learning sciences understands learning, contributing both to theory and practice.
Conference Paper
Full-text available
Educators and researchers have been working to understand the reasons that may be contributing to high dropout rates, and low rates of participation, by females in the computer and information sciences discipline. Along the same lines, and propelled by the increased need for information technology (IT) professionals worldwide, we implemented a students' survey during the fall of 2015 in Norway's primary university for technological education. In this initiative we aim to identify reasons that may be contributing to high dropout rates, low rates of participation by females and aspects important for the efficient preparation of young people for careers in computer science and information technology. The results provide valuable insights and allow us to take appropriate measures for enhancing students' learning experience in the computer and information sciences.
Article
A university education is widely considered essential to social advancement. Ensuring students pass their courses and graduate on time have thus become issues of concern. This paper proposes a reduced training vector-based support vector machine (RTV-SVM) capable of predicting at-risk and marginal students. It also removes redundant training vectors to reduce the training time and support vectors. To examine the effectiveness of the proposed RTV-SVM, 32,593 university students on seven courses were chosen for performance evaluation. Analysis reveals that the RTV-SVM achieved a training vector reduction of at least 59.7% without altering the margin or accuracy of the classifier. Moreover, the results showed the proposed method to be capable of achieving overall accuracy of 92.2–93.8% and 91.3–93.5% in predicting at-risk and marginal students, respectively.
Article
This article clarify enhancing classification accuracy of Artificial Neural Network (ANN) by using metaheuristic optimization algorithm. Classification accuracy of ANN depends on the well-designed ANN model. Well-designed ANN model Based on the structure, activation function that are utilized for ANN nodes, and the training algorithm which are used to detect the correct weight for each node. In our paper we are focused on improving the set of synaptic weights by using Shuffled Frog Leaping metaheuristic optimization algorithm which are determine the correct weight for each node in ANN model. We used 10 well known datasets from UCI machine learning repository. In order to investigate the performance of ANN model we used datasets with different properties. These datasets have categorical, numerical and mixed properties. Then we compared the classification accuracy of proposed method with the classification accuracy of back propagation training algorithm. The results showed that the proposed algorithm performed better performance in the most used datasets.
Article
A support vector machine (SVM)-based data detection for 8-superposed pulse amplitude modulation and direct-current-biased optical orthogonal frequency division multiplexing in visible light communication is proposed and experimentally demonstrated. In this work, the SVM detector contains multiple binary classifiers with different classification strategies. The separating hyperplane of each SVM is constructed by means of the training data. The experiment results presented that the SVM detection offers improved bit error rate performance compared with the traditional direct decision method.
Article
The tremendous growth in electronic data of universities creates the need to have some meaningful information extracted from these large volumes of data. The advancement in the data mining field makes it possible to mine educational data in order to improve the quality of the educational processes. This study, thus, uses data mining methods to study the performance of undergraduate students. Two aspects of students' performance have been focused upon. First, predicting students' academic achievement at the end of a four-year study programme. Second, studying typical progressions and combining them with prediction results. Two important groups of students have been identified: the low and high achieving students. The results indicate that by focusing on a small number of courses that are indicators of particularly good or poor performance, it is possible to provide timely warning and support to low achieving students, and advice and opportunities to high performing students.
Article
The data about high students' failure rates in introductory programming courses have been alarming many educators, raising a number of important questions regarding prediction aspects. In this paper, we present a comparative study on the effectiveness of educational data mining techniques to early predict students likely to fail in introductory programming courses. Although several works have analyzed these techniques to identify students' academic failures, our study differs from existing ones as follows: (i) we investigate the effectiveness of such techniques to identify students likely to fail at early enough stage for action to be taken to reduce the failure rate; (ii) we analyse the impact of data preprocessing and algorithms fine-tuning tasks, on the effectiveness of the mentioned techniques. In our study we evaluated the effectiveness of four prediction techniques on two different and independent data sources on introductory programming courses available from a Brazilian Public University: one comes from distance education and the other from on-campus. The results showed that the techniques analyzed in our study are able to early identify students likely to fail, the effectiveness of some of these techniques is improved after applying the data preprocessing and/or algorithms fine-tuning, and the support vector machine technique outperforms the other ones in a statistically significant way.