Conference PaperPDF Available

PREDICTING UNIVERSITY DROPOUT BY USING CONVOLUTIONAL NEURAL NETWORKS

Authors:

Figures

Content may be subject to copyright.
PREDICTING UNIVERSITY DROPOUT BY USING CONVOLUTIONAL
NEURAL NETWORKS
Mauro Mezzini, Gianmarco Bonavolontà, Francesco Agrusti
Roma Tre University, Department of Education (ITALY)
Abstract
Current trends in graduation rates show that 39% of young adults on average across OECD countries
are expected to complete tertiary-type A (university level) education during their lifetime. According to
Eurostat in 2017, an average of 10.6% of young people (aged 18-24) in the EU-28 were early leavers
from education and training. Over 3 million young people in the European Union had been to university
or college but had discontinued their studies at some point in their life, according to a survey of 2016.
Therefore, the dropout level could potentially represent one of the major issues to be faced in a near
future in the European Union.
The main aim of the research is to predict, as early as possible, which kind of student will more easly
dropout from the Higher Education (HE). This information would allow one to effectively carry out
targeted actions in order to limit the incidence of the phenomenon.
Today, Artificial Intelligence (AI) is being employed to replace human activities that are repetitive, e.g.
in the autonomous driving field or for the image classification task. In these areas AI competes with man
with quite satisfactory results and, in the case of HE dropout, it is extremely unlikely that an expert
teacher will be able to "predict" the student's educational success based only on the data provided by
administrative offices.
The recent breakthrough on Neural Networks with the use of Convolutional Neural Networks (CNN)
architectures has become disruptive in AI. By stacking together tens or hundreds of convolutional neural
layers, a deep network structure is obtained, which has been proved very effective in producing high
accuracy models.
In this study the administrative data of approximately 6,000 students enrolled from 2009 on in the
Education Department at Rome Tre University were used to train the CNNs. Then, the trained network
provides a probabilistic model that indicates, for each student, the probability of dropping out. We used
several types of state-of-the-art CNNs, and their variants, in order to build the most accurate model for
the dropout prediction. The accuracy of the obtained models ranged from 67.1% for the students at the
beginning of the first year up to 88.7% for the students at the end of the second year of their academic
career. With the use of more data, for example students’ career data, we could develop more accurate
dropout prediction models.
Keywords: university dropout; convolutional neural networks; artificial intelligence
1 INTRODUCTION
There are several variables that influence the decision of students to leave their studies at university
level [1]. This phenomenon is known as college dropout and it has been defined by Larsen and other
researchers [2] as the “withdrawal from a university degree program before it has been completed” (p.
18). In this notion is included also the dropout from single courses of study but not withdrawals due to
pregnancy, illness etc. Higher Education (HE) student dropout phenomena has several negative effects:
it has consequences on a personal level, on a family level and, from a systematic point of view, these
low completion rates could bring to a skills bottleneck which can have consequences on the economic
and social level, decreasing competitiveness, innovation and productivity.
In the comparative study on higher education dropout and competition in Europe has been found that
study success is considered as important in 28 out of 35 participating countries [3]. An early recognition
of dropout phenomenon is the prerequisite to reduce the dropout rates: several studies highlights the
importance of monitoring students’ individual and social characteristics since they have a strong impact
on students’ probability of success in HE. Therefore, a strategic goal in the Europe 2020 strategy is to
reduce the university dropout having at least 40% of 30-34-year-olds complete higher education [3].
As reported in literature, students in general leave Higher Education institutions (HEI) during their first
year of college [4,5] just after the upper secondary school: in this period, they have to develop their
sense of responsibility and self-regulation [6,7]. Individual skills and dispositions are investigated in
several psychological and pedagogical models in relation with the early dropout phenomenon in terms
of characteristics of personality [8,9]. The impact of student economic and social status (e.g. race or
income) and organizational services provided to the students by the HEI (e.g. faculty-student ratios) is
also explored by numerous researches [9,10,11,12,13].
From decades one of the most used and discussed model is the Tinto’s student integration model which
underlines the importance of students’ academic and social integration in the forecasting of dropout
phenomenon [14,15,16]. One of the other major models is the one proposed by Bean [17], the students
attrition model, based on the attitude-behavior interactions [18]. In all these models, the relation between
students and institutions is of crucial importance to reduce dropout rates [19,20] and several variables
have been identified to improve students retention [2,21].
In Italy, due to the very high rates of university students’ dropout [22] several specific studies have been
conducted [23,24,25].
Relatively recent advances on neural networks (NN) have shown that Artificial Intelligence (AI) may be
able to compete (or even surpass) with human ability, in the tasks of classification and recognition.
Some of the most dramatic achievements of the AI are on the dropout prediction are shown below.
2 RELATED WORKS
Several research projects using data mining to predict or discover patterns of dropout have been
developed. Therefore, in this section, we discuss previous works that investigated the university student
dropout and performance prediction using Educational Data Mining techniques (EDM) that is a new
approach, applying computerized methods, to analyse large collections of educational data that would
otherwise be impossible to the enormous volume of data [26,27].
The Decision Tree algorithm (DT) has a huge use for the classification problem prediction of the
university dropout. A research conducted at University of Chittagong (Bangladesh) investigated whether
it can use the enrolment data alone to predict study outcome for newly enrolled student [28]. The models
were developed by Classification And Regression Tree (CART) and Chi-squared Automatic Interaction
Detector (CHAID) algorithms evaluated using cross-validation and misclassification errors to decide
which model outperforms other models in term of classification accuracy.
Another research project that aimed to identify patterns of student dropout from 6870 records and 62
attributes (data of students of cohorts 2004, 2005 and 2006) belonging to socio-economic, academic,
and institutional data using DT (J48 algorithm), was conducted in Latin America and was funded by the
Ministry of Education in Colombia and counterpart funds from the University of Nariño and CESMAG
University Institution [29]. The cross-validation method was used in order to evaluate the quality and
prediction accuracy of the discovered patterns with a result of confidence greater than 80%.
Similarly, a research was conducted in India to develop an improve DT based on ID3 which can able to
predict the university dropout students [30]. A dataset of 240 samples collected randomly through survey
at university was used for this study consisting of 32 variables. The performance of the 2 models (ID3,
ID3 improved by using Renyi entropy) were evaluated using accuracy, precision, recall and F-measure.
The result shows accuracy percent 97,50 % for improved ID3 algorithm.
In 2018 a study presented a classification based on DT (C4.5 algorithm) with parameters optimized to
predict the desertion of university students [31]. The study analysed 5288 cases of students belonging
to a Chilean public university (cohorts of students belonging to 44 undergraduate programs in the areas
of humanities, arts, education, engineering, and health). The attributes selected for the analysis were
related to the demographic variables of the student, prior to his admission to university, to his economic
situation and data on academic performance. The result is an accuracy ratio of 87.27%.
Some researcher used specific methodologies, like CRISP-DM (Cross Industry Standard Process for
Data Mining), to develop a model that can predict at the end of the first semester students at risk of
dropout. The model types used by this research are DT, Artificial Neural Network (ANN) and Logistic
Regression (LR) with a data set of 8 years between the years of 1999 and 2006 (over 25,000 students
and 39 variables for each student) with the accuracy 81.19% for the ANN model [32].
Another study used ANN for detecting students at risk of dropout [33]. The population consists of 810
students enrolled for the first time in a health care professions degree course at the University of Genoa
in the academic year 2008-09 and the data came from administrative sources, an ad hoc survey and
telephone interviews. Other comparative analysis of several classification methods was used in order
to develop models to predict dropout students.
Example was the work conducted at Colleges of Technology under the Federal Institute of Education,
Science and Technology of Mato Grosso [34]. It presented a model using Fuzzy-ARTMAP Neural
Network using only the enrolment data collected for a seven year period from 2004 to 2011. The results
show a success rate of accuracy over 85%.
At Universidade Federal do Rio de Janeiro, in Brasil, was conducted a research project where the goal
was to predict risk of dropout their undergraduate education [35]. The study compared DT, SimpleCart
(SC), Support Vector Machine (SVM), Naïve Bayes (NB), and ANN and the data set used came from
14.000 students. The classifier SVM presented the highest true positive rate for all datasets used in the
experiments.
In another study at Budapest University of Technology and Economics [36], using data of 15,285
undergraduate students regarding both their secondary school and university performance, employed
and evaluated several algorithms (DT, Random Forest, Gradient Boosted Trees, LR, Generalized Linear
Model, Deep Learning) to identify students at risk of dropout. The accuracy, recall, precision and AUC
(area under curve) of the ROC were used to evaluate the models and the results highlighted that the
best model was developed by Deep Learning with an accuracy rate of 73.5%.
A similar work [37] used five classification algorithms (LR, Gaussian Naive Bayes, SVM, Random Forest
and Adaptive Boosting) to predict dropout over 4432 student data from the degree studies in Law,
Computer Science and Mathematics of the Barcelona University between the years 2009 and 2014. It
found that all the machine learning algorithms reached an accuracy around 90%.
Another model to predict the drop out was developed at the Instituto Tecnológico de Costa Rica using
Random Forest, SVM, ANN and LR [38]. The data set used in this study gathered from 16.807 students
enrolled between years 2011 and 2016. The results were that the best algorithm for classifying dropouts
was the Random Forest.
The studies mentioned above use different data, algorithms, performance metrics and methodologies
therefore it is impossible to say that one model is better than the other one due to heterogeneity, but all
the studies confirm the effectiveness of data mining approach to analyse and predict university dropout.
As far as we are concerned, the key differences from such related studies to our approach is that we
have introduced Convolutional Neural Networks (CNN) used to classify images to analyse educational
data.
3 METHODOLOGY
One of the most important problem of the field of AI is the classification problem [1]. In the classification
problem we have an object which can be an image, a sound or a written sentence and we want to
associate to each object a class taken from a finite set of predefined classes. For example, in the
image classification problem the solution consists in to associate to each image a proper class according
to some interpretation rule. In this case, a natural interpretation rule would be to label each image with
the subject it contains. Put in more mathematical terms, if we represent each object as an
dimensional vector of real numbers , the solution of the classification problem consists in
finding a function  that associate to each object its class.
A NN can be viewed in fact as a function that takes as input an dimensional vector and produces
a value, called a prediction on . The prediction is correct when 󰇛󰇜 󰇛󰇜 and incorrect otherwise
For having a NN that produce correct predictions, we need that the NN undergo to a training process. It
consists of feeding the NN with a set of objects, called the training set, and denoted as 󰇛 󰇛󰇜󰇜
 where is the number of elements of the training sets. The class󰇛󰇜, of each object in the
training set, is already known. For each object in the training set, the value 󰇛󰇜 is compared to the
prediction 󰇛󰇜of the NN. If the value of the prediction 󰇛󰇜is different from its class 󰇛󰇜, the NN will
be modified according to some optimization rule [40], in order to correct the error. This process is
repeated for hundreds of time for all elements of the training set or until no improvement to the rate of
error is achieved. The rate of error is represented as the fraction of the incorrect predictions out of the
total number of objects in the training set. This process is called also supervised learning since it is
similar to the training process that is usually employed either with humans or with animals.
Among different type of NN the Convolutional neural networks (CNN) have gained much popularity since
recently when cutting edge breakthrough have been obtained in the image classification task [41]. In
their work, Krizhevsky et al. [41] trained a deep convolutional model using 1.2 million images training
set of the Imagenet challenge with 1000 different classes improving of prediction by reducing the error
of almost 50% with respect to the previous system for image classification. Since then, much researches
and advances on CNN have been accomplished.
In classical NN the input object is feed, at the beginning, to a set of neurons, called the first layer.
Every neuron receives in input a copy of the object , then make some computation using an
dimensional vector ,  of weights and gives in output a real number called activation.
Therefore, the output of the first layer can be seen as a dimensional vector of reals. Therefore, this
latter vector can be feed to a second layer of neurons. In this way layers of neurons can be stacked
together to form a more complex and powerful network. The problem is that the number of weights of
the network, and thus the memory and computational resources can be very high as the number of
layers and the number of neurons for each layer increments. For example, at a resolution   pixels,
an image is quite small even for outdated smartphones. However, the numerical encoding of such an
images results in a  dimensional vector. If in the first layer of a NN we want to put, say
neurons, the number of different weights for the first layer only would be 3 million. A key difference
between CNN and NN is that not all components of the input object is feed in input to each neuron, but
only a portion of it [42]. In this way the number of weights for each neuron can be reduced drastically
and at the same time it can produce in output of each layer a very large number of activations.
We employed three different architectures of CNN in order to test their effectiveness for our predictive
model. The first two architectures represent the state of the art of CNN and perform the best or among
the best (at the date of 2017) against industrial benchmarks [43, 44]. The third architecture was built by
us by making several modifications to the ResNet [43] and VGG architectures [45].
We collected, from the administration office of Roma Tre University (R3U), a dataset of students enrolled
in the Department of Education (DE). The years of enrollment ranges from 2009 up to 2014 comprising
a total of 6078 students. About 649 of all students were still active at the time when we acquired the
dataset (August 2018), that is, they were still in the course of their studies, while the remaining 5429
closed the course of their studies either because they graduated or because they dropped out or by
other reasons, explained later. We refer to this set of students as the no active students. The
administrative rule of R3U establishes a time limit of 9 years for the completion of the studies, that is, a
student can be enrolled up to 9 years. If a student does not obtain her/his graduation within this time,
the course of studies of the student will be closed without graduation. We regarded these cases in the
same manner as those in which the student dropped out.
In general, each of the no active student is classified in one of three different classes: Graduated,
Dropout, Other. The class Other, totaling 118 students, contains students that either do not dropped out
and do not graduated in the DE, like for example students changing faculty within the R3U or to another
university. The number of graduated students is 2833 while the number of who dropped out is 2478.
We obtained, from the R3U’s administrative office, most of the (out of what were available)
administrative fields of all students. In the Table 1 is reported the list of administrative fields that were
used.
Table 1. List of administrative attributes.
List 1
List 2
Year of beginning of studies
Academic year
Year of birth
Course code
Gender
Course name
Country of birth
Course year
High school type
Family income class
High school exit score
Working status
High school maximum exit score
Exemption from taxes
Year ending high school
Type of exemption from taxes
Transferred from other university
Handicap
CFU from other university
Part time status
Faculty
Part time CFU
Type of renew of enrolment
Due to privacy reasons, even though some information were available, they were not disclosed. In fact,
it was not possible to know the city of birth nor the city of residence of the students. Other attributes as
the working status were not collected systematically and accurately by the administrative office, since,
in the dataset we found only few students to have a status of worker, while it is well known that a
considerable number of students enrolled in the R3U DE already work as educator.
Note that the value of the attributes in List 2 of Table 2, may change for a same student during its
academic career from year to year, while the values of attributes in List 1 does not change during all
his/her academic career.
In order to construct the training set we need to associate to each student a numeric representation of
her/him. All the domains of the dataset are converted, using an arbitrary bijective function, to a
nonnegative integer domain. For example, the domain of field gender which is the two string {“male”,
female”}, was converted to the domain {0,1} where 0 correspond to “male” and 1 to “female”.
We create a table student whose schema S (that is the fields list) contains all fields of List 1 of Table 1.
For each field f of List 2, we added to S, 9 fields, denoted as
where  , that is, one for each
of the possible 9 years in which the student can be enrolled. If a student ends her/his career at the year
, then
is set equal to for all . The value of , which was chosen arbitrarily to be , should
be considered as a NULL value and does not appear in the domain of any field in the dataset.
We partitioned the number of no active students into three mutually disjoint sets. We randomly chosen
4532 students to form the training set, 450 students to form the validation set and the remaining 447 to
form the test set. We will explain the use of these sets in detail in the following.
If we want to make a prediction on a student at the moment of enrollment, we have only the data of this
student up to year 0. In general, a student beginning the academic year we know only data up to year
. Therefore, we created 9 tables, denoted as studenty, by projecting out all fields
where and
discarding all other fields. For example, the schema of table student0 contains only the field in List 1 of
Table 1, and for each field of List 2 of Table 1, it contains only
. Each table studenty,   is
used as training set for the year .
We used the training sets for the training of the three architectures mentioned before for each of the
year up to year 3. In fact, after year 3 the number of students that make dropout start to be a tiny fraction
of the students that exit with graduation. Furthermore, as explained below, the validation set becomes
smaller and smaller as the year of enrollment increases making the statistics of evaluation less
significant. We run the training for 100 epochs, where an epoch denotes that all the whole training set
has been fed to the CNN once. In other words, we fed to the CNN the whole training set 100 times and
then stopped because, in general, over 100 epochs we detected no significant improvement in the
accuracy. We selected the model which gave the best accuracy on the validation set. The validation set
in turn changes depending on the year. The first year all the students in the validation set are considered,
while in the year only students still active at that year were considered. This reduces the validation
set, as the year increase, since the number of students still active at year is always less than the
number of active students in the year for every  .
4 RESULTS
The training of CNN is a time-consuming task even for the powerful CPUs. Therefore, we used a GPU
in order to implement all the tests. We use the state-of-.the-art open source software libraries to
implement the CNN. All the experiments where implemented using Tensorflow with Keras library [46,47].
Python and MySQL were used, respectively, as a programming language and for the database system.
On a NVIDIA QUADRO P2000 each epoch took between 40 and 73 second to complete depending on
the type of architecture was used in the training. On average the training of a single model can took
about 1 hour.
Table 2. Accuracy on the validation set.
year
arch. 1
arch. 2
0
67.1%
66.7%
1
78.1%
77.8%
2
86.0%
88.7%
3
83.1%
82.3%
In Table 2 we reported the best accuracy of the three architectures mentioned in the previous section
and the year of the trained model. We note that the accuracy of the models on the validation test is low
in the years 0 and 1 but and increase dramatically in the years 2 and 3. This is likely due to the fact that
most of the dropout occur in between the beginning of the course of studies and the beginning of year
2 and at the same time in the validation set contains more information about each student.
Figure 2. Graphs of the accuracy on the training and validation sets during the training process of the
architecture 1. Left: training set and validation set taken from the table student0. Right: training set and
validation set taken from the table student1.
In Fig. 2 we report the plot of the accuracy on the training set and the validation set during the training
process. We reported the data from the training of architecture 1 for the year 0 and the year 1. For the
other architectures the plot of the graph follows a similar pattern. Unfortunately, we can observe that for
the students at the beginning of their course of studies (i.e. at year 0) the model quickly overfit and, at
the same time, the accuracy on the validation set decreases. Furthermore, the maximum value of the
accuracy is attained in the first epochs while at the same time the variance can be very high.
Table 3. Accuracy on the test set.
year
arch. 1
arch. 2
0
62.2%
56.4%
1
71.1%
72.0%
This reflect on the accuracy of the test set, the result of which, for the first two year are reported in Table
3. We can observe a great variability due to random fluctuation in the model tested during the early
phase of training. This in turn suggest that the use of an ensemble of models would reduce the
uncertainty and increase the accuracy of the prediction process.
0.4
0.5
0.6
0.7
0.8
0.9
1
110 19 28 37 46 55 64 73 82 91 100
year 0
Train Validation
0.4
0.5
0.6
0.7
0.8
0.9
1
110 19 28 37 46 55 64 73 82 91 100
year 1
Train Validation
5 CONCLUSIONS AND FUTURE RESEARCH
We employed state-of-the-art Convolutional Neural Networks in order to build a prediction model which
predict the dropout from higher studies of a student. We implemented several models, using real data
of students of the DE in the R3U enrolled between 2009 and 2014. The result obtained are quite
encouraging given the fact that the dataset provided was limited due to privacy constraint and some
important information about students were not (accurately) collected or even considered by the
administration office of R3U. The data we obtained from several tests confirm that the more data is
collected the more accurate the model and the prediction can be.
Much more works remain to be done. First, we can integrate the dataset with data from the academic
career like number of exams, tests score and so on. Then, we can incorporate in the data the fields not
included due to privacy censoring. We tested only three different architectures, but many other different
CNN architectures exist in literature (e.g. [48]). Furthermore, many the parameters of these architecture
can be modified, and much could be explored in order to increase the effectiveness of the models. A
method of data augmentation [49, 50, 51] should be introduced in order to regularize the training process
and to increase the size of the training, validation and test set which is very small due to scarcity of data.
Since it is not required that the prediction process if made in real time, we can train hundreds (perhaps
thousand) of models and make multiple prediction in order to reduce the random variation found in the
early phase of training. Clearly the system can be made finer by introducing a prediction model every
semester or even every trimester or it can be extended to other faculty or other types of students.
ACKNOWLEDGEMENTS
Research group is composed by the authors of the contribution that was edited in the following order:
Mauro Mezzini (§§ 3-4-5), Gianmarco Bonavolontà (§§ 2), Francesco Agrusti (§§ 1).
REFERENCES
[1] K.-L. Krause, ‘Serious thoughts about dropping out in first year: trends, patterns and implications
for higher education’, Studies in Learning, Evaluation, Innovation and Development, vol. 2, no.
3, pp. 5568, 2005.
[2] M. Søgaard Larsen and Dansk Clearinghouse for Uddannelsesforskning, Dropout phenomena at
universities: what is dropout? Why does dropout occur? What can be done by the universities to
prevent or reduce it?: a systematic review. Danish Clearinghouse for Educational Research,
2013.
[3] J. J. Vossensteyn et al., Dropout and completion in higher education in Europe: main report.
European Union, 2015.
[4] L. Harvey, S. Drew, and M. Smith, ‘The first-year experience: A review of literature for the Higher
Education Academy’, York: The Higher Education Academy, 2006.
[5] M. R. Larsen, H. B. Sommersel, and M. S. Larsen, Evidence on dropout phenomena at
universities. Danish Clearinghouse for educational research Copenhagen, 2013.
[6] A. Moè and R. De Beni, ‘Strategie di autoregolazione e successo scolastico: Uno studio con
ragazzi di scuola superiore e universitari’, Psicologia dell’Educazione e della Formazione, vol. 2,
no. 1, pp. 3144, 2000.
[7] P. R. Pintrich and A. Zusho, ‘The development of academic self-regulation: The role of cognitive
and motivational factors’, in Development of achievement motivation, Elsevier, pp. 249284,
2002.
[8] E. Marks, ‘Student perceptions of college persistence, and their intellective, personality and
performance correlates.’, Journal of Educational Psychology, vol. 58, no. 4, p. 210, 1967.
[9] F. Pincus, ‘The false promises of community colleges: Class conflict and vocational education’,
Harvard Educational Review, vol. 50, no. 3, pp. 332361, 1980.
[10] D. H. Kamens, ‘The college" charter" and college size: Effects on occupational choice and college
attrition’, Sociology of education, pp. 270296, 1971.
[11] S. I. Iwai and W. D. Churchill, ‘College attrition and the financial support systems of students’,
Research in Higher Education, vol. 17, no. 2, pp. 105113, 1982.
[12] J. O. Stampen and A. F. Cabrera, ‘The targeting and packaging of student aid and its effect on
attrition’, Economics of Education Review, vol. 7, no. 1, pp. 2946, 1988.
[13] J. M. Braxton and E. M. Brier, ‘Melding organizational and interactional theories of student
attrition: A path analytic study’, The Review of Higher Education, vol. 13, no. 1, pp. 4761,
1989.
[14] V. Tinto, ‘Dropout from higher education: A theoretical synthesis of recent research’, Review of
educational research, vol. 45, no. 1, pp. 89125, 1975.
[15] V. Tinto, Leaving college: Rethinking the causes and cures of student attrition. ERIC, 1987.
[16] V. Tinto, ‘From theory to action: Exploring the institutional conditions for student retention’, in
Higher education: Handbook of theory and research, Springer, 2010, pp. 5189.
[17] J. P. Bean, Leaving college: Rethinking the causes and cures of student attrition. Taylor &
Francis, 1988.
[18] P. M. Bentler and G. Speckart, ‘Models of attitude–behavior relations.’, Psychological review, vol.
86, no. 5, p. 452, 1979.
[19] A. F. Cabrera, M. B. Castaneda, A. Nora, and D. Hengstler, ‘The convergence between two
theories of college persistence’, The journal of higher education, vol. 63, no. 2, pp. 143164,
1992.
[20] G. Hede and L. Wikander, ‘Interruptions and Study Delays in Legal Educations’, A Follow-Up of
the Admissions round Autumn Term, 1983.
[21] A. Siri, Predicting students’ academic dropout using artificial neural networks. Nova, 2014.
[22] ANVUR, Rapporto sullo stato del sistema universitario e della ricerca 2018. 2018.
[23] G. Moretti, M. Burgalassi, and A. Giuliani, ‘ENHANCE STUDENTS’ ENGAGEMENT TO
COUNTER DROPPING-OUT: A RESEARCH AT ROMA TRE UNIVERSITY’, presented at the
International Technology, Education and Development Conference, Valencia, Spain, 2017, pp.
305313.
[24] M. Burgalassi, V. Biasi, R. Capobianco, and G. Moretti, ‘Il fenomeno dell’abbandono universitario
precoce. Uno studio di caso sui corsi di laurea del Dipartimento di Scienze della Formazione
dell’Università «Roma Tre»’, Giornale Italiano di Ricerca Didattica/Italian Journal of Educational
Research, vol. 17, pp. 131152, 2016.
[25] V. Carbone and G. Piras, ‘Palomar Project: Predicting School Renouncing Dropouts, Using the
Artificial Neural Networks as a Support for Educational Policy Decisions’, Substance Use &
Misuse, vol. 33, no. 3, pp. 717750, Jan. 1998.
[26] M. Bala and D. D. B. Ojha, “STUDY OF APPLICATIONS OF DATA MINING TECHNIQUES IN
EDUCATION,” Vol. No., no. 1, p. 10, 2012.
[27] K. R. Koedinger, S. D'Mello, E. A. McLaughlin, Z. A. Pardos, and C. P. Rose. Data mining and
education. Wiley Interdisciplinary Reviews: Cognitive Science, 6(4):333{353, 2015.
[28] M. N. Mustafa, L. Chowdhury, and M. S. Kamal, “Students dropout prediction for intelligent
system from tertiary level in developing country,” presented at the 2012 International
Conference on Informatics, Electronics and Vision, ICIEV 2012, pp. 113118, 2012.
[29] R. T. Pereira, A. C. Romero, and J. J. Toledo, “Extraction student dropout patterns with data
mining techniques in undergraduate programs,” presented at the IC3K 2013; KDIR 2013 - 5th
International Conference on Knowledge Discovery and Information Retrieval and KMIS 2013 -
5th International Conference on Knowledge Management and Information Sharing, Proc., pp.
136142, 2013.
[30] S. Sivakumar, S. Venkataraman, and R. Selvaraj, ‘Predictive modeling of student dropout
indicators in educational data mining using improved decision tree’, Indian Journal of Science
and Technology, vol. 9, no. 4, pp. 15, 2016.
[31] P. E. Ramírez and E. E. Grandón, ‘Prediction of student dropout in a Chilean public university
through classification based on decision trees with optimized parameters’, Formacion
Universitaria, vol. 11, no. 3, pp. 310, 2018.
[32] Dursun Delen, ‘Predicting Student Attrition with Data Mining Methods’, Journal of College
Student Retention: Research, Theory & Practice, vol. 13, no. 1, pp. 1735, Jan, 2011.
[33] A. Siri. Predicting Students’ Dropout at University Using Artificial Neural Networks. Italian Journal
of Sociology of Education, 7(2), 225-247, 2015.
[34] V. R. C. Martinho, C. Nunes, and C. R. Minussi, ‘An intelligent system for prediction of school
dropout risk group in higher education classroom based on artificial neural networks’, presented
at the Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, pp.
159166, 2013.
[35] L. M. B. Manhães, S. M. S. Da Cruz, and G. Zimbrão, ‘The impact of high dropout rates in a large
public brazilian university a quantitative approach using educational data mining’, presented at
the CSEDU 2014 - Proceedings of the 6th International Conference on Computer Supported
Education, vol. 3, pp. 124129, 2014.
[36] M. Nagy and R. Molontay, ‘Predicting Dropout in Higher Education Based on Secondary School
Performance’, presented at the INES 2018 - IEEE 22nd International Conference on Intelligent
Engineering Systems, Proceedings, pp. 000389000394, 2018.
[37] S. Rovira, E. Puertas, and L. Igual, ‘Data-driven system to predict academic grades and dropout’,
PLoS ONE, vol. 12, no. 2, 2017.
[38] M. Solis, T. Moreira, R. Gonzalez, T. Fernandez, and M. Hernandez, ‘Perspectives to Predict
Dropout in University Students with Machine Learning’, presented at the 2018 IEEE
International Work Conference on Bioinspired Intelligence, IWOBI 2018 - Proceedings, 2018.
[39] Y. LeCun, Y. Bengio, G. E. Hinton, “Deep learning”, Nature 521 (7553) 436444, 2015.
[40] N. Qian, “On the momentum term in gradient descent learning algorithms”, Neural Networks,
Volume 12, Issue 1, pp. 145-151, 1999.
[41] A. Krizhevsky, I. Sutskever, G. E. Hinton, “Imagenet classification with deep convolutional neural
networks”, Advances in Neural Information Processing Systems 25, pp. 1097–1105, 2012.
[42] Vincent Dumoulin, Francesco Visin, “A guide to convolution arithmetic for deep learning” CoRR,
abs/ 1603.07285, 2016.
[43] K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition”, IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 770778, 2016.
[44] C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, “Inception-v4, inception-resnet and the impact of
residual connections on learning”, Proceedings of the Thirty-First AAAI Conference on Artificial
Intelligence, pp. 42784284, 2017.
[45] Simonyan, K. & Zisserman, A. “Very Deep Convolutional Networks for Large-Scale Image
Recognition”. CoRR, abs/1409.1556, 2014.
[46] M. Abadi, et al., “Tensorflow: A system for large-scale machine learning”. CoRR, abs/1605.08695,
2016.
[47] F. Chollet, et al., “Keras”, https://github.com/keras-team/keras, 2015.
[48] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, CoRR abs/1709.01507, 2018.
[49] J. Salamon, J. P. Bello, “Deep convolutional neural networks and data augmentation for
environmental sound classification”, IEEE Signal Processing Letters 24 (3), pp. 279283, 2017.
[50] L. Perez, J. Wang, The effectiveness of data augmentation in image classification using deep
learning, CoRR abs/1712.04621, 2017.
[51] P. Y. Simard, D. Steinkraus and J. C. Platt, "Best practices for convolutional neural networks
applied to visual document analysis," Seventh International Conference on Document Analysis
and Recognition, pp. 958-963, 2003.
... This structure allows DLMs to learn more powerful representations of data. DLMs are typically trained using a variant of the backpropagation algorithms, which is used to adjust the parameters of the model to minimize the error between the model's forecast and the actual training data (Goodfellow et al., 2016;Xu et al., 2017;Anubala et al., 2018;Mezzini et al., 2019;Schons et al., 2018). The following deep-learning models were used for forecasting the flow rate. ...
... The input layer precedes the convolution layer which precedes the MLFF layer. Convolutions are performed (sliding the kernel over the input causally or non-causally) in the convolution layer with other operations on the input like feature extraction and MLFF (dense-like layer) as the decision block (Goodfellow, 2016;Anubala et al., 2018;Mezzini et al., 2019). In this work, 1D CNN is used for the flow rate time series forecasting using TensorFlow (a free open-sourced end-to-end software, a library for numerous machine learning functions) with Keras (a high-level neural network library that runs on top of TensorFlow). ...
Article
Full-text available
Study region: Song bengue confluent in Cameroon regulates the river flow rate for hydro energy production with input from four upstream reservoirs. Study focus: Deep learning models forecast a day flow rate of the Song bengue confluent. Decomposed time series multivariate variables of flow rate, precipitation, and upstream reservoir inflows, outflows, and precipitation are used. Different windows and horizons for the forecast are analyzed using deep learning models. A comparative study among the models is carried out. Input parameters are decomposed and different partitions are used as scenarios for the best partition. New hydrological insight: A 7-day window and 1-day forecast yield the lowest error. The dense model is the best among the models followed by the Long-short term memory (LSTM) model, and lastly, the one-dimensional convolutional neural network (Conv1D) based on mean absolute error (MAE), mean square error (MSE), root mean squared error (RMSE), and Nash Sutcliff Efficiency (NSE). Using the scenario with all decomposed variables produces the best result with about a 50% difference in error margin. The second-best result is obtained by using only undecomposed data. The remainder component should not be ignored as it contains important hydrological information.
... Our literature review included the evaluation of various algorithms, such as Logistic Regression (LR) [25,26], Gaussian Naive Bayes (GNB) [25], Support Vector Machine (SVM) [26], Decision Trees (DTs) [25,26], K-Nearest Neighbors (KNNs) [27], Random Forest (RF) [28], Bayesian Networks (BNs) [28], Artificial Neural Networks (ANNs) [29,30], and Convolutional Neural Networks (CNNs) [31][32][33]. Vásquez Verdugo and Miranda [26] investigated a dataset of students in a business program at a Chilean university and obtained that SVM had the best predictive capacity in most cases; it was only inferior to LR when evaluating fifth-and sixth-semester students. ...
... They concluded that the BN was the best model when comparing the precision, accuracy, and specificity metrics. In contrast, Mezzini et al. [31] analyzed 6000 students from the Education Department of Rome Tre University, implemented multiple CNN variants, and obtained an accuracy value of 67.1% for first-year and 90.9% for second-year students. Furthermore, they mentioned that with more data, it is possible to develop more accurate predictions. ...
Article
Full-text available
The prediction of university dropout is a complex problem, given the number and diversity of variables involved. Therefore, different strategies are applied to understand this educational phenomenon, although the most outstanding derive from the joint application of statistical approaches and computational techniques based on machine learning. Student Dropout Prediction (SDP) is a challenging problem that can be addressed following various strategies. On the one hand, machine learning approaches formulate it as a classification task whose objective is to compute the probability of belonging to a class based on a specific feature vector that will help us to predict who will drop out. Alternatively, survival analysis techniques are applied in a time-varying context to predict when abandonment will occur. This work considered analytical mechanisms for supporting the decision-making process on higher education dropout. We evaluated different computational methods from both approaches for predicting who and when the dropout occurs and sought those with the most-consistent results. Moreover, our research employed a longitudinal dataset including demographic, socioeconomic, and academic information from six academic departments of a Latin American university over thirteen years. Finally, this study carried out an in-depth analysis, discusses how such variables influence estimating the level of risk of dropping out, and questions whether it occurs at the same magnitude or not according to the academic department, gender, socioeconomic group, and other variables.
Article
Full-text available
The main aim of the research is to predict, as early as possible, which student will drop out in the Higher Education (HE) context. Artificial Intelligence (AI) is used for replacing repetitive human activities, e.g. in the field of for autonomous driving or for the task of classification pictures. In these areas IA competes with the man with fairly satisfactory results and, in the case of college dropout, it is extremely unlikely that an experienced teacher can “predict” the student’s academic success based on only on data provided by administrative offices. In this study used administrative data of about 6,000 students enrolled in the Department of Education of the University of Roma Tre to train convolutive neural nets (RNC). The trained network provides a probabilistic indicating, for each student, the probability of abandonment. Then, the trained network provides a predictive model that predicts whether the student will dropout. The accuracy of the obtained deep learning models ranged from 67.1% for the first-year students up to 94.3% for the third-year students.
Article
Full-text available
The dropout rates in the European countries is one of the major issues to be faced in a near future as stated in the Europe 2020 strategy. In 2017, an average of 10.6% of young people (aged 18-24) in the EU-28 were early leavers from education and training according to Eurostat’s statistics. The main aim of this review is to identify studies which uses educational data mining techniques to predict university dropout in traditional courses. In Scopus and Web of Science (WoS) catalogues, we identified 241 studies related to this topic from which we selected 73, focusing on what data mining techniques are used for predicting university dropout. We identified 6 data mining classification techniques, 53 data mining algorithms and 14 data mining tools.
Article
The dropout rates in the European countries is one of the major issues to be faced in a near future as stated in the Europe 2020 strategy. In 2017, an average of 10.6% of young people (aged 18-24) in the EU-28 were early leavers from education and training according to Eurostat’s statistics. The main aim of this review is to identify studies which uses educational data mining techniques to predict university dropout in traditional courses. In Scopus and Web of Science (WoS) catalogues, we identified 241 studies related to this topic from which we selected 73, focusing on what data mining techniques are used for predicting university dropout. We identified 6 data mining classification techniques, 53 data mining algorithms and 14 data mining tools.
Article
Full-text available
Nowadays, the role of a tutor is more important than ever to prevent students dropout and improve their academic performance. This work proposes a data-driven system to extract relevant information hidden in the student academic data and, thus, help tutors to offer their pupils a more proactive personal guidance. In particular, our system, based on machine learning techniques, makes predictions of dropout intention and courses grades of students, as well as personalized course recommendations. Moreover, we present different visualizations which help in the interpretation of the results. In the experimental validation, we show that the system obtains promising results with data from the degree studies in Law, Computer Science and Mathematics of the Universitat de Barcelona.
Article
In this paper, we explore and compare multiple solutions to the problem of data augmentation in image classification. Previous work has demonstrated the effectiveness of data augmentation through simple techniques, such as cropping, rotating, and flipping input images. We artificially constrain our access to data to a small subset of the ImageNet dataset, and compare each data augmentation technique in turn. One of the more successful data augmentations strategies is the traditional transformations mentioned above. We also experiment with GANs to generate images of different styles. Finally, we propose a method to allow a neural net to learn augmentations that best improve the classifier, which we call neural augmentation. We discuss the successes and shortcomings of this method on various datasets.
Technical Report
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of ${\sim }$ 25 percent. Models and code are available at https://github.com/hujie-frank/SENet .