Content uploaded by Gökhan Akçapınar
Author content
All content in this area was uploaded by Gökhan Akçapınar on Mar 07, 2020
Content may be subject to copyright.
Exploring Student Approaches to Learning through Sequence
Analysis of Reading Logs
Gökhan Akçapınar
Hacettepe University
Ankara, Turkey
gokhana@hacettepe.edu.tr
Brendan Flanagan
Kyoto University
Kyoto, Japan
flanagan.brendanjohn.4n@kyoto-
ac.jp
Mei-Rong Alice Chen
Kyoto University
Kyoto, Japan
chen.meirong.6s@kyoto-u.ac.jp
Hiroaki Ogata
Kyoto University
Kyoto, Japan
ogata.hiroaki.3e@kyoto-u.ac.jp
Rwitajit Majumdar
Kyoto University
Kyoto, Japan
majumdar.rwitajit.4a@kyoto-
u.ac.jp
ABSTRACT
In this paper, we aim to explore students’ study approaches (e.g.,
deep, strategic, surface) from the logs collected by an electronic
textbook (eBook) system. Data was collected from 89 students
related to their reading activities both in and out of the class in a
Freshman English course. Students are given a task to study reading
materials through the eBook system, highlight the text that is
related to the main or supporting ideas, and answer the questions
prepared for measuring their level of comprehension. Students in
and out of class reading times and their usage of the marker feature
were used as a proxy to understand their study approaches. We used
theory-driven and data-driven approaches together to model the
study approaches of students. Our results showed that three groups
of students who have different study approaches could be
identified. Relationships between students’ reading behaviors and
their academic performance is also investigated by using
association rule mining analysis. Obtained results are discussed in
terms of monitoring, feedback, predicting learning outcomes, and
identifying problems with the content design.
CCS CONCEPTS
• Information systems~Data mining • Computing methodologies~
Machine learning • Applied computing~Interactive learning
environments • Applied computing~E-learning
KEYWORDS
Study approaches, sequence analysis, reading logs, clustering,
association rule mining, learning analytics
ACM Reference format:
Akçapınar, G., Chen, M. R. A., Majumdar, R., Flanagan, B. and Ogata, H.
2020. Exploring Student Approaches to Learning through Sequence
Analysis of Reading Logs. In Proceedings of the 10th International
Conference on Learning Analytics & Knowledge (LAK’20). ACM, New
York, NY, USA, 6 pages. https://doi.org/10.1145/3375462.3375492
1 Introduction
Students are using different study approaches to achieve a specific
learning task [1-3]. Understanding these approaches is important
for designing further interventions for particularly low-performing
students [4]. It is also a challenging task for researchers for several
reasons. First of all, study approaches are dynamic phenomenon
and may vary depending on many variables (e.g., subject, task
difficulty, etc.). Therefore, in many cases, it might not be
convenient to capture students’ study approaches by using self-
report methods. On the other hand, previous studies showed that
students’ learning traces (observable behaviors) in online learning
environments can be used as a proxy to understand latent constructs
such as students’ cognitive and metacognitive strategies [4],
learning strategies [5], and study patterns [6]. Although written
materials are the core of education, there is still limited research
that analyzes reading logs to understand students’ learning
processes.
Thanks to digital textbook systems, now it is possible to collect
detailed data regarding the students’ reading processes which is not
possible with traditional textbooks. A previous study that analyzed
students’ digital textbook interaction data indicates that the course
outcome is directly related to reading of a textbook [7, 8]. Junco
and Clem [8] found that students who were in the top 10th percentile
in the number of highlights had significantly higher course grades
than those in the lower 90th percentile. They also found that
students, those who spent a longer time reading textbooks earned
higher grades in the course over those who spent less time. Huang,
et al. [9] proposed a Knowledge Tracing model that measures
students’ level of knowledge on the underlying concept by looking
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for third-party components of this work must
be honored. For all other uses, contact the owner/author(s).
LAK '20, March 23–27, 2020, Frankfurt, Germany
© 2020 Copyright is held by the owner/author(s). Publication rights licensed to
ACM.
ACM ISBN 978-1-4503-7712-6/20/03$15.00
https://doi.org/10.1145/3375462.3375492
LAK’20, March 23–27, 2020, Frankfurt, Germany
G. Akçapınar et al.
at the amount of time s/he has spent on the related pages (e.g.
read/skimmed).
In this study, we aim to explore the study approaches of students
while they are studying the content in or out of the class. We use a
theory-driven and data-driven approaches together to this end.
Based on the existing Student Approaches to Learning (SAL)
theory literature, we have formed our research questions and
decided features that can be used as a proxy to understand students’
study approaches. On the other hand, the data-driven approach
helped us to identify students who are using similar learning
approaches. We also analyzed relationships between study
approaches and learning outcomes.
2 Background
2.1 Student Approaches to Learning Theory
The origins of the Student Approaches to Learning (SAL) theory
date back to the 1970s. In one of the early attempts, Marton and
Säljö [2] asked the students to read the reading passages within the
given time limit. Then students were asked to answer a series of
questions measuring their level of understanding. Students were
also asked open-ended questions about how they approached
reading tasks. When the researchers compared the students’ level
of understanding and the approach they used, they found that the
students are using either surface or deep approaches while
performing the reading task.
Later studies confirmed these findings and also found that there
was a third approach in addition to deep and surface. This approach
was named achieving by Biggs [1] and strategic by Entwistle and
Ramsden [10]. Characteristics of each approach are given below [5,
11].
Deep approach: A deep approach to learning is characterized
by students' desires to understand, learn with meaning, and
recognize underlying principles and connections among related
principles.
Surface approach: A surface approach to learning often
involves students' memorizing information and doing only what is
necessary to succeed on an upcoming assessment. Students with a
surface approach prefer teaching that directs learning towards
assessment requirements even if this leads to a lack of both
understanding and purpose.
Strategic approach: A strategic approach to learning is
accompanied by students' close attention to details such as expected
test format, the structure of the content as laid out in the text, and
close adherence to an instructor's guidelines for studying. Students
who show a strategic approach can discern and use the aspects of a
learning environment that will support their way of studying.
Previous studies also investigated the relationships between
study approaches and learning outcomes. The surface approach
linked to poor learning outcomes, while the deep approach linked
to better learning outcomes.
Questionnaires are mainly used to measure students' approaches
to learning. However, students may use different strategies at
different times. Moreover, students may not want to self-report
their approaches to learning accurately, especially if they are
surface learners [12]. Therefore, in this study, we aimed at
identifying students’ study approaches from the reading logs
collected by the eBook reader.
2.2 Measuring Latent Variables from Learning
Traces
Analyzing latent variables from the students’ learning traces has
recently gained attention in learning analytics and educational data
mining communities. Cicchinelli, et al. [4], tried to identify
students’ self-regulation strategies (e.g., metacognitive and
cognitive strategies) from their interactions with the learning
management system in a blended course setting. They also
compared the results with the self-report data. They found that
observable features (e.g. content access, question-solving, etc.)
could better explain self-regulated learning behavior and its effects
on academic performance than self-report data. In another study,
researchers investigated temporal characteristics of learning
strategies and their association with feedback from the three years
of logs were collected from online pre-class activities of a flipped
classroom [13]. After analyzed data by using clustering, sequence
mining, and process mining approaches, researchers found a
positive association between personalized feedback and effective
strategies. Boroujeni and Dillenbourg [6] analyzed video viewing
and assignment submission behaviors of 7527 students in a MOOC
environment to find out temporal study patterns of the students
during assessment periods.
While most of the studies were conducted with data from
MOOC and learning management systems and often with video-
based learning materials, we focus on the reading-based learning
scenario for our study.
3 Research Questions
In this paper, we hypothesize that features extracted from digital
textbook reader logs can be used to identify students’ approaches
to learning (e.g., surface, deep and strategic). Specifically, the
following research questions were addressed:
RQ 1: Is it possible to identify surface, deep and strategic
learners from the reading logs?
RQ 2: What is the relationship between study approaches and
learning outcomes?
RQ 3: What are the characteristic association rules between
surface, deep, strategic learners’ reading behaviors and their
academic performance?
4 Method
4.1 Instructional Context & Data Collection
We analyzed more than 25,000 rows of click-stream data that are
collected (see Table 1 for details) from 89 students registered in a
Freshman English course at a university. The course was offered to
first-year undergraduate university students. Students used the
Exploring Student Approaches to Learning through Sequence
Analysis of Reading Logs
LAK’20, March 23–27, 2020, Frankfurt, Germany
WOODSTOCK’18, June, 2018, El Paso, Texas USA
eBook system to access course materials that were uploaded by the
instructor. Data collection took place in two weeks. In the first
class, students introduced the reading material and were instructed
by the teacher regarding how to use functions in the eBook system.
Students were asked to read content through the eBook system,
highlight main ideas and answer the questions that were developed
to assess their level of comprehension. In the second week, the rest
of the content is completed. All the interactions (e.g. next, previous,
jump, highlight, adding a memo, bookmark, etc.) with the eBook
system were recorded in a database.
Table 1: Number of logs in each event
Event Type
Number of Logs
Open
622
Next
7826
Previous
2992
Jump
648
Marker
4539
Memo
2888
Bookmark
102
Quiz attempt
2603
Other
3043
Total
25657
In this study, data was collected from an eBook system which is
currently being used in different universities in Asia. More than
10,000 university-students are using this eBook system as their
main source of learning inside and outside of the classrooms. The
eBook system is an integrated component of a learning analytics
framework. This framework makes it possible to collect all kinds
of interaction data related to students’ eBook reading while
ensuring their privacy. eBook tool has a feature similar to red or
yellow markers to highlight some parts of the text. Students’ can
add memos to remember important points or bookmark pages to
access them quickly while they are reviewing the content.
4.2 Reading Pattern Extraction
At the beginning of data analysis, features from the click-stream
data were extracted. Extracted features were used as a proxy to
understand students’ study approaches. A brief description of
features is given below.
● Time IN: Total time spent on content during the class.
● Time OUT: Total time spent on content during out of the class.
● Marker: Number of yellow and red markers added by the
student.
The content consists of 25 pages. Students’ level of
comprehension was assessed based on 11 questions located inside
the eBook system. With the help of an automated script, students’
in and out-class reading times and marker counts for each page
were extracted. After extracting features, all the numerical data
discretized into three levels. If a student does not have any activity
on a specific page, it is labeled as no activity (na). Then the rest of
the data split into low and high by using the median as a cut-off.
This process is repeated for each feature (e.g., Time IN, Time Out,
and Marker) and each page of the content. At the end of the feature
extraction, 75 columns long data obtained for each student (89 x 75
matrix).
4.3 Data Analysis
After transforming students’ click-stream data into the page level
categorical data, Agglomerative Hierarchical Clustering based on
Ward’s algorithm [14] was used to group students with similar
reading patterns. Optimal matching distance (OM distance) was
used as a similarity calculation method. The optimal number of
clusters decided based on the SAL theory. Dendrogram of the
hierarchical cluster analysis also checked to validate the theoretical
decision. A similar approach previously applied successfully for
detecting students’ learning strategies [4, 6, 13].
For labeling obtained clusters, two graphs were checked. First,
we compared the visualization of page-level data for each cluster.
Then, we analyzed the distribution of aggregated raw data in each
cluster along with quiz results. To extract representative learning
patterns of each cluster, association rule mining analysis was
employed. Data analysis was conducted by the R data mining tool
[15] with the following packages. Sequence analysis conducted by
TraMiner [16], and Association Rules were extracted by using
arules [17] package.
5 Results
5.1 Cluster Analysis
Based on the SAL theory we aim at identifying three clusters in
data related to surface, strategic and deep learning approaches.
Therefore, after confirming with the dendrogram of the hierarchical
cluster analysis (see Fig. 1) we clustered data into 3 groups.
Figure 1: Dendrogram of the hierarchical cluster analysis
Fig. 2 shows the distribution of students’ reading behaviors in each
cluster. Here, each row represents the data of a single student. The
x-axis shows the students’ reading behaviors related to different
LAK’20, March 23–27, 2020, Frankfurt, Germany
G. Akçapınar et al.
features (e.g., Time IN, Time Out, and Marker). Time IN part
shows students’ reading times during the class for each page of the
content. The middle part shows students’ out of class reading times.
The last block shows students' marker activities.
It can be seen from Fig. 2 that students in Cluster 1 (n=38) are
mainly not active in terms of out of the class activity and marker
usage. Regarding the time spent in class, most of them have low
activity in the first 5-10 pages of the content, however, they do not
have any activity on the other pages. Students in Cluster 2 (n = 26)
are highly active in-class and in terms of marker usage. Although
some of them have low activity, most of them have no activity out
of the class. Students in Cluster 3 (n = 25) have similar patterns
with the students in Cluster 2, however, almost all of the students
in this cluster also have high activity across the content during out
of the class.
Figure 2: Visualization of students’ reading behaviors in each
cluster
Before labeling the clusters, we also checked the distribution of
quiz scores in each cluster along with the total time spent in class,
out class, and the total number of markers added. Distribution
observed in Fig 3. is in accordance with the page level data. In terms
of quiz scores, students in Cluster 1 have the lowest scores and
students in Cluster 3 have the highest scores. However, students in
Cluster 2 have both low and high scores. Finally, we labeled Cluster
1 as a surface approach, Cluster 2 as a strategic approach, and
Cluster 3 as a deep approach.
Figure 3: Box plots of aggregated data related to reading
behaviors and quiz scores by cluster
5.2 Association Rules
To see the representative patterns of each cluster and its relation
with the academic performance we conducted association rule
mining analysis. To make obtained rules simple and easy to
understand we used aggregated data instead of page-level data. We
calculated total values for Time IN, Time OUT and Marker
features. Discretized Quiz scores (quiz_low, quiz_high) were also
included data to see the relationship between students’ reading
behaviors and their academic performances. Rules are generated for
each cluster with minimum support of 0.1 (%10) and minimum
confidence of 0.8 (%80). Rules which are not related to academic
performance were filtered. In this case, 12 rules generated for
Cluster 1, 10 rules generated for Cluster 2 and 12 rules generated
for Cluster 3. The frequency of items in each cluster can be seen in
Fig 4. The top 5 rules for each cluster selected based on the support
values are discussed below.
Cluster 1 - Surface Approach: The most frequent items in Cluster
1 are quiz_low, timein_low, and timeout_na (see Fig. 4). The first
rule in Table 2 means that if the student has no activity during out-
class time then s/he will get a low quiz score with a support value
Exploring Student Approaches to Learning through Sequence
Analysis of Reading Logs
LAK’20, March 23–27, 2020, Frankfurt, Germany
WOODSTOCK’18, June, 2018, El Paso, Texas USA
of 76% and a confidence value of 94%. support value shows that
this rule covers 76% of the students in Cluster 1 and confidence
value means that the probability of getting a low quiz score after
low out-class time is 0.94. Similar patterns can be observed for the
other rules given in Table 2. Most of the students in Cluster 1 have
no activity or low activity in terms of reading times and marker
usage. Most of them also have low quiz scores.
Figure 4: Frequent items in each cluster
Table 2: Top 5 Rules with the highest Support for Cluster 1
No
Pattern
SUP
CON
1
[timeout_na] => [quiz_low]
76%
94%
2
[timein_low] => [quiz_low]
76%
91%
3
[timein_low, timeout_na] =>
[quiz_low]
74%
97%
4
[marker_low, timein_low] =>
[quiz_low]
50%
90%
5
[marker_low] => [quiz_low]
50%
86%
Cluster 2 - Strategic Approach: The most frequent items in Cluster
2 are timein_high, quiz_low, and timeout_na (see Fig. 4). Different
than other clusters, this cluster has a similar number of low and high
performers. From the rules given in Table 3, it can be noted that
marker usage is key to separate low and high performers in this
cluster. If a student spends more time in class but his/her marker
usage is low then s/he will get a low quiz score (Rule 1). On the
other hand, high marker usage can be related to high quiz scores
(e.g., Rule 4, Rule 5). Therefore, coverage (support) of the rules in
this cluster is lower than others. High confidence values also
indicate that different rules can be used to identify low and high
performers in this cluster.
Table 3: Top 5 Rules with the highest Support for Cluster 2
No
Pattern
SUP
CON
1
[marker_low, timein_high] =>
[quiz_low]
27%
78%
2
[timeout_low] => [quiz_low]
27%
70%
3
[timein_high, timeout_low] =>
[quiz_low]
23%
75%
4
[marker_high, timein_high,
timeout_na] => [quiz_high]
19%
83%
5
[marker_high, timeout_na] =>
[quiz_high]
19%
71%
Cluster 3 - Deep Approach: The most frequent items in Cluster 3
are marker_high, quiz_high, and timein_high (see Fig. 4). The first
rule in Table 4 means that if the student has high marker usage, then
s/he will get a high quiz score with a support value of 64% and a
confidence value of 76%. All the rules with higher support value
related to high quiz performance. High out-class time also related
to high quiz scores for this group of students (e.g. Rule 4).
Table 4: Top 5 Rules with the highest Support for Cluster 3
No
Pattern
SUP
CON
1
[marker_high] => [quiz_high]
64%
76%
2
[marker_high, timein_high] =>
[quiz_high]
56%
82%
3
[timein_high] => [quiz_high]
56%
78%
4
[timeout_high] => [quiz_high]
52%
72%
5
[marker_high, timeout_high] =>
[quiz_high]
48%
75%
6 Conclusions
In this study, we tried to determine students’ approaches to learning
from their reading behaviors exhibited while performing a given
reading task. For this purpose, a theoretical basis of students'
learning approaches from the SAL literature was considered.
Features from the reading log data were extracted that can then be
used as a proxy to understand the approaches. These features are
LAK’20, March 23–27, 2020, Frankfurt, Germany
G. Akçapınar et al.
students' in and out of class reading times and the number of
markers they used. Obtained results showed that the students could
be divided into three clusters identified as surface, strategic and
deep study approaches. Further, the relationship between reading
behaviors and quiz performances of each cluster was examined by
association rule mining analysis.
The results highlighted that the majority of the students who had
followed surface approach did not use markers, their content
completion rates were also low, and they did not use the tool outside
the class. Also, their quiz performance was low. Students using a
deep approach showed high activity both within the class and out
of the class. They used markers actively while reading the content
and their quiz performances were also high. Students using a
strategic approach actively used the tool in the class while they did
not use it outside the class. In terms of quiz performance in that
cluster, there were both low and high performing students. These
findings are in accordance with the ones in the SAL studies [12, 18,
19]. While surface learners tend to complete the task with minimum
effort, deep learners tend to spend time in the content outside the
class and learn the information deeply by using marker function.
The fact that strategic learners actively used the tool in class and
did not use it outside class, can be interpreted as they want to
succeed with minimum effort.
This study has some limitations. First, the sample size is
relatively small, which limits the generalizability of the obtained
results. Second, although the results of the clustering analysis and
association rule mining analysis support our initial hypothesis, we
cannot make a strong claim that these clusters are definitely
representing the three learning approaches. Further validation with
additional data is required to make a stronger claim.
Obtained association rules can be used to predict students'
learning approaches and accordingly to predict their academic
performances. Data showed students who use surface strategy are
mostly active only in the first few pages of the content and even
there is no activity on the following pages. Interventions to ensure
the continuity of these students' reading can be in redesigning the
content. For instance, quiz questions at the beginning of the content
might have led to more marker activity in that part by strategic and
deep learners. Exploring the effect of such reflective questions
across the content on the students' marker behaviors can be
examined in further studies.
ACKNOWLEDGMENTS
This work was partly supported by JSPS Grant-in-Aid for Scientific
Research (S) 16H06304, NEDO Special Innovation Program on AI
and Big Data 18102059-0, Hacettepe University Scientific
Research Projects Coordination Center Grant Number SBI-2017-
16268 and JSPS KAKENHI Research Activity Start-up Grant
Number 18H05746.
REFERENCES
[1] J. B. Biggs, "The Role of Metalearning in Study Processes," British
Journal of Educational Psychology, vol. 55, no. 3, pp. 185-212,
1985.
[2] F. Marton and R. Säljö, "On Qualitative Differences in Learning: I—
Outcome and Process," British Journal of Educational Psychology,
vol. 46, no. 1, pp. 4-11, 1976.
[3] F. Marton and R. Säljö, "On Qualitative Differences in Learning:
II—Outcome as a Function of the Learner's Conception of the Task,"
British Journal of Educational Psychology, vol. 46, no. 2, pp. 115-
127, 1976.
[4] A. Cicchinelli et al., "Finding traces of self-regulated learning in
activity streams," presented at the Proceedings of the 8th
International Conference on Learning Analytics and Knowledge,
Sydney, New South Wales, Australia, 2018.
[5] J. Jovanović, D. Gašević, S. Dawson, A. Pardo, and N. Mirriahi,
"Learning analytics to unveil learning strategies in a flipped
classroom," The Internet and Higher Education, vol. 33, pp. 74-85,
2017/04/01/ 2017.
[6] M. S. Boroujeni and P. Dillenbourg, "Discovery and temporal
analysis of latent study patterns in MOOC interaction sequences,"
presented at the Proceedings of the 8th International Conference on
Learning Analytics and Knowledge, Sydney, New South Wales,
Australia, 2018.
[7] G. Akçapınar, M. N. Hasnine, R. Majumdar, B. Flanagan, and H.
Ogata, "Developing an Early-Warning System for Spotting At-Risk
Students by using eBook Interaction Logs," Smart Learning
Environments, vol. 6, no. 4, pp. 1-15, 2019.
[8] R. Junco and C. Clem, "Predicting course outcomes with digital
textbook usage data," The Internet and Higher Education, vol. 27,
pp. 54-63, 2015/10/01/ 2015.
[9] Y. Huang, M. Yudelson, S. Han, D. He, and P. Brusilovsky, "A
Framework for Dynamic Knowledge Modeling in Textbook-Based
Learning," presented at the Proceedings of the 2016 Conference on
User Modeling Adaptation and Personalization, Halifax, Nova
Scotia, Canada, 2016.
[10] N. Entwistle and P. Ramsden, Understanding Student Learning.
New York: Nichols Publishing Company, 1982.
[11] D. Tomanek and L. Montplaisir, "Students' studying and approaches
to learning in introductory biology," (in eng), Cell biology education,
vol. 3, no. 4, pp. 253-262, Winter 2004.
[12] G. Akçapınar, "Predicting Students' Approaches to Learning Based
on Moodle Logs," 8th International Conference on Education and
New Learning Technologies, pp. 2347-2352, 2016.
[13] W. Matcha, D. Gašević, N. A. A. Uzir, J. Jovanović, and A. Pardo,
"Analytics of Learning Strategies: Associations with Academic
Performance and Feedback," presented at the Proceedings of the 9th
International Conference on Learning Analytics & Knowledge,
Tempe, AZ, USA, 2019.
[14] A. Gabadinho, G. Ritschard, M. Studer, and N. S. Müller, "Mining
sequence data in R with the TraMineR package: A user’s guide,"
2009.
[15] R Core Team, "R: A language and environment for statistical
computing," ed: R Foundation for Statistical Computing, 2017.
[16] A. Gabadinho, G. Ritschard, N. S. Müller, and M. Studer,
"Analyzing and visualizing state sequences in R with TraMineR,"
Journal of Statistical Software, vol. 40, no. 4, pp. 1-37, 2011.
[17] M. Hahsler, B. Grün, and K. Hornik, "arules - A Computational
Environment for Mining Association Rules and Frequent Item Sets,"
Journal of Statistical Software, vol. 14, no. 15, pp. 1-25, 2005.
[18] R. A. Ellis, F. Han, and A. Pardo, "Improving Learning Analytics–
Combining Observational and Self-Report Data on Student
Learning," Journal of Educational Technology & Society, vol. 20,
no. 3, pp. 158-169, 2017.
[19] D. Gasevic, J. Jovanovic, A. Pardo, and S. Dawson, "Detecting
learning strategies with analytics: Links with self-reported measures
and academic performance," Journal of Learning Analytics, vol. 4,
no. 2, pp. 113–128, 2017.