Conference PaperPDF Available

Linguistic Reflections of Student Engagement in Massive Open Online Courses

Authors:

Abstract and Figures

While data from Massive Open Online Courses (MOOCs) offers the potential to gain new insights into the ways in which online communities can contribute to student learning, much of the richness of the data trace is still yet to be mined. In particular, very little work has attempted fine-grained content analyses of the student interactions in MOOCs. Survey research indicates the importance of student goals and intentions in keeping them involved in a MOOC over time. Automated fine-grained content analyses offer the potential to detect and monitor evidence of student engagement and how it relates to other aspects of their behavior. Ultimately these indicators reflect their commitment to remaining in the course. As a methodological contribution, in this paper we investigate using computational linguistic models to measure learner motivation and cognitive engagement from the text of forum posts. We validate our techniques using survival models that evaluate the predictive validity of these variables in connection with attrition over time. We conduct this evaluation in three MOOCs focusing on very different types of learning materials. Prior work demonstrates that participation in the discussion forums at all is a strong indicator of student commitment. Our methodology allows us to differentiate better among these students, and to identify danger signs that a struggling student is in need of support within a population whose interaction with the course offers the opportunity for effective support to be administered. Theoretical and practical implications will be discussed.
Content may be subject to copyright.
Linguistic Reflections of Student Engagement in Massive Open Online Courses
Miaomiao Wen, Diyi Yang and Carolyn Penstein Ros´
e
Language Technology Institute
Carnegie Mellon University
Pittsburgh, PA 15213
{mwen,diyiy,cprose}@cs.cmu.edu
Abstract
While data from Massive Open Online Courses (MOOCs) of-
fers the potential to gain new insights into the ways in which
online communities can contribute to student learning, much
of the richness of the data trace is still yet to be mined. In
particular, very little work has attempted fine-grained con-
tent analyses of the student interactions in MOOCs. Survey
research indicates the importance of student goals and in-
tentions in keeping them involved in a MOOC over time.
Automated fine-grained content analyses offer the potential
to detect and monitor evidence of student engagement and
how it relates to other aspects of their behavior. Ultimately
these indicators reflect their commitment to remaining in the
course. As a methodological contribution, in this paper we
investigate using computational linguistic models to measure
learner motivation and cognitive engagement from the text of
forum posts. We validate our techniques using survival mod-
els that evaluate the predictive validity of these variables in
connection with attrition over time. We conduct this evalu-
ation in three MOOCs focusing on very different types of
learning materials. Prior work demonstrates that participation
in the discussion forums at all is a strong indicator of student
commitment. Our methodology allows us to differentiate bet-
ter among these students, and to identify danger signs that a
struggling student is in need of support within a population
whose interaction with the course offers the opportunity for
effective support to be administered. Theoretical and practical
implications will be discussed.
1 Introduction
The recent development of Massive Open Online Course
(MOOC) websites such as Coursera1, edX2and Udacity3,
demonstrates the potential of distance learning and lifelong
learning to reach the masses. However, one disappointment
has been that only one in every 20 students who enroll in
such courses actually finish(Koller et al. 2013). In order
to understand the attrition problem and work towards so-
lutions, especially given the varied backgrounds and moti-
vations of students who choose to enroll in a MOOC (De-
Boer et al. 2013), we need to highlight and understand the
Copyright c
2014, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
1https://www.coursera.org/
2https://www.edx.org/
3https://www.udacity.com/
value sought and obtained by the participants of MOOCs in-
cluding that reflected in their discussion posts, especially the
“non-completing” population (Koller et al. 2013).
In this paper, we propose to gauge a student’s engage-
ment using linguistic analysis applied to the student’s forum
posts within the MOOC course. Based on the learning sci-
ences literature, we quantify a student’s level of engagement
in a MOOC from two different angles: (1) displayed level
of motivation to continue with the course and (2) the level
of cognitive engagement with the learning material. Student
motivation to continue is important- without it, it is impos-
sible for a student to regulate him or her effort to move for-
ward productively in the course. Nevertheless, for learning it
is necessary for the student to process the course content in
a meaningful way. In other words, cognitive engagement is
critical. Ultimately it is this grappling with the course con-
tent over time that will be the vehicle through which the stu-
dent achieves the desired learning outcomes.
Conversation in the course forum is replete with terms that
imply learner motivation. These terms may include those
suggested by the literature on learner motivation or simply
from our everyday language. E.g. “I tried very hard to follow
the course schedule” and “I couldn’t even finish the second
lecture.” In this paper, we attempt to automatically measure
learner motivation based on such markers found in posts on
the course discussion forums. Our analysis offers new in-
sights into the relation between language use and underlying
learner motivation in a MOOC context.
Besides student motivational state, the level of cognitive
engagement is another important aspect of student partici-
pation(Carini et al. 2006). For example, “This week’s video
lecture is interesting, the boy in the middle seemed tired,
yawning and so on.” and “The video shows a classroom cul-
ture where the kids clearly understand the rules of conversa-
tion and acknowledge each others contribution.” These two
posts comment on the same video lecture, but the first post
is more descriptive at a surface level while the second one is
more interpretive, and displays more reflection. We measure
this difference in cognitive engagement with an estimated
level of language abstraction. We find that users whose posts
show a higher level of cognitive engagement are more likely
to continue participating in the forum discussion.
The distant nature and the sheer size of MOOCs require
new approaches for providing student feedback and guid-
ing instructor intervention (Ramesh et al. 2013). One big
challenge is that MOOCs are far from uniform. In this pa-
per, we test the generality of our measures in three Cours-
era MOOCs focusing on distinct subjects. We demonstrate
that our measures of engagement are consistently predica-
tive of student dropout from the course forum across the
three courses. With this validation, our hope is that in the
long run, our automatic engagement measures can help in-
structors target their attention to those who show serious
intention of finishing the course, but nevertheless struggle
through due to dips in learner motivation. Our linguistic
analysis provides further indicators that some students are
going through the motions in a course but may need support
in order to fully engage with the material. Again such mon-
itoring might aid instructors in using their limited human
resources to the best advantage.
In the remainder of the paper we begin by describing our
dataset and discussing related work. Next, we explain how
we automatically measure student engagement from a user’s
forum posts from the two perspectives we highlight above.
We then continue with a survival analysis that estimates the
influence of our two measures of engagement on MOOC
dropout rate. Finally, we conclude with a summary and pos-
sible future work.
2 Coursera dataset
In preparation for a partnership with an instructor team for
a Coursera MOOC that was launched in Fall of 2013, we
were given permission by Coursera to crawl and study a
small number of other courses. Our dataset consists of three
courses: one social science course, “Accountable Talk: Con-
versation that works4”, offered in October 2013, which has
1,146 active users (active users refer to those who post at
least one post in a course forum) and 5,107 forum posts; one
literature course, “Fantasy and Science Fiction: the human
mind, our modern world5”, offered in June 2013, which has
771 active users who have posted 6,520 posts in the course
forum; one programming course, “Learn to Program: The
Fundamentals6”, offered in August 2013, which has 3,590
active users and 24,963 forum posts. All three courses are of-
ficially seven weeks long. Each course has seven week spe-
cific subforums and a separate general subforum for more
general discussion about the course. Our analysis is limited
to behavior within the discussion forums.
3 Related Work
3.1 Learner Motivation
Most of the recent research on learner motivation in MOOCs
is based on surveys and relatively small samples of hand-
coded user-stated goals or reasons for dropout (e.g. Cheng
et al. 2013; Christensen et al. 2013; DeBoer et al. 2013;
Poellhuber et al. 2013). Poellhuber et al. (2013) find that
user goals specified in the pre-course survey were the
strongest predictors of later learning behaviors. Motivation
4https://www.coursera.org/course/accountabletalk
5https://www.coursera.org/course/fantasysf
6https://www.coursera.org/course/programming1
is identified as an important determinant of engagement in
MOOCs in the Milligan et al.(2013) study. However, differ-
ent courses design different enrollment motivational ques-
tionnaire items, which makes it difficult to generalize the
conclusions from course to course. Another drawback is that
learner motivation is volatile. In particular, distant learners
can lose interest very fast even if they had been progress-
ing well in the past (Keller & Suzuki, 2004). It is important
to monitor learner motivation and how it varies along the
course weeks. We propose to automatically measure learner
motivation based on linguistic cues in the forum posts.
3.2 Cognitive engagement
Research has consistently found that the cognitive processes
involved in higher-order thinking lead to better knowledge
acquisition (e.g. Chi & Bassock, 1989; Graham & Golan,
1991; Chi, 2000). Previous work has investigated students’
cognitive engagement in both face-to-face (Corno & Man-
dinach, 1983; Newman et al. 1996) and computer mediated
communication (CMC) environments (Garrison et al. 1999;
Zhu 2006). In this paper, we try to measure the cognitive
engagement of a MOOC user based on how much personal
interpretation are contained in the posts.
3.3 MOOC Analysis
The MOOC literature so far has focused on a summative
view of user participation and dropout - trying to assess the
rate at which different groups of users complete the course
(e.g. Kizilcec et al. 2013; Brinton et al. 2013; Ramesh et al.
2013). There are mainly three types of information available
about the participation patterns of MOOC users: the survey
information, the clickstream behavioral data, and the forum
posts. From the survey information, we can partly under-
stand the initial motivation at the time of enrollment of a
subset of users who filled in the survey. Unfortunately, even
for those users, this information does not help us understand
the dynamics of motivational change. From the clickstream
data of a user’s online activities, we can see if people are
working hard or not, but we cannot tell the reasons why their
level of activity changes from time to time. The discussion
forums provide the students with social learning opportu-
nities while at the same time providing a portal into their
minds. The activeness of a course’s online forum negatively
correlates with the volume of students that drop out of the
course (Brinton et al. 2013). In our work, we utilize linguis-
tic features in the forum posts that correlate with dropout to
get insights into the user experience that would be invisible
from the survey information or clickstream data.
Given the rich recent work on MOOC user dropout anal-
ysis, very little has attempted finer-grained content analysis
of the course discussion forums. One exception is Ramesh
et al. (2013), which uses sentiment and subjectivity of user
posts to predict engagement/disengagement. However, nei-
ther sentiment nor subjectivity ended up being predictive of
engagement in that work. One explanation is that engaged
learners also post content with negative sentiment on the
course, such as complaints about peer-grading. Thus, the
problem is more complex than the operationalization used
in that work.
In our work, we use survival models to understand how
attrition happens along the way as students participate in
a course. This approach has been applied to online medi-
cal support communities to quantify the impact of receipt of
emotional and informational support on user commitment to
the community (Wang et al. 2012). Yang et al. (2013) and
Rose et al. (2014) have also used survival models to mea-
sure the influence of social positioning factors on drop out
of a MOOC. Our research contributes to this body of work.
4 Methods
4.1 Predicting Learner Motivation
The level of a student’s motivation strongly influences the
intensity of the student’s participation in the course. Pre-
vious research has shown that it is possible to catego-
rize learner motivation based on a students’ description of
planned learning actions (Ng & Bereiter, 1991; Dowson
& McInerney, 2003). The identified motivation categoriza-
tion has a substantial relationship to both learning behavior
and learning outcomes. But the lab-based experimental tech-
niques used in this prior work are impractical for the ever-
growing size of MOOCs. It is difficult for instructors to per-
sonally identify students who lack motivation based on their
own personal inspection in MOOCs given the high student
to instructor ratio. To overcome these challenges, we build
machine learning models to automatically identify level of
learner motivation based on posts to the course forum. We
validate our measure in a domain general way by not only
testing on data from the same course, but also by training
on one course and then testing one other in order to uncover
course independent motivation cues. The linguistic features
that are predicative of learner motivation provide insights
into what motivates the learners.
4.1.1 Creating the Human-Coded Dataset: MTurk We
used Amazon’s Mechanical Turk (MTurk) to make it prac-
tical to construct a reliable annotated corpus for develop-
ing our automated measure of student motivation. Amazon’s
Mechanical Turk is an online marketplace for crowdsourc-
ing. It allows requesters to post jobs and workers to choose
jobs they would like to complete. Jobs are defined and paid
in units of so-called Human Intelligence Tasks (HITs). Snow
et al. (2008) has shown that the combined judgments of a
small number (about 5) of naive annotators on MTurk leads
to ratings of texts that are very similar to those of experts.
This applies for content such as the emotions expressed, the
relative timing of events referred to in the text, word simi-
larity, word sense disambiguation, and linguistic entailment
or implication. As we show below, MTurk workers’ judg-
ments of learner motivation are also similar to coders who
are familiar with the course content.
We randomly sampled 514 posts from the Accountable
Talk course forums and 534 posts from the Fantasy and Sci-
ence Fiction course forums. The non-English posts were
manually filtered out. In order to construct a hand-coded
dataset for training machine learning models later, we em-
ployed MTurk workers to rate each message with the level
of learner motivation towards the course the corresponding
post had. We provided them with explicit definitions to use
in making their judgment. For each post, the annotator had
to indicate how motivated she perceived the post author to be
towards the course by a 1-7 Likert scale ranging from “Ex-
tremely unmotivated” to “Extremely motivated”. Each re-
quest was labeled by six different annotators. We paid $0.06
for rating each post. To encourage workers to take the nu-
meric rating task seriously, we also asked them to highlight
words and phrases in the post that provided evidence for
their ratings. To further control the annotation quality, we
required that all workers have a United States location and
have 98% or more of their previous submissions accepted.
We monitored the annotation job and manually filtered out
annotators who submitted uniform or seemingly random an-
notations.
We define the motivation score of a post as the average of
the six scores assigned by the annotators. The distributions
of resulting motivation scores are shown in Figure 1. The
following two examples from our final hand-coded dataset
of the Accountable Talk class illustrate the scale. One shows
high motivation, and the other demonstrates low motivation.
The example posts shown in this paper are lightly disguised
and shortened to protect user privacy.
Learner Motivation = 7.0 (Extremely motivated)
Referring to the last video on IRE impacts in our learning
environments, I have to confess that I have been a vic-
tim of IRE and I can recall the silence followed by an
exact and final received from a bright student.... Many
ESL classes are like the cemetery of optional responses
let alone engineering discussions. The Singing Man class
is like a dream for many ESL teachers or even students if
they have a chance to see the video! ...Lets practice this in
our classrooms to share the feedbacks later!
Learner Motivation = 1.0 (Extremely unmotivated)
I have taken several coursera courses, and while I am will-
ing to give every course a chance, I was underwhelmed
by the presentation. I would strongly suggest you look-
ing at other courses and ramping up the lectures. I’m sure
the content is worthy, I am just not motivated to endure a
bland presentation to get to it. All the best, XX.
4.1.2 Inter-annotator Agreement To evaluate the relia-
bility of the annotations we calculate the intra-class corre-
lation coefficient for the motivation annotation. Intra-class
correlation(Koch, 1982) is appropriate to assess the consis-
tency of quantitative measurements when all objects are not
rated by the same judges. The intra-class correlation coeffi-
cient for learner motivation is 0.74 for the Accountable Talk
class and 0.72 for the Fantasy and Science Fiction course.
To assess the validity of their ratings, we also had the
workers code 30 Accountable Talk forum posts which
had been previously coded by experts. The correlation of
MTurkers’ average ratings and the experts’ average ratings
was moderate (r = .74) for level of learner motivation.
We acknowledge that the perception of motivation is
highly subjective and annotators may have inconsistent
scales. In an attempt to mitigate this risk, instead of using
the raw motivational score from MTurk, for each course, we
break the set of annotated posts into two balanced groups
0
20
40
60
80
100
120
#posts
Motivation Scores (Accountable Talk)
0
20
40
60
80
100
120
140
#posts
Motivation Scores (Fantasy and Science Fiction)
Figure 1: Annotated motivation score distribution.
based on the motivation scores: “motivated” posts and “un-
motivated” posts.
4.1.3 Linguistic Markers of Learner Motivation In this
section, we work to find domain-independent motivation
cues so that a machine learning model is able to capture mo-
tivation expressed in posts reliably across different MOOCs.
Building on the literature of learner motivation, we design
five linguistic features and describe them below. The fea-
tures are binary indicators of whether certain words ap-
peared in the post or not. Table 1 describes the distribution of
the motivational markers in our Accountable Talk annotated
dataset. We do not include the Fantasy and Science Fiction
dataset in this analysis, because they will serve as a test do-
main dataset for our prediction task in the next section.
Apply words (Table 1, line 1): previous research on E-
learning has found that motivation to learn can be expressed
as the attention and effort required to complete a learning
task and then apply the new material to the work site or
life (Esque & McCausland, 1997). Actively relating learn-
ing to potential application is a sign of a motivated learner
(Moshinskie, 2001). So we hypothesize that words that in-
dicate application of new knowledge can be cues of learner
motivation.
The Apply lexicon we use consists of words that are syn-
onyms of “apply” or “use”: “apply”, “try”, “utilize”, “em-
ploy”, “practice”, “use”, “help”, “exploit” and “implement”.
Need words (Table 1, line 2) show the participant’s need,
plan and goals: “hope”, “want”, “need”, “will”, “would
like”, “plan”, “aim” and “goal”. Previous research has
shown that learners could be encouraged to identify and ar-
ticulate clear aims and goals for the course to increase moti-
vation (Locke & Latham, 2002; Milligan et al. 2013).
LIWC-cognitive words (Table 1, line 3): The cognitive
mechanism dictionary in LIWC (Pennebaker & King, 1999)
includes such terms as “thinking”, “realized”, “understand”,
“insight” and “comprehend”.
First person pronouns (Table 1, line 4): using more first
person pronouns may indicate the user can relate the discus-
sion to self effectively.
Positive words (Table 1, line 5) from the sentiment lexi-
con (Liu et al. 2005) are also indicators of learner moti-
vation. Learners with positive attitudes have been demon-
strated to be more motivated in E-learning settings (Moshin-
skie, 2001). Note that negative words are not necessarily in-
dicative of unmotivated posts, because an engaged learner
may also post negative comments. This has also been re-
ported in earlier work by Ramesh et al. (2013).
The features we use here are mostly indicators of high
user motivation. The features that are indicative of low user
motivation do not appear as frequently as we expected from
the literature. This may be partly due to the fact that stu-
dents who post in the forum have higher learner motivation
in general.
Feature In Motivated In Unmotivated
post set post set
Apply** 57% 42%
Need** 54% 37%
LIWC 56% 38%
-cognitive**
1st Person*** 98% 86%
Positive*** 91% 77%
Table 1: Features for predicting learner motivation. A bino-
mial test is used to measure the feature distribution differ-
ence between the motivated and unmotivated post sets(**: p
<0.01, ***: p <0.001).
4.1.4 Experimental Setup To evaluate the robustness and
domain-independence of the analysis from the previous sec-
tion, we set up our motivation prediction experiments on the
two courses. We treat Accountable Talk as a “development
domain” since we use it for developing and identifying lin-
guistic features. Fantasy and Science Fiction is thus our “test
domain” since it was not used for identifying the features.
For each post, we classify it as “motivated” or “unmoti-
vated”. The amount of data from the two courses is balanced
within each category. In particular, each category contains
257 posts from the Accountable Talk course and 267 posts
for the Fantasy and Science Fiction course.
We compare three different feature sets: a unigram fea-
ture representation as a baseline feature set, a linguistic clas-
sifier (Ling.) using only the linguistic features described
in the previous section, and a combined feature set (Uni-
gram+Ling.). We use logistic regression for our binary clas-
sification task. We employ liblinear (Fan et al. 2008) in Weka
(Witten & Frank, 2005) to build the linear models. In order
In-domain Cross-domain
Train Accountable Fantasy Accountable Fantasy
Test Accountable Fantasy Fantasy Accountable
Unigram 71.1% 64.0% 61.0% 61.3%
Ling. 65.2% 60.1% 61.4% 60.8%
Unigram+Ling. 72.3% 66.7% 63.3% 63.7%
Table 2: Accuracies of our three classifiers for the Accountable Talk course (Accountable) and the Fantasy and Science Fiction
course (Fantasy), for in-domain and cross-domain settings. The random baseline performance is 50%.
to prevent overfitting we use Ridge (L2) regularization.
4.1.5 Motivation Prediction We now show how our fea-
ture based analysis can be used in a machine learning
model for automatically classifying forum posts according
to learner motivation.
To ensure we capture the course-independent learner mo-
tivation markers, we evaluate the classifiers both in an in-
domain setting, with a 10-fold cross validation, and in a
cross-domain setting, where we train on one course’s data
and test on the other (Table 2). For both our development
(Accountable Talk) and our test (Fantasy and Science Fic-
tion) domains, and in both the in-domain and cross-domain
settings, the linguistic features give 1-3% absolute improve-
ment over the unigram model.
The experiments in this section confirm that our theory-
inspired features are indeed effective in practice, and gen-
eralize well to new domains. The bag-of-words model is
hard to be applied to different course posts due to the differ-
ent content of the courses. For example, many motivational
posts in the Accountable Talk course discuss about teaching
strategies. So words such as “student” and “classroom” have
high feature weight in the model. This is not necessarily true
for the other courses whose content has nothing to do with
teaching.
In this section, we examine learner motivation where it
can be perceived by a human. However, it is naive to as-
sume that every forum post of a user can be regarded as a
motivational statement. Many posts do not contain markers
of learner motivation. In the next section, we measure the
cognitive engagement level of a student based on her posts,
which may be detectable more broadly.
4.2 Level of Cognitive Engagement
Level of cognitive engagement captures the attention and ef-
fort in interpreting, analyzing and reasoning about the course
material that is visible in discussion posts (Stoney & Oliver,
1999). Previous work uses manual content analysis to ex-
amine studentscognitive engagement in computer-mediated
communication (CMC)(Fahy et al. 2001; Zhu 2006). In the
MOOC forums, some of the posts are more descriptive of a
particular scenario. Some of the posts contain more higher-
order thinking, such as deeper interpretations of the course
material. Whether the post is more descriptive or interpretive
may reflect different levels of cognitive engagement of the
post author. Recent work shows that level of language ab-
straction reflects level of cognitive inferences (Beukeboom,
2014). In this section, we measure the level of cognitive en-
gagement of a MOOC user with the level of language ab-
straction of her forum posts.
4.2.1 Measuring Level of Language Abstraction Con-
crete words refer to things, events, and properties that we
can perceive directly with our senses, such as “trees”, “walk-
ing”, and “red”. Abstract words refer to ideas and concepts
that are distant from immediate perception, such as “sense”,
“analysis”, and “disputable”(Turney et al. 2011).
Previous work measures level of language abstraction
with Linguistic Inquiry and Word Count (LIWC) word cate-
gories (Gill & Oberlander, 2002; Pennebaker & King, 1999;
Yarkoni, 2010; Beukeboom, 2013). For a broader word cov-
erage, we use the automatically generated abstractness dic-
tionary from Turney et al. (2011) which is publicly available.
This dictionary contains 114,501 words. They automatically
calculate a numerical rating of the degree of abstractness of
a word on a scale from 0 (highly concrete) to 1 (highly ab-
stract) based on generated feature vectors from the contexts
the word has been found in.
The mean level of abstraction was computed for each post
by adding the abstractness score of each word in the post
and dividing that by the total number of words. The follow-
ing are two example posts from the Accountable Talk course
Week 2 subforum, one with high level of abstraction and one
with low level of abstraction. Based on the abstraction dic-
tionary, abstract words are in italic and concrete words are
underlined.
Level of abstraction = 0.85 (top 10%)
Iagree. Probably what you just have to keep in mind is
that you are there to help them learn by giving them op-
portunities to REASON out. In that case, you will not just
accept the student’s answer but let them explain how they
arrived towards it. Keep in mind to appreciate and chal-
lenge their answers.
Level of abstraction = 0.13 (bottom 10%)
I teach science to gifted middle school students. The stu-
dents learned to have conversations with me as a class and
with the expert her wrote Chapter 8 of a text published in
2000. They are trying to design erosion control features
for the building of a basketball court at the bottom of a
hill in rainy Oregon.
We believe that level of language abstraction reflects the
understanding that goes into using those abstract words
when creating the post. In the Learn to Program course fo-
rums, many discussion threads are solving actual program-
ming problems, which is very different from the other two
courses where more subjective reflections of the course con-
tents are shared. Higher level of language abstraction re-
flects the understanding of a broader problem. More con-
crete words are used when describing a particular bug a stu-
dent encounters. Below are two examples.
Level of abstraction = 0.65 (top 10%)
I have the same problems here. Make sure that your
variable names match exactly. Remember that under-bars
connect words together. I know something to do with the
read board(board file) function, but still need someone to
explain more clearly.
Level of abstraction = 0.30 (bottom 10%)
>>> print(python, is)(’python’, ’is’) >>> print(’like’,
’the’, ’instructors’, ’python’) It leaves the ’quotes’ and
commas, when the instructor does the same type of
print in the example she gets not parenthesis, quotes, or
commas. Does anyone know why?
5 Validation Experiments
We use survival analysis to validate that participants with
higher measured level of engagement will stay active in the
forums longer, controlling for other forum behaviors such as
how many posts the user contributes. We apply our linguistic
measures described in Section 4 to quantify student engage-
ment. We use the in-domain learner motivation classifiers
with both linguistic and unigram features (Section 4.1.5) for
the Accountable Talk class and the Fantasy and Science Fic-
tion class. We use the classifier trained on the Accountable
Talk dataset to assign motivated/unmotivated labels to the
posts in the Learn to Program course.
5.1 Survival Model Design
Survival models can be regarded as a type of regression
model, which captures influences on time-related outcomes,
such as whether or when an event occurs. In our case, we
are investigating our engagement measures’ influence on
when a course participant drops out of the course forum.
More specifically, our goal is to understand whether our
automatic measures of student engagement can predict
her length of participation in the course forum. Survival
analysis is known to provide less biased estimates than
simpler techniques (e.g., standard least squares linear
regression) that do not take into account the potentially
truncated nature of time-to-event data (e.g., users who had
not yet left the community at the time of the analysis but
might at some point subsequently). From a more technical
perspective, a survival model is a form of proportional odds
logistic regression, where a prediction about the likelihood
of a failure occurring is made at each time point based
on the presence of some set of predictors. The estimated
weights on the predictors are referred to as hazard ratios.
The hazard ratio of a predictor indicates how the relative
likelihood of the failure occurring increases or decreases
with an increase or decrease in the associated predictor. We
use the statistical software package Stata (Stata, 2001). We
assume a Weibull distribution of survival times, which is
generally appropriate for modeling survival.
For each of our three courses, we include all the active stu-
dents, i.e. who contributed one or more posts to the course
forums. We define the time intervals as student participation
weeks. We considered the timestamp of the first post by each
student as the starting date for that student’s participation in
the course discussion forums and the date of the last post as
the end of participation unless it is the last course week.
Dependent Variable:
In our model, the dependent measure is Dropout, which is
1 on a student’s last week of active participation unless it is
the last course week (i.e. the seventh course week), and 0 on
other weeks.
Control Variables:
Cohort1: This is a binary indicator that describes whether
a user had ever posted in the first course week (1) or not
(0). Members who join the course in earlier weeks are more
likely than others to continue participating in discussion fo-
rums(Yang et al. 2013).
PostCountByUser: This is the number of messages a mem-
ber posts in the forums in a week, which is a basic effort
measure of engagement of a student.
CommentCount: This is the number of comments a user’s
posts receive in the forums in a week. Since this variable
is highly correlated with PostCountByUser (r >.70 for all
three courses). In order to avoid multicollinearity problems,
we only include PostCountByUser in the final models.
Independent variables:
AvgMotivation is the percentage of an individual’s posts in
that week that are predicted as “motivated” using our model
with both unigram and linguistic features (Section 4.1.4).
AvgCogEngagement: This variable measures the average
abstractness score per post each week.
We note that AvgMotivation and AvgCogEngagement are
not correlated with PostCountByUser (r <.20 for all three
courses). So they are orthogonal to the simpler measure of
student engagement. AvgMotivation is not correlated with
AvgAbstractness (r <.10 for all three courses). Thus, it is
acceptable to include these variables together in the same
model.
5.2 Survival Model Results
Table 3 reports the estimates from the survival models for the
control and independent variables entered into the survival
regression.
Effects are reported in terms of the hazard ratio (HR),
which is the effect of an explanatory variable on the risk
or probability of participants drop out from the course fo-
rum. Because all the explanatory variables except Cohort1
have been standardized, the hazard rate here is the predicted
change in the probability of dropout from the course fo-
rum for a unit increase in the predictor variable(i.e., Cohort1
changing from 0 to 1 or the continuous variable increasing
by a standard deviation when all the other variables are at
their mean levels).
Our variables show similar effects across our three
courses (Table 3). Here we explain the results on the Ac-
countable Talk course. The hazard ratio value for Cohort1
Accountable Talk Fantasy and Science Fiction Learn to Program
Control/Indep. Variable HR Std. Err. HR Std. Err. HR Std. Err.
Cohort1 .68*** .05 .82* .08 .81* .04
PostCountByUser .86*** .02 .90*** .02 .76*** .04
AvgMotivation .58* .13 .82* .08 .84*** .04
AvgCogEngagement .94* .02 .92** .03 .53** .13
Table 3: Results of the survival analysis(*: p<0.05, **: p<0.01, ***: p<0.001).
means that members survival in the group is 32% 7higher
for those who have posted in the first course week. Similarly,
the hazard ratio for PostCountByUser indicates that survival
rates are 14% 8higher for those who posted a standard devi-
ation more posts than average.
Controlling for when the participants started to post in the
forum and the total number of posts published each week,
both learner motivation and average level of abstraction sig-
nificantly influenced the dropout rates in the same direc-
tion. Those whose posts expressed an average of one stan-
dard deviation more learner motivation (AvgMotivation) are
42% 9more likely to continue posting in the course forum.
Those whose posts have an average of one standard devi-
ation higher cognitive engagement level (AvgCogEngage-
ment) are 6% 10 more likely to continue posting in the course
forum. AvgMotivation is relatively more predicative of user
dropout than AvgCogEngagement for the Accountable Talk
course and the Fantasy and Science Fiction course, while
AvgCogEngagement is more predicative of user dropout in
the Learn to Program course. This may be due to that in the
Learn to Program course more technical problems are dis-
cussed and less posts contain motivation markers.
Figure 2: Survival curves for students with different levels
of engagement in the Accountable Talk course.
Figure 2-4 illustrate these results graphically, showing
three survival curves for each course. The solid curve shows
survival with the number of posts, motivation, and cognitive
engagement at their mean level. The top curve shows sur-
732% = 100% - (100% * 0.86)
814% = 100% - (100% * 0.86)
942% = 100% - (100% * 0.58)
106% = 100% - (100% * 0.94)
Figure 3: Survival curves for students with different levels
of engagement in the Fantasy and Science Fiction course.
Figure 4: Survival curves for students with different levels
of engagement in the Learn to Program course.
vival when the number of posts is at its mean level, and av-
erage learner motivation and level of cognitive engagement
in the posts are both one standard deviation above the mean,
and the bottom curve shows survival when the number of
posts is at its mean, and the average expressed learner mo-
tivation and level of cognitive engagement in the posts are
one standard deviation below the average.
5.3 Implications
In contrast to regular courses where students engage with
class materials in a structured and monitored way, and in-
structors directly observe student behavior and provide feed-
back, in MOOCs, it is important to target the limited instruc-
tor’s attention to students who need it most(Ramesh et al.
2013). The automated linguistic models designed in this pa-
per can help monitor MOOC user engagement from forum
posts. By identifying students who are likely to end up not
completing the class before it is too late, we can perform tar-
geted interventions (e.g., sending encouraging emails, post-
ing reminders, allocating limited tutoring resources, etc.) to
try to improve the engagement of these students. For exam-
ple, our motivation prediction model could be used to im-
prove targeting of limited instructor’s attention to users who
are motivated in general but are experiencing a temporary
lack of motivation that might threaten their continued par-
ticipation, in particular, those who have shown serious inten-
tion of finishing the course by joining the discussion forums.
One possible intervention that can be based on this type of
analysis might suggest instructors reply to those with recent
motivation level lower than it has been in the past. This may
help students get past a difficult part of the course. We can
also recommend reading the highly motivated posts to the
other users, which may serve as an inspiration.
Based on the predictive engagement markers, we see it is
important for the students to be able to apply new knowledge
and engage in deeper thinking. Discussion facilitation can
influence levels of cognitive engagement (Corno and Mand-
inach, 1983). The instructor can encourage learners to reflect
on what and how learning addressed needs. Work on auto-
mated facilitation from the Computer Supported Collabora-
tive Learning (CSCL) literature might be able to be adapted
to the MOOC context to make this feasible (Adamson et al.
in press).
6 Conclusion
We present a study on how to measure MOOC student en-
gagement based on linguistic analysis on forum posts. We
identify two new measures that quantify engagement and
validate the measures on three Coursera courses with diverse
content. We automatically identify the extent to which posts
in course forums express learner motivation and cognitive
engagement. The survival analysis results validate that the
more motivation the learner expresses, the lower the risk of
dropout. Similarly, the more personal interpretation a par-
ticipant shows in her posts, the lower the rate of student
dropout from the course forums.
6.1 Limitations and future work
An important limitation of this study is that, even though the
activeness in a course’s online forum closely correlates with
the student’s drop out of the course, exactly when (and why)
the students drop out of a course entirely is not publicly ac-
cessible information. Gillanni(2013) shows that those that
engage explicitly in the discussion forums are often higher-
performing than their counterparts in the course. For the “in-
visible” users who have never interacted with other learn-
ers/staff on the discussion forums, we can only rely on click-
stream data to understand their behavior. Another limitation
is that even though we use longitudinal data, our findings
are correlational. Student motivation is generally catego-
rized as intrinsic and extrinsic in previous work(Ryan and
Deci 2000). In our work, we did not distinguish between the
two motivation types. Because in MOOC forums, we ob-
serve that there is a limited number of posts that demon-
strate extrinsic motivations(E.g. taking the course for high
grades or the certification). In future work, with annotated
motivation types, it will be interesting to study how students
with different observed types of motivation behave differ-
ently in MOOCs. We also hope to utilize social interactions
in forums, such as who talk to whom information, to better
understand social learning in MOOCs (Sun et al. 2011). We
also plan to design and test a targeted intervention making
use of the predicted engagement level, which will allow us
to measure the practical impact of our findings as well as
to evaluate whether the correlational evaluation we present
here holds up to an experimental test of causality.
Acknowledgments
We want to thank Ryan Carlson, David Adamson and Tan-
may Sinha, who helped provide the data for this project. The
research reported here was supported by National Science
Foundation grant 0835426.
References
Adamson, D., Dyke, G., Jang, H. J., Ros´e, C. P. 2013. To-
wards an Agile Approach to Adapting Dynamic Collabora-
tion Support to Student Needs. International Journal of AI
in Education 24(1), pp91-121..
Belanger, Y. and Thornton, J 2013. Bioelectricity: A Quanti-
tative Approach Duke University’s First MOOC. Duke Uni-
versity.
Beukeboom, C. J., Tanis, M., and Vermeulen, I. E. 2013. The
Language of Extraversion Extraverted People Talk More
Abstractly, Introverts Are More Concrete. Journal of Lan-
guage and Social Psychology, 32(2), 191-201.
Beukeboom, C. J. 2014. Mechanims of linguistic bias: How
words reflect and maintain stereotypic expectancies. . Social
Cognition and Communication, 31, 313-330.
Brinton, C. G., Chiang, M., Jain, S., Lam, H., Liu, Z. and
Wong, F. M. F. 2013. Learning about social learning
in MOOCs: From statistical analysis to generative model.
arXiv preprint arXiv:1312.2159.
Carini, R. M., Kuh, G. D. and Klein, S. P. 2006. Student
engagement and student learning: Testing the linkages. Re-
search in Higher Education, 47(1), 1-32.
Cheng, J., Kulkarni, C. and Klemmer, S. 2013. Tools for
predicting drop-off in large online classes. In Proceedings
of the 2013 conference on Computer supported cooperative
work companion. ACM.
Chi, M.T.H. 2000. Self-explaining expository texts: The
dual processes of generating inferences and repairing men-
tal models. . In R. Glaser (Ed), Advances in instructional
psychology: Educational design and cognitive science (pp.
161-238).
Chi, M.T.H., and Bassock, M. 1989. Learning from exam-
ples via self-explanations. Knowing, learning, and instruc-
tion: Essays in honor of Robert Glaser (1989): 251-282..
Christensen, G., A. Steinmetz, B. Alcorn, A. Bennet, D.
Woods, and EJ Emmanuel 2013. The MOOC Phenomenom:
Who Takes Massive Open Online Courses and Why?. Uni-
versity of Pennsylvania, n.d. Web. 6 Dec. 2013 .
Clow, Doug. 2013. MOOCs and the funnel of participation.
Third International Conference on Learning Analytics and
Knowledge. ACM .
Corno, Lyn, and Ellen B. Mandinach. 1983. The role of
cognitive engagement in classroom learning and motivation.
Educational psychologist 18(2): 88-108. .
DeBoer, Jennifer, G. S. Stump, D. Seaton, and Lori Bres-
low. 2013. Diversity in MOOC students’ backgrounds and
behaviors in relationship to performance in 6.002 x. In Pro-
ceedings of the Sixth Learning International Networks Con-
sortium Conference.
Dowson, M., and McInerney, D. M. 2003. What do students
say about their motivational goals?: Towards a more com-
plex and dynamic perspective on student motivation. Con-
temporary Educational Psychology, 28(1), 91-113.
Esque, T. and McCausland, J. 1997. Taking ownership for
transfer: A management development study case. Perfor-
mance Improvement Quarterly, 10 (2), 116-133.
Fahy, P.J., Crawford, G. and Ally, M. 2001. Patterns of in-
teraction in a computer conference transcript. International
Review of Open and Distance Learning, 2(1).
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. and Lin,
C.-J. 2008. LIBLINEAR: A library for large linear classifi-
cation. Journal of Machine Learning Research (9).
Fuchs, L.S., Fuchs, D., Hamlett, C. L., Phillips, N. B., Karns,
K., and Dutka, S. 1997. Enhancing students’ helping behav-
ior during peer-medicated instruction with conceptual math-
ematical explanations. Elementary School Journal, 97, 223-
249.
Garrison, D. R., Anderson, T. and Archer, W. 1999. Critical
inquiry in a text-based environment: Computer conferencing
in higher education. The internet and higher education, 2(2),
87-105..
Gillani, Nabeel. 2013. Learner communications in mas-
sively open online course. OxCHEPS Occasional Paper 53
.
Girish, Balakrishnan 2013. Predicting Student Retention in
Massive Open Online Courses using Hidden Markov Mod-
els. EECS Department, University of California, Berkeley.
Graham, S., & Golan, S. 1991. Motivational influences on
cognition: Task involvement, ego involvement, and depth of
information processing. Journal of Educational Psychology
83:187-194.
Huang, Jonathan, Chris Piech, Andy Nguyen, Leonidas
Guibas. 2013. Syntactic and Functional Variability of a
Million Code Submissions in a Machine Learning MOOC.
In Proceedings of the 1st Workshop on Massive Open Online
Courses at the 16th Annual Conference on Artificial Intelli-
gence in Education.
Keller, J., and Suzuki, K. 2004. Learner motivation and e-
learning design: A multinationally validated process. Jour-
nal of Educational Media, 29(3), 229-239.
Kizilcec, Ren F., Chris Piech, and Emily Schneider 2013.
Deconstructing disengagement: analyzing learner subpopu-
lations in massive open online courses. In Proceedings of
the Third International Conference on Learning Analytics
and Knowledge. ACM.
Koch, Gary G 1982. Intraclass correlation coefficient. In
S. Kotz and N. L. Johnson (Eds), Encyclopedia of statistical
sciences (pp.213-217).New York:John Wiley and Sons .
Koller, Daphne, Andrew Ng, Chuong Do and Zhenghao
Chen. 2013. Retention and Intention in Massive Open On-
line Courses. In Depth. Educause. N.p., 3. .
Liu, Bing., Minqing Hu, and Junsheng Cheng. 2005. Opin-
ion Observer: analyzing and comparing opinions on the
Web. In Proceedings of WWW, pages 342-351..
Locke, E. A., and Latham, G. P. 2002. Building a practically
useful theory of goal setting and task motivation. American
Psychologist, 57(9), 705-717.
Milligan, C., Littlejohn, A., and Margaryan, A. 2013. Pat-
terns of Engagement in Connectivist MOOCs. Journal of
Online Learning and Teaching, 9(2).
Moshinskie, J. 2001. How to keep e-learners from e-
scaping. Performance Improvement, 40(6), 30-37.
Ng, Evelyn, and Carl Bereiter 1991. Three levels of goal
orientation in learning. In Journal of the Learning Sciences
1.3-4.
Pennebaker, J. W. and King, L. A. 1999. Linguistic Styles:
Language use as an individual difference. Journal of Per-
sonality and Social Psychology, 77, 1296-1312.
Poellhuber, Bruno., Normand Roy, Ibtihel Bouchoucha,
Jacques Raynauld, Jean Talbot and Terry Anderso. 2013.
The Relations between MOOC’s Participants Motivational
Profiles, Engagement Profile and Persistence. In MRI Con-
ference, Arlington.
Ramesh, Arti, Dan Goldwasser, Bert Huang, Hal Daum III,
and Lise Getoor 2013. Modeling Learner Engagement in
MOOCs using Probabilistic Soft Logic. In workshop of
NIPS.
Rose, C. Carlson, D. Yang, M. Wen, L. Resnick, P. Gold-
man, and J. Sheerer. 2014. Social factors that contribute to
attrition in moocs. In ACM Learning at Scale.
Roscoe, R. D., and Chi, M. T. H. 2008. Tutor learning: The
role of explaining and responding to questions. Instructional
Science, 36. pp. 321-350.
Ryan, Richard M., and Edward L. Deci. 2000. Intrinsic and
extrinsic motivations: Classic definitions and new directions.
In Contemporary educational psychology 25.1 (2000): 54-
67 .
Snow, R., O’Connor, B., Jurafsky, D. and Ng, A. Y. 2008.
Cheap and fast — but is it good? Evaluating non-expert
annotations for natural language tasks. In Proceedings of
the Conference on Empirical Methods in Natural Language
Processing (pp. 254-263).
Stoney, C. and Oliver, R. 1999. Can higher order thinking
and cognitive engagement be enhanced with multimedia?
Interactive Multimedia Electronic Journal of Computer-
Enhanced Learning. .
Stata Corporation 2011. Stata Statistical Software Release
7.0: Programming. Stata Corporation .
Sun, T, W. Chen, Z. Liu, Y. Wang, X. Sun, M. Zhang, and
C.-Y. Lin. 2011. Participation maximization based on social
influence in online discussion forums. In Proceedings of
ICWSM, 2011.
Turney, P. D., Neuman, Y., Assaf, D. and Cohen, Y. 2011.
Literal and metaphorical sense identification through con-
crete and abstract context. In Proceedings of the 2011 Con-
ference on the Empirical Methods in Natural Language Pro-
cessing (pp. 680-690) .
Vedder, P. 1985. Cooperative learning: A study on processes
and effects of cooperation between primary school children.
Westerhaven, Groningen: Rijkuniversiteit Groningen..
Wang, Yi-Chia, Robert Kraut, and John M. Levine 2012.
To stay or leave?: the relationship of emotional and infor-
mational support to commitment in online health support
groups. In Proceedings of the ACM 2012 conference on
Computer Supported Cooperative Work. ACM.
Witten, I. H. and Frank, E. 2005. Data mining: Practical
machine learning tools and techniques (2nd ed.) Elsevier.
Wittrock, M. C. 1990. Generative processes of comprehen-
sion. Educational Psychologist, 24, pp. 345-376. .
Yang, D., Sinha, T., Adamson, D. and Ros´e, C. P. 2013.
“Turn on, Tune in, Drop out”: Anticipating student dropouts
in Massive Open Online Courses. In workshop of NIPS.
Zhu, E. 2006. Interaction and cognitive engagement: An
analysis of four asynchronous online discussions. Instruc-
tional Science, 34(6), 451-480.
... The large number of student posts in MOOC discussion forums makes it difficult for an instructor to know where to intervene to answer questions, resolve issues, or provide feedback. Moreover, while MOOC forum posts provide evidence of student learning and motivation (Wen and Yang 2014a;Elouazizi 2014), the large volume of content makes it difficult for an instructor to identify students who may need help Copyright © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. ...
... Wen et al. (2014b) found a correlation between the ratio of negative-to-positive sentiment words in MOOC forum posts and the rate of student drop-out in a given week. Similarly, Wen et al. (2014a) conducted a survival analysis and found that students with MOOC forum posts indicating higher levels of motivation and cognitive engagement were more likely to complete the course. ...
Article
Students in a Massive Open Online Course (MOOC) interact with each other and the course staff through online discussion forums. While discussion forums play a central role in MOOCs, they also pose a challenge for instructors. The large number of student posts makes it difficult for an instructor to know where to intervene to answer questions, resolve issues, and provide feedback. In this work, we focus on automatically predicting speech acts in MOOC forum posts. Our speech act categories describe the purpose or function of the post in the ongoing discussion. Specifically, we address three main research questions. First, we investigate whether crowdsourced workers can reliably label MOOC forum posts using our speech act definitions. Second, we investigate whether our speech acts can help predict instructor interventions and assignment completion and performance. Finally, we investigate which types of features (derived from the post content, author, and surrounding context) are most effective for predicting our different speech act categories.
... Text as an NLP feature for predicting student performance The literature indicates that while certain studies have systematically researched texts, the focus has not involved their linguistic or semantic properties, i.e. tokenization, lemmatization, text classification etc. More specifically, several studies have explored course forum texts as a feature source, though they have not extracted any linguistic features [9]- [11]. Typical features that have been examined include the total number of posts, the total number of words per post, and the time of posting.For example, in a CS undergraduate course [12] the authors extracted the following measures: messages, threads, words, sentences, views, and time of participation. ...
Conference Paper
Full-text available
While students generate a large volume of texts in higher education (e.g. blogs, fora, wikis, assignments), the potential of these texts as performance predictors has been unexplored. In the present work we report findings from a study in which undergraduates first viewed a series of six short video lectures and then wrote a short summary for each. Students' comprehension of each video lecture was measured with a quiz. Based on the median score as a threshold, two performance groups were created, high (above median) and low (below median). Using standard NLP approaches, we converted the student summaries into two large feature sets: (a) raw (i.e., BoW and TF-IDF) and (b) engineered (i.e., Part of Speech, embeddings). After encoding both the video lecture transcript and each respective student summary, the resulting sparse and dense vectors were used to compute the cosine similarity. The raw and engineered feature sets were subsequently used to train eight common ML classifiers. As the two classes were imbalanced, we used both the accuracy and the f1 score as metrics. The results indicated that in 50% of all cases, the raw text feature set led to a higher average classification accuracy compared to the engineered features. Furthermore, the average classification performance of the classifiers was .70 for the raw text features and .74 for the engineered features. Finally, depending on the metric and feature set, some classifiers performed better than others.
... Another similar work, conducted by Wen et al. (2014), focused on a sentiment analysis in a MOOC, aiming to understand the relation between students' comments and course success. The authors developed a model to identify motivated students, based on their comments in a course forum. ...
Article
Full-text available
Massive Online Open Course (MOOC) platforms are considered a distinctive way to deliver a modern educational experience, open to a worldwide public. However, student engagement in MOOCs is a less explored area, although it is known that MOOCs suffer from one of the highest dropout rates within learning environments in general, and in e-learning in particular. A special challenge in this area is finding early, measurable indicators of engagement. This paper tackles this issue with a unique blend of data analytics and NLP and machine learning techniques together with a solid foundation in psychological theories. Importantly, we show for the first time how Self-Determination Theory (SDT) can be mapped onto concrete features extracted from tracking student behaviour on MOOCs. We map the dimensions of Autonomy, Relatedness and Competence, leading to methods to characterise engaged and disengaged MOOC student behaviours, and exploring what triggers and promotes MOOC students’ interest and engagement. The paper further contributes by building the Engage Taxonomy, the first taxonomy of MOOC engagement tracking parameters, mapped over 4 engagement theories: SDT, Drive, ET, Process of Engagement. Moreover, we define and analyse students’ engagement tracking, with a larger than usual body of content (6 MOOC courses from two different universities with 26 runs spanning between 2013 and 2018) and students (initially around 218.235). Importantly, the paper also serves as the first large-scale evaluation of the SDT theory itself, providing a blueprint for large-scale theory evaluation. It also provides for the first-time metrics for measurable engagement in MOOCs, including specific measures for Autonomy, Relatedness and Competence; it evaluates these based on existing (and expanded) measures of success in MOOCs: Completion rate, Correct Answer ratio and Reply ratio. In addition, to further illustrate the use of the proposed SDT metrics, this study is the first to use SDT constructs extracted from the first week, to predict active and non-active students in the following week.
... Other studies have explored engagement in different contexts such as political argument settings (Shugars and Beauchamp, 2019), conversations around terrorist attacks (Chiluwa and Odebunmi, 2016), socio-affective aspects of conversations such as emotion (Yu et al., 2004), student engagement in online discussion forums (Liu et al., 2018), cognitive engagement in MOOC forums (Wen et al., 2014), real-time engagement in reducing binge drinking through intervention text messages (Irvine et al., 2017), user engagement in online health communities (Wang et al., 2020). However, none of these studies specifically consider the dynamics of socio-affective and cognitive engagement in online conversations between patients and healthcare providers. ...
... Other studies have explored engagement in different contexts such as political argument settings (Shugars and Beauchamp, 2019), conversations around terrorist attacks (Chiluwa and Odebunmi, 2016), socio-affective aspects of conversations such as emotion (Yu et al., 2004), student engagement in online discussion forums (Liu et al., 2018), cognitive engagement in MOOC forums (Wen et al., 2014), real-time engagement in reducing binge drinking through intervention text messages (Irvine et al., 2017), user engagement in online health communities (Wang et al., 2020). However, none of these studies specifically consider the dynamics of socio-affective and cognitive engagement in online conversations between patients and healthcare providers. ...
Preprint
Full-text available
Patients who effectively manage their symptoms often demonstrate higher levels of engagement in conversations and interventions with healthcare practitioners. This engagement is multifaceted, encompassing cognitive and socio-affective dimensions. Consequently, it is crucial for AI systems to understand the engagement in natural conversations between patients and practitioners to better contribute toward patient care. In this paper, we present a novel dataset (MedNgage), which consists of patient-nurse conversations about cancer symptom management. We manually annotate the dataset with a novel framework of categories of patient engagement from two different angles, namely: i) socio-affective engagement (3.1K spans), and ii) cognitive engagement (1.8K spans). Through statistical analysis of the data that is annotated using our framework, we show a positive correlation between patient symptom management outcomes and their engagement in conversations. Additionally, we demonstrate that pre-trained transformer models fine-tuned on our dataset can reliably predict engagement categories in patient-nurse conversations. Lastly, we use LIME (Ribeiro et al., 2016) to analyze the underlying challenges of the tasks that state-of-the-art transformer models encounter. The de-identified data is available for research purposes upon request.
... As predictive modeling techniques in education matured, the focus has expanded to advancing our theoretical understandings of student learning in a particular context (Shmueli, 2010). This includes understanding students' learning strategies and self-regulated learning (Di Mitri et al., 2017;Maldonado-Mahauad et al., 2018;Moreno-Marcos et al., 2020;Sierens et al., 2009), affect detection (Calvo & D'Mello, 2010;Hussain et al., 2011), reading comprehension (Allen et al., 2015), critical thinking (Barbosa et al., 2020;Kovanović et al., 2014Kovanović et al., , 2016Neto et al., 2021Neto et al., , 2018Waters et al., 2015), reflection , motivation (Sharma et al., 2020;Wen et al., 2014), feedback engagement (Iraj et al., 2020), social interactions (Joksimović et al., 2015;Yoo & Kim, 2012), and team performance (Yoo 16. Predictive modeling of student success Christopher Brooks, Vitomir Kovanović and Quan Nguyen Christopher Brooks, Vitomir Kovanović, and Quan Nguyen -9781800375413 Downloaded from PubFactory at 05/04/2023 06:23:05PM via University of South Australia & Kim, 2013). ...
Article
Full-text available
Learning analytics, located at the intersection of learning science, data science, and computer science, aims to leverage educational data to enhance teaching and learning. However, as educational data increases, distilling meaningful insights presents challenges, particularly concerning individual learner differences. This work introduces a comprehensive approach for designing and automatically mapping learners into meaningful cohorts based on diverse learning behaviors. It defines four critical contexts, including engagement, direction, repetitiveness, and orderliness, and generates practical learning cohorts from their combinations. The approach employs a time-series-based clustering method with K-means clustering using dynamic time warping to identify similar learning patterns. Statistical techniques like the Mann-Kendall test and Theil-Sen estimator further refine the process. A case study on data science courses validates the approach, offering novel insights into learner behavior. The contributions include a novel time series approach to characterizing learning behavior, new learning cohorts based on critical contexts, and a systematic method for automated cohort identification.
Article
Full-text available
Background Forums in massive open online courses (MOOCs) enable written exchanges on course content; hence, they can potentially facilitate learners' cognitive engagement. Given the myriad of MOOC forum messages, this engagement is commonly analysed automatically through the linguistic features of the messages. Assessing linguistic features of learners' forum messages involves consideration of the learning tasks. MOOC forum discussion tasks, however, have not been previously considered. Objective and Method This study explores the effects of MOOC forum discussion tasks on learners' cognitive engagement. Based on the structure of observed learning outcomes (SOLO) taxonomy, we manually annotate distinct levels of cognitive engagement encouraged in forum discussion tasks and displayed by learners in messages starting discussions (i.e., thread starters). We study the linguistic features of thread starters in relation to the pedagogical design of the discussion tasks. Additionally, we use random‐forest modelling to identify the linguistic and task‐related features that help to categorise learners' cognitive engagement according to SOLO levels. Results Manual analysis showed that learners' thread starters mainly reflect surface SOLO levels and include few academic words and cohesive language. Random‐forest modelling showed that these linguistic features, together with the SOLO levels encouraged in the discussion tasks, played an important role in identification of learners' cognitive engagement. Major Takeaways Our results highlight the importance of the pedagogical design of MOOC forum tasks in helping learners engage cognitively. Our study also contributes to the empirical evidence that learners' linguistic choices can afford insights into the quality of their cognitive engagement.
Thesis
Full-text available
Massive open online courses (MOOCs) emerged with the promise to disrupt higher education. Fifteen years after their emergence, in terms of performance, that promise has not been fulfilled. In MOOC discussion forums, learners seldom capitalise on the opportunities for social learning. Through four empirical studies, we investigate how MOOC discussion forums are structured and how they can be potentially designed to facilitate learner-to-learner interactions and instructional dialogue. Results show that thoughtful design can help improve MOOC forum navigation, participation, and interactions. However, the environment in which forums are embedded needs to be considered as a techno-pedagogical fabric that provides (but also constrains) opportunities for social learning.
Article
Full-text available
Recent studies have found that comments from teaching assistants may encourage interactions in edX-like Massive Open Online Course (xMOOC) forums. However, how concepts from these interactions are conveyed to other xMOOC participants has not received much attention. Therefore, this study focuses on a unidirectional teaching assistant-student xMOOC interaction (TS interaction), a content-related pair including one question from a student and one immediate answer from a teaching assistant. The authors particularly investigate the linguistic features (i.e., concept connectivity, concept concreteness, readability and semantic overlap) of concept conveying in TS interactions with many responses (mTS) and with few responses (fTS). In addition, a language factor (English and Chinese) is also considered. Additionally, the interaction transcripts from science lectures (SL) and political briefings (PB) were used as control groups as two opposite cases of concept conveying. At the concept level, the concept conveying in transcripts were modelled as a graph, and measured by common indicators in graph theory. At the overall level, the concept conveying in transcripts were measured by regular linguistic measuring tools. The results show that interactions with mTS and fTS demonstrate different concept conveying tendencies toward SL and PB in terms of linguistic features in both languages. The results suggest that in both languages, teaching assistants may use mixed concept-conveying strategies to stimulate more follow-up responses in xMOOC forums. These conclusions drawn from TS interactions can be even partially generalized in a larger student-student (SS) interaction dataset.
Conference Paper
Full-text available
In this paper, we explore student dropout behavior in Massive Open Online Courses(MOOC). We use as a case study a recent Coursera class from which we develop a survival model that allows us to measure the influence of factors extracted from that data on student dropout rate. Specifically we explore factors related to student behavior and social positioning within discussion forums using standard social network analytic techniques. The analysis reveals several significant predictors of dropout.
Article
Intrinsic and extrinsic types of motivation have been widely studied, and the distinction between them has shed important light on both developmental and educational practices. In this review we revisit the classic definitions of intrinsic and extrinsic motivation in light of contemporary research and theory. Intrinsic motivation remains an important construct, reflecting the natural human propensity to learn and assimilate. However, extrinsic motivation is argued to vary considerably in its relative autonomy and thus can either reflect external control or true self-regulation. The relations of both classes of motives to basic human needs for autonomy, competence and relatedness are discussed.
Chapter
This article has no abstract.
Article
Massive open online courses (MOOCs) have commanded considerable public attention due to their sudden rise and disruptive potential. But there are no robust, published data that describe who is taking these courses and why they are doing so. As such, we do not yet know how transformative the MOOC phenomenon can or will be. We conducted an online survey of students enrolled in at least one of the University of Pennsylvania’s 32 MOOCs offed on the Coursera platform. The student population tends to be young, well educated, and employed, with a majority from developed countries. There are significantly more males than females taking MOOCs, especially in BRIC and other developing countries. Students’ main reasons for taking a MOOC are advancing in their current job and satisfying curiosity. The individuals the MOOC revolution is supposed to help the most — those without access to higher education in developing countries — are underrepresented among the early adopters.
Article
This paper investigates the use of conversational agents to scaffold on-line collaborative learning discussions through an approach called Academically Productive Talk (APT). In contrast to past work on dynamic support for collaborative learning, where agents were used to elevate conceptual depth by leading students through directed lines of reasoning (Kumar & Rosé, IEEE Transactions on Learning Technologies, 4(1), 2011), this APT-based approach uses generic prompts that encourage students to articulate and elaborate their own lines of reasoning, and to challenge and extend the reasoning of their teammates. This paper integrates findings from a series of studies across content domains (biology, chemistry, engineering design), grade levels (high school, undergraduate), and facilitation strategies. APT based strategies are contrasted with simply offering positive feedback when the students themselves employ APT facilitation moves in their interactions with one another, an intervention we term Positive Feedback for APT engagement. The pattern of results demonstrates that APT based support for collaborative learning can significantly increase learning, but that the effect of specific APT facilitation strategies is context specific. It appears the effectiveness of each strategy depends upon factors such as the difficulty of the material (in terms of being new conceptual material versus review) and the skill level of the learner (urban public high school vs. selective private university). In contrast, Feedback for APT engagement does not positively impact learning. In addition to an analysis based on learning gains, an automated conversation analysis technique is presented that effectively predicts which strategies are successfully operating in specific contexts. Implications for design of more agile forms of dynamic support for collaborative learning are discussed.
Article
Sixteen adult volunteers provided thinking-aloud protocols while undergoing a 10-hr individually administered course in BASIC (beginner's all-purpose symbolic instruction code) programming. Three levels of goals were identified as operative in the learning situation: task-completion goals, instructional goals, and personal knowledge-building goals. Although protocol statements indicating knowledge-building goals were infrequent, students exhibiting a relatively high proportion of them were distinctive in several ways. They did significantly better on a posttest. Their performance in goal cue selections differed from that of other participants in ways consistent with their orientation: They responded more often to learning goal cues than to task goal cues. They actively related new learning to prior knowledge and they posed and tried to solve problems and questions. Students oriented toward instructional goals tended to focus on what was explicitly taught. Students oriented toward task-completion goals tended to equate learning with successful completion of assigned tasks. Level of goal orientation and posttest performance were unrelated to level of education and prior computer experience but were positively related to previous experience of independent learning.
Article
Despite the hype and speculation about the role massively open online courses (MOOCs) may play in higher education, empirical research that explores the realities of interacting and learning in MOOCs is in its infancy. MOOCs have evolved from previous incarnations of online learning but are distinguished in their global reach and semi-synchronicity. Thus, it is important to understand the ways that learners from around the world interact in these settings. In this paper, we ask three questions: 1) What are the demographic characteristics of students that participate in MOOC discussion forums? 2) What are the discussion patterns that characterize their interactions? And 3) How does participation in discussion forums relate to students’ final scores? Analysis of nearly 87,000 individuals from one MOOC reveals three key trends. First, forum participants tend to be young adults from the western world. Secondly, these participants assemble and disperse as crowds, not communities, of learners. Finally, those that engage explicitly in the discussion forums are often higher-performing than those that do not, although the vast majority of forum participants receive “failing” marks. These findings have implications for the design and implementation of future MOOCs, and how they are conceptualised as part of higher education.
Conference Paper
This paper describes two diagnostic tools to predict students are at risk of dropping out from an online class. While thousands of students have been attracted to large online classes, keeping them motivated has been challenging. Experiments on a large, online HCI class suggest that the tools these paper introduces can help identify students who will not complete assignments, with an F1 score of 0.46 and 0.73 three days before the assignment due date.