Conference PaperPDF Available

Linguistic Reflections of Student Engagement in Massive Open Online Courses

August 2014

August 2014

Conference: THE 8TH INTERNATIONAL AAAI CONFERENCE ON WEB AND SOCIAL MEDIA

Authors:

Miaomiao Wen

Carnegie Mellon University

Diyi Yang

Carnegie Mellon University

While data from Massive Open Online Courses (MOOCs) offers the potential to gain new insights into the ways in which online communities can contribute to student learning, much of the richness of the data trace is still yet to be mined. In particular, very little work has attempted fine-grained content analyses of the student interactions in MOOCs. Survey research indicates the importance of student goals and intentions in keeping them involved in a MOOC over time. Automated fine-grained content analyses offer the potential to detect and monitor evidence of student engagement and how it relates to other aspects of their behavior. Ultimately these indicators reflect their commitment to remaining in the course. As a methodological contribution, in this paper we investigate using computational linguistic models to measure learner motivation and cognitive engagement from the text of forum posts. We validate our techniques using survival models that evaluate the predictive validity of these variables in connection with attrition over time. We conduct this evaluation in three MOOCs focusing on very different types of learning materials. Prior work demonstrates that participation in the discussion forums at all is a strong indicator of student commitment. Our methodology allows us to differentiate better among these students, and to identify danger signs that a struggling student is in need of support within a population whose interaction with the course offers the opportunity for effective support to be administered. Theoretical and practical implications will be discussed.

Survival curves for students with different levels of engagement in the Accountable Talk course.

…

Survival curves for students with different levels of engagement in the Fantasy and Science Fiction course.

…

Survival curves for students with different levels of engagement in the Learn to Program course.

…

Figures - uploaded by Miaomiao Wen

Content may be subject to copyright.

Content uploaded by Miaomiao Wen

Content may be subject to copyright.

Linguistic Reﬂections of Student Engagement in Massive Open Online Courses

Miaomiao Wen, Diyi Yang and Carolyn Penstein Ros´

Language Technology Institute

Carnegie Mellon University

Pittsburgh, PA 15213

{mwen,diyiy,cprose}@cs.cmu.edu

Abstract

While data from Massive Open Online Courses (MOOCs) of-

fers the potential to gain new insights into the ways in which

online communities can contribute to student learning, much

of the richness of the data trace is still yet to be mined. In

particular, very little work has attempted ﬁne-grained con-

tent analyses of the student interactions in MOOCs. Survey

research indicates the importance of student goals and in-

tentions in keeping them involved in a MOOC over time.

Automated ﬁne-grained content analyses offer the potential

to detect and monitor evidence of student engagement and

how it relates to other aspects of their behavior. Ultimately

these indicators reﬂect their commitment to remaining in the

course. As a methodological contribution, in this paper we

investigate using computational linguistic models to measure

learner motivation and cognitive engagement from the text of

forum posts. We validate our techniques using survival mod-

els that evaluate the predictive validity of these variables in

connection with attrition over time. We conduct this evalu-

ation in three MOOCs focusing on very different types of

learning materials. Prior work demonstrates that participation

in the discussion forums at all is a strong indicator of student

commitment. Our methodology allows us to differentiate bet-

ter among these students, and to identify danger signs that a

struggling student is in need of support within a population

whose interaction with the course offers the opportunity for

effective support to be administered. Theoretical and practical

implications will be discussed.

1 Introduction

The recent development of Massive Open Online Course

(MOOC) websites such as Coursera1, edX2and Udacity3,

demonstrates the potential of distance learning and lifelong

learning to reach the masses. However, one disappointment

has been that only one in every 20 students who enroll in

such courses actually ﬁnish(Koller et al. 2013). In order

to understand the attrition problem and work towards so-

lutions, especially given the varied backgrounds and moti-

vations of students who choose to enroll in a MOOC (De-

Boer et al. 2013), we need to highlight and understand the

2014, Association for the Advancement of Artiﬁcial

1https://www.coursera.org/

2https://www.edx.org/

3https://www.udacity.com/

value sought and obtained by the participants of MOOCs in-

cluding that reﬂected in their discussion posts, especially the

“non-completing” population (Koller et al. 2013).

In this paper, we propose to gauge a student’s engage-

ment using linguistic analysis applied to the student’s forum

posts within the MOOC course. Based on the learning sci-

ences literature, we quantify a student’s level of engagement

in a MOOC from two different angles: (1) displayed level

of motivation to continue with the course and (2) the level

of cognitive engagement with the learning material. Student

motivation to continue is important- without it, it is impos-

sible for a student to regulate him or her effort to move for-

ward productively in the course. Nevertheless, for learning it

is necessary for the student to process the course content in

a meaningful way. In other words, cognitive engagement is

critical. Ultimately it is this grappling with the course con-

tent over time that will be the vehicle through which the stu-

dent achieves the desired learning outcomes.

Conversation in the course forum is replete with terms that

imply learner motivation. These terms may include those

suggested by the literature on learner motivation or simply

from our everyday language. E.g. “I tried very hard to follow

the course schedule” and “I couldn’t even ﬁnish the second

lecture.” In this paper, we attempt to automatically measure

learner motivation based on such markers found in posts on

the course discussion forums. Our analysis offers new in-

sights into the relation between language use and underlying

learner motivation in a MOOC context.

Besides student motivational state, the level of cognitive

engagement is another important aspect of student partici-

pation(Carini et al. 2006). For example, “This week’s video

lecture is interesting, the boy in the middle seemed tired,

yawning and so on.” and “The video shows a classroom cul-

ture where the kids clearly understand the rules of conversa-

tion and acknowledge each others contribution.” These two

posts comment on the same video lecture, but the ﬁrst post

is more descriptive at a surface level while the second one is

more interpretive, and displays more reﬂection. We measure

this difference in cognitive engagement with an estimated

level of language abstraction. We ﬁnd that users whose posts

show a higher level of cognitive engagement are more likely

to continue participating in the forum discussion.

The distant nature and the sheer size of MOOCs require

new approaches for providing student feedback and guid-

ing instructor intervention (Ramesh et al. 2013). One big

challenge is that MOOCs are far from uniform. In this pa-

per, we test the generality of our measures in three Cours-

era MOOCs focusing on distinct subjects. We demonstrate

that our measures of engagement are consistently predica-

tive of student dropout from the course forum across the

three courses. With this validation, our hope is that in the

long run, our automatic engagement measures can help in-

structors target their attention to those who show serious

intention of ﬁnishing the course, but nevertheless struggle

through due to dips in learner motivation. Our linguistic

analysis provides further indicators that some students are

going through the motions in a course but may need support

in order to fully engage with the material. Again such mon-

itoring might aid instructors in using their limited human

resources to the best advantage.

In the remainder of the paper we begin by describing our

dataset and discussing related work. Next, we explain how

we automatically measure student engagement from a user’s

forum posts from the two perspectives we highlight above.

We then continue with a survival analysis that estimates the

inﬂuence of our two measures of engagement on MOOC

dropout rate. Finally, we conclude with a summary and pos-

sible future work.

2 Coursera dataset

In preparation for a partnership with an instructor team for

a Coursera MOOC that was launched in Fall of 2013, we

were given permission by Coursera to crawl and study a

small number of other courses. Our dataset consists of three

courses: one social science course, “Accountable Talk: Con-

versation that works4”, offered in October 2013, which has

1,146 active users (active users refer to those who post at

least one post in a course forum) and 5,107 forum posts; one

literature course, “Fantasy and Science Fiction: the human

mind, our modern world5”, offered in June 2013, which has

771 active users who have posted 6,520 posts in the course

forum; one programming course, “Learn to Program: The

Fundamentals6”, offered in August 2013, which has 3,590

active users and 24,963 forum posts. All three courses are of-

ﬁcially seven weeks long. Each course has seven week spe-

ciﬁc subforums and a separate general subforum for more

general discussion about the course. Our analysis is limited

to behavior within the discussion forums.

3 Related Work

3.1 Learner Motivation

Most of the recent research on learner motivation in MOOCs

is based on surveys and relatively small samples of hand-

coded user-stated goals or reasons for dropout (e.g. Cheng

et al. 2013; Christensen et al. 2013; DeBoer et al. 2013;

Poellhuber et al. 2013). Poellhuber et al. (2013) ﬁnd that

user goals speciﬁed in the pre-course survey were the

strongest predictors of later learning behaviors. Motivation

4https://www.coursera.org/course/accountabletalk

5https://www.coursera.org/course/fantasysf

6https://www.coursera.org/course/programming1

is identiﬁed as an important determinant of engagement in

MOOCs in the Milligan et al.(2013) study. However, differ-

ent courses design different enrollment motivational ques-

tionnaire items, which makes it difﬁcult to generalize the

conclusions from course to course. Another drawback is that

learner motivation is volatile. In particular, distant learners

can lose interest very fast even if they had been progress-

ing well in the past (Keller & Suzuki, 2004). It is important

to monitor learner motivation and how it varies along the

course weeks. We propose to automatically measure learner

motivation based on linguistic cues in the forum posts.

3.2 Cognitive engagement

Research has consistently found that the cognitive processes

involved in higher-order thinking lead to better knowledge

acquisition (e.g. Chi & Bassock, 1989; Graham & Golan,

1991; Chi, 2000). Previous work has investigated students’

cognitive engagement in both face-to-face (Corno & Man-

dinach, 1983; Newman et al. 1996) and computer mediated

communication (CMC) environments (Garrison et al. 1999;

Zhu 2006). In this paper, we try to measure the cognitive

engagement of a MOOC user based on how much personal

interpretation are contained in the posts.

3.3 MOOC Analysis

The MOOC literature so far has focused on a summative

view of user participation and dropout - trying to assess the

rate at which different groups of users complete the course

(e.g. Kizilcec et al. 2013; Brinton et al. 2013; Ramesh et al.

2013). There are mainly three types of information available

about the participation patterns of MOOC users: the survey

information, the clickstream behavioral data, and the forum

posts. From the survey information, we can partly under-

stand the initial motivation at the time of enrollment of a

subset of users who ﬁlled in the survey. Unfortunately, even

for those users, this information does not help us understand

the dynamics of motivational change. From the clickstream

data of a user’s online activities, we can see if people are

working hard or not, but we cannot tell the reasons why their

level of activity changes from time to time. The discussion

forums provide the students with social learning opportu-

nities while at the same time providing a portal into their

minds. The activeness of a course’s online forum negatively

correlates with the volume of students that drop out of the

course (Brinton et al. 2013). In our work, we utilize linguis-

tic features in the forum posts that correlate with dropout to

get insights into the user experience that would be invisible

from the survey information or clickstream data.

Given the rich recent work on MOOC user dropout anal-

ysis, very little has attempted ﬁner-grained content analysis

of the course discussion forums. One exception is Ramesh

et al. (2013), which uses sentiment and subjectivity of user

posts to predict engagement/disengagement. However, nei-

ther sentiment nor subjectivity ended up being predictive of

engagement in that work. One explanation is that engaged

learners also post content with negative sentiment on the

course, such as complaints about peer-grading. Thus, the

problem is more complex than the operationalization used

in that work.

In our work, we use survival models to understand how

attrition happens along the way as students participate in

a course. This approach has been applied to online medi-

cal support communities to quantify the impact of receipt of

emotional and informational support on user commitment to

the community (Wang et al. 2012). Yang et al. (2013) and

Rose et al. (2014) have also used survival models to mea-

sure the inﬂuence of social positioning factors on drop out

of a MOOC. Our research contributes to this body of work.

4 Methods

4.1 Predicting Learner Motivation

The level of a student’s motivation strongly inﬂuences the

intensity of the student’s participation in the course. Pre-

vious research has shown that it is possible to catego-

rize learner motivation based on a students’ description of

planned learning actions (Ng & Bereiter, 1991; Dowson

& McInerney, 2003). The identiﬁed motivation categoriza-

tion has a substantial relationship to both learning behavior

and learning outcomes. But the lab-based experimental tech-

niques used in this prior work are impractical for the ever-

growing size of MOOCs. It is difﬁcult for instructors to per-

sonally identify students who lack motivation based on their

own personal inspection in MOOCs given the high student

to instructor ratio. To overcome these challenges, we build

machine learning models to automatically identify level of

learner motivation based on posts to the course forum. We

validate our measure in a domain general way by not only

testing on data from the same course, but also by training

on one course and then testing one other in order to uncover

course independent motivation cues. The linguistic features

that are predicative of learner motivation provide insights

into what motivates the learners.

4.1.1 Creating the Human-Coded Dataset: MTurk We

used Amazon’s Mechanical Turk (MTurk) to make it prac-

tical to construct a reliable annotated corpus for develop-

ing our automated measure of student motivation. Amazon’s

Mechanical Turk is an online marketplace for crowdsourc-

ing. It allows requesters to post jobs and workers to choose

jobs they would like to complete. Jobs are deﬁned and paid

in units of so-called Human Intelligence Tasks (HITs). Snow

et al. (2008) has shown that the combined judgments of a

small number (about 5) of naive annotators on MTurk leads

to ratings of texts that are very similar to those of experts.

This applies for content such as the emotions expressed, the

relative timing of events referred to in the text, word simi-

larity, word sense disambiguation, and linguistic entailment

or implication. As we show below, MTurk workers’ judg-

ments of learner motivation are also similar to coders who

are familiar with the course content.

We randomly sampled 514 posts from the Accountable

Talk course forums and 534 posts from the Fantasy and Sci-

ence Fiction course forums. The non-English posts were

manually ﬁltered out. In order to construct a hand-coded

dataset for training machine learning models later, we em-

ployed MTurk workers to rate each message with the level

of learner motivation towards the course the corresponding

post had. We provided them with explicit deﬁnitions to use

in making their judgment. For each post, the annotator had

to indicate how motivated she perceived the post author to be

towards the course by a 1-7 Likert scale ranging from “Ex-

tremely unmotivated” to “Extremely motivated”. Each re-

quest was labeled by six different annotators. We paid $0.06

for rating each post. To encourage workers to take the nu-

meric rating task seriously, we also asked them to highlight

words and phrases in the post that provided evidence for

their ratings. To further control the annotation quality, we

required that all workers have a United States location and

have 98% or more of their previous submissions accepted.

We monitored the annotation job and manually ﬁltered out

annotators who submitted uniform or seemingly random an-

notations.

We deﬁne the motivation score of a post as the average of

the six scores assigned by the annotators. The distributions

of resulting motivation scores are shown in Figure 1. The

following two examples from our ﬁnal hand-coded dataset

of the Accountable Talk class illustrate the scale. One shows

high motivation, and the other demonstrates low motivation.

The example posts shown in this paper are lightly disguised

and shortened to protect user privacy.

•Learner Motivation = 7.0 (Extremely motivated)

Referring to the last video on IRE impacts in our learning

environments, I have to confess that I have been a vic-

tim of IRE and I can recall the silence followed by an

exact and ﬁnal received from a bright student.... Many

ESL classes are like the cemetery of optional responses

let alone engineering discussions. The Singing Man class

is like a dream for many ESL teachers or even students if

they have a chance to see the video! ...Lets practice this in

our classrooms to share the feedbacks later!

•Learner Motivation = 1.0 (Extremely unmotivated)

I have taken several coursera courses, and while I am will-

ing to give every course a chance, I was underwhelmed

by the presentation. I would strongly suggest you look-

ing at other courses and ramping up the lectures. I’m sure

the content is worthy, I am just not motivated to endure a

bland presentation to get to it. All the best, XX.

4.1.2 Inter-annotator Agreement To evaluate the relia-

bility of the annotations we calculate the intra-class corre-

lation coefﬁcient for the motivation annotation. Intra-class

correlation(Koch, 1982) is appropriate to assess the consis-

tency of quantitative measurements when all objects are not

rated by the same judges. The intra-class correlation coefﬁ-

cient for learner motivation is 0.74 for the Accountable Talk

class and 0.72 for the Fantasy and Science Fiction course.

To assess the validity of their ratings, we also had the

workers code 30 Accountable Talk forum posts which

had been previously coded by experts. The correlation of

MTurkers’ average ratings and the experts’ average ratings

was moderate (r = .74) for level of learner motivation.

We acknowledge that the perception of motivation is

highly subjective and annotators may have inconsistent

scales. In an attempt to mitigate this risk, instead of using

the raw motivational score from MTurk, for each course, we

break the set of annotated posts into two balanced groups

100

120

#posts

Motivation Scores (Accountable Talk)

100

120

140

#posts

Motivation Scores (Fantasy and Science Fiction)

Figure 1: Annotated motivation score distribution.

based on the motivation scores: “motivated” posts and “un-

motivated” posts.

4.1.3 Linguistic Markers of Learner Motivation In this

section, we work to ﬁnd domain-independent motivation

cues so that a machine learning model is able to capture mo-

tivation expressed in posts reliably across different MOOCs.

Building on the literature of learner motivation, we design

ﬁve linguistic features and describe them below. The fea-

tures are binary indicators of whether certain words ap-

peared in the post or not. Table 1 describes the distribution of

the motivational markers in our Accountable Talk annotated

dataset. We do not include the Fantasy and Science Fiction

dataset in this analysis, because they will serve as a test do-

main dataset for our prediction task in the next section.

Apply words (Table 1, line 1): previous research on E-

learning has found that motivation to learn can be expressed

as the attention and effort required to complete a learning

task and then apply the new material to the work site or

life (Esque & McCausland, 1997). Actively relating learn-

ing to potential application is a sign of a motivated learner

(Moshinskie, 2001). So we hypothesize that words that in-

dicate application of new knowledge can be cues of learner

motivation.

The Apply lexicon we use consists of words that are syn-

onyms of “apply” or “use”: “apply”, “try”, “utilize”, “em-

ploy”, “practice”, “use”, “help”, “exploit” and “implement”.

Need words (Table 1, line 2) show the participant’s need,

plan and goals: “hope”, “want”, “need”, “will”, “would

like”, “plan”, “aim” and “goal”. Previous research has

shown that learners could be encouraged to identify and ar-

ticulate clear aims and goals for the course to increase moti-

vation (Locke & Latham, 2002; Milligan et al. 2013).

LIWC-cognitive words (Table 1, line 3): The cognitive

mechanism dictionary in LIWC (Pennebaker & King, 1999)

includes such terms as “thinking”, “realized”, “understand”,

“insight” and “comprehend”.

First person pronouns (Table 1, line 4): using more ﬁrst

person pronouns may indicate the user can relate the discus-

sion to self effectively.

Positive words (Table 1, line 5) from the sentiment lexi-

con (Liu et al. 2005) are also indicators of learner moti-

vation. Learners with positive attitudes have been demon-

strated to be more motivated in E-learning settings (Moshin-

skie, 2001). Note that negative words are not necessarily in-

dicative of unmotivated posts, because an engaged learner

may also post negative comments. This has also been re-

ported in earlier work by Ramesh et al. (2013).

The features we use here are mostly indicators of high

user motivation. The features that are indicative of low user

motivation do not appear as frequently as we expected from

the literature. This may be partly due to the fact that stu-

dents who post in the forum have higher learner motivation

in general.

Feature In Motivated In Unmotivated

post set post set

Apply** 57% 42%

Need** 54% 37%

LIWC 56% 38%

-cognitive**

1st Person*** 98% 86%

Positive*** 91% 77%

Table 1: Features for predicting learner motivation. A bino-

mial test is used to measure the feature distribution differ-

ence between the motivated and unmotivated post sets(**: p

<0.01, ***: p <0.001).

4.1.4 Experimental Setup To evaluate the robustness and

domain-independence of the analysis from the previous sec-

tion, we set up our motivation prediction experiments on the

two courses. We treat Accountable Talk as a “development

domain” since we use it for developing and identifying lin-

guistic features. Fantasy and Science Fiction is thus our “test

domain” since it was not used for identifying the features.

For each post, we classify it as “motivated” or “unmoti-

vated”. The amount of data from the two courses is balanced

within each category. In particular, each category contains

257 posts from the Accountable Talk course and 267 posts

for the Fantasy and Science Fiction course.

We compare three different feature sets: a unigram fea-

ture representation as a baseline feature set, a linguistic clas-

siﬁer (Ling.) using only the linguistic features described

in the previous section, and a combined feature set (Uni-

gram+Ling.). We use logistic regression for our binary clas-

siﬁcation task. We employ liblinear (Fan et al. 2008) in Weka

(Witten & Frank, 2005) to build the linear models. In order

In-domain Cross-domain

Train Accountable Fantasy Accountable Fantasy

Test Accountable Fantasy Fantasy Accountable

Unigram 71.1% 64.0% 61.0% 61.3%

Ling. 65.2% 60.1% 61.4% 60.8%

Unigram+Ling. 72.3% 66.7% 63.3% 63.7%

Table 2: Accuracies of our three classiﬁers for the Accountable Talk course (Accountable) and the Fantasy and Science Fiction

course (Fantasy), for in-domain and cross-domain settings. The random baseline performance is 50%.

to prevent overﬁtting we use Ridge (L2) regularization.

4.1.5 Motivation Prediction We now show how our fea-

ture based analysis can be used in a machine learning

model for automatically classifying forum posts according

to learner motivation.

To ensure we capture the course-independent learner mo-

tivation markers, we evaluate the classiﬁers both in an in-

domain setting, with a 10-fold cross validation, and in a

cross-domain setting, where we train on one course’s data

and test on the other (Table 2). For both our development

(Accountable Talk) and our test (Fantasy and Science Fic-

tion) domains, and in both the in-domain and cross-domain

settings, the linguistic features give 1-3% absolute improve-

ment over the unigram model.

The experiments in this section conﬁrm that our theory-

inspired features are indeed effective in practice, and gen-

eralize well to new domains. The bag-of-words model is

hard to be applied to different course posts due to the differ-

ent content of the courses. For example, many motivational

posts in the Accountable Talk course discuss about teaching

strategies. So words such as “student” and “classroom” have

high feature weight in the model. This is not necessarily true

for the other courses whose content has nothing to do with

teaching.

In this section, we examine learner motivation where it

can be perceived by a human. However, it is naive to as-

sume that every forum post of a user can be regarded as a

motivational statement. Many posts do not contain markers

of learner motivation. In the next section, we measure the

cognitive engagement level of a student based on her posts,

which may be detectable more broadly.

4.2 Level of Cognitive Engagement

Level of cognitive engagement captures the attention and ef-

fort in interpreting, analyzing and reasoning about the course

material that is visible in discussion posts (Stoney & Oliver,

1999). Previous work uses manual content analysis to ex-

amine studentscognitive engagement in computer-mediated

communication (CMC)(Fahy et al. 2001; Zhu 2006). In the

MOOC forums, some of the posts are more descriptive of a

particular scenario. Some of the posts contain more higher-

order thinking, such as deeper interpretations of the course

material. Whether the post is more descriptive or interpretive

may reﬂect different levels of cognitive engagement of the

post author. Recent work shows that level of language ab-

straction reﬂects level of cognitive inferences (Beukeboom,

2014). In this section, we measure the level of cognitive en-

gagement of a MOOC user with the level of language ab-

straction of her forum posts.

4.2.1 Measuring Level of Language Abstraction Con-

crete words refer to things, events, and properties that we

can perceive directly with our senses, such as “trees”, “walk-

ing”, and “red”. Abstract words refer to ideas and concepts

that are distant from immediate perception, such as “sense”,

“analysis”, and “disputable”(Turney et al. 2011).

Previous work measures level of language abstraction

with Linguistic Inquiry and Word Count (LIWC) word cate-

gories (Gill & Oberlander, 2002; Pennebaker & King, 1999;

Yarkoni, 2010; Beukeboom, 2013). For a broader word cov-

erage, we use the automatically generated abstractness dic-

tionary from Turney et al. (2011) which is publicly available.

This dictionary contains 114,501 words. They automatically

calculate a numerical rating of the degree of abstractness of

a word on a scale from 0 (highly concrete) to 1 (highly ab-

stract) based on generated feature vectors from the contexts

the word has been found in.

The mean level of abstraction was computed for each post

by adding the abstractness score of each word in the post

and dividing that by the total number of words. The follow-

ing are two example posts from the Accountable Talk course

Week 2 subforum, one with high level of abstraction and one

with low level of abstraction. Based on the abstraction dic-

tionary, abstract words are in italic and concrete words are

underlined.

•Level of abstraction = 0.85 (top 10%)

Iagree. Probably what you just have to keep in mind is

that you are there to help them learn by giving them op-

portunities to REASON out. In that case, you will not just

accept the student’s answer but let them explain how they

arrived towards it. Keep in mind to appreciate and chal-

lenge their answers.

•Level of abstraction = 0.13 (bottom 10%)

I teach science to gifted middle school students. The stu-

dents learned to have conversations with me as a class and

with the expert her wrote Chapter 8 of a text published in

2000. They are trying to design erosion control features

for the building of a basketball court at the bottom of a

hill in rainy Oregon.

We believe that level of language abstraction reﬂects the

understanding that goes into using those abstract words

when creating the post. In the Learn to Program course fo-

rums, many discussion threads are solving actual program-

ming problems, which is very different from the other two

courses where more subjective reﬂections of the course con-

tents are shared. Higher level of language abstraction re-

ﬂects the understanding of a broader problem. More con-

crete words are used when describing a particular bug a stu-

dent encounters. Below are two examples.

•Level of abstraction = 0.65 (top 10%)

I have the same problems here. Make sure that your

variable names match exactly. Remember that under-bars

connect words together. I know something to do with the

read board(board ﬁle) function, but still need someone to

explain more clearly.

•Level of abstraction = 0.30 (bottom 10%)

>>> print(python, is)(’python’, ’is’) >>> print(’like’,

’the’, ’instructors’, ’python’) It leaves the ’quotes’ and

commas, when the instructor does the same type of

print in the example she gets not parenthesis, quotes, or

commas. Does anyone know why?

5 Validation Experiments

We use survival analysis to validate that participants with

higher measured level of engagement will stay active in the

forums longer, controlling for other forum behaviors such as

how many posts the user contributes. We apply our linguistic

measures described in Section 4 to quantify student engage-

ment. We use the in-domain learner motivation classiﬁers

with both linguistic and unigram features (Section 4.1.5) for

the Accountable Talk class and the Fantasy and Science Fic-

tion class. We use the classiﬁer trained on the Accountable

Talk dataset to assign motivated/unmotivated labels to the

posts in the Learn to Program course.

5.1 Survival Model Design

Survival models can be regarded as a type of regression

model, which captures inﬂuences on time-related outcomes,

such as whether or when an event occurs. In our case, we

are investigating our engagement measures’ inﬂuence on

when a course participant drops out of the course forum.

More speciﬁcally, our goal is to understand whether our

automatic measures of student engagement can predict

her length of participation in the course forum. Survival

analysis is known to provide less biased estimates than

simpler techniques (e.g., standard least squares linear

regression) that do not take into account the potentially

truncated nature of time-to-event data (e.g., users who had

not yet left the community at the time of the analysis but

might at some point subsequently). From a more technical

perspective, a survival model is a form of proportional odds

logistic regression, where a prediction about the likelihood

of a failure occurring is made at each time point based

on the presence of some set of predictors. The estimated

weights on the predictors are referred to as hazard ratios.

The hazard ratio of a predictor indicates how the relative

likelihood of the failure occurring increases or decreases

with an increase or decrease in the associated predictor. We

use the statistical software package Stata (Stata, 2001). We

assume a Weibull distribution of survival times, which is

generally appropriate for modeling survival.

For each of our three courses, we include all the active stu-

dents, i.e. who contributed one or more posts to the course

forums. We deﬁne the time intervals as student participation

weeks. We considered the timestamp of the ﬁrst post by each

student as the starting date for that student’s participation in

the course discussion forums and the date of the last post as

the end of participation unless it is the last course week.

Dependent Variable:

In our model, the dependent measure is Dropout, which is

1 on a student’s last week of active participation unless it is

the last course week (i.e. the seventh course week), and 0 on

other weeks.

Control Variables:

Cohort1: This is a binary indicator that describes whether

a user had ever posted in the ﬁrst course week (1) or not

(0). Members who join the course in earlier weeks are more

likely than others to continue participating in discussion fo-

rums(Yang et al. 2013).

PostCountByUser: This is the number of messages a mem-

ber posts in the forums in a week, which is a basic effort

measure of engagement of a student.

CommentCount: This is the number of comments a user’s

posts receive in the forums in a week. Since this variable

is highly correlated with PostCountByUser (r >.70 for all

three courses). In order to avoid multicollinearity problems,

we only include PostCountByUser in the ﬁnal models.

Independent variables:

AvgMotivation is the percentage of an individual’s posts in

that week that are predicted as “motivated” using our model

with both unigram and linguistic features (Section 4.1.4).

AvgCogEngagement: This variable measures the average

abstractness score per post each week.

We note that AvgMotivation and AvgCogEngagement are

not correlated with PostCountByUser (r <.20 for all three

courses). So they are orthogonal to the simpler measure of

student engagement. AvgMotivation is not correlated with

AvgAbstractness (r <.10 for all three courses). Thus, it is

acceptable to include these variables together in the same

model.

5.2 Survival Model Results

Table 3 reports the estimates from the survival models for the

control and independent variables entered into the survival

regression.

Effects are reported in terms of the hazard ratio (HR),

which is the effect of an explanatory variable on the risk

or probability of participants drop out from the course fo-

rum. Because all the explanatory variables except Cohort1

have been standardized, the hazard rate here is the predicted

change in the probability of dropout from the course fo-

rum for a unit increase in the predictor variable(i.e., Cohort1

changing from 0 to 1 or the continuous variable increasing

by a standard deviation when all the other variables are at

their mean levels).

Our variables show similar effects across our three

courses (Table 3). Here we explain the results on the Ac-

countable Talk course. The hazard ratio value for Cohort1

Accountable Talk Fantasy and Science Fiction Learn to Program

Control/Indep. Variable HR Std. Err. HR Std. Err. HR Std. Err.

Cohort1 .68*** .05 .82* .08 .81* .04

PostCountByUser .86*** .02 .90*** .02 .76*** .04

AvgMotivation .58* .13 .82* .08 .84*** .04

AvgCogEngagement .94* .02 .92** .03 .53** .13

Table 3: Results of the survival analysis(*: p<0.05, **: p<0.01, ***: p<0.001).

means that members survival in the group is 32% 7higher

for those who have posted in the ﬁrst course week. Similarly,

the hazard ratio for PostCountByUser indicates that survival

rates are 14% 8higher for those who posted a standard devi-

ation more posts than average.

Controlling for when the participants started to post in the

forum and the total number of posts published each week,

both learner motivation and average level of abstraction sig-

niﬁcantly inﬂuenced the dropout rates in the same direc-

tion. Those whose posts expressed an average of one stan-

dard deviation more learner motivation (AvgMotivation) are

42% 9more likely to continue posting in the course forum.

Those whose posts have an average of one standard devi-

ation higher cognitive engagement level (AvgCogEngage-

ment) are 6% 10 more likely to continue posting in the course

forum. AvgMotivation is relatively more predicative of user

dropout than AvgCogEngagement for the Accountable Talk

course and the Fantasy and Science Fiction course, while

AvgCogEngagement is more predicative of user dropout in

the Learn to Program course. This may be due to that in the

Learn to Program course more technical problems are dis-

cussed and less posts contain motivation markers.

Figure 2: Survival curves for students with different levels

of engagement in the Accountable Talk course.

Figure 2-4 illustrate these results graphically, showing

three survival curves for each course. The solid curve shows

survival with the number of posts, motivation, and cognitive

engagement at their mean level. The top curve shows sur-

732% = 100% - (100% * 0.86)

814% = 100% - (100% * 0.86)

942% = 100% - (100% * 0.58)

106% = 100% - (100% * 0.94)

Figure 3: Survival curves for students with different levels

of engagement in the Fantasy and Science Fiction course.

Figure 4: Survival curves for students with different levels

of engagement in the Learn to Program course.

vival when the number of posts is at its mean level, and av-

erage learner motivation and level of cognitive engagement

in the posts are both one standard deviation above the mean,

and the bottom curve shows survival when the number of

posts is at its mean, and the average expressed learner mo-

tivation and level of cognitive engagement in the posts are

one standard deviation below the average.

5.3 Implications

In contrast to regular courses where students engage with

class materials in a structured and monitored way, and in-

structors directly observe student behavior and provide feed-

back, in MOOCs, it is important to target the limited instruc-

tor’s attention to students who need it most(Ramesh et al.

2013). The automated linguistic models designed in this pa-

per can help monitor MOOC user engagement from forum

posts. By identifying students who are likely to end up not

completing the class before it is too late, we can perform tar-

geted interventions (e.g., sending encouraging emails, post-

ing reminders, allocating limited tutoring resources, etc.) to

try to improve the engagement of these students. For exam-

ple, our motivation prediction model could be used to im-

prove targeting of limited instructor’s attention to users who

are motivated in general but are experiencing a temporary

lack of motivation that might threaten their continued par-

ticipation, in particular, those who have shown serious inten-

tion of ﬁnishing the course by joining the discussion forums.

One possible intervention that can be based on this type of

analysis might suggest instructors reply to those with recent

motivation level lower than it has been in the past. This may

help students get past a difﬁcult part of the course. We can

also recommend reading the highly motivated posts to the

other users, which may serve as an inspiration.

Based on the predictive engagement markers, we see it is

important for the students to be able to apply new knowledge

and engage in deeper thinking. Discussion facilitation can

inﬂuence levels of cognitive engagement (Corno and Mand-

inach, 1983). The instructor can encourage learners to reﬂect

on what and how learning addressed needs. Work on auto-

mated facilitation from the Computer Supported Collabora-

tive Learning (CSCL) literature might be able to be adapted

to the MOOC context to make this feasible (Adamson et al.

in press).

6 Conclusion

We present a study on how to measure MOOC student en-

gagement based on linguistic analysis on forum posts. We

identify two new measures that quantify engagement and

validate the measures on three Coursera courses with diverse

content. We automatically identify the extent to which posts

in course forums express learner motivation and cognitive

engagement. The survival analysis results validate that the

more motivation the learner expresses, the lower the risk of

dropout. Similarly, the more personal interpretation a par-

ticipant shows in her posts, the lower the rate of student

dropout from the course forums.

6.1 Limitations and future work

An important limitation of this study is that, even though the

activeness in a course’s online forum closely correlates with

the student’s drop out of the course, exactly when (and why)

the students drop out of a course entirely is not publicly ac-

cessible information. Gillanni(2013) shows that those that

engage explicitly in the discussion forums are often higher-

performing than their counterparts in the course. For the “in-

visible” users who have never interacted with other learn-

ers/staff on the discussion forums, we can only rely on click-

stream data to understand their behavior. Another limitation

is that even though we use longitudinal data, our ﬁndings

are correlational. Student motivation is generally catego-

rized as intrinsic and extrinsic in previous work(Ryan and

Deci 2000). In our work, we did not distinguish between the

two motivation types. Because in MOOC forums, we ob-

serve that there is a limited number of posts that demon-

strate extrinsic motivations(E.g. taking the course for high

grades or the certiﬁcation). In future work, with annotated

motivation types, it will be interesting to study how students

with different observed types of motivation behave differ-

ently in MOOCs. We also hope to utilize social interactions

in forums, such as who talk to whom information, to better

understand social learning in MOOCs (Sun et al. 2011). We

also plan to design and test a targeted intervention making

use of the predicted engagement level, which will allow us

to measure the practical impact of our ﬁndings as well as

to evaluate whether the correlational evaluation we present

here holds up to an experimental test of causality.

Acknowledgments

We want to thank Ryan Carlson, David Adamson and Tan-

may Sinha, who helped provide the data for this project. The

research reported here was supported by National Science

Foundation grant 0835426.

References

Adamson, D., Dyke, G., Jang, H. J., Ros´e, C. P. 2013. To-

wards an Agile Approach to Adapting Dynamic Collabora-

tion Support to Student Needs. International Journal of AI

in Education 24(1), pp91-121..

Belanger, Y. and Thornton, J 2013. Bioelectricity: A Quanti-

tative Approach Duke University’s First MOOC. Duke Uni-

versity.

Beukeboom, C. J., Tanis, M., and Vermeulen, I. E. 2013. The

Language of Extraversion Extraverted People Talk More

Abstractly, Introverts Are More Concrete. Journal of Lan-

guage and Social Psychology, 32(2), 191-201.

Beukeboom, C. J. 2014. Mechanims of linguistic bias: How

words reﬂect and maintain stereotypic expectancies. . Social

Cognition and Communication, 31, 313-330.

Brinton, C. G., Chiang, M., Jain, S., Lam, H., Liu, Z. and

Wong, F. M. F. 2013. Learning about social learning

in MOOCs: From statistical analysis to generative model.

arXiv preprint arXiv:1312.2159.

Carini, R. M., Kuh, G. D. and Klein, S. P. 2006. Student

engagement and student learning: Testing the linkages. Re-

search in Higher Education, 47(1), 1-32.

Cheng, J., Kulkarni, C. and Klemmer, S. 2013. Tools for

predicting drop-off in large online classes. In Proceedings

of the 2013 conference on Computer supported cooperative

work companion. ACM.

Chi, M.T.H. 2000. Self-explaining expository texts: The

dual processes of generating inferences and repairing men-

tal models. . In R. Glaser (Ed), Advances in instructional

psychology: Educational design and cognitive science (pp.

161-238).

Chi, M.T.H., and Bassock, M. 1989. Learning from exam-

ples via self-explanations. Knowing, learning, and instruc-

tion: Essays in honor of Robert Glaser (1989): 251-282..

Christensen, G., A. Steinmetz, B. Alcorn, A. Bennet, D.

Woods, and EJ Emmanuel 2013. The MOOC Phenomenom:

Who Takes Massive Open Online Courses and Why?. Uni-

versity of Pennsylvania, n.d. Web. 6 Dec. 2013 .

Clow, Doug. 2013. MOOCs and the funnel of participation.

Third International Conference on Learning Analytics and

Knowledge. ACM .

Corno, Lyn, and Ellen B. Mandinach. 1983. The role of

cognitive engagement in classroom learning and motivation.

Educational psychologist 18(2): 88-108. .

DeBoer, Jennifer, G. S. Stump, D. Seaton, and Lori Bres-

low. 2013. Diversity in MOOC students’ backgrounds and

behaviors in relationship to performance in 6.002 x. In Pro-

ceedings of the Sixth Learning International Networks Con-

sortium Conference.

Dowson, M., and McInerney, D. M. 2003. What do students

say about their motivational goals?: Towards a more com-

plex and dynamic perspective on student motivation. Con-

temporary Educational Psychology, 28(1), 91-113.

Esque, T. and McCausland, J. 1997. Taking ownership for

transfer: A management development study case. Perfor-

mance Improvement Quarterly, 10 (2), 116-133.

Fahy, P.J., Crawford, G. and Ally, M. 2001. Patterns of in-

teraction in a computer conference transcript. International

Review of Open and Distance Learning, 2(1).

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. and Lin,

C.-J. 2008. LIBLINEAR: A library for large linear classiﬁ-

cation. Journal of Machine Learning Research (9).

Fuchs, L.S., Fuchs, D., Hamlett, C. L., Phillips, N. B., Karns,

K., and Dutka, S. 1997. Enhancing students’ helping behav-

ior during peer-medicated instruction with conceptual math-

ematical explanations. Elementary School Journal, 97, 223-

249.

Garrison, D. R., Anderson, T. and Archer, W. 1999. Critical

inquiry in a text-based environment: Computer conferencing

in higher education. The internet and higher education, 2(2),

87-105..

Gillani, Nabeel. 2013. Learner communications in mas-

sively open online course. OxCHEPS Occasional Paper 53

Girish, Balakrishnan 2013. Predicting Student Retention in

Massive Open Online Courses using Hidden Markov Mod-

els. EECS Department, University of California, Berkeley.

Graham, S., & Golan, S. 1991. Motivational inﬂuences on

cognition: Task involvement, ego involvement, and depth of

information processing. Journal of Educational Psychology

83:187-194.

Huang, Jonathan, Chris Piech, Andy Nguyen, Leonidas

Guibas. 2013. Syntactic and Functional Variability of a

Million Code Submissions in a Machine Learning MOOC.

In Proceedings of the 1st Workshop on Massive Open Online

Courses at the 16th Annual Conference on Artiﬁcial Intelli-

gence in Education.

Keller, J., and Suzuki, K. 2004. Learner motivation and e-

learning design: A multinationally validated process. Jour-

nal of Educational Media, 29(3), 229-239.

Kizilcec, Ren F., Chris Piech, and Emily Schneider 2013.

Deconstructing disengagement: analyzing learner subpopu-

lations in massive open online courses. In Proceedings of

the Third International Conference on Learning Analytics

and Knowledge. ACM.

Koch, Gary G 1982. Intraclass correlation coefﬁcient. In

S. Kotz and N. L. Johnson (Eds), Encyclopedia of statistical

sciences (pp.213-217).New York:John Wiley and Sons .

Koller, Daphne, Andrew Ng, Chuong Do and Zhenghao

Chen. 2013. Retention and Intention in Massive Open On-

line Courses. In Depth. Educause. N.p., 3. .

Liu, Bing., Minqing Hu, and Junsheng Cheng. 2005. Opin-

ion Observer: analyzing and comparing opinions on the

Web. In Proceedings of WWW, pages 342-351..

Locke, E. A., and Latham, G. P. 2002. Building a practically

useful theory of goal setting and task motivation. American

Psychologist, 57(9), 705-717.

Milligan, C., Littlejohn, A., and Margaryan, A. 2013. Pat-

terns of Engagement in Connectivist MOOCs. Journal of

Online Learning and Teaching, 9(2).

Moshinskie, J. 2001. How to keep e-learners from e-

scaping. Performance Improvement, 40(6), 30-37.

Ng, Evelyn, and Carl Bereiter 1991. Three levels of goal

orientation in learning. In Journal of the Learning Sciences

1.3-4.

Pennebaker, J. W. and King, L. A. 1999. Linguistic Styles:

Language use as an individual difference. Journal of Per-

sonality and Social Psychology, 77, 1296-1312.

Poellhuber, Bruno., Normand Roy, Ibtihel Bouchoucha,

Jacques Raynauld, Jean Talbot and Terry Anderso. 2013.

The Relations between MOOC’s Participants Motivational

Proﬁles, Engagement Proﬁle and Persistence. In MRI Con-

ference, Arlington.

Ramesh, Arti, Dan Goldwasser, Bert Huang, Hal Daum III,

and Lise Getoor 2013. Modeling Learner Engagement in

MOOCs using Probabilistic Soft Logic. In workshop of

NIPS.

Rose, C. Carlson, D. Yang, M. Wen, L. Resnick, P. Gold-

man, and J. Sheerer. 2014. Social factors that contribute to

attrition in moocs. In ACM Learning at Scale.

Roscoe, R. D., and Chi, M. T. H. 2008. Tutor learning: The

role of explaining and responding to questions. Instructional

Science, 36. pp. 321-350.

Ryan, Richard M., and Edward L. Deci. 2000. Intrinsic and

extrinsic motivations: Classic deﬁnitions and new directions.

In Contemporary educational psychology 25.1 (2000): 54-

67 .

Snow, R., O’Connor, B., Jurafsky, D. and Ng, A. Y. 2008.

Cheap and fast — but is it good? Evaluating non-expert

annotations for natural language tasks. In Proceedings of

the Conference on Empirical Methods in Natural Language

Processing (pp. 254-263).

Stoney, C. and Oliver, R. 1999. Can higher order thinking

and cognitive engagement be enhanced with multimedia?

Interactive Multimedia Electronic Journal of Computer-

Enhanced Learning. .

Stata Corporation 2011. Stata Statistical Software Release

7.0: Programming. Stata Corporation .

Sun, T, W. Chen, Z. Liu, Y. Wang, X. Sun, M. Zhang, and

C.-Y. Lin. 2011. Participation maximization based on social

inﬂuence in online discussion forums. In Proceedings of

ICWSM, 2011.

Turney, P. D., Neuman, Y., Assaf, D. and Cohen, Y. 2011.

Literal and metaphorical sense identiﬁcation through con-

crete and abstract context. In Proceedings of the 2011 Con-

ference on the Empirical Methods in Natural Language Pro-

cessing (pp. 680-690) .

Vedder, P. 1985. Cooperative learning: A study on processes

and effects of cooperation between primary school children.

Westerhaven, Groningen: Rijkuniversiteit Groningen..

Wang, Yi-Chia, Robert Kraut, and John M. Levine 2012.

To stay or leave?: the relationship of emotional and infor-

mational support to commitment in online health support

groups. In Proceedings of the ACM 2012 conference on

Computer Supported Cooperative Work. ACM.

Witten, I. H. and Frank, E. 2005. Data mining: Practical

machine learning tools and techniques (2nd ed.) Elsevier.

Wittrock, M. C. 1990. Generative processes of comprehen-

sion. Educational Psychologist, 24, pp. 345-376. .

Yang, D., Sinha, T., Adamson, D. and Ros´e, C. P. 2013.

“Turn on, Tune in, Drop out”: Anticipating student dropouts

in Massive Open Online Courses. In workshop of NIPS.

Zhu, E. 2006. Interaction and cognitive engagement: An

analysis of four asynchronous online discussions. Instruc-

tional Science, 34(6), 451-480.

Predicting Speech Acts in MOOC Forum Posts

Article

Aug 2021

Students in a Massive Open Online Course (MOOC) interact with each other and the course staff through online discussion forums. While discussion forums play a central role in MOOCs, they also pose a challenge for instructors. The large number of student posts makes it difficult for an instructor to know where to intervene to answer questions, resolve issues, and provide feedback. In this work, we focus on automatically predicting speech acts in MOOC forum posts. Our speech act categories describe the purpose or function of the post in the ongoing discussion. Specifically, we address three main research questions. First, we investigate whether crowdsourced workers can reliably label MOOC forum posts using our speech act definitions. Second, we investigate whether our speech acts can help predict instructor interventions and assignment completion and performance. Finally, we investigate which types of features (derived from the post content, author, and surrounding context) are most effective for predicting our different speech act categories.

The Evaluation of Two Text Feature Sets as Performance Predictors in E Learning Settings

Conference Paper

Full-text available

Apr 2023

While students generate a large volume of texts in higher education (e.g. blogs, fora, wikis, assignments), the potential of these texts as performance predictors has been unexplored. In the present work we report findings from a study in which undergraduates first viewed a series of six short video lectures and then wrote a short summary for each. Students' comprehension of each video lecture was measured with a quiz. Based on the median score as a threshold, two performance groups were created, high (above median) and low (below median). Using standard NLP approaches, we converted the student summaries into two large feature sets: (a) raw (i.e., BoW and TF-IDF) and (b) engineered (i.e., Part of Speech, embeddings). After encoding both the video lecture transcript and each respective student summary, the resulting sparse and dense vectors were used to compute the cosine similarity. The raw and engineered feature sets were subsequently used to train eight common ML classifiers. As the two classes were imbalanced, we used both the accuracy and the f1 score as metrics. The results indicated that in 50% of all cases, the raw text feature set led to a higher average classification accuracy compared to the engineered features. Furthermore, the average classification performance of the classifiers was .70 for the raw text features and .74 for the engineered features. Finally, depending on the metric and feature set, some classifiers performed better than others.

The engage taxonomy: SDT-based measurable engagement indicators for MOOCs and their evaluation

Article

Full-text available

Aug 2023
USER MODEL USER-ADAP

Massive Online Open Course (MOOC) platforms are considered a distinctive way to deliver a modern educational experience, open to a worldwide public. However, student engagement in MOOCs is a less explored area, although it is known that MOOCs suffer from one of the highest dropout rates within learning environments in general, and in e-learning in particular. A special challenge in this area is finding early, measurable indicators of engagement. This paper tackles this issue with a unique blend of data analytics and NLP and machine learning techniques together with a solid foundation in psychological theories. Importantly, we show for the first time how Self-Determination Theory (SDT) can be mapped onto concrete features extracted from tracking student behaviour on MOOCs. We map the dimensions of Autonomy, Relatedness and Competence, leading to methods to characterise engaged and disengaged MOOC student behaviours, and exploring what triggers and promotes MOOC students’ interest and engagement. The paper further contributes by building the Engage Taxonomy, the first taxonomy of MOOC engagement tracking parameters, mapped over 4 engagement theories: SDT, Drive, ET, Process of Engagement. Moreover, we define and analyse students’ engagement tracking, with a larger than usual body of content (6 MOOC courses from two different universities with 26 runs spanning between 2013 and 2018) and students (initially around 218.235). Importantly, the paper also serves as the first large-scale evaluation of the SDT theory itself, providing a blueprint for large-scale theory evaluation. It also provides for the first-time metrics for measurable engagement in MOOCs, including specific measures for Autonomy, Relatedness and Competence; it evaluates these based on existing (and expanded) measures of success in MOOCs: Completion rate, Correct Answer ratio and Reply ratio. In addition, to further illustrate the use of the proposed SDT metrics, this study is the first to use SDT constructs extracted from the first week, to predict active and non-active students in the following week.

MedNgage: A Dataset for Understanding Engagement in Patient-Nurse Conversations

Conference Paper

Jan 2023

MedNgage: A Dataset for Understanding Engagement in Patient-Nurse Conversations

Preprint

Full-text available

May 2023

Patients who effectively manage their symptoms often demonstrate higher levels of engagement in conversations and interventions with healthcare practitioners. This engagement is multifaceted, encompassing cognitive and socio-affective dimensions. Consequently, it is crucial for AI systems to understand the engagement in natural conversations between patients and practitioners to better contribute toward patient care. In this paper, we present a novel dataset (MedNgage), which consists of patient-nurse conversations about cancer symptom management. We manually annotate the dataset with a novel framework of categories of patient engagement from two different angles, namely: i) socio-affective engagement (3.1K spans), and ii) cognitive engagement (1.8K spans). Through statistical analysis of the data that is annotated using our framework, we show a positive correlation between patient symptom management outcomes and their engagement in conversations. Additionally, we demonstrate that pre-trained transformer models fine-tuned on our dataset can reliably predict engagement categories in patient-nurse conversations. Lastly, we use LIME (Ribeiro et al., 2016) to analyze the underlying challenges of the tasks that state-of-the-art transformer models encounter. The de-identified data is available for research purposes upon request.

Predictive modeling of student success

Chapter

Apr 2023

Unlocking learner engagement and performance: A multidimensional approach to mapping learners to learning cohorts

Article

Full-text available

Jun 2024
Educ Inform Tech

Learning analytics, located at the intersection of learning science, data science, and computer science, aims to leverage educational data to enhance teaching and learning. However, as educational data increases, distilling meaningful insights presents challenges, particularly concerning individual learner differences. This work introduces a comprehensive approach for designing and automatically mapping learners into meaningful cohorts based on diverse learning behaviors. It defines four critical contexts, including engagement, direction, repetitiveness, and orderliness, and generates practical learning cohorts from their combinations. The approach employs a time-series-based clustering method with K-means clustering using dynamic time warping to identify similar learning patterns. Statistical techniques like the Mann-Kendall test and Theil-Sen estimator further refine the process. A case study on data science courses validates the approach, offering novel insights into learner behavior. The contributions include a novel time series approach to characterizing learning behavior, new learning cohorts based on critical contexts, and a systematic method for automated cohort identification.

The role of MOOC forum discussion tasks in learners' cognitive engagement

Article

Full-text available

May 2024
J COMPUT ASSIST LEAR

Background Forums in massive open online courses (MOOCs) enable written exchanges on course content; hence, they can potentially facilitate learners' cognitive engagement. Given the myriad of MOOC forum messages, this engagement is commonly analysed automatically through the linguistic features of the messages. Assessing linguistic features of learners' forum messages involves consideration of the learning tasks. MOOC forum discussion tasks, however, have not been previously considered. Objective and Method This study explores the effects of MOOC forum discussion tasks on learners' cognitive engagement. Based on the structure of observed learning outcomes (SOLO) taxonomy, we manually annotate distinct levels of cognitive engagement encouraged in forum discussion tasks and displayed by learners in messages starting discussions (i.e., thread starters). We study the linguistic features of thread starters in relation to the pedagogical design of the discussion tasks. Additionally, we use random‐forest modelling to identify the linguistic and task‐related features that help to categorise learners' cognitive engagement according to SOLO levels. Results Manual analysis showed that learners' thread starters mainly reflect surface SOLO levels and include few academic words and cohesive language. Random‐forest modelling showed that these linguistic features, together with the SOLO levels encouraged in the discussion tasks, played an important role in identification of learners' cognitive engagement. Major Takeaways Our results highlight the importance of the pedagogical design of MOOC forum tasks in helping learners engage cognitively. Our study also contributes to the empirical evidence that learners' linguistic choices can afford insights into the quality of their cognitive engagement.

Pedagogical design of MOOC forums to facilitate meaningful interactions

Thesis

Full-text available

Dec 2023

Dennis A Rivera

Massive open online courses (MOOCs) emerged with the promise to disrupt higher education. Fifteen years after their emergence, in terms of performance, that promise has not been fulfilled. In MOOC discussion forums, learners seldom capitalise on the opportunities for social learning. Through four empirical studies, we investigate how MOOC discussion forums are structured and how they can be potentially designed to facilitate learner-to-learner interactions and instructional dialogue. Results show that thoughtful design can help improve MOOC forum navigation, participation, and interactions. However, the environment in which forums are embedded needs to be considered as a techno-pedagogical fabric that provides (but also constrains) opportunities for social learning.

The Linguistic Differences in Concept Conveying in English and Chinese xMOOC Forums

Article

Full-text available

Dec 2022

Recent studies have found that comments from teaching assistants may encourage interactions in edX-like Massive Open Online Course (xMOOC) forums. However, how concepts from these interactions are conveyed to other xMOOC participants has not received much attention. Therefore, this study focuses on a unidirectional teaching assistant-student xMOOC interaction (TS interaction), a content-related pair including one question from a student and one immediate answer from a teaching assistant. The authors particularly investigate the linguistic features (i.e., concept connectivity, concept concreteness, readability and semantic overlap) of concept conveying in TS interactions with many responses (mTS) and with few responses (fTS). In addition, a language factor (English and Chinese) is also considered. Additionally, the interaction transcripts from science lectures (SL) and political briefings (PB) were used as control groups as two opposite cases of concept conveying. At the concept level, the concept conveying in transcripts were modelled as a graph, and measured by common indicators in graph theory. At the overall level, the concept conveying in transcripts were measured by regular linguistic measuring tools. The results show that interactions with mTS and fTS demonstrate different concept conveying tendencies toward SL and PB in terms of linguistic features in both languages. The results suggest that in both languages, teaching assistants may use mixed concept-conveying strategies to stimulate more follow-up responses in xMOOC forums. These conclusions drawn from TS interactions can be even partially generalized in a larger student-student (SS) interaction dataset.

“Turn on, Tune in, Drop out”: Anticipating Student Dropouts in Massive Open Online Courses

Conference Paper

Full-text available

Dec 2013

In this paper, we explore student dropout behavior in Massive Open Online Courses(MOOC). We use as a case study a recent Coursera class from which we develop a survival model that allows us to measure the influence of factors extracted from that data on student dropout rate. Specifically we explore factors related to student behavior and social positioning within discussion forums using standard social network analytic techniques. The analysis reveals several significant predictors of dropout.

Intrinsic and extrinsic motivations: Classic definitions and new directions

Article

Jan 2000
CONTEMP EDUC PSYCHOL

Intrinsic and Extrinsic Motivations: Classic Definition and New Directions

Article

Feb 2000
CONTEMP EDUC PSYCHOL

Intrinsic and extrinsic types of motivation have been widely studied, and the distinction between them has shed important light on both developmental and educational practices. In this review we revisit the classic definitions of intrinsic and extrinsic motivation in light of contemporary research and theory. Intrinsic motivation remains an important construct, reflecting the natural human propensity to learn and assimilate. However, extrinsic motivation is argued to vary considerably in its relative autonomy and thus can either reflect external control or true self-regulation. The relations of both classes of motives to basic human needs for autonomy, competence and relatedness are discussed.

Knowing, learning, and instruction: Essays in honor of Robert Glaser

Article

Jan 1989

Intraclass Correlation Coefficient

Chapter

Aug 2006

Gary G. Koch

This article has no abstract.

The MOOC Phenomenon: Who Takes Massive Open Online Courses and Why?

Article

Jan 2013

Massive open online courses (MOOCs) have commanded considerable public attention due to their sudden rise and disruptive potential. But there are no robust, published data that describe who is taking these courses and why they are doing so. As such, we do not yet know how transformative the MOOC phenomenon can or will be. We conducted an online survey of students enrolled in at least one of the University of Pennsylvania’s 32 MOOCs offed on the Coursera platform. The student population tends to be young, well educated, and employed, with a majority from developed countries. There are significantly more males than females taking MOOCs, especially in BRIC and other developing countries. Students’ main reasons for taking a MOOC are advancing in their current job and satisfying curiosity. The individuals the MOOC revolution is supposed to help the most — those without access to higher education in developing countries — are underrepresented among the early adopters.

Towards an Agile Approach to Adapting Dynamic Collaboration Support to Student Needs

Article

Jan 2014

This paper investigates the use of conversational agents to scaffold on-line collaborative learning discussions through an approach called Academically Productive Talk (APT). In contrast to past work on dynamic support for collaborative learning, where agents were used to elevate conceptual depth by leading students through directed lines of reasoning (Kumar & Rosé, IEEE Transactions on Learning Technologies, 4(1), 2011), this APT-based approach uses generic prompts that encourage students to articulate and elaborate their own lines of reasoning, and to challenge and extend the reasoning of their teammates. This paper integrates findings from a series of studies across content domains (biology, chemistry, engineering design), grade levels (high school, undergraduate), and facilitation strategies. APT based strategies are contrasted with simply offering positive feedback when the students themselves employ APT facilitation moves in their interactions with one another, an intervention we term Positive Feedback for APT engagement. The pattern of results demonstrates that APT based support for collaborative learning can significantly increase learning, but that the effect of specific APT facilitation strategies is context specific. It appears the effectiveness of each strategy depends upon factors such as the difficulty of the material (in terms of being new conceptual material versus review) and the skill level of the learner (urban public high school vs. selective private university). In contrast, Feedback for APT engagement does not positively impact learning. In addition to an analysis based on learning gains, an automated conversation analysis technique is presented that effectively predicts which strategies are successfully operating in specific contexts. Implications for design of more agile forms of dynamic support for collaborative learning are discussed.

Three Levels of Goal Orientation in Learning

Article

Jul 1991

Sixteen adult volunteers provided thinking-aloud protocols while undergoing a 10-hr individually administered course in BASIC (beginner's all-purpose symbolic instruction code) programming. Three levels of goals were identified as operative in the learning situation: task-completion goals, instructional goals, and personal knowledge-building goals. Although protocol statements indicating knowledge-building goals were infrequent, students exhibiting a relatively high proportion of them were distinctive in several ways. They did significantly better on a posttest. Their performance in goal cue selections differed from that of other participants in ways consistent with their orientation: They responded more often to learning goal cues than to task goal cues. They actively related new learning to prior knowledge and they posed and tried to solve problems and questions. Students oriented toward instructional goals tended to focus on what was explicitly taught. Students oriented toward task-completion goals tended to equate learning with successful completion of assigned tasks. Level of goal orientation and posttest performance were unrelated to level of education and prior computer experience but were positively related to previous experience of independent learning.

Communication Patterns in Massively Open Online Courses

Article

Oct 2014
Internet High Educ

Despite the hype and speculation about the role massively open online courses (MOOCs) may play in higher education, empirical research that explores the realities of interacting and learning in MOOCs is in its infancy. MOOCs have evolved from previous incarnations of online learning but are distinguished in their global reach and semi-synchronicity. Thus, it is important to understand the ways that learners from around the world interact in these settings. In this paper, we ask three questions: 1) What are the demographic characteristics of students that participate in MOOC discussion forums? 2) What are the discussion patterns that characterize their interactions? And 3) How does participation in discussion forums relate to students’ final scores? Analysis of nearly 87,000 individuals from one MOOC reveals three key trends. First, forum participants tend to be young adults from the western world. Secondly, these participants assemble and disperse as crowds, not communities, of learners. Finally, those that engage explicitly in the discussion forums are often higher-performing than those that do not, although the vast majority of forum participants receive “failing” marks. These findings have implications for the design and implementation of future MOOCs, and how they are conceptualised as part of higher education.

Tools for predicting drop-off in large online classes

Conference Paper

Feb 2013

This paper describes two diagnostic tools to predict students are at risk of dropping out from an online class. While thousands of students have been attracted to large online classes, keeping them motivated has been challenging. Experiments on a large, online HCI class suggest that the tools these paper introduces can help identify students who will not complete assignments, with an F1 score of 0.46 and 0.73 three days before the assignment due date.

Linguistic Reflections of Student Engagement in Massive Open Online Courses

Abstract and Figures

Recommended publications

Linguistic Reflections of Student Engagement in Massive Open Online Courses

Towards an Integration of Text and Graph Clustering Methods as a Lens for Studying Social Interactio...

Towards triggering higher-order thinking behaviors in MOOCs

“Turn on, Tune in, Drop out”: Anticipating Student Dropouts in Massive Open Online Courses