ArticlePDF Available

ESL students’ oral performance in English language school-based assessment: results of an empirical study

Authors:

Abstract and Figures

Background The English language school-based assessment (SBA) component of the Hong Kong Diploma of Secondary Education (HKDSE) Examination is innovative in that the assessment tasks involve assessing English oral language skills in a high-stakes context but they are designed and implemented in the ESL classroom by school teachers in light of a regular reading and viewing program or the elective modules integrated into the school curriculum. While this certainly is a positive move towards better congruence between the teaching, learning, and assessment activities, there has been concern whether the teachers are capable of applying the assessment criteria and standards consistently in spite of going through a variety of standardization meetings and sharing discussions initiated and mandated by the Hong Kong Examination and Assessment Authority (HKEAA). In other words, there has been concern about the extent to which results provided from teachers in different schools are comparable. Also, how may task difficulty be reflected in students’ assessment results across the two SBA task types? It was to provide some research evidence on matters relating to these issues associated with teacher assessment results that the study described here was carried out. Methods The study, with the help of Rasch analysis, aims to examine the psychometric qualities of this English language school-based assessment, how students’ assessment results may vary across different schools, and how task difficulty may vary across the two different task types. Results The findings indicated the following: (1) among the three schools involved in this study, two band 2 schools demonstrated similar abilities across all task domains as there were no significant differences in students’ SBA results in all assessment domains between these two band 2 schools. Significant differences were found in some assessment domains between the two band 2 schools and the band 3 school; (2) an obviously more fine-grained pattern of difference in difficulty levels of different assessment domains was observed in students’ assessment results across the two task types in this study than in previous studies. Conclusions Implications of the results for teacher assessor training and test task development are discussed.
This content is subject to copyright. Terms and conditions apply.
R E S E A R C H Open Access
ESL studentsoral performance in English
language school-based assessment: results
of an empirical study
Zhengdong Gan
1*
, Emily Pey Tee Oon
1
and Chris Davison
2
* Correspondence:
zhengdonggan@umac.mo
1
Faculty of Education, University of
Macau, Macao, Peoples Republic of
China
Full list of author information is
available at the end of the article
Abstract
Background: The English language school-based assessment (SBA) component of
the Hong Kong Diploma of Secondary Education (HKDSE) Examination is innovative
in that the assessment tasks involve assessing English oral language skills in a high-
stakes context but they are designed and implemented in the ESL classroom by
school teachers in light of a regular reading and viewing program or the elective
modules integrated into the school curriculum. While this certainly is a positive move
towards better congruence between the teaching, learning, and assessment
activities, there has been concern whether the teachers are capable of applying the
assessment criteria and standards consistently in spite of going through a variety of
standardization meetings and sharing discussions initiated and mandated by the Hong
Kong Examination and Assessment Authority (HKEAA). In other words, there has been
concern about the extent to which results provided from teachers in different schools are
comparable. Also, how may task difficulty be reflected in studentsassessment results
across the two SBA task types? It was to provide some research evidence on matters
relating to these issues associated with teacher assessment results that the study
described here was carried out.
Methods: The study, with the help of Rasch analysis, aims to examine the psychometric
qualities of this English language school-based assessment, how studentsassessment
results may vary across different schools, and how task difficulty may vary across the two
different task types.
Results: The findings indicated the following: (1) among the three schools involved in this
study, two band 2 schools demonstrated similar abilities across all task domains as there
were no significant differences in studentsSBA results in all assessment domains between
these two band 2 schools. Significant differences were found in some assessment
domains between the two band 2 schools and the band 3 school; (2) an obviously more
fine-grained pattern of difference in difficulty levels of different assessment domains was
observed in studentsassessment results across the two task types in this study than in
previous studies.
Conclusions: Implications of the results for teacher assessor training and test
task development are discussed.
Keywords: School-based assessment, Oral performance, Rasch analysis
© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if changes were made.
Gan et al. Language Testing in Asia (2017) 7:19
DOI 10.1186/s40468-017-0051-2
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Background
In contrast to large-scale standardized testing in which the assessor is usually someone
who must remain objective and uninvolved throughout the whole assessment process,
school-based assessment tends to be embedded in the regular curriculum and assessed
by a teacher who is familiar with the students work (Davison 2007). Davison maintains
that school-based assessment derives its validity from building into its actual design the
capacity for triangulation and the collection of multiple sources and types of evidence
under naturalistic conditions over a lengthy period of time. Consequently, the reliability
of the assessment was also enhanced by having a series of assessments (rather than just
one) by a teacher who was familiar with the student and by encouraging multiple oppor-
tunities for assessor reflection and standardization.(Davison 2007, p. 51). In other words,
teachers are in the best position to determine the quality of student achievement over
time and at specific points and to improve student learning (Wyatt-Smith et al. 2010).
However, drawing on her qualitative observation, Sadler (1998, p. 8082) made expli-
cit the typical intellectual and experiential resources teachers rely on when making a
judgment in classroom assessment:
Superior knowledge about the content or substance of what is to be learned
Deep knowledge of criteria and standards [or performance expectations]
appropriate to the assessment task
Evaluative skill or expertise in having made judgments about studentsefforts on
similar tasks in the past
A set of attitudes or dispositions towards teaching, as an activity, and towards learners,
including their own ability to empathize with students who are learning; their desire
to help students develop, improve, and do better; their personal concern for the
feedback and veracity of their own judgments; and their patterns in offering help
Implicit in Sadlers observation is thus that teacher judgments might be characterized
as remaining responsive to the influence of other knowledge and skills rather than the
stated standards and criteria. Clapham (2000) further commented:
A problem with methods of alternative assessment, however, lies with their validity
and reliability: tasks are often not tried out to see whether they produce the desired
linguistic information; marking criteria are not investigated to see whether they
work; and raters are often not trained to give consistent marks. (p. 152).
In a survey of a high-profile school-based assessment initiative in Hong Kong (Davi-
son et al. 2010), teacher comments such as I would like the HKEAA to take up my
marks to see if I have interpreted the criteria correctlyrevealed a lack of confidence
among teachers about this teacher-mediated and context-dependent assessment initia-
tive, with many doubting that they had the required knowledge and skills to carry out
the assessment properly. Although this English language school-based assessment com-
ponent has been implemented in schools for nearly 10 years, there has been almost no
empirical evidence to illustrate the extent to which teacher assessment results from one
school are comparable to results of another school. Also, to what extent does difficulty
level of different task domains vary across the two task types in this assessment? It was
Gan et al. Language Testing in Asia (2017) 7:19 Page 2 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
to provide some research evidence on matters relating to teacher assessment results
that the study described here was carried out.
School-based English language assessment (SBA) scheme in Hong Kong
The literature on school-based assessment has been growing for more than two de-
cades (Davison 2007; Meisels et al. 2001; Gan 2012; Gan 2013; Qian 2014). School-
based assessment, as an alternative to testing, in the form of greater use of teachers
assessment of their own students, has become increasingly popular in many countries
over the world. Such curriculum-embedded performance assessments often defined as in-
tegrated parts of studentslearning experience rely heavily on teacher judgment. They dif-
fer from external assessments in that curriculum-embedded performance assessments are
integrated into the daily curriculum and instructional activities of a classroom (Meisels et
al. 2001). The thinking behind the curriculum-embedded performance assessments is
based on a social-constructivist view of learning (Vygotsky 1978). The use of this
curriculum-embedded performance assessment is often advocated on the grounds that it
can be conducted as part of teaching and so provide formative feedback to students, thus
improving their learning (Crooks 1988). What characterizes this type of curriculum-
embedded performance assessment is that both the teacher and students are actively
engaged with every stage of the assessment process in order that they truly understand
the requirements of the process, and the criteria and standards being applied (Price et al.
2007). Essential to the operation of this type of assessment is the teachers ability to recon-
cile the dual role that they are required to take in both promoting and judging learning
(Harlen 2005). Harlen points out that the task of helping teachers take up this dual role
can be particularly difficult in countries where a great deal of emphasis is given to exami-
nations results. For example, Choi (1999) suggested that in a highly competitive
examination-driven school system such as Hong Kongs, success of a school-based assess-
ment initiative hinges on assessment training and resource support provided for teachers.
Choi, however, mentioned another difficulty in introducing a school-based assessment ini-
tiative is to ensure credibility for school-based assessment. This means that an effective
and efficient quality assurance and quality control system needs to be established so that
the users of examination results can be assured of the reliability of this scheme of assess-
ment and have confidence in the teachersjudgments.
The school-based assessment (SBA) scheme in Hong Kong started out as a compo-
nent of the Hong Kong Certificate of Education Examination (HKCEE) English
Language in 2006. This assessment scheme which was collaboratively initiated by the
Education Bureau (EDB) and the Hong Kong Examinations and Assessment Authority
(HKEAA) is innovative in that assessments are administered in schools and marked by
teachers in the context of public assessment. Grounded within an assessment for
learningframework, it is now incorporated into the new Hong Kong Diploma of
Secondary Education (HKDSE) English Language Examination, adopting a standards-
referenced assessment system, aiming to not just report on the full range of educational
achievement but also motivate learning in Hong Kong secondary schools. In addition
to the fact that this assessment scheme accounts for 15% of the total subject mark in
the HKDSE, this SBA component seeks to provide a more comprehensive evaluation of
learnersachievement by assessing those learning objectives which can hardly be
assessed in public assessments while concurrently enhancing the capability for student
Gan et al. Language Testing in Asia (2017) 7:19 Page 3 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
self-assessment and life-long learning (Davison 2007). Given the nature of multiple
functions of this SBA component, we believe this current school-based English
language assessment can best be defined as:
The process by which teachers gather evidence in a planned and systematic way in order
to draw inferences about their studentslearning, based on their professional judgment,
and to report at a particular time on their studentsachievements (Harlen 2005, p. 247).
According to HKEAA, these two kinds of assessment tasks build on two different kinds
of learning programs embedded in the school curriculum in Hong Kong. One is a reading/
viewing program in which students read/view four texts over the course of 3 years and
undertake an individual presentation or a group interaction based on the books/videos/films
that they have read/viewed. The other is the elective module(s) in the school curriculum
where students carry out an individual presentation or a group interaction based on the
knowledge, skills, and experience gained in these elective modules.
Although SBA underwent a detailed research, development, and distribution process and
bears the advantages of providing teachers with a formative view of the progress of individ-
ual students and allowing them to address moreeffectivelythespecificneedsoftheirstu-
dents (Yip and Cheung 2005; Carless and Harfitt 2013), challenges and controversy arose
particularly when assessment for both formative and summative purposes is integrated into
the regular teaching and learning process, with school teachers involved at all stages of the
assessment cycle, from planning the assessment program, to identifying and/or developing
appropriate assessment tasks right through to making the final judgments (Davison 2007).
While responses of teachers and students to the underlying philosophy of SBA and its em-
phasis on improving the quality of teaching and learning were generally very positive, con-
cern about the comparability of SBA scores across schools has been pervasive and still
continues, with some more experienced teachers being even more vocal with regard to
negative comments towards the administration of SBA in the initial stage (Qian 2014).
Reliability is often defined as the consistency of measurement (Bachman and Palmer
1996). In other words, the reliability of a test or assessment has to do with the consistency
of scoring and the accuracy of the administration procedures of the test or assessment
(Chiedu and Omenogor 2014). Chiedu and Omenogor suggest that in the case of teacher-
directed classroom assessment, two teacher assessors may not necessarily interpret the as-
sessment criteria the same way. In addition, as teacher-directed classroom assessment may
vary in different contexts at different times, it may lead to inconsistent assessor judgment
(McNamara 1996). It has thus been widely believed that a major source of unreliability is
thescoringofatestorassessment.Undoubtedly, reliability is as an important issue for
school-based assessment as for traditional testing. Currently, in the case of English language
SBA in Hong Kong, the following methods, within-school standardization,inter-school
sharing,andHKEAAsstatisticalmoderation, are adopted by the HKEAA (2016, p. 22) to
ensure the reliability and consistency of SBA scores across schools. Below is a description of
each of these four assessment training methods.
Within-school standardization
Within-school standardizationmeans that if there is more than one subject teacher
teaching the subject to the same cohort of students in the school, it is necessary for the
Gan et al. Language Testing in Asia (2017) 7:19 Page 4 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
teachers involved to agree on the criteria for awarding marks so that the same standard
of assessment is applied to all students. Specifically, teachers teaching the same cohort
of students bring samples of video-recorded assessments of different levels (e.g., the
three highest and the three lowest assessments) to the school-level standardization
meeting where the video-recorded assessments are shown and discussed. The discus-
sions at this school-level standardization meeting may lead to adjustments to scores
across classes in the school. This school-level standardization ensures that all the
teachers involved in SBA in the school will achieve a clear understanding of the shared
expectations of what students at particular levels should be able to do in order to
achieve a certain score.
Inter-school sharing
Following the within-school standardization meeting, Inter-school sharingmeetings
are organized by SBA District Coordinators. At the end of the school year, the SBA
District Coordinator will organize an inter-school meeting for professional sharing
among the schools within the group. The School Coordinators bring samples of video-
recordings and assessment records to this inter-school meeting where these samples of
student performance from different schools will be viewed and discussed with reference
to the assessment criteria. Each School Coordinator needs to report back to colleagues
in their own schools. If it is apparent that a particular schools scores are markedly
higher or lower as a whole than those from the other schools as a whole, the school
team may wish to review their scores.
HKEAAs statistical moderation
Despite the school-level teachersparticipatory and reflective professional sharing in
the implementation of SBA, there is still the likelihood that teachers in one school may
be harsher or more lenient in their judgments than teachers in other schools. Given
this concern, a statistical moderation method is adopted by HKEAA in moderating the
SBA assessments submitted by schools, with the aim to ensuring the comparability of
SBA scores across schools. This statistical moderation is done by adjusting the average
and the spread of SBA scores of students in a given school with reference to the public
examination scores of the same group of students, supplemented with review of sam-
ples of studentswork. The statistical moderation results will be compared to the
results from the sample review. Potential adjustments will be made to the statistical
moderation results so that the final moderated scores of these schools can properly
reflect the performance of their students in the SBA.
Kane (2010) makes a distinction between procedural fairness and substantive fairness.
Procedural fairness can be said to require that all test takers take the same test or
equivalent tests, under the same conditions or equivalent conditions, and that their
performances be evaluated using the same rules and procedures. Substantive fairness in
testing requires that the score interpretation and any test-based decision rule be rea-
sonable and appropriate and that examinees of equal standing with respect to the con-
struct the test is intended to measure should on average earn the same test score,
irrespective of group membership(AERA et al. 1999, p. 74). In other words, substantive
fairness is concerned with how well the program functions for different groups, and it
requires that scores have comparable meaning in different groups. While the above
Gan et al. Language Testing in Asia (2017) 7:19 Page 5 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
school-level processes of systematic, participatory and reflective professional sharing
may indeed be helpful in mitigating stakeholderspotential concern about fairness of
SBA scores across schools, there has been almost no empirical evidence to illustrate
the extent to which SBA results across different types of schools are comparable. To fill
in this research gap, the present study, with the help of Rasch analysis, aims to examine
how students participating in SBA in different schools may vary with regard to SBA
scores in the assessment tasks.
Task-based L2 performance in the research literature
Currently, there are two competing theoretical perspectives on task-based L2 performance
aiming to account for the impact of task type and task conditions on L2 spoken perform-
ance, the Tradeoff Hypothesis (Skehan 2009; Skehan 2014) and the Cognition Hypothesis
(Robinson et al. 2009, Robinson 2007). Skehans theoretical framework views limitations
in attention as fundamental to second language speech performance, which entails a need
to analyze what attentional and working-memory demands a task makes and the conse-
quences this may have for different language performance dimensions often referred to as
accuracy, fluency, and complexity. Consequently, it is often assumed that more demand-
ing tasks are likely to result in prioritization of fluency over accuracy and complexity and
that tasks based on familiar or concrete information favor a concern for accuracy. Also,
within this Tradeoff Hypothesis, it is suggested that interactive tasks or tasks requiring
transformation or manipulation of materials or tasks which have had pre-task planning
are likely to lead to greater linguistic complexity. Standing in clear opposition to Skehans
Tradeoff Hypothesis, Robinsons (2009, 2007) Cognition Hypothesis claims that there is
no limit to human attentional resources and as such human mind can attend to different
aspects of performance if certain conditions are met and that language learners can access
multiple attentional pools that do not compete and depletion of attention in one pool has
no effect on the amount remaining in another. Robinson (2007) also argues that the more
demanding a task is in terms of its content, the more complex and accurate its linguistic
performance will be.
Empirical studies that were guided by either SkehansorRobinsons framework and con-
ducted in pedagogic con-texts, however, yielded mixed results. For example, Bygate
(1999) examined the complexity of the language of Hungarian secondary EFL learners on
a monologic narrative task and an argumentation task and found that the narrative tasks
might stretch the speakers more in terms of complexity of syntactic and lexical process-
ing. Bygates study finding appeared to be echoed in the Michel et al. (2007) study which
revealed that the dialogic (i.e., interactive) task tended to elicit shorter and structurally
simpler sentences than the monologic narrative task, although Michel et al. also found
that students made significantly fewer errors and were significantly more fluent in the dia-
logic task condition. In other words, Michel et al.s study suggests that interactivity may
affect structural complexity negatively. It was thus apparent that Skehan and his col-
leagues(Foster and Skehan 1996; Skehan and Foster 1997) observation that more inter-
active tasks lead to more complex language performance did not find support in the
Bygate and Michel et al. (2007) studies. In language testing contexts, a few studies (e.g.,
Fulcher 1996; Bachman et al. 1995) reported significant but small differences in test scores
across different types of test tasks. More recently, a number of studies conducted in
experimental language testing settings that replicated SkehansorRobinsonsframework
Gan et al. Language Testing in Asia (2017) 7:19 Page 6 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
concerning the impact of task performance conditions on task performance revealed
results that did not lend much support to either of the their theoretical frameworks. Given
the mixed results of these studies on the relationship between task type and task perform-
ance, it is clear that this issue warrants further empirical research.
The context for the present study is innovative in that the assessment tasks in this
study involve speaking in a high-stakes language assessment context but they are
designed and implemented in the ESL classroom by school teachers in light of a regular
reading and viewing program or the elective modules integrated into the school cur-
riculum (Gan 2013). The processes of selecting appropriate assessment tasks and mak-
ing the actual assessments are undertaken collaboratively among teachers concerned,
taking into account the studentsbackgrounds, needs, and their skills. All the teachers
involved in the assessment, however, need to go through a series of within-school and
inter-school standardization meetings and discussions organized by the HKEAA to help
them to develop a clear understanding of the shared expectations of what students at
particular levels of performance should be able to do to achieve a certain grade.
Building on the research discussed above, the present study focuses on the following
research questions:
1. What is the variation of SBA results across schools in Hong Kong?
2. How may task difficulty be reflected in studentsassessment results across the two
SBA task types?
Methods
Participants
The study is part of a large-scale longitudinal project of investigating teachers and stu-
dentsperceptions of a high-profile school-based assessment initiative in Hong Kong
and using various measures to validate assessment tasks and assessment results. In an
earlier related study, a convenience sample of 373 secondary Form 6 students from
three different schools completed a questionnaire about their perceived difficulty of the
two task types on the school-based assessment. The students also reported on their
assessment results from the two assessment tasks. The study reported in this paper fo-
cused on analysis of the studentsassessment results collected in the earlier study.
Among the three schools involved in the study, schools A and B are both catholic
schools where English is used as the medium to teach core subjects such as English,
Math, Chemistry, and Physics. School C became a Chinese-medium school after 1997,
and at the time of this study, school C was making efforts to build up better discipline
and learning atmosphere among the students. Note that schools A and B are ranked as
band 2 school whereas school C is ranked as band 3 school in the traditional local
school rankings.
Procedures
Prior to studentsparticipation in the questionnaire survey, studentsperformance in
the two SBA tasks were assessed by their teachers who followed the assessment criteria
for both group discussion and individual presentation that cover six levels (level 1 rep-
resents the lowest level, and level 6 represents the highest level) of oral English profi-
ciency in the four major domains of English language performance. The two task types
are defined by HKEAA (2016) as follows:
Gan et al. Language Testing in Asia (2017) 7:19 Page 7 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
An individual presentation, which may be quite informal, is defined as a single piece
of oral text in which an individual speaker presents some ideas or information over a
sustained period (35 min), with the expectation that they will not be interrupted. An
individual presentation requires comparatively long turns, hence generally needing
more pre-planning and a more explicit structure to ensure coherence. A presentation
may be followed by questions or comments from the audience, but this exchange is not
mandatory for the assessment for the individual presentation.
Agroup interaction is defined as an exchange of short turns or dialog with more than
one speaker on a common topic. An interaction is jointly constructed by two or more
speakers, hence generally needing less explicit structuring but more attention to turn-
taking skills and more planning of how to initiate, maintain, and/or control the inter-
action by making suggestions, asking for clarification, supporting and/or developing
each others views, and disagreeing and/or offering alternatives.
In each of the individual presentations or group discussions, each participant thus
received a separate score for each of the four domains of assessment criteria, as well as
a global score as a result of the aggregation of the domain scores (Gan 2012).
Data analysis
In some of the previous test, validation studies, test psychometric properties, and result in-
terpretations were analyzed typically through conventional analysis methods. For instances,
internal consistency of test items in the form of Cronbachs alpha is usually examined for
the indication of reliability; face and content validity are obtained solely from a panel of ex-
perts; raw scores from each item were summed across for a total mean score for compari-
son of studentsperformance or for parametric statistical test examination. Such
conventional analyses of raw scores assumed interval-scale data for an ordinal-scale data
(Wright 1999) where parametric statistical tests are not readily to be performed on (Wright
and Master 1982; Boone et al. 2014; Liu 2010). When parametric test was done on ordinal
data, the results have an element of error. In other words, the reliability and validity of data
are jeopardized. In the present study, the psychometric properties of the test were assessed
by Rasch modeling analysis and raw scores were transformed into interval data (Rasch esti-
mates in unit logit) for the conduct of parametric statistical testthese features clearly ad-
vancedtheprecursorystudies.
In the current paper, the school-based English language assessment scores from 373 sec-
ondary Form 6 students from three schools were analyzed using Rasch analysis (Rasch
1980) with FACETS software (Linacre 2017). In the analysis, each separate domain of task
performance of the two SBA assessment tasks is referred to as an assessment item,scored
ona6-pointscale(seeAppendixes1and2).Atotal of eight assessment items were included
for analysis. This enables the psychometric quality of the instrument to be assessed. For this
purpose, principal component analysis of residuals, fit statistics, and Rasch separation indi-
ces were examined. Rasch model was used to transform the raw scores into interval-scale
data for analyses. Specifically, raw scores were transformed into Raschs estimates that are
linear and readily used for conventional statistical analyses, e.g., ANOVA for variables com-
parisons. In order to evaluate whether scores on the eight items of the two assessment tasks
were significantly different across schools, interaction analysis between item difficulty and
schools was conducted. In order to examine the relative task difficulty across the two SBA
task types, the difficulty estimates of the eight items were compared.
Gan et al. Language Testing in Asia (2017) 7:19 Page 8 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Results
Psychometric features of the assessment
Rasch model expects unidimensionality where scores are measures of only one latent
trait. While principal component analysis (PCA) of residuals identifies potential sec-
ondary dimension that distorts the measurement of the latent trait (Linacre 2014), it
assists unidimensionality assessment through variance explained by Raschs modeling
measures. The data is assumed to be unidimensional if the variance explained by Rasch
measures is greater than or equal to 50% (Linacre 2014). For the present study, the
PCA of residual test reporting 77.5% of variance was explained by Rasch measures.
This is an indication that the data are sufficiently unidimensional and appropriate for
Rasch analysisan attribute of construct validity (Bond and Fox 2015) and of strong
evidence that the scores are interpretable.
Fit statistics are also indicators for unidimensionality. The fit statistics is assessed
through Infit and Outfit Mean Squares (MnSq). Infit MnSq are derived from on-target
performance scores while Outfit MnSq are influenced more by the off-target scores.
Data that fit the Rasch model perfectly will yield a fit of 1. This ideal situation is impos-
sible in real world from actual data. A MnSq fit range between 0.60 and 1.40 indicated
good adhesion to the model (Bond and Fox 2015; Wright and Linacre 1994). Misfitting
statistics indicated that test items may measure more than one latent trait. Results to
the items staying outside the acceptable range should be interpreted with caution.
Table 1 shows that all items reported acceptable Infit and Outfit MnSq with values ran-
ging between 0.86 and 1.18. In addition to the PCA results reported earlier, the item fit
statistics indicated that the data were unidimensional and that item performed accord-
ing the Rasch models expectations.
Rasch modeling two separation indices providing information on whether the person
and item estimates estimated by Rasch model are reliable. Person separation index indi-
cates replicability of person ordering while item separation index indicates replicability
of item placement on an interval scale (Bond and Fox 2015). The widely accepted
threshold for the separation index is 3.0 (Bond and Fox 2015). Person and item separ-
ation indices for the present study were 5.07 and 4.31 (corresponding to 7.09 person
strata and 6.08 item strata)these results indicate that this sample and items are
Table 1 Item fit statistics
Entry Name Measure SE Infit MnSq Outfit MnSq
1 disProdel 0.06 0.08 1.01 1.05
2 disComstr 0.12 0.08 1.16 1.18
3 disVoclan 0.37 0.08 0.86 0.86
4 disIdeorg 0.67 0.08 0.91 0.86
5 indpreProdel 0.19 0.08 0.87 0.88
6 indpreComstr 0.33 0.08 1.14 1.11
7 indpreVoclan 0.25 0.08 0.86 0.88
8 indpreIdeorg 0.42 0.08 1.10 1.16
Notes: Entries 14 refer to the four performance domains (i.e., pronunciation and delivery, communication strategies,
vocabulary and language patterns, ideas and organization) of the SBA group interaction task; entries 58 refer to the four
performance domains (i.e., pronunciation and delivery, communication strategies, vocabulary and language patterns,
ideas and organization) of the SBA individual presentation task. Each domain of either task is scored on a 6-point scale.
See Appendixes 1 and 2
Gan et al. Language Testing in Asia (2017) 7:19 Page 9 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
separable into 67 levels of ability or difficulty levels, respectively (Bonk and Ockey
2003), from which they indicated that the person and item estimates from the Rasch
analysis are reliable and are replicable on interval scales (Bond and Fox 2015).
What is the variation of SBA results across different schools in Hong Kong?
An interaction analysis between item difficulty and schools was conducted to examine
the difference of studentsoral English performance on the eight assessment items
across the three different schools. The result of chi-square test showed that the inter-
action effect was significant [χ
2
= 111.3, p< .05]. In other words, generally, students
from different schools demonstrated significantly different performance on the items.
Rasch estimates are indicators of studentsability and item difficulty. A positive value
indicates higher ability and higher difficulty; in contrary, a negative value indicates
lower ability and lower difficulty. Students from school C scored higher on discussion
task (Fig. 1) as they reported lower Rasch-calibrated item difficulties across the four
group discussion task domains (items). Item 1 (disProdel) (0.34 logit) and item 2 (dis-
Comstr) (0.41 logit) were particularly easier for student from school C than they were
for students from schools A and B. The item difficulty of item 1 for schools A and B
were 0.34 logit and 0.42 logit, while the item difficulty of item 2 were 0.18 logit and
0.07 logit, respectively. The differences between schools A/B and C were significant
(p<.05). The differences of item difficulty of item 3 (disVoclan) and item 4 (disIdeorg)
between students from school B and school C were also significant (p<.05). It is obvi-
ous that students from schools A and B demonstrated similar abilities across the dis-
cussion items, as no significant difference on item difficulty was observed between
them on all discussion task domains.
In general, students from school C showed poorer performance on individual presen-
tation task domains, especially on items 6, 7, and 8. The item difficulty of these three
items for students from school C were 0.64 logit, 0.77 logit, and 0.03 logit
Fig. 1 Item difficulty estimates for the three schools
Gan et al. Language Testing in Asia (2017) 7:19 Page 10 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
respectively. In contrary, students from schools A and B performed significantly better
on these items (p< .05). The item difficulty of these three items were 0.18 logit, 0.36
logit, and 0.67 logit for students from school A and 0.06 logit, 0.03 logit, and 0.82
logit for students from school B. Students from schools A and B demonstrated similar
performance across the individual presentation task items without any significant dif-
ference on item difficulty observed.
The pattern of relative item difficulty for students from different schools is clearer in
Fig. 2. Students from schools A and B demonstrated similar performance pattern. The
items in discussion task appeared more difficult for school A/B than they were for
school C. In contrast, items in individual presentation task were easier for school A/B
than they were for school C.
How may difficulty of items (i.e., assessment domains) be reflected in studentsassess-
ment results across the two SBA task types?
Figure 3 lays out the locations of the students and the items on an interval scale. The
first column is the logit scale, and the second and third columns graphically described
the locations of the students and the eight items, respectively. The fourth column is the
rating scale of the items. This map transformed the student scores and item scores on
a common interval scale in logit unit. For the present study, the logit scale runs from
10 to + 9 logits. Students towards the top of the figure were higher in ability than stu-
dents staying at the bottom. Items near the top are more difficult items while those
near the bottom are less difficult items.
Across the two task types, item 4 (disldeorg) and item 8 (indpreldeorg) appeared to
be the easiest items to students (Fig. 3); the former is a group discussion item while the
latter an individual presentation item. Item 6 (indpreComstr), item 3 (disVoclan), and
item 7 (indpreVoclan) emerged as the most difficult items (Fig. 3); item 6 and item 7
are individual presentation items while item 3 is a group discussion item. The
remaining items 2 (disComstr), 5 (disProdel), and 1 (disProdel) appeared to be of
Fig. 2 Relative item difficulty for the three schools
Gan et al. Language Testing in Asia (2017) 7:19 Page 11 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
medium difficulty level relatively. These results suggest some more fine-grained pattern
of difference in item difficulty across the two SBA task types, which was not reported
in previous studies.
Discussion
Psychometric qualities of the English language school-based assessment
In the English language school-based assessment, the teacher assessor, who has
received rater training organized by the HKEAA before undertaking assessment, sits
nearby, assesses each participant, and assigns scores to students. Each student thus
receives two independent ratings for their oral English performance in either the indi-
vidual presentation or group interaction task and is scored on pronunciation and deliv-
ery, communication strategies, vocabulary and language patterns, and ideas and
organization. Raw scores for each of the assessment tasks were assigned on a scale of
06 on each of four rating categories, for a total score of 24. In conducting data ana-
lysis of test datasets, the assumption of unidimensionality is perhaps one of the most-
discussed features of Rasch models (Bonk and Ockey 2003). In our study, the statistics
reported above display adequate psychometric unidimensionality, suggesting the
English language school-based assessment tends to assess a unidimensional latent trait,
i.e., the oral English proficiency, as represented by ratings on four scoring categories
and thus providing evidence of construct validity. The statistics also show that the
Fig. 3 Person-item map on interval scale
Gan et al. Language Testing in Asia (2017) 7:19 Page 12 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
items in the SBA had satisfactory fit statistics, indicating that all items performed in a
consistent way as expected. The person and item separation indices shown in the sum-
mary statistics are above the widely accepted threshold for the separation index, indi-
cating that the SBA can differentiate levels of proficiency. This means that the Rasch
model generated in our analysis could reliably separate examinees by ability.
Bonk and Ockey (2003) used Rasch measurement in their study of a group oral
assessment in English language at a Japanese university, in which groups of three or
four students were assessed in conversation. Examinees were scored on five soring cat-
egories, i.e., pronunciation, fluency, grammar, vocabulary/content, and communication
skills/strategies. Although items in the form of these five scoring categories were found
to show acceptable fit, communication skillsand pronunciationwere the categories
with a possible degree of misfit. Two scoring categories of our study (see Appendixes 1
and 2) also measure communication skillsand pronunciation,but unlike Bonk and
Ockeys study, these two categories as well as the other two categories in our study all
demonstrate good fit. This means that all these assessment categories (pronunciation
and delivery, communication strategies, vocabulary and language patterns, ideas and
organization) obviously belong to the same measurement domain. This makes sense as
the focus of the school-based assessment is on the studentsspeaking ability to discuss
issues in depth and to convey their ideas clearly and concisely rather than
memorization skills or their ability to provide highly specific factual details about what
they have read or viewed.
Variation of SBA results across schools in Hong Kong
This study showed that students from school C demonstrated significantly poor
performance on three assessment domains (i.e., communication strategies, vocabu-
lary and language patterns, and ideas and organization) in the individual presenta-
tion task compared with school A and school B. However, somewhat unexpectedly,
students from school C scored significantly higher on two assessment domains
(i.e., pronunciation and delivery, and communication and strategy) in the group
discussion task than students from school A or school B, given the fact that school
C is a government-funded band 3 school. At the time of this study, school C was
struggling hard to improve its teaching quality and discipline maintenance among
the students. There are two possible interpretations of school Cshigherperform-
ance on those two assessment domains. First, as a typical practice in many
government-funded band 3 schools in Hong Kong, these schools tend to designate
a couple of classes from each grade as elite classes.Such eliteclasses usually
have the privilege of access to the best teaching and learning resources in the
school. For example, these classes are usually taught by the best English teachers
in the school and may also participate in extra-curricular English learning tutorials
offered by native-English speaking teachers in the school. In this study, there was
the likelihood that a considerable proportion of elite class students from school C
might have participated in this study. Second, there was the possibility that some
teacher assessors from school C might have been lenient in assessing their stu-
dentsoral performance in some assessment domains in the group discussion task
intheSBA.Overall,thisstudyindicatesthatstudentsfromschoolCinthisstudy
Gan et al. Language Testing in Asia (2017) 7:19 Page 13 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
were likely to demonstrate relatively unstable language performance in the English
language SBA in Hong Kong. This implies that in spite of the school-level teachers
participatory sharing in the SBA standardization processes, there is still the likeli-
hood that there is variance in being harsh or lenient in their judgments of stu-
dentsperformance in different assessment domains in school C. This study thus
points to the need for HKEAA to adopt a statistical moderation method in moder-
ating the SBA assessment results submitted by schools to ensure comparability of
SBA scores across schools.
Two of the three schools involved in this study, schools A and B, are catholic
direct-subsidy schools that use English as the medium to teach the core school
subjects, and have been ranked as band 2 schools in Hong Kong. This study shows
that students from these two catholic subsidy schools demonstrated no statistically
significant differences in their school-based English language assessment results,
suggesting that teachersassessment scores appeared to be comparable across these
two schools. In other words, teacher judgments of the student performance from
these two band 2 schools on the two English language SBA tasks tend to be con-
sistent. Such potentially reliable judgment of studentsperformance on the SBA
might have to do with a range of standardization procedures within or across
schools that enable teachers to meet together, look at/listen to/discuss student oral
samples, the tasks students have done, and talk about why they think a sample is
at a level on each domain. These procedures thus likely constitute the important
processes that contribute to understanding and to common grounds among English
teachers involved in the SBA.
Difference in difficulty of different assessment domains across the two SBA task types
The notion of task-induced variation(Ellis 1994) means that a particular type of task
that a learner is asked to perform will result in variation (Rahimpour 2007). This is
echoed by Tarone (1990) who argues that as second language learners perform different
tasks, their production of some grammatical, morphological, and phonological forms
will vary in a particular manner. Gan (2013) examined how learner L2 oral performance
may vary across two different task types in the current school-based assessment in
Hong Kong by analyzing both the discourse produced from the tasks and the teacher
rater assessments of studentstask performance. Gans study revealed a general trend
towards higher assessment scores on most of the assessment domains in individual
presentation task than in the group discussion task. It needs to be pointed out that only
30 studentsassessment performance from one particular secondary school in Hong
Kong was analyzed in the Gan study. With the help of Rasch analysis, the present study
examined the teacher rater assessments of 373 students across three different schools
and revealed an obviously more fine-grained pattern of difference in difficulty levels of
different assessment domains observed in studentsassessment performance across the
two SBA task types. Item 4 (disldeorg) of the group discussion task and item 8 (indprel-
deorg) of the individual presentation task appeared to be the easiest task domains to
students. This result could be associated with the possibility that while assessing stu-
dent oral performance, the teacher rater was likely to attend more to the grammatical,
lexical, or phonological features of the test candidates language use than to
Gan et al. Language Testing in Asia (2017) 7:19 Page 14 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
organization of their ideas. This appears to corroborate the result that item 3 (disVoclan)
of group discussion task and item 7 (indpreVoclan) of individual presentation task
emerged as the most difficult items as these items represent performance domains on
which the teacher assessor was more likely to base their decisions (Gan 2012). Note that
item 6 (indpreComstr) was also one of the most difficult items. This might be due to the
possibility that the condition under which the learner performed individual presentation
task resulted in the learner concentrating on accuracy and fluency of their language
production but overlooking use of interactional skills. Consequently, these results show
that different aspects of the two SBA tasks may have different strengths in measuring
studentsspeaking proficiency in the school-based assessment context. In other words, the
result provides evidence that the two SBA task types could be used to complement each
other in measuring the same construct of oral language proficiency as they claim to
measure. In the past decades, there has been anxiety among educators and researchers
about the reliability of the group oral discussion format in the testing literature. The re-
sults of this study lead us to concur with Bonk and Ockey that the group oral may also be
a reasonably solid basis upon which to make a valid overall decision about studentsL2
oral ability.
Conclusions
This study was motivated by the concern in both research and practice that
teachers from different schools might not be able to provide comparable results,
given teachersnecessarily subjective judgments and interpretations of assessment
data. We were thus interested to examine the extent to which teachersassessment
results from three different schools were comparable. The results suggest that as-
sessment results from two band 2 schools appeared generally comparable as there
was no significant difference in studentsSBA results in most assessment domains
across the two schools. Teachersassessment scores of students from the band 3
school in this study could be less stable occasionally as students from this school
scored significantly lower on some assessment domains but significantly higher on
some other domains compared with the two band 2 schools.
Overall, the finding that students from two schools of similar banding level demon-
strated similar performance on the two assessment task types provides empirical sup-
port for reliability and fairness of the SBA as a component in the public assessment of
the English language subject at the secondary level in Hong Kong. Meanwhile, the pos-
sibility that teacher raters leniency might lead to higher scores in some domains of
group discussion task in the band 3 school in this study provides justification to the
need for the HKEAA to adopt a statistical moderation method in moderating the SBA
assessment results submitted by schools to ensure the comparability of SBA scores
across schools. Finally, observation of an obviously more fine-grained pattern of differ-
ence in difficulty levels of different assessment domains in studentsassessment results
across the two task types clearly adds to our understanding of the role of different task
types in oral assessment in the classroom assessment context. The generalizability of
the specific results of this study, however, could be limited by its small sample of
schools involved in this study. Future studies should use a more representative sample
of schools selected from a variety of geographic regions across the region.
Gan et al. Language Testing in Asia (2017) 7:19 Page 15 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Table 2 SBA assessment criteria for group interaction (GI)
I. Pronunciation and delivery II. Communication strategies III. Vocabulary and language patterns IV. Ideas and organization
6 Can project the voice appropriately for the
context without artificial aids. Can pronounce
all sounds/sound clusters and words clearly
and accurately. Can speak fluently and
naturally, with very little hesitation, while
using suitable intonation to enhance
communication.
Can use appropriate body language to display and
encourage interest. Can use a full range of
turn-taking strategies to initiate and maintain ap
propriate interaction, and can draw others into the
interaction (e.g., by summarizing for weaker
studentsbenefit or by redirecting a conversation
to a quiet student). Can interact without the use of
narrowly formulaic expressions.
Can use a wide range of accurate and
appropriate vocabulary. Can use varied,
appropriate, and highly accurate
language patterns; minor slips do not
impede communication. Can self-correct
effectively. May occasionally glance at
notes but is clearly not dependent on
them.
Can express a wide range of relevant
information and ideas without any signs of
difficulty and without the use of notes. Can
consistently respond effectively to others,
sustaining and extending a conversational
exchange. Can use the full range of
questioning and response levels (see
Framework of Guiding Questions) to engage
with peers.
5 Can project the voice appropriately for the
context without artificial aids. Can pronounce
all sounds/sound clusters clearly and almost
all words accurately. Can speak fluently using
intonation to enhance communication, with
only occasional hesitation, giving an overall
sense of natural non-native language.
Can use appropriate body language to display and
encourage interest. Can use a good range of
turn-taking strategies to initiate and maintain
appropriate interaction and can help draw others
into the interaction (e.g., by encouraging
contributions, asking for opinions, or by responding
to group membersquestions). Can mostly interact
without the use of narrowly formulaic expressions.
Can use varied and almost always
appropriate vocabulary. Can use almost
entirely accurate and appropriate
language patterns. Can usually self-
correct effectively. May occasionally refer
to a note card.
Can express relevant information and ideas
clearly and fluently, perhaps with occasional,
unobtrusive, reference to a notecard. Can
respond appropriately to others to sustain
and extend a conversational exchange. Can
use a good variety of questioning and
response levels (see Framework of Guiding
Questions).
4 Can project the voice mostly satisfactorily
without artificial aids. Can pronounce most
sounds/sound clusters and all common
words clearly and accurately; less common
words can be understood although there
may be articulation errors (e.g., dropping
final consonants). Can speak at a deliberate
pace, with some hesitation but using
sufficient intonation conventions to convey
meaning.
Can use some features of appropriate body
language encourage to and display interest. Can
use a range of appropriate turn-taking strategies to
participate in interaction (e.g., by making
suggestions in a group discussion), and can
sometimes help draw others in (e.g., by asking for
their views). Can interact using a mixture of mainly
natural language and formulaic expressions.
Can use mostly appropriate vocabulary.
Can use language patterns that are
usually accurate and without errors that
impede communication. Can self-correct
when concentrating carefully or when
asked to do so. May refer to a note card
but is not dependent on notes.
Can present relevant literal ideas clearly in a
well-organized structure, perhaps with
occasional reference to a notecard. Can often
respond appropriately to others; can sustain
and may extend some conversational
exchanges However, can do these things less
well when attempting to respond to
interpretive or critical questions, or when
trying to interpret information and present
elaborated ideas.
3 Volume may be a problem without artificial
aids. Can pronounce all simple sounds clearly
but some errors with sound clusters; less
common words may be misunderstood
unless supported by contextual meaning.
Can use appropriate body language to display
interest in the interaction. Can use appropriate but
simple turn-taking strategies to participate in, and
occasionally initiate, interaction (e.g., by requesting
repetition and clarification or by offering agreement).
Can use simple vocabulary and language
patterns appropriately and with errors
that only occasionally impede
communication. Can sometimes self-
correct simple errors. May suggest a level
Can present some relevant ideas sequentially
with some links among own ideas and with
those presented by others. Can respond to
some simple questions and may be able to
Appendix 1
Gan et al. Language Testing in Asia (2017) 7:19 Page 16 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Table 2 SBA assessment criteria for group interaction (GI) (Continued)
I. Pronunciation and delivery II. Communication strategies III. Vocabulary and language patterns IV. Ideas and organization
Can speak at a careful pace and use
sufficient basic intonation conventions to be
understood by a familiar and supportive
listener; hesitation is present.
Can use mainly formulaic expressions as
communication strategies.
of proficiency above 3 but has provided
too limited a sample, or cannot be scored
accurately because of dependence on notes.
expand these responses when addressed
directly.
2 Volume may be a problem without artificial
aids. Can pronounce simple sounds/sound
clusters well enough to be understood most
of the time; common words can usually be
understood within overall context. Can
produce familiar stretches of language with
sufficiently appropriate pacing and
intonation to help listeners understanding.
Can use appropriate body language when especially
interested in the group discussion or when
prompted to respond by a group member. Can use
simple but heavily formulaic expressions to respond
to others (e.g., by offering greetings or apologies).
Can appropriately use vocabulary drawn from
a limited and very familiar range. Can use
some very basic language patterns
accurately in brief exchanges. Can identify
some errors but may be unable to self-correct.
Provides a limited language sample or a
sample wholly spoken from notes.
Can express some simple relevant information
and ideas, sometimes successfully, and may
expand some responses briefly. Can make
some contribution to a conversation when
prompted.
1 Volume is likely to be a problem. Can
pronounce some simple sounds and
common words accurately enough to be
understood. Can use appropriate intonation
in the most familiar of words and phrases;
hesitant speech makes the listeners task
difficult.
Can use restricted features of body language when
required to respond to peers. Can use only simple
and narrowly restricted formulaic expressions and
only to respond to others.
Can produce a narrow range of simple
vocabulary. Can use a narrow range of
language patterns in very short and
rehearsed utterances. The language sample is
too limited for a full assessment of proficiency.
Can occasionally produce brief information
and ideas relevant to the topic. Can make
some brief responses or statements made
when prompted.
0 Does not produce any comprehensible
English speech.
Does not use any interactional strategies. Does not produce any recognizable words
or sequences.
Does not produce any appropriate, relevant
material.
Gan et al. Language Testing in Asia (2017) 7:19 Page 17 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Table 3 SBA assessment criteria for individual presentation (IP)
I. Pronunciation and delivery II. Communication strategies III. Vocabulary and language patterns IV. Ideas and organization
6 Can project the voice appropriately for the
context without artificial aids. Can pronounce
all sounds/sound clusters and words clearly
and accurately. Can speak fluently and
naturally, with very little hesitation, while
using suitable intonation to enhance
communication.
Can use appropriate body language to show
focus on audience and to engage interest. Can
judge timing in order to complete the
presentation. Can confidently invite and
respond to questions if this is required by the
task.
Can use a wide range of accurate and appropriate
vocabulary. Can use varied, appropriate and highly
accurate language patterns; minor slips do not
impede communication. Can choose appropriate
content and level of language to enable audience
to follow. Can self-correct effectively. Can present
without use of notes, but may glance at a note
card occasionally.
Can convey relevant information and ideas
clearly and fluently without referring to
notes. Can elaborate in detail on some
appropriate aspects of the topic, and can
consistently link main points with support
anddevelopment.Canbefollowedeasily
and with interest. Can reformulate a point
if the audience is unclear.
5 Can project the voice appropriately for the
context without artificial aids. Can pronounce
all sounds/sound clusters clearly and almost all
words accurately. Can speak fluently using
intonation to enhance communication, with
only occasional hesitation, giving an overall
sense of natural nonnative language.
Can use appropriate body language to show
focus on audience and to engage interest. Can
judge timing sufficiently to cover all essential
points of the topic. Can appropriately invite
and respond to questions or comments when
required for the task.
Can use varied and almost always appropriate
vocabulary. Can use almost entirely accurate and
appropriate language patterns. Can choose
content and level of language that the audience
can follow, with little or no dependence on notes.
Can usually self-correct effectively. May
occasionally refer to a note card.
Can convey relevant information and ideas
clearly and well, perhaps with occasional,
unobtrusive, reference to a note card. Can
elaborate on some appropriate aspects of
the topic, and can link main points with
support and development. Can be followed
easily. Can explain a point if the audience is
unclear.
4 Can project the voice mostly satisfactorily
without artificial aids. Can pronounce most
sounds/sound clusters and all common words
clearly and accurately; less common words can
be understood although there may be
articulation errors (e.g., dropping final
consonants). Can speak at a deliberate pace,
with some hesitation but using sufficient
intonation conventions to convey meaning.
Can use appropriate body language to display
audience awareness and to engage interest,
but this is not consistently demonstrated. Can
use the available time to adequately cover all
the most essential points of the topic. Can
respond to any well-formulated questions if
these are required by and directly related to
the task
Can use mostly appropriate vocabulary. Can use
language patterns that are usually accurate and
without errors that impede communication. Can
choose mostly appropriate content and level of
language to enable audience to follow. Can
self-correct when concentrating carefully or when
asked to do so. May refer to a note card but is not
dependent on notes.
Can present relevant literal ideas clearly in
a well-organized structure, perhaps with
occasional reference to a note card. Can
expand on some appropriate aspects of
the topic with additional detail or
explanation, and can sometimes link these
main points and expansions together
effectively. Can be followed without much
effort.
3 Volume may be a problem without artificial
aids. Can pronounce all simple sounds clearly
but some errors with sound clusters; less
common words may be misunderstood
unless supported by contextual meaning.
Can speak at a careful pace and use sufficient
basic intonation conventions to be understood
Can use some appropriate body language,
displaying occasional audience awareness and
providing some degree of interest. Can present
basic relevant points but has difficulty
sustaining a presentation mode. Can respond
to any relevant, cognitively simple,
well-formulated questions required by the task.
Can use simple vocabulary and language patterns
appropriately and with errors that only occasionally
impede communication, but reliance on memorized
materials or written notes makes language and
vocabulary use seem more like written text spoken
aloud. Can choose a level of content and language
that enables audience to follow a main point, but
Can present some relevant literal ideas
clearly, and can sometimes provide some
simple supporting ideas. Can sometimes
link main and supporting points together.
May appear dependent on notes.
Appendix 2
Gan et al. Language Testing in Asia (2017) 7:19 Page 18 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Table 3 SBA assessment criteria for individual presentation (IP) (Continued)
I. Pronunciation and delivery II. Communication strategies III. Vocabulary and language patterns IV. Ideas and organization
by a familiar and supportive listener; hesitation
is present.
needs to refer to
notes. Can sometimes self-correct simple errors,
may suggest a level of proficiency above 3, but
cannot be scored accurately because of
dependence on notes.
2 Volume may be a problem without artificial
aids. Can pronounce simple sounds/sound
clusters well enough to be understood most
of the time; common words can usually be
understood within overall context. Can produce
familiar stretches of language with sufficiently
appropriate pacing and intonation to help
listeners understanding.
Can use a restricted range of features of body
language, but the overall impression is stilted.
Can present very basic points but does not
demonstrate use of a presentation mode and
is dependent on notes. Audience awareness is
very limited.
Can appropriately use vocabulary and language
patterns drawn from a limited and very familiar
range. Can read notes aloud but with difficulty.
Can identify some errors but may be unable to
self-correct. Provides a limited language sample
or a sample wholly spoken from notes.
Can make an attempt to express simple
relevant information and ideas, sometimes
successfully, and can attempt to expand
on one or two points. Can link the key
information sequentially. May be
dependent on notes.
1 Volume is likely to be a problem. Can pronounce
some simple sounds and common words
accurately enough to be understood. Can use
appropriate intonation in the most familiar of
words and phrases; hesitant speech makes the
listeners task difficult.
Body language may be intermittently present,
but communication strategies appropriate to
delivering a presentation are absent. There is
no evident audience awareness.
Can produce a narrow range of simple vocabulary.
Can use a narrow range of language patterns in
very short and rehearsed utterances. Insufficient
sample to assess vocabulary and language patterns.
Can express a main point or make a brief
statement when prompted, in a way that
is partially understandable. The
presentation is wholly dependent on
notes or a written text.
0 Does not produce any comprehensible English
speech.
Does not attempt a presentation. Does not produce any recognizable words or
sequences.
Does not express any relevant or
understandable information.
Gan et al. Language Testing in Asia (2017) 7:19 Page 19 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Acknowledgements
We would like to thank Professor Antony John Kunnan for his helpful comments on an earlier version of the paper.
Authorscontributions
All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
PublishersNote
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author details
1
Faculty of Education, University of Macau, Macao, Peoples Republic of China.
2
Faculty of Arts and Social Sciences,
UNSW, Sydney, NSW 2052, Australia.
Received: 24 July 2017 Accepted: 2 November 2017
References
American Educational Research Association (1999). Standards for educational and psychological testing. Washington, DC:
American Psychological Association.
Bachman, L, Lynch, B, Mason, M. (1995). Investigating variability in tasks and rater judgements in a performance test of
foreign language speaking. Language Testing,12(2), 238257.
Bachman, L.F. and Palmer, A.S. 1996: Language testing in practice: designing and developing useful language tests.
Oxford: Oxford University Press.
Bond, TG, & Fox, CM (2015). Applying the Rasch model, (3rd ed., ). New York and London: Routledge, Taylor & Francis Group.
Boone, W.J., Staver, J.R., & Yale, M.S. (2014). Rasch Analysis in the Human Sciences. Dordrecht: Springer.
Bonk, WJ, & Ockey, GJ. (2003). A many-facet Rasch analysis of the second language group oral discussion task.
Language Testing,20(1), 89110.
Bygate, M. (1999). Quality of language and purpose of task: patterns of learnerslanguage on two oral communication
tasks. Language Teaching Research,3(2), 185214.
Carless, DR, & Harfitt, G (2013). Innovation in secondary education: A case of curriculum reform in Hong Kong. In K
Hyland, LC Wong (Eds.), Innovation and change in English language education, (pp. 172185). London: Routledge.
Chiedu, RE, & Omenogor, HD. (2014). The concept of reliability in language testing: issues and solutions. Journal of
Resourcefulness and Distinction,8(1), 19.
Choi, CC. (1999). Public examinations in Hong Kong. Assessment in Education,6(3), 405418.
Clapham, C. (2000) Assessment and testing. Annual Review of Applied Linguistics, 20:147161.
Crooks, TJ. (1988). The impact of classroom evaluation practices on students. Review of Educational Research,58,438481.
Davison, C. (2007). Views from the chalk face: school-based assessment in Hong Kong. Language Assessment
Quarterly,4(1), 3768.
Davison, C., Hamp-Lyons, L., Leung, W., Gan, Z., Poon C., Fung, V. 2010. Longitudinal Study on the Schoolsbased
Assessment Component of the 2007 Hong Kong Certificate of Education (HKCE) English Language Examination.
The Hong Kong Examinations and Assessment Authority.
Ellis, R (1994). The study of second language acquisition. Oxford: Oxford University Press.
Foster, P, & Skehan, P. (1996). The influence of planning time and task type on second language performance. Studies
in Second Language Acquisition,18, 299323.
Fulcher, G. (1996). Testing tasks: issues in task design and the group oral. Language Testing,13,2351.
Gan, Z. (2012). Complexity measures, task type, and analytic evaluations of speaking proficiency in a school-based
assessment context. Language Assessment Quarterly,9(2), 133151.
Gan, Z. (2013). Task type and linguistic performance in school-based assessment situation. Linguistics and Education,24,535544.
Harlen, W. (2005). Trusting teachersjudgement: research evidence of the reliability and validity of teachersassessment
used for summative purposes. Research Papers in Education,20(3), 245270.
Hong Kong Examination and Assessment Authority (HKEAA). (2016). English language school-based assessment
teachershandbook.
Kane, M. (2010). Validity and fairness. Language Testing,27, 177182.
Linacre, J.M. (2014). WINTSEPS (Version 3.81.0) [Computer Software]. Retrieved 17 November, 2016 from Chicago:
winsteps.com.
Linacre, J. M. (2017). Facets computer program for many-facet Rasch measurement, version 3.80.0. Retrieved 20,
October 2017 from Beaverton, Oregon: Winsteps.com.
Liu, X (2010). Using and developing measurement instruments in science education: a Rasch modeling approach. Charlotte:
Information Age Publishing.
McNamara, T (1996). Measuring second language performance. London & New York: Longman.
Meisels, SJ, Bickel, DD, Nicholson, J, Xue, Y, Atkins-Burnett, S. (2001). Trusting teachersjudgments: a validity
study of a curriculum-embedded performance assessment in kindergarten to grade 3. American Educational
Research Journal,38(1), 7395.
Michel, MC, Kuiken, F, Vedder, I. (2007). The influence of complexity in monologic versus dialogic tasks in Dutch L2.
International Review of Applied Linguistics in Language Teaching,45(2), 241259.
Gan et al. Language Testing in Asia (2017) 7:19 Page 20 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Price, M, O'Donovan, B, Rust, C. (2007). Putting a social-constructivist assessment process model into practice: building
the feedback loop into the assessment process through peer review. Innovations in Education and Teaching
International,44(2), 143152.
Qian, DD. (2014). School-based English language assessment as a high-stakes examination component in Hong Kong:
insights of frontline assessors. Assessment in Education: Principles, Policy & Practice,21(3), 251270.
Rasch, G (1980). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press.
Rahimpour, M., 2007. Task complexity and variation in L2 learners' oral discourse.The University of Queensland Working
papers in Linguistics, Australia., 1: 19.
Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: effects on L2 speech production,
interaction, uptake, and perceptions of task difficulty. International Review of Applied Linguistics,45(3), 193213.
Robinson, P, Cadierno, T, Shirai, Y. (2009). Time and motion: measuring the effects of the conceptual demands of tasks
on second language speech production. Applied Linguistics,30(4), 533544.
Sadler, D.R. (2006). Formative Assessment: revisiting the territory. Assessment in Education: Principles, Policy & Practice,5
(1):7784.
Skehan, P. (2009). Modelling second language performance: integrating complexity, accuracy, fluency, and lexis. Applied
Linguistics,30(4), 510532.
Skehan, P (Ed.) (2014). Processing perspectives on task performance. London: John Benjamins.
Skehan, P, & Foster, P. (1997). Task type and task processing conditions as influences on foreign language performance.
Language Teaching Research,1(3), 185211.
Tarone, E. (1990). On variation in interlanguage: a response to Gregg. Applied Linguistics,11, 392400.
Vygotsky, L (1978). Mind in society: the development of higher psychological processes. Cambridge: Harvard
University Press.
Wright, BD (1999). Fundamental measurement for psychology. In SE Embretson, SL Hershberger (Eds.), The new rules of
measurement: what every educator and psychologist should know, (pp. 65104). Hillsdale: Lawrence Erlbaum.
Wright, BD, & Linacre, JM. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions,8, 370.
Wright, BD, & Master, GN (1982). Rating scale analysis. Chicago: Mesa Press.
Wyatt-Smith, C, Klenowski, V, Gunn, S. (2010). The centrality of teachersjudgement practice in assessment: a study of
standards in moderation. Assessment in Education: Principles, Policy & Practice,17(1), 5975.
Yip, DY, & Cheung, D. (2005). Teachersconcerns on school-based assessment of practical work. Journal of Biological
Education,39(4), 156162.
Gan et al. Language Testing in Asia (2017) 7:19 Page 21 of 21
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... To answer the second research question, comparing the relative difficulty of the five task types, we compared the difficulty estimates for the five items. Each student received three independent ratings for their English oral performance based on the following criteria adapted from Gan et al. (2017 ): pronunciation and delivery, communication strategies, vocabulary and language patterns, and ideas and organization. On each of the four rating categories, raw scores were given for the five assessment tasks on a scale of 0-6, for a total score of 24 ( Fig. 12.1 ). ...
... Consequently, these results show that various aspects of each task may have different strengths/ potentials in measuring students' speaking proficiency. That is, as discussed by Gan et al. (2017 ), the five task types can ultimately complement each other in assessing the same construct of language proficiency from different aspects. ...
Chapter
As Virtual Reality (VR) technologies continue to grow, they have a promising potential in providing a dynamic learning and assessment experience. In this study, we used a set of real-life tasks on a VR platform to assess the speaking performance of a group of adult EFL students. 16 participants at A2 proficiency level attended a 70-hour English conversation class (using a task-based approach) to improve their communication skills in English for emigration purposes. At the end of the course, the students attended the VR test specifically designed to assess their speaking skills. In this test, the learners interacted with 3D characters, powered by artificial intelligence, in different situations and tried to accomplish the specified communicative goals. The communicative goals were defined based on real-life tasks practiced during their course. The tasks included situations like registering for a course, accepting or declining an invitation, etc. The learners’ performance was recorded and evaluated by the three experienced examiners at the end of the test session. In addition, the students’ attitudes towards the assessment method were investigated in a group discussion. Using FACETS analysis, psychometric characteristics of the test were analyzed to assess the test’s construct and scoring validity. The results suggested the test enjoyed an acceptable construct and scoring validity level. Also, it was found, in general, that the VR exam made the students more motivated and engaged in the task. The findings imply that the design and development of VR platforms tasks specifically tailored for English as a Foreign Language (EFL) contexts can help the learners improve their communicative skills and overcome their anxiety.
... Wang et al., 2017). There have also been empirical studies of ESL students' oral performance in school-based assessments (He & Chen, 2017) and intercultural competencies in Chinese language programs (Gan et al., 2017;L. Zhang, 2017). ...
... Wang, 2019). The literature reviewed in this chapter provided an understanding of how local educational programs can be improved to modify approaches and solutions to help prepare both teachers and students as well as assist instructors who face the challenges of improving the oral English abilities of ESL high school students (Gan et al., 2017). The question of how to prepare teachers to deliver oral instruction effectively is now a genuine and significant concern (Anyiendah, 2017). ...
Research
Full-text available
Perceptions of foreign language teachers working in the Hangzhou region of China, regarding their use of instructional strategies to support Chinese high school students’ learning of oral English language.
... Based on the assessment of the knowledge domain, this study showed that the test (written and oral tests) was commonly employed by EFL lecturers. According to previous studies, teachers commonly used testing techniques in assessing the knowledge domain (Alfiriani, 2016;Ermawati & Hidayat, 2017;Gan & Davison, 2017;Kurniati & Khaliq, 2019). For example, Alfiriani (2016) found that teachers generally choose written tests to assess students' knowledge. ...
Article
Full-text available
Within the context of Indonesian higher education, the assessment focuses on the three domains of students’ competencies, namely knowledge, skills, and attitude. Therefore, this study aims to explore the assessment practice of EFL lecturers, especially the methods used in assessing the three students’ competencies. It also explores the general assessment practice and examines differences by educational degree and length of teaching experience. This quantitative study relies on survey data from 71 participants joining the EFL lecturers’ association in Indonesia. Furthermore, the convenient sampling technique was used to determine the sample, and the data were analyzed using SPSS version 24. The results showed that the lecturers employed various methods in assessing the three domains of students’ competencies. It was also found that the lecturers mostly employed observation (66.2%) for assessing attitudes domain, tests (33.8%) for assessing knowledge, and work performance (29.6%) for skills. The findings demonstrated that the lecturers have followed the assessment policy of the government related to the method, purpose, instrumentation, procedure, and reporting grade. The results also showed no differences in the assessment practice between lecturers with Master and Doctoral degrees. Additionally, there was no difference among the four categories of teaching experiences in the assessment practice. These findings are expected to become a reference for lecturers and curriculum development in conducting and designing appropriate assessments to guarantee teaching and learning quality.
... 11. Vous pouvez sentir la présence de l'autorité et de la confiance de l'auteur (Source: Kaewpet, C. (2018) (Gan et al., 2017) Critères d'évaluation pour la présentation individuelle (IP) Ne produit pas de discours en anglais compréhensible. ...
Technical Report
Full-text available
Ce rapport présente les travaux réalisés dans le cadre du chèque InnoSuisse intitulé « 47430.1 - Preuve de concept : une plate-forme informatique pour la diffusion des “procès de la Tech” ». Les « Procès de la Tech » sont des ateliers ludiques conçus, développés et mis en œuvre par l’Empowerment Foundation (EF)(https://www.empowerment.foundation/). Pour l’essentiel, il s’agit de conduire des procès fictifs dans lesquels une Intelligence Artificielle (IA) est jugée pour avoir causé la mort d’un être humain. Les participants se voient attribuer des rôles d’avocats de la défense, d’accusation, de membres du jury ou encore de juge. En respectant les phases habituelles d’un procès (énoncé de l’acte d’accusation, instruction, plaidoiries, jugement), les participant·e·s développent leurs connaissances du monde numérique et de l’IA (littéracie numérique et littéracie IA) mais aussi leur éloquence et leur connaissance du monde juridique lors des plaidoiries. Les ateliers des « Procès de la Tech » ne sont donc pas seulement des moments ludiques, mais une opportunité de développer ce qui est aujourd’hui nommé « citoyenneté numérique ». Arrivé à maturité, le déploiement des « Procès de la Tech » passe par une démocratisation de leur animation. Les concepteurs ne peuvent plus assurer l’ensemble de la coordination et de l’animation des sessions. C’est ainsi qu’est née l’idée de développer une plateforme informatique ludique susceptible de favoriser le déploiement des ateliers des « Procès de la Tech ». L’équipe de projet a donc évalué la faisabilité (proof of concept) d’une telle approche sous différents angles. Premièrement, une analyse du fonctionnement des ateliers des « Procès de la Tech » tels qu’ils se déroulent actuellement, est décrite en détail dans le chapitre 4.3.1 en page 11. Ensuite, le groupe de projet a travaillé à l’élaboration d’un scénario permettant d’organiser et conduire des ateliers des « Procès de la Tech », soit à distance via la plateforme informatique et ludique, soit en présentiel, mais en s’appuyant également sur ladite plateforme informatique et ludique (chapitre 4.3.2 en page 11). Sur la base du scénario élaboré, les chercheurs de la HE-Arc ingénierie ont analysé la nature du produit informatique à réaliser (chapitre 5.1.2 en page 16) et en ont élaboré et décrit la structure générale et les coûts (chapitre 5.2 en page 17). Afin de pouvoir mieux évaluer le retour sur l’investissement qu’ElleProd SA, le partenaire commercial du projet, pourrait envisager en s’engageant dans le développement d’un projet tel que la gamification et l’informatisation des ateliers des « Procès de la Tech », les chercheurs de la HE-Arc Santé ont esquissé une analyse Social Value ou SROI du projet. La démarche complète est décrite dans le chapitre 6 à partir de la page 27. Enfin, le chapitre 7 en page 57 propose une exploration sommaire du marché des jeux en ligne et du potentiel des ateliers des « Procès de la Tech ». Au terme de cette analyse, il ressort qu’une plateforme multimédia numérique ludique serait la solution optimum pour déployer les ateliers des « Procès de la Tech ». Néanmoins, les analyses réalisées suggèrent qu’il est difficilement envisageable de commercialiser un jeu comme les ateliers des « Procès de la Tech » sans bénéficier d’un appui des départements de l’Instruction Publique. Or, pour convaincre lesdits départements ou d’autres mécènes, l’analyse Social Value / SROI sera probablement une étape importante. En effet, la force des ateliers des « Procès de la Tech » repose sur l’objectif de développement de la citoyenneté numérique - incluant l’éloquence et la connaissance du fonctionnement système judiciaire suisse - qui prend une importance croissante dans le monde digital d’aujourd’hui et de demain. Il s’agit donc de savoir si notre pays, au travers des établissements scolaires et d’institutions telles qu’Innosuisse, veut se donner les moyens d’investir dans ce qui forment l’avenir de notre société : notre jeunesse et les technologies digitales lui correspondant.
... Referred to and revised from previous studies (e.g. Faulkner-Bond et al., 2018;Gan et al., 2017;Wang, 2004;Wang et al., 2018), English competence was assessed using six items created for the current study. Participants rated those items (e.g. ...
Article
To advance understanding of Chinese international student (CIS)’ psychological adjustment, we examined the role of parents in CIS’s loneliness and stress. 167 college students (Mage = 21.2 years) from two universities in the United States participated in this study. Regression analyses revealed that CIS reported less loneliness with more parent-CIS contact, but more stress when they reported more parent-CIS conflict. Helicopter parenting was associated with higher levels of loneliness and stress. Parental involvement was not a significant predictor when considering the impact of helicopter parenting. These findings support the important role of parents for CIS’s psychological adjustment. Implications for practice and future research are discussed.
... Up until now, research on the area of English SBA in Hong Kong is still minimal. Earlier studies mainly focused on the SBA implemented in the Hong Kong Certificate Examination (HKCEE) under the old education system (e.g., Cheng, Andrews, & Yu, 2011;Davison, 2007;Qian, 2014); while studies related to the latest examination system mainly focused on students' perceptions and performance (e.g., Gan, Oon, & Davison, 2017;Tong & Adamson, 2015). In light of this gap, this exploratory study investigated the relationship between SBA in the English language and AfL practices based on the experience of teachers. ...
... Similarly, teacher feedback that provides praise and constructive criticism on learners' oral skill make good rapport and help learners a lot (Wang, Yu, & Teo, 2018). On the other hand, Gan, Oon & Davison, (2017) advocated that school based assessment (SBA) which is innovative could be used for assessing English oral language skills. Another innovative assessment system namely mixed panel assessment examined by Kernec, Levrai and Bolster (2017) among undergraduate students at the University of Nottingham Ningbo China focusing on the use of a mixed specialist and nonspecialist audience for students' oral presentations assessment at the end of the semester. ...
Conference Paper
Full-text available
In this digital era, English is such a lingua franca which is important to fill up our daily offline and online communication activities in all spheres of our life. Thus, EFL learners (graduate level) need to be skilled in oral English communication in functioning professional operation effectively in future career. It is observed that a graduate with good oral English communication skills (OECS) has a better chance in career advancement and promotion rather than one who does not. Thus, the objective of this critical review is to This critical review on 28 research papers from 2010 to 2019 chosen from the database of Springer and Scopus using selecting criteria of PRISMA model (2009) and analyzing through NVIVO (12 version) aims to explore and identify causes for poor OECS, teaching techniques for OECS and assessment procedure of OECS. The prime findings of this study are shown that there are several causes e.g. anxiety, teaching techniques e.g. using technology or features of mobile phone and assessment procedures e.g. School based assessment for OECS. The analysis of this study is conducted for detailed description of the concepts and ideas for teachers and academic administrators for teaching and learning OECS effectively and functionally. However, this study would provide in-depth understanding and insights on causes and assessment of OECS for teachers who are teaching at University level, administrators who are involved to design courses and above all graduate level learners in EFL contexts and suggest to investigate a paradigm shift of traditional pedagogy into mobile based pedagogy in future.
Article
Full-text available
Complexity, accuracy, and fluency have proved useful measures of second language performance. The present article will re-examine these measures themselves, arguing that fluency needs to be rethought if it is to be measured effectively, and that the three general measures need to be supplemented by measures of lexical use. Building upon this discussion, generalizations are reviewed which focus on inter-relationships between the measures, especially between accuracy and complexity, since positive correlations between these two areas have been less common in the literature. Some examples of accu-racy–complexity correlations are reviewed. The central issue here is how to account for these correlations, and so the discussion explores rival claims from the Cognition and Trade-off Hypotheses. It is argued that such joint raised performance between accuracy and complexity is not a function of task difficulty, as the Cognition Hypothesis would predict, but that instead it reflects the joint operation of separate task and task condition factors. Extending the theoretical discussion, connection is made with the Levelt model of first language speaking, and it is proposed that the results obtained in the task-based performance literature can be linked to this model, modified to take account of differences between first and second language processing, particularly as these stem from differences in the underlying mental lexicons.
Article
Full-text available
The Cognition Hypothesis (Robinson 2005) claims that pedagogic tasks should be sequenced for learners in an order of increasing cognitive complexity, and that along resource-directing dimensions of task demands increasing effort at conceptualization promotes more complex and grammaticized second language (L2) speech production. This article summarizes results of two studies that measured the effects of increasing the complexity of task demands in conceptual domains using specific measures of the accuracy and complexity of speech. These measures are motivated by research into the development of tense–aspect morphology when referring to time (Shirai 2002), and by typological, cross-linguistic research into using lexicalization patterns when referring to motion (Cadierno 2008). Results show there is more developmentally advanced use of tense–aspect morphology on conceptually demanding tasks compared with less demanding tasks, and a trend to more target-like-use of lexicalization patterns for referring to motion on complex tasks.
Article
This article is at www.rasch.org/rmt/rmt83b.htm
Article
Acknowledgements 1. Introductions 2. Second language performance assessment 3. Modelling performance: opening Pandora's Box 4. Designing a performance test: the Occupational English Test 5. Raters and ratings: introduction to multi-faceted measurement 6. Concepts and procedures in Rasch measurement 7. Mapping and reporting abilities and skill levels 8. Using Rasch analysis in research on second language performance assessment 9. Data, models and dimensions References Index
Article
In recent years, school-based assessment (SBA) has been incorporated into the English Language subject of a traditional high-stakes public examination, the Hong Kong Certificate of Education Examination. As reactions from various stakeholder groups have been mixed, it was necessary to review this new practice. This paper reports on a study of 33 English language teachers who have been frontline SBA assessors. The study aimed to better understand (1) teacher-assessors’ perceptions of the English SBA, (2) their professional development needs in relation to implementing SBA, (3) their perceptions of how students could benefit from SBA and (4) the pressing issues arising from the SBA implementation. The study found that, while many participants agreed that SBA could be beneficial to students’ English learning, there was much room for improvement in all four aspects investigated. In particular, some problems identified at the very beginning of the implementation of the English SBA have still persisted.
Book
W.J. Boone, Miami University, Cincinnati, USA; J.R. Staver, Purdue University, Lafayette, USA; M.S. Yale, Purdue University, Irving, USA Rasch Analysis in the Human Sciences ▶ Offers step-by-step instruction on how-to use Rasch analysis ▶ Features a combination of Rasch conversations, Rasch data analyses, Rasch theory, and formative assessments ▶ Written for a broad audience as an introductory text: light on the mathematics, heavy on the practical aspects, and clear on the concepts and theory ▶ All activities and tables utilize user-friendly (and free) Ministeps Rasch software that prepares readers to confidently utilize Rasch Winsteps software Rasch Analysis in the Human Scienceshelps individuals, both students and teachers, master the key concepts and resources needed to use Rasch techniques for analyzing data from assessments to measure variables such as abilities, attitudes, and personality traits. Upon completion of the text, readers will be able to confidently evaluate the strengths and weaknesses of existing instrumentation, compute linear person measures and item measures, interpret Wright Maps, utilize Rasch software, and understand what it means to measure in the Human Sciences. Each of the 24 chapters presents a key concept using a mix of theory and application of user-friendly Rasch software. Chapters also include a beginning and ending dialogue between two typical researchers learning Rasch, formative assessment check points, sample data files, an extensive set of application activities with answers, a one paragraph sample research article text integrating the chapter topic, quick-tips, and suggested readings.