BookPDF Available

Experimental Design for Health Sciences: Questionnaire Development & Validation for Health Science Studies

Authors:

Abstract and Figures

Questionnaires are commonly and widely used for survey-based research in medical, social, economic, psychological, and behavioural research. Due to questionnaires are important tools that determine the scientific merit of questionnaire-based research, this module describes the step-by-step approach to systematically develop and validate questionnaires for research. The author outlines the key steps for developing and validating questionnaires based on the best practice and the author’s research experience in this area.
Content may be subject to copyright.
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Experimental Design for Health Sciences: Questionnaire
Development & Validation for Health Science Studies
ASSOCIATE PROFESSOR DR MUHAMAD SAIFUL BAHRI BIN YUSOFF
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 2
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
TABLE OF CONTENT
Page
OBJECTIVES OF THE COURSE 3
LEARNING OUTCOMES 3
SYNOPSIS 3
VALIDITY CONCEPT 4
QUESTIONNAIRE DEVELOPMENT
Goal Setting 08
Defining Factors 09
Generating and Writing Items 11
Selecting Response Format 12
QUESTIONNAIRE VALIDATION
Parameters for Selecting Items 15
Content Validity Index 16
Face Validity Index 22
Exploratory Factor Analysis 28
Reliability Analysis 39
Reporting EFA & Reliability Analysis 44
CONCLUSION 46
REFERENCE 47
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 3
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
OBJECTIVES OF THE COURSE
At the end of this course, learners are expected to:
Describe the concepts of validity
Describe the sources of evidence to support validity
Describe the key steps to develop a questionnaire systematically
Develop a questionnaire for research purposes
Perform content and face validation systematically
Calculate content and face validity index
Perform exploratory factor analysis
Perform reliability analysis
Report the results of exploratory factor analysis and reliability analysis
LEARNING OUTCOMES
The learners should be abole to develop and validate questionnaires for research purposes through a
systematic process and based on the best practice.
SYNOPSIS
Questionnaires are commonly and widely used for survey-based research in medical, social, economic,
psychological, and behavioural research. Due to questionnaires are important tools that determine the
scientific merit of questionnaire-based research, this module describes the step-by-step approach to
systematically develop and validate questionnaires for research. The author outlines the key steps for
developing and validating questionnaires based on the best practice and the authors research
experience in this area.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 4
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
VALIDITY CONCEPT
Validity refers to the degree to which evidence and theory support the interpretations of test scores
entailed by the proposed uses of tests. In other words, validity describes how well one can legitimately
trust the results of a test as interpreted for a specific purpose. There are five sources of validity evidence
to support the construct validity: content (do instrument items completely represent the construct?),
response process (the relationship between the intended construct and the thought processes of subjects
or observers), internal structure (acceptable reliability and factor structure), relations to other variables
(correlation with scores from another instrument assessing the same construct), and consequences (do
scores really make a difference?)1. These are not different types of validity but rather they are categories
of evidence that can be collected to support the construct validity of inferences made from instrument
scores.
Content validity is defined as the degree to which elements of an assessment instrument are relevant to
and representative of the targeted construct for a particular assessment purpose1,2. The assessment
purpose refers to the expected functions of the measurement tool, for examples, the Medical Student
Stressor Questionnaire (MSSQ) was developed to identify the sources of stress in medical students3 and
the Anatomy Education Environment Measurement Inventory (AEEMI) was developed to measure the
anatomy educational environment in medical schools4. The relevance of an assessment tool refers to the
appropriateness of its elements for the targeted constructs and functions of assessment, while the
representativeness of an assessment tool refers to the degree to which its elements proportional to the
facets of the targeted construct2. Despite the two aspects of content validity (i.e., relevant and
representativeness of an assessment tool), the relevance of an assessment tool has been frequently
used to measure content validity5-7. It is important to note that establishing content validity is vital to
support the validity of an assessment tool such as questionnaires, especially for research purposes.
Haynes et al (1995) emphasized, “Inferences from assessment instruments with unsatisfactory content
validity will be suspect, even when other indices of validity are satisfactory.” The content validity evidence
can be represented by the content validity index (CVI)5-8, for instances, several recent studies4,9-11
established the content validity using CVI to support the validity of an assessment tool.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 5
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
In 1947, Mosier analysed various definitions of face validity concept13. Commonly, response process
validity evidence is performed after content validity has been established14,15 and response process
validity is also known as the face validity that refers to the degree to which test respondents view the
content of a test and its items as relevant to the context in which the test is being administered16. Similarly,
other researchers define face validity as the degree of raters judge the items of an assessment
instrument are appropriate to the targeted construct and assessment objectives17,18. The raters’
understanding and interpretation about the items will determine the accuracy of an assessment tool to
measure the targeted construct. People with similar background rate test face validity similarly, and they
rate the face validity of different tests differently18. Due to so much concern about the face validity
concept, Cook & Beck (2006) has avoided to use the face validity term, instead the researchers use the
response process evidence of validity as the term to reflect the thought processes of users of the tested
assessment as they respond to the tool1,4 and it can be quantified by face validity index (FVI)4,9-11. These
are commonly evaluated in the form of clarity and comprehensibility of instructions and language used in
the assessment tool by the raters1,4. The clarity of instructions and language refers to whether there were
ambiguities or multiple ways to interpret the items, whereas the comprehensibility of instructions and
language refers to whether words and sentences of the constructed items can be understood easily by
raters. It is important to establish response process validity to support the overall validity of an
assessment tool such as questionnaires, especially for research purpose. The response process validity
can be represented by FVI and several studies4,9-11 had calculated it to support the validity of an
assessment tool.
Internal Structure (acceptable reliability and factor structure). Reliability and factor analysis data are
generally considered evidence of internal structure. Scores intended to measure a single construct should
yield homogenous results, whereas scores intended to measure multiple constructs should demonstrate
heterogeneous responses in a pattern predicted by the constructs. Reliability refers to the reproducibility
or consistency of scores from one assessment to another. Reliability is a necessary, but not sufficient,
component of validity. Reproducibility over time (test-retest), between different versions of an instrument
(parallel forms), and between raters (inter-rater) are other measures of reliability. Reliability is usually
reported as a coefficient ranging from 0 to 1. A value of 0 represents no correlation (all error), whereas 1
represents
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 6
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
perfect correlation (all variance attributable to subjects). Acceptable values will vary according to the
purpose of the instrument. For high-stakes settings (eg, licensure examination) reliability should be
greater than 0.9, whereas for less important situations values of 0.8 or 0.7 may be acceptable. Note that
the interpretation of reliability coefficients is different than the interpretation of correlation coefficients in
other applications, where a value of 0.6 would often be considered quite high. Low reliability can be
improved by increasing the number of items or observers and (in education settings) using items of
medium difficulty. Factor analysis is used to investigate relationships between items in an instrument and
the constructs they are intended to measure. Factor analysis can determine whether the items intended
to measure a given construct actually “cluster” together into “factors” as expected. Items that “load” on
more than one factor, or on unexpected factors, may not be measuring their intended constructs.
Relations to other variables (correlation with scores from another instrument assessing the same
construct). Correlation with scores from another instrument or outcome for which correlation would be
expected, or lack of correlation where it would not, supports interpretation consistent with the underlying
construct. For example, correlation between scores from a questionnaire designed to assess the severity
of benign prostatic hypertrophy and the incidence of acute urinary retention would support the validity of
the intended inferences. For a quality of life assessment, score differences among patients with varying
health states would support validity.
Consequences (do scores really make a difference?). Evaluating intended or unintended consequences
of an assessment can reveal previously unnoticed sources of invalidity. For example, if a teaching
assessment shows that male instructors are consistently rated lower than females it could represent a
source of unexpected bias. It could also mean that males are less effective teachers. Evidence of
consequences thus requires a link relating the observations back to the original construct before it can
truly be said to influence the validity of inferences. Another way to assess evidence of consequences is
to explore whether desired results have been achieved and unintended effects avoided. Finally, the
method used to determine score thresholds (eg, pass/fail cut scores or classification of symptom severity
as low, moderate, or high) also falls under this category.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 7
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure 1: The five sources of evidence to support the construct validity
Finally, when developing questionnaires, careful attention should be given to each category of validity
evidence in turn as illustrated in Figure 1.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 8
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
QUESTIONNAIRE DEVELOPMENT
Goal Setting
The first step is to set clear aims and goals for developing a questionnaire19. The following are several
questions to help researchers, without clear answers to them your measure may not be useful:
What precisely will this questionnaire measure? For example, Medical Student Stressor
Questionnaire (MSSQ) was developed to measure sources of stress among medical students3
and Anatomy Education Environment Measurement Inventory (AEEMI) was developed to
measure learning experience that influence medical students’ motivation to learn anatomy
subject, thus affecting their attitudes, values and behaviours towards anatomy-related learning
tasks4. It is essential to have a clear end in mind about the attributes (i.e., concepts,
characteristic or features of someone or something) to be measured before developing a
questionnaire.
Who is the intended target group? Knowing the exact target group (i.e., the respondents)
who will be responding to the questionnaire is important to ensure its validity, for example, the
intended target group for MSSQ and AEEMI was medical students.
Why it needs to be developed? Defining a clear reason for developing a questionnaire for
research is critical to ensure its validity. For instance, many instruments are measuring stress
level, but none was developed to measure the sources of stress among medical students,
therefore it is important to develop an instrument (MSSQ) that specifically measuring the
sources of stress among medical students. It is also important to ensure researchers do not
reinvent the wheel if a similar tool was developed and validated by other researchers, otherwise
it will be a waste of time and resources.
How it will contribute to the practices in the field? Stating clearly the expected contributions
of developing a questionnaire is important to ensure its relevance to the current practices in the
field and it will not become a “reinvention of the wheel”. For instance, the contributions of MSSQ
development was to become a universal tool to identify sources of stress among medical
students and encourage medical educators around the globe to evaluate the potential sources
of stress among their students, hence early interventions could be planned to alleviate the
stressor.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 9
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
DEFINING FACTORS
Figure 2: The basic structure of factors and observed variables (items)
After clearly defining the purpose of questionnaires to be developed, it is essential to define the factors
to be measured by questionnaires19. Figure 2 shows the basic structure of factors and observed variables.
Factors are conceptualized as constructs, attributes or domains to be measured and the observed or
manifest variables are items that measure the factors. Having a good understanding of relevant theories
and literature can help us to come out with a suitable definition for each factor. Having clear description
of a factor is important because it will help to generate items of the factor. The following are recommended
strategies:
Conduct a literature review to have a sound basic understanding of the attribute and other
research involving it; to identify other existing measures; to consider what kind of items is
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 10
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
needed, how the questionnaire might look like it and how it differs from the existing
questionnaires.
Conduct interviews and/or focus groups - to learn how the population of interest
conceptualizes and describes the attributes of interest. Both strategies ensure the
conceptualization of the attributes make theoretical sense to scholars in the field and uses
language that the intended population understands. Failure to clarify exactly the attributes to
be measured could mean that we end up with an assessment which is not coherent and invalid.
The following Table 1 shows an example of detailed descriptions of the factors measured by
GSQ20, and in this example, a stressor is defined as a personal or environmental event that
causes stress:
Table 1: Identified GSQ factors and the description of each factor.
Identified factors
Description
Family
Events occur in family that can lead to person's emotional disturbances such
as poor
relationship with spouse, poor support from family members and having lack
of skill in managing family.
Relationship with
superior
Interpersonal relationship events that can cause distress feelings to a person
such as lack of support from superior and unfair assessment from
supervisors.
Bureaucratic
constraints
Organizational working environment that can cause distress feeling to a
person such as lack of support from authority, having to do task out of ability,
and lack of opportunity in decision making.
Work-family conflicts
Work events that compromise a person’s personal and home life that lead to
distress feelings such as life is too centered on working, advancing career at
the expense of personal or home life and work demands affect personal life.
Relationship with
colleagues
Interpersonal relationship events that can cause distress feelings to a person
such as lack of support from uncooperative and incompetence colleagues.
Performance
pressure
Work demands that cause emotional disturbances to a person such as work
overload, short duration given to complete tasks and doing high risk task
where any mistake can lead to disastrous consequences.
Job prospect
Events related to reward and recognition given to an individual that cause
distress feelings such as lack of promotion prospect, feeling of being
underpaid, and lack of recognition to the job.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 11
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Generating and Writing Items
This is the critical step in the development of questionnaire, to write items and to consider the most
appropriate response format, resulting in what is commonly called an answer sheet19. Each item should
be written clearly and simply, avoiding double negatives and being as short as possible. To reduce
response bias, where someone tends to give the same answer to every item, reverse-phrase some of
them (i.e., negative items) and do not forget to reverse score the items afterwards before any analysis.
Layout of items should be simple and straightforward and enable respondents to easily connect
responses with the different options. This is important to ensure the response process validity. The
following are several areas need to be considered to generate items:
i. Test content a) use a grid-style blueprint for determining content areas and how these are
potentially manifest by people, b) get a small group to brainstorm a list of as many facets of an
attribute as possible or c) include people who might be at the extremes of the attribute so that
you can identify item content which reflects the entire spectrum.
ii. Target population it should be defined clearly.
iii. The kinds of items needed and their number - we need to reflect all relevant aspects of the
attribute and for a relatively simple measure, it would be wise to aim for at least 10 items per
attribute at the development stage.
iv. Administration instructions - it is important to be clearly developed, especially for a self-
reporting questionnaire. Clear instructions ensure respondents understand what to do and how
to rate the questionnaire, hence it is strengthening the response process validity.
v. Estimate the time limits, or the time required for completion this depends on the kind of
measure, and it is wise to develop a questionnaire that requires less time required for completion.
If possible, try to make it less than 15 minutes! The shorter time required for completion, the
better the response rate.
vi. How scores should be calculated and interpreted - the simplest process might be to sum
responses, though if you write some items which are negatively phrased compared to others you
will need to reverse their scores before totalling. Providing clear interpretation of the scores is
vital to ensure the consistency of its meaning across researches.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 12
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
The following Table 2 shows a sample of items potentially representing each factor of GSQ20 based on
the literature review.
Table 2: The GSQ factors and its potential items.
Identified factors
Potential items
Family
- Inadequate preparation for dealing with more difficult aspects of family
matters
- Insufficient knowledge in educating and building child/children characters
- Poor communication and relationship with family members
- Poor relationship with spouse
Relationship with
superiors
- Lack of support from superior
- Difficulty in maintaining relationship with superior
- My beliefs contradict with those of my superior
- Unfair assessment from superior
Bureaucratic
constraints
- Lack of authority to carry out my job duties
- Unable to make full use of my skills and ability
- Cannot participate in decision making
- Having to do work outside of my competence
Work-family
conflicts
- Work demands affect my personal/home life
- Advancing a career at the expense of home/personal life
- My life is too centered on my work
- Absence of emotional support from family
Relationship with
colleagues
- Working with uncooperative colleagues
- Working with incompetence colleagues
- Relationship problems with colleagues/subordinates
- Competition among colleagues
Performance
pressure
- Time pressures deadlines to meet
- Work overload
- Fear of making mistakes that can lead to serious consequences
- My work is mentally straining
Job prospect
- Feeling insecure in this job
- Society does not think highly of my profession
- Lack of promotion prospects
- Feeling of being underpaid
Selecting Response Format
Response format should be selected based on the nature of questionnaire1, for instance, GSQ uses the
rating scale based on the following Likert-scale: 0 = causing no stress at all, 1 = causing mild stress, 2 =
causing moderate stress, 3 = causing high stress, 4 = causing severe stress4. The selection of response
format due to the nature of GSQ assessing stress level caused by each potential item, which is stressor.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 13
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
The following are examples of response format that are commonly used in questionnaire21; however,
researchers can choose other response format to fit with the nature of items:
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 14
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
For choosing the Likert-type scale response anchors, we recommend for researchers to refer an article
written by Vagias (2006)21.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 15
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
QUESTIONNAIRE VALIDATION
Parameters for Selecting Items
The following Table 3 summarizes the parameters to be considered for selecting items .
Table 3: The summary of parameters for selecting items.
Parameters
Content validity index
Face validity index
Item percentage of
response
Floor and ceiling effects
Mean (standard deviation)
or Median (interquartile
range)
Factorial Structure
Internal Consistency
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 16
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
The next sections will elaborate and demonstrate the processes of content validation, response process
validation, EFA, and reliability analysis.
Content Validity Index
Content validity is defined as the degree to which elements of an assessment instrument are relevant to
and representative of the targeted construct for a particular assessment purpose1,2. The relevance of an
assessment tool refers to the appropriateness of its elements for the targeted constructs and functions
of assessment, while the representativeness of an assessment tool refers to the degree to which its
elements proportional to the facets of the targeted construct2. Despite the two aspects of content validity
(i.e., relevant and representativeness of an assessment tool), the relevant of an assessment tool has
been frequently used to measure the content validity5-7. It is important to note that establishing the content
validity is vital to support the validity of an assessment tool such as questionnaires, especially for research
purpose. The content validity evidence can be represented by the content validity index (CVI)5-8. This
section describes the best practice to quantify content validity using CVI based on the following are the
six steps of content validation15:
1. Preparing content validation form
2. Selecting a review panel of experts
3. Conducting content validation
4. Reviewing domains and items
5. Providing score on each item
6. Calculating CVI
1.0 Preparing content validation form
The first step of content validation is to prepare the content validation form to ensure the review panel of
experts will have clear expectation and understanding about the task. An example for the instruction and
rating scale is provided in Figure CVI 1. The recommended rating scale of relevance has been used for
scoring individual items (Figure CVI 2). It is recommended to provide the definition of domain to facilitate
the scoring process by the experts please refer to Figure CVI 2 for an example.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 17
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure CVI 1: An example of instruction and rating scale in the content validation form to the experts
Figure CVI 2: An example of layout for content validation form with domain, its definition and items
represent (measure) the domain.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 18
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
2.0 Selecting a Review Panel of Experts
The selection of individual to review and critique an assessment tool (e.g., questionnaire) is usually based
on the individual expertise with the topic to be studied. The following table summarizes the recommended
number of experts with its implication on the acceptable cut-off score of CVI (Table 4).
Table 4: The number of experts and its implication on the acceptable cut-off score of CVI
Number of experts
Acceptable CVI values
Source of recommendation
Two experts
At least 0.80
Davis (1992)
Three to five experts
Should be 1
Polit & Beck (2006), Polit et al., (2007)
At least six experts
At least 0.83
Polit & Beck (2006), Polit et al., (2007)
Six to eight experts
At least 0.83
Lynn (1986)
At least nine experts
At least 0.78
Lynn (1986)
It can be concurred that for content validation, the minimum acceptable expert number is two, however
most of recommendations propose a minimum of six experts. Considering the recommendations and the
author’s experience, the number of experts for content validation should be at least six and does not
exceed 10.
3.0 Conducting Content Validation
The content validation can be conducted through the face-to-face or non-face-to-face approach. For the
face-to-face approach, an expert panel meeting is organised, and the researcher facilitates the content
validation process through Step 4.0 to Step 5.0 (will be described later). For the non-face-to-face
approach, usually an online content validation form is sent to the experts and clear instructions are
provided (Figure 1) to facilitate the content validation process. The most important factors need to be
considered are cost, time and response rate. The cost and time might be the challenging factor to conduct
the face-to-face approach because of difficulty to get all experts be together, but the response rate will
be at the highest. The response rate and time might be the challenging factor for the non-face-to-face
approach because of difficulty to get response on time and at risk of not getting response at all from the
expert, however the cost saving is the biggest advantage. Nevertheless, based on the author’s
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 19
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
experience, the non-face-to-face approach is very efficient if a systematic follow-up is in place to improve
the response rate and time.
4.0 Reviewing Domain and Items
In the content validation form, the definition of domain and the items represent the domain are clearly
provided to the experts as shown in Figure CVI 2. The experts are requested to critically review the
domain and its items before providing score on each item. The experts are encouraged to provide verbal
comment or written comment to improve the relevance of items to the targeted domain. All comments
are taken into consideration to refine the domain and its items.
5.0 Providing score on each item
Upon completion of reviewing domain and items, the experts are requested to provide score on each
item independently based on the relevant scale (Figure 1 and Figure 2). The experts are required to
submit their responses to the researcher once they have completely provided the score on all items.
6.0 Calculating CVI
Table 5: The definition and formula of I-CVI, S-CVI/Ave and S-CVI/UA
The CVI indices
Definition
Formula
I-CVI (item-level content
validity index)
The proportion of content experts giving item
a relevance rating of 3 or 4
I-CVI = (agreed
item)/(number of expert)
S-CVI/Ave (scale-level
content validity index
based on the average
method)
The average of the I-CVI scores for all items
on the scale or the average of proportion
relevance judged by all experts. The
proportion relevant is the average of
relevance rating by individual expert.
S-CVI/Ave = (sum of I-CVI
scores)/(number of item)
S-CVI/Ave = (sum of
proportion relevance
rating)/(number of expert)
S-CVI/UA (scale-level
content validity index
based on the universal
agreement method)
The proportion of items on the scale that
achieve a relevance scale of 3 or 4 by all
experts. Universal agreement (UA) score is
given as 1 when the item achieved 100%
experts in agreement, otherwise the UA score
is given as 0.
S-CVI/UA = (sum of UA
scores)/(number of item)
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 20
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
** The definition and formula was based on the recommendations by Lynn (1985), Davis (1995),
Polit & Beck (2006) and Polit et al (2007)
There are two forms of CVI, in which CVI for item (I-CVI) and CVI for scale (S-CVI). Two methods for
calculating S-CVI, in which the average of the I-CVI scores for all items on the scale (S-CVI/Ave) and
the proportion of items on the scale that achieve a relevance scale of 3 or 4 by all experts (S-CVI/UA)
(6). The definition and formula of the CVI indices are summarised in Table 5.
Table 6: The relevance ratings on the item scale by ten experts
Prior to the calculation of CVI, the relevance rating must be recoded as 1 (relevance scale of 3 or 4) or 0
(relevance scale of 1 or 2) as shown in Table 6. To illustrate the calculation of different CVI indices, the
relevance ratings on item scale by ten experts are provided in Table 3.
To illustrate the calculation for the CVI indices (please refer to Table 5), the following are examples of
calculation based on the data provided in Table 6:
Experts in agreement: just sum up the relevant rating provided by all experts for each item, for
example, the experts in agreement for Q2 (1 + 0 + 1 +1 + 1 + 1 + 1+ 1 + 1 + 1) = 9
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 21
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Universal agreement: score ‘1’ is assigned to the item that achieved 100% experts in agreement,
for examples, Q1 obtained 1 because all the experts provided relevance rating of 1, while Q2
obtained 0 because not all the experts provided relevance rating of 1.
I-CVI: the expert in agreement divided by the number of experts, for example I-CVI of Q2 is 9
divided by 10 experts that is equal to 0.9.
S-CVI/Ave (based on I-CVI): the average of I-CVI scores across all items, for example the S-
CVI/Ave [(10 + 9 + 0 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)/12] is equal to 0.91.
S-CVI/Ave (based on proportion relevance): the average of proportion relevance scores across
all experts, for example the S-CVI/Ave [(0.92 + 0.83 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 +
0.92 + 0.92)/10] is equal to 0.91.
S-CVI/UA: the average of UA scores across all items, for example the S-CVI/UA [(1 + 0 + 0 + 1
+ 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)/12] is equal to 0.83.
Based on the above calculation, we can conclude that I-CVI, S-CVI/Ave and S-CVI/UA meet satisfactory
level, and thus the scale of questionnaire has achieved satisfactory level of content validity. For more
examples on how to report the content validity index, please refer to papers written by Hadie et al (2017)14,
Ozair et al (2018)9, Lau et al (2018)10 and Marzuki et al (2018)11.
Content validity is vital to ensure the overall validity of an assessment, therefore a systematic approach
for content validation should be done based on the evidence and best practice. This paper has provided
a systematic and evidence-based approach to conduct a proper content validation.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 22
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Face Validity Index
Commonly, response process validity evidence is performed after content validity has been established13-
15 and response process validity is also known as the face validity that refers to the degree to which test
respondents view the content of a test and its items as relevant to the context in which the test is being
administered16. Similarly, other researchers define face validity as the degree of raters judge the items of
an assessment instrument are appropriate to the targeted construct and assessment objectives17,18. The
raters of face validity include i) the person who actually take the test, ii) the nonprofessional users who
work with the results of the test, and iii) the general public18. In other words, the people who are involved
with the test taking should be asked to do the rating, in which they cannot be replaced by professional,
experts or pyschometricians18. The raters’ understanding and interpretation about the items will
determine the accuracy of an assessment tool to measure the targeted construct. People with similar
background rate test face validity similarly, and they rate the face validity of different tests differently18.
Due to so much concern about the face validity concept, Cook & Beck (2006) has avoided to use the face
validity term7, instead the researchers use the response process evidence of validity as the term to reflect
the thought processes of users of the tested assessment as they respond to the tool1,4 and it can be
quantified by face validity index (FVI)4, 9-11. These are commonly evaluated in the form of clarity and
comprehensibility of instructions and language used in the assessment tool by the raters1,4. The clarity of
instructions and language refers to whether there were ambiguities or multiple ways to interpret the items,
whereas the comprehensibility of instructions and language refers to whether words and sentences of
the constructed items can be understood easily by raters. It is important to establish response process
validity to support the overall validity of an assessment tool such as questionnaires, especially for
research purpose. The response process validity can be represented by FVI and several studies4,9-11 had
calculated it to support the validity of an assessment tool. This section describes the best practice to
perform response process validation and calculating FVI based on the following are the six steps of
response process validation:
1. Preparing response process validation form
2. Selecting a panel of raters
3. Conducting response process validation
4. Reviewing items for clarity and comprehension
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 23
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
5. Providing score on each item based on the clarity and comprehensibility rating scale
6. Calculating FVI
1.0 Preparing response process validation form
The first step of response process validation (can be also known as face validation) is to prepare the
response process validation form to ensure the panel of raters who are the intended respondents will
have clear expectation and understanding about the task. An example for the instruction and rating scale
is provided in Figure FVI 1. The rating scales of clarity and comprehension have been used for scoring
individual items5-8 (Figure FVI 2).
Figure FVI 1: An example of instruction and rating scale in the response process validation form to the
raters, in this case students.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 24
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure FVI 2: An example of layout for response process validation form with domain and items.
2.0 Selecting a panel of raters
The selection of raters to review and critique an assessment tool (e.g., questionnaire) is usually based
on the target user of the tool, for examples students, public and teachers. The following table summarizes
the number of raters with its implication on the acceptable cut-off score of FVI (Table 7) based on the
previous studies4,9-11,22-25.
Table 7: The number of raters and its implication on the acceptable cut-off score of FVI
Source of study
Number of raters
Acceptable FVI value
Method
Hadie et al (2017)
30 medical students
At least 0.80
Face to face survey
Ozair et al (2017)
30 paramedics
At least 0.83
Face to face survey
Lau et al (2017)
30 parents of pre-school
children
At least 0.80
Face to face survey
Lau et al (2018)
30 parents of pre-school
children
At least 0.80
Face to face survey
Marzuki et al (2018)
10 users of medical apps
At least 0.83
Online survey
Chin et al (2018)
32 medical students
At least 0.80
Online survey
Mahadi et al (2018)
32 medical students
At least 0.80
Online survey
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 25
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
It can be concurred that for response process validation, the minimum acceptable number of rater is 10,
however most studies administered to raters at least 30 raters. Considering the previous studies (Table
7) and the author’s experience, the number of experts for content validation should not less than 10
raters.
3.0 Conducting Response Process Validation
The response process validation can be conducted through face-to-face or online survey (Table 7). For
the face-to-face survey, the researcher facilitates the response process validation process by holding a
meeting with the raters and followed by Step 4 and Step 5 (will be elaborated later). For the online survey,
an online response process validation form is sent to the raters and clear instructions are provided (Figure
FVI 1) to facilitate the validation process. Based on the author’s experience, the face-to-face approach is
very efficient to increase the response rate, whereas the online survey is efficient in term of cost and time.
4.0 Reviewing items for clarity and comprehension
In the response process validation form, the domain and its items are provided to the raters as shown in
Figure 2. The raters are requested to review all items before providing score on each item. The raters
are encouraged to provide verbal comment or written comment to improve the clarity and comprehension
of the items. All comments are taken into consideration to refine items.
5.0 Providing score on each item based on the clarity and comprehensibility rating scale
Upon completion of reviewing all items, the raters are requested to provide score on each item
independently based on the clarity and comprehension scale (Figure 1 and Figure 2). The raters are
required to submit their responses to the researcher once they have completely provided the score on all
items.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 26
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Step 6: Calculating FVI
Table 8: The definition and formula of I-FVI, S-FVI/Ave and S-FVI/UA
The CVI indices
Definition
Formula
I-FVI (item-level face validity
index)
The proportion of rater giving item a clarity and
comprehension rating of 3 or 4
I-FVI = (agreed item)/(number of
rater)
S-FVI/Ave (scale-level face
validity index based on the
average method)
The average of the I-FVI scores for all items on the
scale or the average of proportion clarity and
comprehension judged by all raters. The proportion
clarity and comprehension is the average of rating by
individual rater.
S-FVI/Ave = (sum of I-FVI
scores)/(number of item)
S-FVI/Ave = (sum of proportion
clarity and comprehension
rating)/(number of rater)
S-FVI/UA (scale-level face
validity index based on the
universal agreement method)
The proportion of items on the scale that achieve a
clarity and comprehension scale of 3 or 4 by all raters.
Universal agreement (UA) score is given as 1 when the
item achieved 100% raters in agreement, otherwise the
UA score is given as 0.
S-FVI/UA = (sum of UA
scores)/(number of item)
** The definition and formula was based on the content validity index formula reported in Yusoff (2019) (3)
There are two forms of FVI, in which FVI for item (I-FVI) and FVI for scale (S-FVI). Two methods for
calculating S-FVI, in which the average of the I-FVI scores for all items on the scale (S-FVI/Ave) and the
proportion of items on the scale that achieve a clarity and comprehension scale of 3 or 4 by all raters (S-
FVI/UA) (Table 8). The definition and formula of the FVI indices are summarised in Table 8.
Prior to the calculation of FVI, the clarity and comprehension rating must be recoded as 1 (the scale of 3
or 4) or 0 (the scale of 1 or 2) as shown in Table 9. To illustrate the calculation of different FVI indices,
the clarity and comprehension ratings on item scale by 10 raters are provided in Table 9.
Table 9: The clarity and comprehension ratings on the item scale by 10 raters
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 27
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
To illustrate the calculation for the FVI indices (please refer to Table 8), the following are examples of
calculation based on the data provided in Table 9:
Raters in agreement: just sum up the relevant rating provided by all raters for each item, for
example, the raters in agreement for Q2 (1 + 0 + 1 +1 + 1 + 1 + 1+ 1 + 1 + 1) = 9
Universal agreement: score ‘1’ is assigned to the item that achieved 100% raters in agreement,
for examples, Q1 obtained 1 because all the raters provided rating of 1, while Q2 obtained 0
because not all raters provided rating of 1.
I-FVI: the raters in agreement divided by the number of raters, for example I-FVI of Q2 is 9 divided
by 10 raters that is equal to 0.9.
S-FVI/Ave (based on I-FVI): the average of I-FVI scores across all items, for example the S-
FVI/Ave [(10 + 9 + 0 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)/12] is equal to 0.91.
S-FVI/Ave (based on proportion clarity and comprehension): the average of proportion clarity
and comprehension scores across all raters, for example the S-FVI/Ave [(0.92 + 0.83 + 0.92 +
0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92)/10] is equal to 0.91.
S-FVI/UA: the average of UA scores across all items, for example the S-FVI/UA [(1 + 0 + 0 + 1
+ 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)/12] is equal to 0.83.
Based on the above calculation, we can conclude that I-FVI, S-FVI/Ave and S-FVI/UA meet satisfactory
level, and thus the scale of questionnaire has achieved satisfactory level of response process validity.
For more examples on how to report the response process validity index, please refer to papers written
by Hadie et al (2017), Ozair et al (2017), Lau et al (2017), Lau et al (2018), Marzuki et al (2018), Chin et
al (2018) and Mahadi et al (2018).
Response process validity is vital to ensure the overall validity of an assessment, therefore a systematic
approach for validating response process should be done based on the best evidence. This paper has
provided a systematic and evidence-based approach to conduct a proper response process validation
through face validity index.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 28
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Exploratory Factor Analysis
The primary purpose of Exploratory Factor Analysis (EFA) is to explore the number of constructs that can
be extracted from a data set26. This section illustrates the key steps of running EFA by SPSS version 26
and the SPSS data set can be downloaded from this link https://tinyurl.com/yb9wup7e. The data set must
be checked and cleaned from incomplete data before running EFA.
1.0 Running EFA
Figure EFA 1: Getting into EFA
Open the data set for EFA, click on ‘Analyze’ tab, browse on Dimension Reduction’, and click ‘Factor
menu (Figure EFA 1). The following display will appear on the screen (Figure EFA 2).
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 29
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 2: The main dialog box of Factor Analysis
Figure EFA 3: Selection of all items into the ‘Variables box
Select all items into the Variables box and click on the Descriptives menu (Figure EFA 3). The
following display will appear (Figure EFA 4). Tick on Univariate descriptives’, Initial solution’, and
KMO and Bartlett’s test of sphericityoptions as shown in Figure EFA 4. Click on the ‘Continuemenu
to proceed to the next step.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 30
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 4: The factor analysis descriptives screen
Figure EFA 5: The main EFA displays and the Extraction’ menu
Click on the ‘Extraction’ menu (Figure EFA 5). The following display will appear (Figure EFA 6). Tick on
Unrotated factor solution’, and Scree plot options as shown in Figure EFA 6.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 31
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 6: The factor analysis extraction display
In the Extract column, researchers can choose either to let EFA suggesting the number of factors can
be extracted based on the Eigenvalues (by default is 1) or researchers can specify the intended number
of factors by specifying the exact number of factor to be extracted in the Factors to extract’ box (Figure
EFA 6). Click on the ‘Continue’ menu to proceed to the next step.
Figure EFA 7: The main EFA displays and the ‘Rotation menu
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 32
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Click on the ‘Rotation’ menu (Figure EFA 7), the following display will appear as shown in Figure EFA 8.
The purpose of the rotation is to optimize the factor loading of each item on all other factors. There two
types of rotations, orthogonal and oblique. The orthogonal rotations assume all factors are independent
of each other (not overlapping), while the oblique rotations assume all factors are dependent on each
other (overlapping). Researchers are recommended to choose the rotation based on their assumption of
the factors to be extracted. In this example, the ‘Maximum Iterations for Convergence’ value was set at
30, in which by default SPSS set it at 25. The higher value due to the large sample size (more than 2500),
thus requires more iteration to optimize the factor loadings across the extracted factors. Click on the
Continue’ menu to proceed to the next step.
Figure EFA 8: The Factor Analysis Rotation menu
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 33
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 9: The main EFA displays and the ‘Optionsmenu
Figure EFA 10: The Factor Analysis Options menu
Click on the Options’ menu (Figure EFA 9), the display will appear as shown in Figure EFA 10. In the
Coefficient Display Format’ column, tick on Sorted by sizeand ‘Suppress small coefficients’ options
(Figure EFA 10). Insert a value of 0.40 in the Absolute value belowbox and click on the ‘Continue
menu to proceed to the next step.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 34
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
2.0 Interpretation of EFA Outputs
Several outputs appear after running EFA. Descriptive statistics provide information on several
parameters including mean, standard deviation, and the number of responses (Figure EFA11.0). This
output provides an idea about the extent of data completeness. Click on the ‘KMO and Bartlett's Test
bar for the next output (Figure EFA12.0).
Figure EFA 11: The Descriptive Statistics output
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 35
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 12: The KMO and Bartlett’s Test output
The KMO and Bartlett’s Test output (Figure EFA 12) provides parameters to indicate the sampling
adequacy and appropriateness of factor analysis. The Kaiser-Meyer-Olkin Measure of Sampling
Adequacyvalue of more than 0.7 is considered as a good level of factor distinction. The significant
Bartlett’s Test of Sphericity’ (p-value less than 0.05) indicates the factor analysis is appropriate. If the
value of Kaiser-Meyer-Olkin Measure of Sampling Adequacyis less than 0.5, researchers should
consider collecting more samples.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 36
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA13.0: The Communalities output
The communality (extraction) is the squared root factor loadings, representing the proportion of variance
in the observed variable that is explained by the factor. It reflects the contribution of the item to the
extracted factor. The communality (extraction) should be at least 0.70 and the average value of
communalities should be at least 0.6026. In this example, only one item has more than 0.70 and the
average of communality values is less than 0.60. However, considering the large sample, it is probably
safe to assume Kaiser’s criterion.
Figure EFA 14: The Total Variance Explained output
The ‘Total Variance Explained’ output (Figure EFA 14) proposes the number of factors that can be
extracted based on the Eigenvalues (in this example, the value was set at 1 and higher). The Initial
Eigenvaluescolumn provides the total number of factors potentially can be formed regardless of the
Eigenvalues value (before extraction). The Extraction Sum of Squared Loadingsprovides the number
of factors that can be extracted based on the Eigenvalues of 1 (after extraction) and it shows that the four
factors have Eigenvalues of at least 1. The third column, Rotation Sums of Squared Loading’, provides
the results after Varimax rotation. The total Eigenvalues after the Varimax rotation have been optimally
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 37
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
distributed across the four factors. This information is important for researchers to decide whether to
choose the ‘Component Matrix’ (Figure EFA 15) or the ‘Rotated Component Matrix’ (Figure EFA 16).
In this case, the Eigenvalues have been optimally distributed across the four factors, the ‘Rotated
Component Matrix will be chosen as the final factorial structure. The detailed output of the Rotated
Component Matrix’ is displayed in Figure EFA 17.
Figure EFA 15: The Component Matrix output
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 38
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 16: The Rotated Component Matrix output
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 39
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 17: The detail output of the Rotated Component Matrix
The factor loadings of more than 0.40 are displayed in the respective factor (component) as shown in
Figure EFA 17. The analysis proposed four factors are representing the questionnaire.
Rotated Component Matrixa
Component
1
2
3
4
q06 I have little experience of computers
.800
q18 SPSS always crashes when I try to use it
.684
q13 I worry that I will cause irreparable damage
because of my incompetenece with computers
.647
q07 All computers hate me
.638
q14 Computers have minds of their own and
deliberately go wrong whenever I use them
.579
q10 Computers are useful only for playing games
.550
q15 Computers are out to get me
.459
q20 I can't sleep for thoughts of eigen vectors
.677
q21 I wake up under my duvet thinking that I am
trapped under a normal distribtion
.661
q03 Standard deviations excite me
.567
q12 People try to tell you that SPSS makes statistics
easier to understand but it doesn't
.473
.523
q04 I dream that Pearson is attacking me with
correlation coefficients
.516
q16 I weep openly at the mention of central tendency
.514
q01 Statiscs makes me cry
.496
q05 I don't understand statistics
.429
q08 I have never been good at mathematics
.833
q17 I slip into a coma whenever I see an equation
.747
q11 I did badly at mathematics at school
.747
q09 My friends are better at statistics than me
.648
q22 My friends are better at SPSS than I am
.645
q23 If I'm good at statistics my friends will think I'm a
nerd
.586
q02 My friends will think I'm stupid for not being able
to cope with SPSS
.543
q19 Everybody looks at me when I use SPSS
.428
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.a
a. Rotation converged in 8 iterations.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 40
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Reliability Analysis
Once the items have been identified based on the factor, the reliability analysis will be performed to
determine the internal consistency of each factor. This chapter illustrates the key steps of reliability
analysis using SPSS version 26.
1.0 Running Reliability Analysis
Figure RA 1: The reliability analysis menu
Open the data set for EFA, click on ‘Analyze’ tab, scroll over the ‘Score’, and click ‘Reliability Analysis
menu (Figure RA 1). The following display will appear on the screen (Figure RA 2).
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 41
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure RA 2: The reliability analysis interface
Select the items according to the specific factor (in this case, all items of factor 1) and move them to the
Items’ box as shown in Figure RA 3. The Cronbach alpha is the commonest coefficient being used to
determine the internal consistency level. Click on the ‘Statisticsmenu for the next step.
Figure RA 3: The transfer of select item for a specific factor to the ‘Items’ box
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 42
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure RA 4: The Reliability Analysis Statistic display
Tick on ‘Item’, Scale’, and Scale if item deleted options as shown in Figure RA 4. Then click on
Continue and ‘OKbar.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 43
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
2.0 Interpretation of Reliability Analysis Outputs
The following is the reliability analysis output interface (Figure RA 5). It shows the total number of valid
cases and the Cronbach alpha value for the six items of factor 1, in this case, it was 0.814.
Figure RA 5: The reliability output interface
Figure RA 6: The Item-Total Statistics output
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 44
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
The ‘Item-Total Statistics’ provides parameters that suggest the contribution of each item to the factor.
The ‘Corrected Item-Total Correlation’ values should be at least 0.30 to indicate an acceptable level of
contribution to the factor. The ‘Cronbach’s Alpha if Item Deleted’value is another parameter to indicate
the contribution of items to the factor. If the ‘Cronbach’s Alpha if Item Deletedvalue is lower than the
Cronbach alpha (in this case 0.814), it indicates the item has a high contribution to the reliability of the
factor, thus should be retained. If the value is higher, it indicates the item has less contributed to the
reliability, thus can be removed. Repeat the above steps for the remaining factor. Generally, the internal
consistency of above 0.70 or greater is preferred27.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 45
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Reporting EFA and Reliability Analysis Results
Once EFA and reliability analysis completely performed, the following is the recommended way to
report the result:
Table 10: Summary of EFA results for the questionnaire (N = 2571)
Items
Factor 1
Factor 2
Factor 3
Factor 4
Q06 I have little experience of computers
0.800
Q18 SPSS always crashes when I try to use it
0.684
Q13 I worry that I will cause irreparable damage because of
my incompetenece with computers
0.647
Q07 All computers hate me
0.638
Q14 Computers have minds of their own and deliberately go
wrong whenever I use them
0.579
Q10 Computers are useful only for playing games
0.550
Q15 Computers are out to get me
0.459
Q20 I can't sleep for thoughts of eigen vectors
0.677
Q21 I wake up under my duvet thinking that I am trapped
under a normal distribtion
0.661
Q03 Standard deviations excite me
0.567
Q12 People try to tell you that SPSS makes statistics easier to
understand but it doesn't
0.523
Q04 I dream that Pearson is attacking me with correlation
coefficients
0.516
Q16 I weep openly at the mention of central tendency
0.514
Q01 Statiscs makes me cry
0.496
Q05 I don't understand statistics
0.429
Q08 I have never been good at mathematics
0.833
Q17 I slip into a coma whenever I see an equation
0.747
Q11 I did badly at mathematics at school
0.747
Q09 My friends are better at statistics than me
0.648
Q22 My friends are better at SPSS than I am
0.645
Q23 If I'm good at statistics my friends will think I'm a nerd
0.586
Q02 My friends will think I'm stupid for not being able to cope
with SPSS
0.543
Q19 Everybody looks at me when I use SPSS
0.428
Eigenvalues
3.730
3.340
2.553
1.850
Percentage of Variance
16.21
14.52
11.09
8.47
Cronbach’s Alpha
0.81
0.82
0.82
0.57
For this example we might write something like this:
A principal component analysis (PCA) was conducted on the 23 items with orthogonal rotation
(varimax). The KaiserMeyerOlkin measure verified the sampling adequacy for the analysis,
KMO = .93, which is well above the acceptable limit of 0.5. Bartlett’s test of sphericity ² (253) =
19334.49, p < 0.001, indicated that correlations between items were sufficiently large for PCA. An
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 46
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
initial analysis was run to obtain eigenvalues for each component in the data. Four components
had eigenvalues over Kaiser’s criterion of 1 and in combination explained 50.32% of the variance.
Given the large sample size, Kaiser’s criterion on four components, this is the number of
components that were retained in the final analysis. Table 1 shows the factor loadings after
rotation. The items that cluster on the same components suggest that Factor 1 represents a fear
of computers, Factor 2 represents a fear of statistics, Factor 3 represents a fear of maths, and
Factor 4 represents peer evaluation concerns.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 47
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
CONCLUSION
It is hoped that this short module is useful and helpful for anyone to develop and validate any
questionnaire. This module does not cover confirmatory factor analysis (CFA), but CFA can always be
learned from other modules that are available in the market.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 48
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
REFERENCE
1. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments:
theory and application. The American journal of medicine. 2006;119(2):166. e7-. e16.
2. Haynes SN, Richard D, Kubany ES. Content validity in psychological assessment: A functional
approach to concepts and methods. Psychological assessment. 1995;7(3):238.
3. Yusoff MSB. A systematic review on validity evidence of medical student stressor questionnaire.
Education in Medicine Journal. 2017;9(1):1-16.
4. Hadie SNH, Hassan A, Ismail ZIM, Asari MA, Khan AA, Kasim F, et al. Anatomy education
environment measurement inventory: A valid tool to measure the anatomy learning environment.
Anatomical sciences education. 2017;10(5):423-32.
5. Davis LL. Instrument review: Getting the most from a panel of experts. Applied nursing research.
1992;5(4):194-7.
6. Polit DF, Beck CT. The content validity index: are you sure you know what's being reported? Critique
and recommendations. Research in nursing & health. 2006;29(5):489-97.
7. Polit DF, Beck CT, Owen SV. Is the CVI an acceptable indicator of content validity? Appraisal and
recommendations. Research in nursing & health. 2007;30(4):459-67.
8. Lynn MR. Determination and quantification of content validity. Nursing research. 1986;35(6):381-5.
9. Ozair MM, Baharuddin KA, Mohamed SA, Esa W, Yusoff MSB. Development and Validation of the
Knowledge and Clinical Reasoning of Acute Asthma Management in Emergency Department (K-
CRAMED). Education in Medicine Journal. 2017;9(2):1-17.
10. Lau AS, Yusoff MS, Lee Y-Y, Choi S-B, Xiao J-Z, Liong M-T. Development and validation of a
Chinese translated questionnaire: A single simultaneous tool for assessing gastrointestinal and upper
respiratory tract related illnesses in pre-school children. Journal of Taibah University medical
sciences. 2018;13(2):135-41.
11. Marzuki MFM, Yaacob NA, Yaacob NM. Translation, Cross-Cultural Adaptation, and Validation of
the Malay Version of the System Usability Scale Questionnaire for the Assessment of Mobile Apps.
JMIR human factors. 2018;5(2):e10308.
12. American Educational Research Association, American Psychological Association, National Council
on Measurement in Education. Standards for Educational and Psychological Testing. Washington,
DC: American Educational Research Association; 1999.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 49
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
13. Mosier CI. A critical examination of the concepts of face validity. Educational and Psychological
Measurement. 1947;7(2):191-205.
14. Anthony R. Artino J, Rochelle JSL, Dezee KJ, Gehlbach H. Developing questionnaires for
educational research: AMEE Guide No. 87. Medical Teacher. 2010;36:46374.
15. Yusoff MSB. ABC of content validation and content validity index calculation. Education in Medicine
Journal. 2019;11(2):49-54.
16. Holden RR. Face validity. The corsini encyclopedia of psychology. 2010:1-2.
17. Hardesty DM, Bearden WO. The use of expert judges in scale development: Implications for
improving face validity of measures of unobservable constructs. Journal of Business Research.
2004;57(2):98-107.
18. Nevo B. Face validity revisited. Journal of Educational Measurement. 1985;22(4):287-93.
19. Coaley K. An Introduction to Psychological Assessment and Psychometrics. London: Sage; 2010.
20. Yusoff MSB, Esa AR. The reliability and validity of the General Stressor Questionnaire (GSQ) among
house officer. International Medical Journal, 18 (3), 179-182.
21. Vagias WM. Likert-type scale response anchors. Clemson International Institute for Tourism &
Research Development, Department of Parks, Recreation and Tourism Management, Clemson
University. 2006.
22. Yusoff MSB. ABC of Response Process Validation and Face Validity Index Calculation. Education in
Medicine Journal. 2019;11(3):55-61.
23. Lau AS-Y, Yusoff MSB, Lee YY, Choi S-B, Rashid F, Wahid N, et al. Development, Translation and
Validation of Questionnaires for Diarrhea and Respiratory-related Illnesses during Probiotic
Administration in Children. Education in Medicine Journal. 2017;9(4).
24. Chin RWA, Chua YY, Chu MN, Mahadi NF, Wong MS, Yusoff MS, et al. Investigating validity
evidence of the Malay translation of the Copenhagen Burnout Inventory. Journal of Taibah University
Medical Sciences. 2018;13(1):1-9.
25. Mahadi NF, Chin RWA, Chua YY, Chu MN, Wong MS, Yusoff MSB, et al. Malay Language
Translation and Validation of the Oldenburg Burnout Inventory Measuring Burnout. Education in
Medicine Journal. 2018;10(2).
26. Field, Andy P. Discovering Statistics Using SSPSS. Los Angeles, Thousand Oaks, Calif. SAGE
Publications, 2009.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 50
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
27. Cortina, J. M. What is coefficient alpha? An examination of theory and applications. Journal of
applied psychology, 1993, 78(1), 98.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Validity evidence can be supported by five sources that are content, response process, internal structure, relation to other variables, and consequences. Response process validity measures the thought processes of users of the tested inventory as they respond to the assessment tool. These are commonly evaluated in the form of clarity of instructions and language used in the assessment tool, as well as the comprehension of instruction after training or an observation session. Response process validity contributes to the overall validity of an assessment tool; therefore, it should be quantified systematically based on the evidence and best practice. This paper describes a systematic approach to quantify response process validity in the form of face validity index based on the evidence.
Article
Full-text available
There are five sources of validity evidence that are content, response process, internal structure, relation to other variables, and consequences. Content validity is the extent of a measurement tool represents the measured construct and it is considered as an essential evidence to support the validity of a measurement tool such as a questionnaire for research. Since content validity is vital to ensure the overall validity, therefore content validation should be performed systematically based on the evidence and best practice. This paper describes a systematic approach to quantify content validity in the form of content validity index based on the evidence and best practice.
Article
Full-text available
This study aimed to translate Oldenburg Burnout Inventory (OLBI) into Malay language, and test its response process (face validity) and internal structure (factor structure and internal consistency). To the author's knowledge, OLBI is not yet validated in Malay language, thus this study aimed to produce a validated Malay version of OLBI (OLBI-M) in order to measure burnout among the healthcare learner population in Malaysia. OLBI has great potentials mainly due to its accessibility and free of any cost to use it, thus might promote more researchers to conduct burnout research in Malaysia. The forward-backward translation was performed as per standard guideline. The OLBI-M was distributed to 32 medical students to assess face validity and later to 452 medical students to assess construct validity. Data analysis was performed by Microsoft Excel, Statistical Package for the Social Sciences (SPSS), and Analysis of Moment Structures (AMOS). The face validity index of OLBI-M was more than 0.70. The two factors of CBI-M achieved good level of goodness of fit indices (Cmin/df = 3.585, RMSEA = 0.076, GFI = 0.958, CFI = 0.934, NFI = 0.912, TLI = 0.905) after removal of several items. The composite reliability values of the two factors ranged from 0.71 to 0.73. The Cronbach's alpha values of the three factors ranged from 0.70 to 0.74. This study shows OLBI-M is a reliable and valid tool to measure burnout in medical students. Future burnout studies in Malaysia are highly recommended to utilise OLBI-M. However, it is crucial for further validity to be carried out to verify the credential of OLBI-M.
Article
Full-text available
Background: A mobile app is a programmed system designed to be used by a target user on a mobile device. The usability of such a system refers not only to the extent to which product can be used to achieve the task that it was designed for, but also its effectiveness and efficiency, as well as user satisfaction. The System Usability Scale is one of the most commonly used questionnaires used to assess the usability of a system. The original 10-item version of System Usability Scale was developed in English and thus needs to be adapted into local languages to assess the usability of a mobile apps developed in other languages. Objective: The aim of this study is to translate and validate (with cross-cultural adaptation) the English System Usability Scale questionnaire into Malay, the main language spoken in Malaysia. The development of a translated version will allow the usability of mobile apps to be assessed in Malay. Methods: Forward and backward translation of the questionnaire was conducted by groups of Malay native speakers who spoke English as their second language. The final version was obtained after reconciliation and cross-cultural adaptation. The content of the Malay System Usability Scale questionnaire for mobile apps was validated by 10 experts in mobile app development. The efficacy of the questionnaire was further probed by testing the face validity on 10 mobile phone users, followed by reliability testing involving 54 mobile phone users. Results: The content validity index was determined to be 0.91, indicating good relevancy of the 10 items used to assess the usability of a mobile app. Calculation of the face validity index resulted in a value of 0.94, therefore indicating that the questionnaire was easily understood by the users. Reliability testing showed a Cronbach alpha value of .85 (95% CI 0.79-0.91) indicating that the translated System Usability Scale questionnaire is a reliable tool for the assessment of usability of a mobile app. Conclusions: The Malay System Usability Scale questionnaire is a valid and reliable tool to assess the usability of mobile app in Malaysia.
Article
Full-text available
Objectives Children are prone to contagious illnesses that come from peers in nurseries, kindergartens, and day care centres. The administration of probiotics has been reported to decrease the episodes of such illnesses, leading to decreased absences and consumption of antibiotics. With less emphasis on, and preferences for, blood collection from young subjects, quantifiable data are merely obtained from surveys and questionnaires. Malaysia has a population which is 25% ethnic Chinese. We aimed to develop a single tool that enables simultaneous assessments of both gastrointestinal and respiratory tract-related illnesses among young Chinese children. Methods The English-language validated questionnaires using data about demographics and monthly health records were translated into the Chinese language. Both forward and backward translated versions were validated. Results The developed demographic and monthly health questionnaires showed an overall item-level content validity index (I-CVI) of 0.99 and 0.97, respectively; while the translated Chinese versions showed I-CVI of 0.97 and 0.98, respectively. Item-level of response process validity index of 1.00 for this questionnaire was obtained from 30 respondents inferring that the items were clear and comprehensible. Conclusions This study showed acceptable levels validity in the Chinese translated version, illustrating a valid and reliable tool to be used for simultaneous assessment of gastrointestinal and respiratory tract-related illnesses in young children that is applicable for Malaysia's Chinese population and other Chinese-speaking nations.
Article
Full-text available
Background: Gastrointestinal illnesses and respiratory-related illnesses are common among young children in Malaysia, especially those who are attending day care. During administration of probiotic, the occurences of gastrointestinal and respiratory-related illnesses can be reduced. These were observed by evaluation through a single questionnaire. However, currently no single tool exists to simultaneously evaluate the domains of gastrointestinal and respiratory-related illnesses among these young children. The current study aimed to develop a source questionnaire in English, translate and validate into the Malay. Methods: Relevant domains of gastrointestinal and respiratory-related illnesses were identified to generate items and formed a screening tool through literature reviews, focus groups and opinions of experts. Results: The developed Basic Demographic and Lifestyle Questionnaire (BDLQ) and Monthly Healthy Questionnaires (MHQ) showed item-level content validity index (I-CVI) of 0.99 and 0.97, respectively, while the translated Malay versions showed I-CVI of 1.00 and 0.99, respectively. Item-level face validity index (I-FVI) of 1.00 for both questionnaires were obtained from 30 respondents showing that the items were clear and comprehensible. Conclusion: This study showed good level of I-CVI and I-FVI in both developed questionnaires and their Malay translated versions. These tools in English and Malay were valid and thus reliable to be used for assessing gastrointestinal and respiratory-related illnesses in young children.
Article
Full-text available
Introduction: Asthma is a common non-communicable and chronic respiratory illness that affects individuals regardless of age. Suboptimal management of asthma will burden and restrict daily activities of affected individuals, unfortunately, approximately 40% of asthmatic patients received suboptimal management. Therefore, this study aimed to develop a tool to assess knowledge and clinical reasoning of healthcare providers on acute asthmatic management in emergency setting. Method: The tool was developed via three phases: 1) domain identification, 2) domain blueprinting based the GINA and BTS guidelines, and 3) item generation for each domain for assessing knowledge and clinical reasoning. Three forms of validity evidence related to content, response process and internal structure were appraised. Content validity index (CVI), face validity index (FVI) and intraclass correlation coefficient (ICC) estimate the content validity, response process and internal structure of the tool. Results: A new tool was developed, named as K-CRAMED, which assesses knowledge and clinical reasoning on three domains related to management of acute asthma – diagnosis, treatment and disposition. CVI values for the three domains were more than 0.83. FVI values for the three domains among doctors and paramedics were at least 0.83. The ICC between scores given by emergency specialists was 0.989 (CI 95% 0.982, 0.994, p-value < 0.001). Conclusion: The newly developed tool, named as K-CRAMED, is a valid tool to assess knowledge and clinical reasoning of healthcare providers who manage patients with acute asthma. Further validation is required to verify its validity in other setting.
Article
Full-text available
Introduction: Detecting sources of stress of medical students is important for planning wellness program to improve their psychological wellbeing. One of instruments to detect the sources of stress is the Medical Student Stressor Questionnaire (MSSQ). A systematic review was performed to find out evidence to support its validity in term of content, response process, internal structure, relation to other variables, and consequences. Method: The author planned, conducted and reported this study according to PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) standard of quality for reporting meta-analyses. Systematic search was performed on EBSCOHOST, Scopus, Proquest, PubMed, Web of Science, and Google Scholar databases. Result: The author yielded 613 relevant articles based on search terms, 44 articles had used MSSQ, and after critical appraisal, only 18 articles provided evidence to support validity MSSQ and thus were included in the systematic review. Conclusion: This systematic review supports the validity of MSSQ in relation to content, response process, internal structure, relations to other variables, and consequences of its scores. MSSQ is a valid tool to detect sources of stress in medical students and its results can be utilised as a guide to plan wellness program or intervention to improve medical students’ wellbeing.
Article
Full-text available
Abstract In this AMEE Guide, we consider the design and development of self-administered surveys, commonly called questionnaires. Questionnaires are widely employed in medical education research. Unfortunately, the processes used to develop such questionnaires vary in quality and lack consistent, rigorous standards. Consequently, the quality of the questionnaires used in medical education research is highly variable. To address this problem, this AMEE Guide presents a systematic, seven-step process for designing high-quality questionnaires, with particular emphasis on developing survey scales. These seven steps do not address all aspects of survey design, nor do they represent the only way to develop a high-quality questionnaire. Instead, these steps synthesize multiple survey design techniques and organize them into a cohesive process for questionnaire developers of all levels. Addressing each of these steps systematically will improve the probabilities that survey designers will accurately measure what they intend to measure.