Content uploaded by Muhamad Saiful Bahri Yusoff
Author content
All content in this area was uploaded by Muhamad Saiful Bahri Yusoff on Nov 15, 2020
Content may be subject to copyright.
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Experimental Design for Health Sciences: Questionnaire
Development & Validation for Health Science Studies
ASSOCIATE PROFESSOR DR MUHAMAD SAIFUL BAHRI BIN YUSOFF
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 2
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
TABLE OF CONTENT
Page
OBJECTIVES OF THE COURSE 3
LEARNING OUTCOMES 3
SYNOPSIS 3
VALIDITY CONCEPT 4
QUESTIONNAIRE DEVELOPMENT
Goal Setting 08
Defining Factors 09
Generating and Writing Items 11
Selecting Response Format 12
QUESTIONNAIRE VALIDATION
Parameters for Selecting Items 15
Content Validity Index 16
Face Validity Index 22
Exploratory Factor Analysis 28
Reliability Analysis 39
Reporting EFA & Reliability Analysis 44
CONCLUSION 46
REFERENCE 47
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 3
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
OBJECTIVES OF THE COURSE
At the end of this course, learners are expected to:
• Describe the concepts of validity
• Describe the sources of evidence to support validity
• Describe the key steps to develop a questionnaire systematically
• Develop a questionnaire for research purposes
• Perform content and face validation systematically
• Calculate content and face validity index
• Perform exploratory factor analysis
• Perform reliability analysis
• Report the results of exploratory factor analysis and reliability analysis
LEARNING OUTCOMES
The learners should be abole to develop and validate questionnaires for research purposes through a
systematic process and based on the best practice.
SYNOPSIS
Questionnaires are commonly and widely used for survey-based research in medical, social, economic,
psychological, and behavioural research. Due to questionnaires are important tools that determine the
scientific merit of questionnaire-based research, this module describes the step-by-step approach to
systematically develop and validate questionnaires for research. The author outlines the key steps for
developing and validating questionnaires based on the best practice and the author’s research
experience in this area.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 4
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
VALIDITY CONCEPT
Validity refers to the degree to which evidence and theory support the interpretations of test scores
entailed by the proposed uses of tests. In other words, validity describes how well one can legitimately
trust the results of a test as interpreted for a specific purpose. There are five sources of validity evidence
to support the construct validity: content (do instrument items completely represent the construct?),
response process (the relationship between the intended construct and the thought processes of subjects
or observers), internal structure (acceptable reliability and factor structure), relations to other variables
(correlation with scores from another instrument assessing the same construct), and consequences (do
scores really make a difference?)1. These are not different types of validity but rather they are categories
of evidence that can be collected to support the construct validity of inferences made from instrument
scores.
Content validity is defined as the degree to which elements of an assessment instrument are relevant to
and representative of the targeted construct for a particular assessment purpose1,2. The assessment
purpose refers to the expected functions of the measurement tool, for examples, the Medical Student
Stressor Questionnaire (MSSQ) was developed to identify the sources of stress in medical students3 and
the Anatomy Education Environment Measurement Inventory (AEEMI) was developed to measure the
anatomy educational environment in medical schools4. The relevance of an assessment tool refers to the
appropriateness of its elements for the targeted constructs and functions of assessment, while the
representativeness of an assessment tool refers to the degree to which its elements proportional to the
facets of the targeted construct2. Despite the two aspects of content validity (i.e., relevant and
representativeness of an assessment tool), the relevance of an assessment tool has been frequently
used to measure content validity5-7. It is important to note that establishing content validity is vital to
support the validity of an assessment tool such as questionnaires, especially for research purposes.
Haynes et al (1995) emphasized, “Inferences from assessment instruments with unsatisfactory content
validity will be suspect, even when other indices of validity are satisfactory.” The content validity evidence
can be represented by the content validity index (CVI)5-8, for instances, several recent studies4,9-11
established the content validity using CVI to support the validity of an assessment tool.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 5
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
In 1947, Mosier analysed various definitions of face validity concept13. Commonly, response process
validity evidence is performed after content validity has been established14,15 and response process
validity is also known as the face validity that refers to the degree to which test respondents view the
content of a test and its items as relevant to the context in which the test is being administered16. Similarly,
other researchers define face validity as the degree of raters judge the items of an assessment
instrument are appropriate to the targeted construct and assessment objectives17,18. The raters’
understanding and interpretation about the items will determine the accuracy of an assessment tool to
measure the targeted construct. People with similar background rate test face validity similarly, and they
rate the face validity of different tests differently18. Due to so much concern about the face validity
concept, Cook & Beck (2006) has avoided to use the face validity term, instead the researchers use the
response process evidence of validity as the term to reflect the thought processes of users of the tested
assessment as they respond to the tool1,4 and it can be quantified by face validity index (FVI)4,9-11. These
are commonly evaluated in the form of clarity and comprehensibility of instructions and language used in
the assessment tool by the raters1,4. The clarity of instructions and language refers to whether there were
ambiguities or multiple ways to interpret the items, whereas the comprehensibility of instructions and
language refers to whether words and sentences of the constructed items can be understood easily by
raters. It is important to establish response process validity to support the overall validity of an
assessment tool such as questionnaires, especially for research purpose. The response process validity
can be represented by FVI and several studies4,9-11 had calculated it to support the validity of an
assessment tool.
Internal Structure (acceptable reliability and factor structure). Reliability and factor analysis data are
generally considered evidence of internal structure. Scores intended to measure a single construct should
yield homogenous results, whereas scores intended to measure multiple constructs should demonstrate
heterogeneous responses in a pattern predicted by the constructs. Reliability refers to the reproducibility
or consistency of scores from one assessment to another. Reliability is a necessary, but not sufficient,
component of validity. Reproducibility over time (test-retest), between different versions of an instrument
(parallel forms), and between raters (inter-rater) are other measures of reliability. Reliability is usually
reported as a coefficient ranging from 0 to 1. A value of 0 represents no correlation (all error), whereas 1
represents
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 6
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
perfect correlation (all variance attributable to subjects). Acceptable values will vary according to the
purpose of the instrument. For high-stakes settings (eg, licensure examination) reliability should be
greater than 0.9, whereas for less important situations values of 0.8 or 0.7 may be acceptable. Note that
the interpretation of reliability coefficients is different than the interpretation of correlation coefficients in
other applications, where a value of 0.6 would often be considered quite high. Low reliability can be
improved by increasing the number of items or observers and (in education settings) using items of
medium difficulty. Factor analysis is used to investigate relationships between items in an instrument and
the constructs they are intended to measure. Factor analysis can determine whether the items intended
to measure a given construct actually “cluster” together into “factors” as expected. Items that “load” on
more than one factor, or on unexpected factors, may not be measuring their intended constructs.
Relations to other variables (correlation with scores from another instrument assessing the same
construct). Correlation with scores from another instrument or outcome for which correlation would be
expected, or lack of correlation where it would not, supports interpretation consistent with the underlying
construct. For example, correlation between scores from a questionnaire designed to assess the severity
of benign prostatic hypertrophy and the incidence of acute urinary retention would support the validity of
the intended inferences. For a quality of life assessment, score differences among patients with varying
health states would support validity.
Consequences (do scores really make a difference?). Evaluating intended or unintended consequences
of an assessment can reveal previously unnoticed sources of invalidity. For example, if a teaching
assessment shows that male instructors are consistently rated lower than females it could represent a
source of unexpected bias. It could also mean that males are less effective teachers. Evidence of
consequences thus requires a link relating the observations back to the original construct before it can
truly be said to influence the validity of inferences. Another way to assess evidence of consequences is
to explore whether desired results have been achieved and unintended effects avoided. Finally, the
method used to determine score thresholds (eg, pass/fail cut scores or classification of symptom severity
as low, moderate, or high) also falls under this category.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 7
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure 1: The five sources of evidence to support the construct validity
Finally, when developing questionnaires, careful attention should be given to each category of validity
evidence in turn as illustrated in Figure 1.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 8
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
QUESTIONNAIRE DEVELOPMENT
Goal Setting
The first step is to set clear aims and goals for developing a questionnaire19. The following are several
questions to help researchers, without clear answers to them your measure may not be useful:
• What precisely will this questionnaire measure? For example, Medical Student Stressor
Questionnaire (MSSQ) was developed to measure sources of stress among medical students3
and Anatomy Education Environment Measurement Inventory (AEEMI) was developed to
measure learning experience that influence medical students’ motivation to learn anatomy
subject, thus affecting their attitudes, values and behaviours towards anatomy-related learning
tasks4. It is essential to have a clear end in mind about the attributes (i.e., concepts,
characteristic or features of someone or something) to be measured before developing a
questionnaire.
• Who is the intended target group? Knowing the exact target group (i.e., the respondents)
who will be responding to the questionnaire is important to ensure its validity, for example, the
intended target group for MSSQ and AEEMI was medical students.
• Why it needs to be developed? Defining a clear reason for developing a questionnaire for
research is critical to ensure its validity. For instance, many instruments are measuring stress
level, but none was developed to measure the sources of stress among medical students,
therefore it is important to develop an instrument (MSSQ) that specifically measuring the
sources of stress among medical students. It is also important to ensure researchers do not
reinvent the wheel if a similar tool was developed and validated by other researchers, otherwise
it will be a waste of time and resources.
• How it will contribute to the practices in the field? Stating clearly the expected contributions
of developing a questionnaire is important to ensure its relevance to the current practices in the
field and it will not become a “reinvention of the wheel”. For instance, the contributions of MSSQ
development was to become a universal tool to identify sources of stress among medical
students and encourage medical educators around the globe to evaluate the potential sources
of stress among their students, hence early interventions could be planned to alleviate the
stressor.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 9
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
DEFINING FACTORS
Figure 2: The basic structure of factors and observed variables (items)
After clearly defining the purpose of questionnaires to be developed, it is essential to define the factors
to be measured by questionnaires19. Figure 2 shows the basic structure of factors and observed variables.
Factors are conceptualized as constructs, attributes or domains to be measured and the observed or
manifest variables are items that measure the factors. Having a good understanding of relevant theories
and literature can help us to come out with a suitable definition for each factor. Having clear description
of a factor is important because it will help to generate items of the factor. The following are recommended
strategies:
• Conduct a literature review – to have a sound basic understanding of the attribute and other
research involving it; to identify other existing measures; to consider what kind of items is
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 10
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
needed, how the questionnaire might look like it and how it differs from the existing
questionnaires.
• Conduct interviews and/or focus groups - to learn how the population of interest
conceptualizes and describes the attributes of interest. Both strategies ensure the
conceptualization of the attributes make theoretical sense to scholars in the field and uses
language that the intended population understands. Failure to clarify exactly the attributes to
be measured could mean that we end up with an assessment which is not coherent and invalid.
The following Table 1 shows an example of detailed descriptions of the factors measured by
GSQ20, and in this example, a stressor is defined as a personal or environmental event that
causes stress:
Table 1: Identified GSQ factors and the description of each factor.
Identified factors
Description
Family
Events occur in family that can lead to person's emotional disturbances such
as poor
relationship with spouse, poor support from family members and having lack
of skill in managing family.
Relationship with
superior
Interpersonal relationship events that can cause distress feelings to a person
such as lack of support from superior and unfair assessment from
supervisors.
Bureaucratic
constraints
Organizational working environment that can cause distress feeling to a
person such as lack of support from authority, having to do task out of ability,
and lack of opportunity in decision making.
Work-family conflicts
Work events that compromise a person’s personal and home life that lead to
distress feelings such as life is too centered on working, advancing career at
the expense of personal or home life and work demands affect personal life.
Relationship with
colleagues
Interpersonal relationship events that can cause distress feelings to a person
such as lack of support from uncooperative and incompetence colleagues.
Performance
pressure
Work demands that cause emotional disturbances to a person such as work
overload, short duration given to complete tasks and doing high risk task
where any mistake can lead to disastrous consequences.
Job prospect
Events related to reward and recognition given to an individual that cause
distress feelings such as lack of promotion prospect, feeling of being
underpaid, and lack of recognition to the job.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 11
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Generating and Writing Items
This is the critical step in the development of questionnaire, to write items and to consider the most
appropriate response format, resulting in what is commonly called an answer sheet19. Each item should
be written clearly and simply, avoiding double negatives and being as short as possible. To reduce
response bias, where someone tends to give the same answer to every item, reverse-phrase some of
them (i.e., negative items) and do not forget to reverse score the items afterwards before any analysis.
Layout of items should be simple and straightforward and enable respondents to easily connect
responses with the different options. This is important to ensure the response process validity. The
following are several areas need to be considered to generate items:
i. Test content – a) use a grid-style blueprint for determining content areas and how these are
potentially manifest by people, b) get a small group to brainstorm a list of as many facets of an
attribute as possible or c) include people who might be at the extremes of the attribute so that
you can identify item content which reflects the entire spectrum.
ii. Target population – it should be defined clearly.
iii. The kinds of items needed and their number - we need to reflect all relevant aspects of the
attribute and for a relatively simple measure, it would be wise to aim for at least 10 items per
attribute at the development stage.
iv. Administration instructions - it is important to be clearly developed, especially for a self-
reporting questionnaire. Clear instructions ensure respondents understand what to do and how
to rate the questionnaire, hence it is strengthening the response process validity.
v. Estimate the time limits, or the time required for completion – this depends on the kind of
measure, and it is wise to develop a questionnaire that requires less time required for completion.
If possible, try to make it less than 15 minutes! The shorter time required for completion, the
better the response rate.
vi. How scores should be calculated and interpreted - the simplest process might be to sum
responses, though if you write some items which are negatively phrased compared to others you
will need to reverse their scores before totalling. Providing clear interpretation of the scores is
vital to ensure the consistency of its meaning across researches.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 12
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
The following Table 2 shows a sample of items potentially representing each factor of GSQ20 based on
the literature review.
Table 2: The GSQ factors and its potential items.
Identified factors
Potential items
Family
- Inadequate preparation for dealing with more difficult aspects of family
matters
- Insufficient knowledge in educating and building child/children characters
- Poor communication and relationship with family members
- Poor relationship with spouse
Relationship with
superiors
- Lack of support from superior
- Difficulty in maintaining relationship with superior
- My beliefs contradict with those of my superior
- Unfair assessment from superior
Bureaucratic
constraints
- Lack of authority to carry out my job duties
- Unable to make full use of my skills and ability
- Cannot participate in decision making
- Having to do work outside of my competence
Work-family
conflicts
- Work demands affect my personal/home life
- Advancing a career at the expense of home/personal life
- My life is too centered on my work
- Absence of emotional support from family
Relationship with
colleagues
- Working with uncooperative colleagues
- Working with incompetence colleagues
- Relationship problems with colleagues/subordinates
- Competition among colleagues
Performance
pressure
- Time pressures deadlines to meet
- Work overload
- Fear of making mistakes that can lead to serious consequences
- My work is mentally straining
Job prospect
- Feeling insecure in this job
- Society does not think highly of my profession
- Lack of promotion prospects
- Feeling of being underpaid
Selecting Response Format
Response format should be selected based on the nature of questionnaire1, for instance, GSQ uses the
rating scale based on the following Likert-scale: 0 = causing no stress at all, 1 = causing mild stress, 2 =
causing moderate stress, 3 = causing high stress, 4 = causing severe stress4. The selection of response
format due to the nature of GSQ assessing stress level caused by each potential item, which is stressor.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 13
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
The following are examples of response format that are commonly used in questionnaire21; however,
researchers can choose other response format to fit with the nature of items:
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 14
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
For choosing the Likert-type scale response anchors, we recommend for researchers to refer an article
written by Vagias (2006)21.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 15
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
QUESTIONNAIRE VALIDATION
Parameters for Selecting Items
The following Table 3 summarizes the parameters to be considered for selecting items .
Table 3: The summary of parameters for selecting items.
Parameters
Descriptions
Content validity index
Content validation should be carried out to assess how represent and
relevant the items are with respect to the construct of interest15.
Face validity index
Response process validation should be carried out to assess
respondents understanding and interpretation of items in the manner
that survey designer intends14,22.
Item percentage of
response
Descriptive statistics should be carried out on each item to assess
minimum and maximum rating – this is obtained to ensure all range of
the rating scale is utilized (e.g. 1 to 5).
Floor and ceiling effects
Descriptive statistics should be carried out on each item to assess the
floor and ceiling effects, when more than 15% of responses are at the
lowest or the highest ends of the scale.
Mean (standard deviation)
or Median (interquartile
range)
Descriptive statistics should be carried out on each item to assess the
distribution of responses per item. Items with good distribution of
responses are selected.
Factorial Structure
Validation studies are carried to assess the factor structure of
questionnaire based on the following factor analysis:
i. Exploratory Factory Analysis (EFA) explores the potential factors can
be extracted from the validation data.
ii. Confirmatory Factor Analysis (CFA) tests the latent factors based on
the validation data.
Internal Consistency
Reliability analysis will be performed to assess the internal consistency,
commonly presented by Cronbach’s alpha coefficients
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 16
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
The next sections will elaborate and demonstrate the processes of content validation, response process
validation, EFA, and reliability analysis.
Content Validity Index
Content validity is defined as the degree to which elements of an assessment instrument are relevant to
and representative of the targeted construct for a particular assessment purpose1,2. The relevance of an
assessment tool refers to the appropriateness of its elements for the targeted constructs and functions
of assessment, while the representativeness of an assessment tool refers to the degree to which its
elements proportional to the facets of the targeted construct2. Despite the two aspects of content validity
(i.e., relevant and representativeness of an assessment tool), the relevant of an assessment tool has
been frequently used to measure the content validity5-7. It is important to note that establishing the content
validity is vital to support the validity of an assessment tool such as questionnaires, especially for research
purpose. The content validity evidence can be represented by the content validity index (CVI)5-8. This
section describes the best practice to quantify content validity using CVI based on the following are the
six steps of content validation15:
1. Preparing content validation form
2. Selecting a review panel of experts
3. Conducting content validation
4. Reviewing domains and items
5. Providing score on each item
6. Calculating CVI
1.0 Preparing content validation form
The first step of content validation is to prepare the content validation form to ensure the review panel of
experts will have clear expectation and understanding about the task. An example for the instruction and
rating scale is provided in Figure CVI 1. The recommended rating scale of relevance has been used for
scoring individual items (Figure CVI 2). It is recommended to provide the definition of domain to facilitate
the scoring process by the experts – please refer to Figure CVI 2 for an example.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 17
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure CVI 1: An example of instruction and rating scale in the content validation form to the experts
Figure CVI 2: An example of layout for content validation form with domain, its definition and items
represent (measure) the domain.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 18
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
2.0 Selecting a Review Panel of Experts
The selection of individual to review and critique an assessment tool (e.g., questionnaire) is usually based
on the individual expertise with the topic to be studied. The following table summarizes the recommended
number of experts with its implication on the acceptable cut-off score of CVI (Table 4).
Table 4: The number of experts and its implication on the acceptable cut-off score of CVI
Number of experts
Acceptable CVI values
Source of recommendation
Two experts
At least 0.80
Davis (1992)
Three to five experts
Should be 1
Polit & Beck (2006), Polit et al., (2007)
At least six experts
At least 0.83
Polit & Beck (2006), Polit et al., (2007)
Six to eight experts
At least 0.83
Lynn (1986)
At least nine experts
At least 0.78
Lynn (1986)
It can be concurred that for content validation, the minimum acceptable expert number is two, however
most of recommendations propose a minimum of six experts. Considering the recommendations and the
author’s experience, the number of experts for content validation should be at least six and does not
exceed 10.
3.0 Conducting Content Validation
The content validation can be conducted through the face-to-face or non-face-to-face approach. For the
face-to-face approach, an expert panel meeting is organised, and the researcher facilitates the content
validation process through Step 4.0 to Step 5.0 (will be described later). For the non-face-to-face
approach, usually an online content validation form is sent to the experts and clear instructions are
provided (Figure 1) to facilitate the content validation process. The most important factors need to be
considered are cost, time and response rate. The cost and time might be the challenging factor to conduct
the face-to-face approach because of difficulty to get all experts be together, but the response rate will
be at the highest. The response rate and time might be the challenging factor for the non-face-to-face
approach because of difficulty to get response on time and at risk of not getting response at all from the
expert, however the cost saving is the biggest advantage. Nevertheless, based on the author’s
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 19
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
experience, the non-face-to-face approach is very efficient if a systematic follow-up is in place to improve
the response rate and time.
4.0 Reviewing Domain and Items
In the content validation form, the definition of domain and the items represent the domain are clearly
provided to the experts as shown in Figure CVI 2. The experts are requested to critically review the
domain and its items before providing score on each item. The experts are encouraged to provide verbal
comment or written comment to improve the relevance of items to the targeted domain. All comments
are taken into consideration to refine the domain and its items.
5.0 Providing score on each item
Upon completion of reviewing domain and items, the experts are requested to provide score on each
item independently based on the relevant scale (Figure 1 and Figure 2). The experts are required to
submit their responses to the researcher once they have completely provided the score on all items.
6.0 Calculating CVI
Table 5: The definition and formula of I-CVI, S-CVI/Ave and S-CVI/UA
The CVI indices
Definition
Formula
I-CVI (item-level content
validity index)
The proportion of content experts giving item
a relevance rating of 3 or 4
I-CVI = (agreed
item)/(number of expert)
S-CVI/Ave (scale-level
content validity index
based on the average
method)
The average of the I-CVI scores for all items
on the scale or the average of proportion
relevance judged by all experts. The
proportion relevant is the average of
relevance rating by individual expert.
S-CVI/Ave = (sum of I-CVI
scores)/(number of item)
S-CVI/Ave = (sum of
proportion relevance
rating)/(number of expert)
S-CVI/UA (scale-level
content validity index
based on the universal
agreement method)
The proportion of items on the scale that
achieve a relevance scale of 3 or 4 by all
experts. Universal agreement (UA) score is
given as 1 when the item achieved 100%
experts in agreement, otherwise the UA score
is given as 0.
S-CVI/UA = (sum of UA
scores)/(number of item)
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 20
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
** The definition and formula was based on the recommendations by Lynn (1985), Davis (1995),
Polit & Beck (2006) and Polit et al (2007)
There are two forms of CVI, in which CVI for item (I-CVI) and CVI for scale (S-CVI). Two methods for
calculating S-CVI, in which the average of the I-CVI scores for all items on the scale (S-CVI/Ave) and
the proportion of items on the scale that achieve a relevance scale of 3 or 4 by all experts (S-CVI/UA)
(6). The definition and formula of the CVI indices are summarised in Table 5.
Table 6: The relevance ratings on the item scale by ten experts
Prior to the calculation of CVI, the relevance rating must be recoded as 1 (relevance scale of 3 or 4) or 0
(relevance scale of 1 or 2) as shown in Table 6. To illustrate the calculation of different CVI indices, the
relevance ratings on item scale by ten experts are provided in Table 3.
To illustrate the calculation for the CVI indices (please refer to Table 5), the following are examples of
calculation based on the data provided in Table 6:
• Experts in agreement: just sum up the relevant rating provided by all experts for each item, for
example, the experts in agreement for Q2 (1 + 0 + 1 +1 + 1 + 1 + 1+ 1 + 1 + 1) = 9
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 21
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
• Universal agreement: score ‘1’ is assigned to the item that achieved 100% experts in agreement,
for examples, Q1 obtained 1 because all the experts provided relevance rating of 1, while Q2
obtained 0 because not all the experts provided relevance rating of 1.
• I-CVI: the expert in agreement divided by the number of experts, for example I-CVI of Q2 is 9
divided by 10 experts that is equal to 0.9.
• S-CVI/Ave (based on I-CVI): the average of I-CVI scores across all items, for example the S-
CVI/Ave [(10 + 9 + 0 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)/12] is equal to 0.91.
• S-CVI/Ave (based on proportion relevance): the average of proportion relevance scores across
all experts, for example the S-CVI/Ave [(0.92 + 0.83 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 +
0.92 + 0.92)/10] is equal to 0.91.
• S-CVI/UA: the average of UA scores across all items, for example the S-CVI/UA [(1 + 0 + 0 + 1
+ 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)/12] is equal to 0.83.
Based on the above calculation, we can conclude that I-CVI, S-CVI/Ave and S-CVI/UA meet satisfactory
level, and thus the scale of questionnaire has achieved satisfactory level of content validity. For more
examples on how to report the content validity index, please refer to papers written by Hadie et al (2017)14,
Ozair et al (2018)9, Lau et al (2018)10 and Marzuki et al (2018)11.
Content validity is vital to ensure the overall validity of an assessment, therefore a systematic approach
for content validation should be done based on the evidence and best practice. This paper has provided
a systematic and evidence-based approach to conduct a proper content validation.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 22
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Face Validity Index
Commonly, response process validity evidence is performed after content validity has been established13-
15 and response process validity is also known as the face validity that refers to the degree to which test
respondents view the content of a test and its items as relevant to the context in which the test is being
administered16. Similarly, other researchers define face validity as the degree of raters judge the items of
an assessment instrument are appropriate to the targeted construct and assessment objectives17,18. The
raters of face validity include i) the person who actually take the test, ii) the nonprofessional users who
work with the results of the test, and iii) the general public18. In other words, the people who are involved
with the test taking should be asked to do the rating, in which they cannot be replaced by professional,
experts or pyschometricians18. The raters’ understanding and interpretation about the items will
determine the accuracy of an assessment tool to measure the targeted construct. People with similar
background rate test face validity similarly, and they rate the face validity of different tests differently18.
Due to so much concern about the face validity concept, Cook & Beck (2006) has avoided to use the face
validity term7, instead the researchers use the response process evidence of validity as the term to reflect
the thought processes of users of the tested assessment as they respond to the tool1,4 and it can be
quantified by face validity index (FVI)4, 9-11. These are commonly evaluated in the form of clarity and
comprehensibility of instructions and language used in the assessment tool by the raters1,4. The clarity of
instructions and language refers to whether there were ambiguities or multiple ways to interpret the items,
whereas the comprehensibility of instructions and language refers to whether words and sentences of
the constructed items can be understood easily by raters. It is important to establish response process
validity to support the overall validity of an assessment tool such as questionnaires, especially for
research purpose. The response process validity can be represented by FVI and several studies4,9-11 had
calculated it to support the validity of an assessment tool. This section describes the best practice to
perform response process validation and calculating FVI based on the following are the six steps of
response process validation:
1. Preparing response process validation form
2. Selecting a panel of raters
3. Conducting response process validation
4. Reviewing items for clarity and comprehension
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 23
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
5. Providing score on each item based on the clarity and comprehensibility rating scale
6. Calculating FVI
1.0 Preparing response process validation form
The first step of response process validation (can be also known as face validation) is to prepare the
response process validation form to ensure the panel of raters who are the intended respondents will
have clear expectation and understanding about the task. An example for the instruction and rating scale
is provided in Figure FVI 1. The rating scales of clarity and comprehension have been used for scoring
individual items5-8 (Figure FVI 2).
Figure FVI 1: An example of instruction and rating scale in the response process validation form to the
raters, in this case students.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 24
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure FVI 2: An example of layout for response process validation form with domain and items.
2.0 Selecting a panel of raters
The selection of raters to review and critique an assessment tool (e.g., questionnaire) is usually based
on the target user of the tool, for examples students, public and teachers. The following table summarizes
the number of raters with its implication on the acceptable cut-off score of FVI (Table 7) based on the
previous studies4,9-11,22-25.
Table 7: The number of raters and its implication on the acceptable cut-off score of FVI
Source of study
Number of raters
Acceptable FVI value
Method
Hadie et al (2017)
30 medical students
At least 0.80
Face to face survey
Ozair et al (2017)
30 paramedics
At least 0.83
Face to face survey
Lau et al (2017)
30 parents of pre-school
children
At least 0.80
Face to face survey
Lau et al (2018)
30 parents of pre-school
children
At least 0.80
Face to face survey
Marzuki et al (2018)
10 users of medical apps
At least 0.83
Online survey
Chin et al (2018)
32 medical students
At least 0.80
Online survey
Mahadi et al (2018)
32 medical students
At least 0.80
Online survey
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 25
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
It can be concurred that for response process validation, the minimum acceptable number of rater is 10,
however most studies administered to raters at least 30 raters. Considering the previous studies (Table
7) and the author’s experience, the number of experts for content validation should not less than 10
raters.
3.0 Conducting Response Process Validation
The response process validation can be conducted through face-to-face or online survey (Table 7). For
the face-to-face survey, the researcher facilitates the response process validation process by holding a
meeting with the raters and followed by Step 4 and Step 5 (will be elaborated later). For the online survey,
an online response process validation form is sent to the raters and clear instructions are provided (Figure
FVI 1) to facilitate the validation process. Based on the author’s experience, the face-to-face approach is
very efficient to increase the response rate, whereas the online survey is efficient in term of cost and time.
4.0 Reviewing items for clarity and comprehension
In the response process validation form, the domain and its items are provided to the raters as shown in
Figure 2. The raters are requested to review all items before providing score on each item. The raters
are encouraged to provide verbal comment or written comment to improve the clarity and comprehension
of the items. All comments are taken into consideration to refine items.
5.0 Providing score on each item based on the clarity and comprehensibility rating scale
Upon completion of reviewing all items, the raters are requested to provide score on each item
independently based on the clarity and comprehension scale (Figure 1 and Figure 2). The raters are
required to submit their responses to the researcher once they have completely provided the score on all
items.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 26
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Step 6: Calculating FVI
Table 8: The definition and formula of I-FVI, S-FVI/Ave and S-FVI/UA
The CVI indices
Definition
Formula
I-FVI (item-level face validity
index)
The proportion of rater giving item a clarity and
comprehension rating of 3 or 4
I-FVI = (agreed item)/(number of
rater)
S-FVI/Ave (scale-level face
validity index based on the
average method)
The average of the I-FVI scores for all items on the
scale or the average of proportion clarity and
comprehension judged by all raters. The proportion
clarity and comprehension is the average of rating by
individual rater.
S-FVI/Ave = (sum of I-FVI
scores)/(number of item)
S-FVI/Ave = (sum of proportion
clarity and comprehension
rating)/(number of rater)
S-FVI/UA (scale-level face
validity index based on the
universal agreement method)
The proportion of items on the scale that achieve a
clarity and comprehension scale of 3 or 4 by all raters.
Universal agreement (UA) score is given as 1 when the
item achieved 100% raters in agreement, otherwise the
UA score is given as 0.
S-FVI/UA = (sum of UA
scores)/(number of item)
** The definition and formula was based on the content validity index formula reported in Yusoff (2019) (3)
There are two forms of FVI, in which FVI for item (I-FVI) and FVI for scale (S-FVI). Two methods for
calculating S-FVI, in which the average of the I-FVI scores for all items on the scale (S-FVI/Ave) and the
proportion of items on the scale that achieve a clarity and comprehension scale of 3 or 4 by all raters (S-
FVI/UA) (Table 8). The definition and formula of the FVI indices are summarised in Table 8.
Prior to the calculation of FVI, the clarity and comprehension rating must be recoded as 1 (the scale of 3
or 4) or 0 (the scale of 1 or 2) as shown in Table 9. To illustrate the calculation of different FVI indices,
the clarity and comprehension ratings on item scale by 10 raters are provided in Table 9.
Table 9: The clarity and comprehension ratings on the item scale by 10 raters
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 27
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
To illustrate the calculation for the FVI indices (please refer to Table 8), the following are examples of
calculation based on the data provided in Table 9:
• Raters in agreement: just sum up the relevant rating provided by all raters for each item, for
example, the raters in agreement for Q2 (1 + 0 + 1 +1 + 1 + 1 + 1+ 1 + 1 + 1) = 9
• Universal agreement: score ‘1’ is assigned to the item that achieved 100% raters in agreement,
for examples, Q1 obtained 1 because all the raters provided rating of 1, while Q2 obtained 0
because not all raters provided rating of 1.
• I-FVI: the raters in agreement divided by the number of raters, for example I-FVI of Q2 is 9 divided
by 10 raters that is equal to 0.9.
• S-FVI/Ave (based on I-FVI): the average of I-FVI scores across all items, for example the S-
FVI/Ave [(10 + 9 + 0 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)/12] is equal to 0.91.
• S-FVI/Ave (based on proportion clarity and comprehension): the average of proportion clarity
and comprehension scores across all raters, for example the S-FVI/Ave [(0.92 + 0.83 + 0.92 +
0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92)/10] is equal to 0.91.
• S-FVI/UA: the average of UA scores across all items, for example the S-FVI/UA [(1 + 0 + 0 + 1
+ 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)/12] is equal to 0.83.
Based on the above calculation, we can conclude that I-FVI, S-FVI/Ave and S-FVI/UA meet satisfactory
level, and thus the scale of questionnaire has achieved satisfactory level of response process validity.
For more examples on how to report the response process validity index, please refer to papers written
by Hadie et al (2017), Ozair et al (2017), Lau et al (2017), Lau et al (2018), Marzuki et al (2018), Chin et
al (2018) and Mahadi et al (2018).
Response process validity is vital to ensure the overall validity of an assessment, therefore a systematic
approach for validating response process should be done based on the best evidence. This paper has
provided a systematic and evidence-based approach to conduct a proper response process validation
through face validity index.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 28
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Exploratory Factor Analysis
The primary purpose of Exploratory Factor Analysis (EFA) is to explore the number of constructs that can
be extracted from a data set26. This section illustrates the key steps of running EFA by SPSS version 26
and the SPSS data set can be downloaded from this link https://tinyurl.com/yb9wup7e. The data set must
be checked and cleaned from incomplete data before running EFA.
1.0 Running EFA
Figure EFA 1: Getting into EFA
Open the data set for EFA, click on ‘Analyze’ tab, browse on ‘Dimension Reduction’, and click ‘Factor’
menu (Figure EFA 1). The following display will appear on the screen (Figure EFA 2).
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 29
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 2: The main dialog box of Factor Analysis
Figure EFA 3: Selection of all items into the ‘Variables’ box
Select all items into the ‘Variables’ box and click on the ‘Descriptives’ menu (Figure EFA 3). The
following display will appear (Figure EFA 4). Tick on ‘Univariate descriptives’, ‘Initial solution’, and
‘KMO and Bartlett’s test of sphericity’ options as shown in Figure EFA 4. Click on the ‘Continue’ menu
to proceed to the next step.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 30
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 4: The factor analysis descriptives screen
Figure EFA 5: The main EFA displays and the ‘Extraction’ menu
Click on the ‘Extraction’ menu (Figure EFA 5). The following display will appear (Figure EFA 6). Tick on
‘Unrotated factor solution’, and ‘Scree plot’ options as shown in Figure EFA 6.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 31
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 6: The factor analysis extraction display
In the ‘Extract’ column, researchers can choose either to let EFA suggesting the number of factors can
be extracted based on the Eigenvalues (by default is 1) or researchers can specify the intended number
of factors by specifying the exact number of factor to be extracted in the ‘Factors to extract’ box (Figure
EFA 6). Click on the ‘Continue’ menu to proceed to the next step.
Figure EFA 7: The main EFA displays and the ‘Rotation’ menu
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 32
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Click on the ‘Rotation’ menu (Figure EFA 7), the following display will appear as shown in Figure EFA 8.
The purpose of the rotation is to optimize the factor loading of each item on all other factors. There two
types of rotations, orthogonal and oblique. The orthogonal rotations assume all factors are independent
of each other (not overlapping), while the oblique rotations assume all factors are dependent on each
other (overlapping). Researchers are recommended to choose the rotation based on their assumption of
the factors to be extracted. In this example, the ‘Maximum Iterations for Convergence’ value was set at
30, in which by default SPSS set it at 25. The higher value due to the large sample size (more than 2500),
thus requires more iteration to optimize the factor loadings across the extracted factors. Click on the
‘Continue’ menu to proceed to the next step.
Figure EFA 8: The Factor Analysis Rotation menu
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 33
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 9: The main EFA displays and the ‘Options’ menu
Figure EFA 10: The Factor Analysis Options menu
Click on the ‘Options’ menu (Figure EFA 9), the display will appear as shown in Figure EFA 10. In the
‘Coefficient Display Format’ column, tick on ‘Sorted by size’ and ‘Suppress small coefficients’ options
(Figure EFA 10). Insert a value of 0.40 in the ‘Absolute value below’ box and click on the ‘Continue’
menu to proceed to the next step.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 34
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
2.0 Interpretation of EFA Outputs
Several outputs appear after running EFA. Descriptive statistics provide information on several
parameters including mean, standard deviation, and the number of responses (Figure EFA11.0). This
output provides an idea about the extent of data completeness. Click on the ‘KMO and Bartlett's Test’
bar for the next output (Figure EFA12.0).
Figure EFA 11: The Descriptive Statistics output
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 35
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 12: The KMO and Bartlett’s Test output
The KMO and Bartlett’s Test output (Figure EFA 12) provides parameters to indicate the sampling
adequacy and appropriateness of factor analysis. The ‘Kaiser-Meyer-Olkin Measure of Sampling
Adequacy’ value of more than 0.7 is considered as a good level of factor distinction. The significant
‘Bartlett’s Test of Sphericity’ (p-value less than 0.05) indicates the factor analysis is appropriate. If the
value of ‘Kaiser-Meyer-Olkin Measure of Sampling Adequacy’ is less than 0.5, researchers should
consider collecting more samples.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 36
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA13.0: The Communalities output
The communality (extraction) is the squared root factor loadings, representing the proportion of variance
in the observed variable that is explained by the factor. It reflects the contribution of the item to the
extracted factor. The communality (extraction) should be at least 0.70 and the average value of
communalities should be at least 0.6026. In this example, only one item has more than 0.70 and the
average of communality values is less than 0.60. However, considering the large sample, it is probably
safe to assume Kaiser’s criterion.
Figure EFA 14: The Total Variance Explained output
The ‘Total Variance Explained’ output (Figure EFA 14) proposes the number of factors that can be
extracted based on the Eigenvalues (in this example, the value was set at 1 and higher). The ‘Initial
Eigenvalues’ column provides the total number of factors potentially can be formed regardless of the
Eigenvalues value (before extraction). The ‘Extraction Sum of Squared Loadings’ provides the number
of factors that can be extracted based on the Eigenvalues of 1 (after extraction) and it shows that the four
factors have Eigenvalues of at least 1. The third column, ‘Rotation Sums of Squared Loading’, provides
the results after Varimax rotation. The total Eigenvalues after the Varimax rotation have been optimally
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 37
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
distributed across the four factors. This information is important for researchers to decide whether to
choose the ‘Component Matrix’ (Figure EFA 15) or the ‘Rotated Component Matrix’ (Figure EFA 16).
In this case, the Eigenvalues have been optimally distributed across the four factors, the ‘Rotated
Component Matrix’ will be chosen as the final factorial structure. The detailed output of the ‘Rotated
Component Matrix’ is displayed in Figure EFA 17.
Figure EFA 15: The Component Matrix output
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 38
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 16: The Rotated Component Matrix output
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 39
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure EFA 17: The detail output of the Rotated Component Matrix
The factor loadings of more than 0.40 are displayed in the respective factor (component) as shown in
Figure EFA 17. The analysis proposed four factors are representing the questionnaire.
Rotated Component Matrixa
Component
1
2
3
4
q06 I have little experience of computers
.800
q18 SPSS always crashes when I try to use it
.684
q13 I worry that I will cause irreparable damage
because of my incompetenece with computers
.647
q07 All computers hate me
.638
q14 Computers have minds of their own and
deliberately go wrong whenever I use them
.579
q10 Computers are useful only for playing games
.550
q15 Computers are out to get me
.459
q20 I can't sleep for thoughts of eigen vectors
.677
q21 I wake up under my duvet thinking that I am
trapped under a normal distribtion
.661
q03 Standard deviations excite me
.567
q12 People try to tell you that SPSS makes statistics
easier to understand but it doesn't
.473
.523
q04 I dream that Pearson is attacking me with
correlation coefficients
.516
q16 I weep openly at the mention of central tendency
.514
q01 Statiscs makes me cry
.496
q05 I don't understand statistics
.429
q08 I have never been good at mathematics
.833
q17 I slip into a coma whenever I see an equation
.747
q11 I did badly at mathematics at school
.747
q09 My friends are better at statistics than me
.648
q22 My friends are better at SPSS than I am
.645
q23 If I'm good at statistics my friends will think I'm a
nerd
.586
q02 My friends will think I'm stupid for not being able
to cope with SPSS
.543
q19 Everybody looks at me when I use SPSS
.428
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.a
a. Rotation converged in 8 iterations.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 40
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Reliability Analysis
Once the items have been identified based on the factor, the reliability analysis will be performed to
determine the internal consistency of each factor. This chapter illustrates the key steps of reliability
analysis using SPSS version 26.
1.0 Running Reliability Analysis
Figure RA 1: The reliability analysis menu
Open the data set for EFA, click on ‘Analyze’ tab, scroll over the ‘Score’, and click ‘Reliability Analysis’
menu (Figure RA 1). The following display will appear on the screen (Figure RA 2).
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 41
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure RA 2: The reliability analysis interface
Select the items according to the specific factor (in this case, all items of factor 1) and move them to the
‘Items’ box as shown in Figure RA 3. The Cronbach alpha is the commonest coefficient being used to
determine the internal consistency level. Click on the ‘Statistics’ menu for the next step.
Figure RA 3: The transfer of select item for a specific factor to the ‘Items’ box
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 42
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Figure RA 4: The Reliability Analysis Statistic display
Tick on ‘Item’, ‘Scale’, and ‘Scale if item deleted’ options as shown in Figure RA 4. Then click on
‘Continue’ and ‘OK’ bar.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 43
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
2.0 Interpretation of Reliability Analysis Outputs
The following is the reliability analysis output interface (Figure RA 5). It shows the total number of valid
cases and the Cronbach alpha value for the six items of factor 1, in this case, it was 0.814.
Figure RA 5: The reliability output interface
Figure RA 6: The Item-Total Statistics output
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 44
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
The ‘Item-Total Statistics’ provides parameters that suggest the contribution of each item to the factor.
The ‘Corrected Item-Total Correlation’ values should be at least 0.30 to indicate an acceptable level of
contribution to the factor. The ‘Cronbach’s Alpha if Item Deleted’value is another parameter to indicate
the contribution of items to the factor. If the ‘Cronbach’s Alpha if Item Deleted’ value is lower than the
Cronbach alpha (in this case 0.814), it indicates the item has a high contribution to the reliability of the
factor, thus should be retained. If the value is higher, it indicates the item has less contributed to the
reliability, thus can be removed. Repeat the above steps for the remaining factor. Generally, the internal
consistency of above 0.70 or greater is preferred27.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 45
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
Reporting EFA and Reliability Analysis Results
Once EFA and reliability analysis completely performed, the following is the recommended way to
report the result:
Table 10: Summary of EFA results for the questionnaire (N = 2571)
Items
Factor 1
Factor 2
Factor 3
Factor 4
Q06 I have little experience of computers
0.800
Q18 SPSS always crashes when I try to use it
0.684
Q13 I worry that I will cause irreparable damage because of
my incompetenece with computers
0.647
Q07 All computers hate me
0.638
Q14 Computers have minds of their own and deliberately go
wrong whenever I use them
0.579
Q10 Computers are useful only for playing games
0.550
Q15 Computers are out to get me
0.459
Q20 I can't sleep for thoughts of eigen vectors
0.677
Q21 I wake up under my duvet thinking that I am trapped
under a normal distribtion
0.661
Q03 Standard deviations excite me
0.567
Q12 People try to tell you that SPSS makes statistics easier to
understand but it doesn't
0.523
Q04 I dream that Pearson is attacking me with correlation
coefficients
0.516
Q16 I weep openly at the mention of central tendency
0.514
Q01 Statiscs makes me cry
0.496
Q05 I don't understand statistics
0.429
Q08 I have never been good at mathematics
0.833
Q17 I slip into a coma whenever I see an equation
0.747
Q11 I did badly at mathematics at school
0.747
Q09 My friends are better at statistics than me
0.648
Q22 My friends are better at SPSS than I am
0.645
Q23 If I'm good at statistics my friends will think I'm a nerd
0.586
Q02 My friends will think I'm stupid for not being able to cope
with SPSS
0.543
Q19 Everybody looks at me when I use SPSS
0.428
Eigenvalues
3.730
3.340
2.553
1.850
Percentage of Variance
16.21
14.52
11.09
8.47
Cronbach’s Alpha
0.81
0.82
0.82
0.57
For this example we might write something like this:
A principal component analysis (PCA) was conducted on the 23 items with orthogonal rotation
(varimax). The Kaiser–Meyer–Olkin measure verified the sampling adequacy for the analysis,
KMO = .93, which is well above the acceptable limit of 0.5. Bartlett’s test of sphericity ꭓ² (253) =
19334.49, p < 0.001, indicated that correlations between items were sufficiently large for PCA. An
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 46
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
initial analysis was run to obtain eigenvalues for each component in the data. Four components
had eigenvalues over Kaiser’s criterion of 1 and in combination explained 50.32% of the variance.
Given the large sample size, Kaiser’s criterion on four components, this is the number of
components that were retained in the final analysis. Table 1 shows the factor loadings after
rotation. The items that cluster on the same components suggest that Factor 1 represents a fear
of computers, Factor 2 represents a fear of statistics, Factor 3 represents a fear of maths, and
Factor 4 represents peer evaluation concerns.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 47
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
CONCLUSION
It is hoped that this short module is useful and helpful for anyone to develop and validate any
questionnaire. This module does not cover confirmatory factor analysis (CFA), but CFA can always be
learned from other modules that are available in the market.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 48
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
REFERENCE
1. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments:
theory and application. The American journal of medicine. 2006;119(2):166. e7-. e16.
2. Haynes SN, Richard D, Kubany ES. Content validity in psychological assessment: A functional
approach to concepts and methods. Psychological assessment. 1995;7(3):238.
3. Yusoff MSB. A systematic review on validity evidence of medical student stressor questionnaire.
Education in Medicine Journal. 2017;9(1):1-16.
4. Hadie SNH, Hassan A, Ismail ZIM, Asari MA, Khan AA, Kasim F, et al. Anatomy education
environment measurement inventory: A valid tool to measure the anatomy learning environment.
Anatomical sciences education. 2017;10(5):423-32.
5. Davis LL. Instrument review: Getting the most from a panel of experts. Applied nursing research.
1992;5(4):194-7.
6. Polit DF, Beck CT. The content validity index: are you sure you know what's being reported? Critique
and recommendations. Research in nursing & health. 2006;29(5):489-97.
7. Polit DF, Beck CT, Owen SV. Is the CVI an acceptable indicator of content validity? Appraisal and
recommendations. Research in nursing & health. 2007;30(4):459-67.
8. Lynn MR. Determination and quantification of content validity. Nursing research. 1986;35(6):381-5.
9. Ozair MM, Baharuddin KA, Mohamed SA, Esa W, Yusoff MSB. Development and Validation of the
Knowledge and Clinical Reasoning of Acute Asthma Management in Emergency Department (K-
CRAMED). Education in Medicine Journal. 2017;9(2):1-17.
10. Lau AS, Yusoff MS, Lee Y-Y, Choi S-B, Xiao J-Z, Liong M-T. Development and validation of a
Chinese translated questionnaire: A single simultaneous tool for assessing gastrointestinal and upper
respiratory tract related illnesses in pre-school children. Journal of Taibah University medical
sciences. 2018;13(2):135-41.
11. Marzuki MFM, Yaacob NA, Yaacob NM. Translation, Cross-Cultural Adaptation, and Validation of
the Malay Version of the System Usability Scale Questionnaire for the Assessment of Mobile Apps.
JMIR human factors. 2018;5(2):e10308.
12. American Educational Research Association, American Psychological Association, National Council
on Measurement in Education. Standards for Educational and Psychological Testing. Washington,
DC: American Educational Research Association; 1999.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 49
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
13. Mosier CI. A critical examination of the concepts of face validity. Educational and Psychological
Measurement. 1947;7(2):191-205.
14. Anthony R. Artino J, Rochelle JSL, Dezee KJ, Gehlbach H. Developing questionnaires for
educational research: AMEE Guide No. 87. Medical Teacher. 2010;36:463–74.
15. Yusoff MSB. ABC of content validation and content validity index calculation. Education in Medicine
Journal. 2019;11(2):49-54.
16. Holden RR. Face validity. The corsini encyclopedia of psychology. 2010:1-2.
17. Hardesty DM, Bearden WO. The use of expert judges in scale development: Implications for
improving face validity of measures of unobservable constructs. Journal of Business Research.
2004;57(2):98-107.
18. Nevo B. Face validity revisited. Journal of Educational Measurement. 1985;22(4):287-93.
19. Coaley K. An Introduction to Psychological Assessment and Psychometrics. London: Sage; 2010.
20. Yusoff MSB, Esa AR. The reliability and validity of the General Stressor Questionnaire (GSQ) among
house officer. International Medical Journal, 18 (3), 179-182.
21. Vagias WM. Likert-type scale response anchors. Clemson International Institute for Tourism &
Research Development, Department of Parks, Recreation and Tourism Management, Clemson
University. 2006.
22. Yusoff MSB. ABC of Response Process Validation and Face Validity Index Calculation. Education in
Medicine Journal. 2019;11(3):55-61.
23. Lau AS-Y, Yusoff MSB, Lee YY, Choi S-B, Rashid F, Wahid N, et al. Development, Translation and
Validation of Questionnaires for Diarrhea and Respiratory-related Illnesses during Probiotic
Administration in Children. Education in Medicine Journal. 2017;9(4).
24. Chin RWA, Chua YY, Chu MN, Mahadi NF, Wong MS, Yusoff MS, et al. Investigating validity
evidence of the Malay translation of the Copenhagen Burnout Inventory. Journal of Taibah University
Medical Sciences. 2018;13(1):1-9.
25. Mahadi NF, Chin RWA, Chua YY, Chu MN, Wong MS, Yusoff MSB, et al. Malay Language
Translation and Validation of the Oldenburg Burnout Inventory Measuring Burnout. Education in
Medicine Journal. 2018;10(2).
26. Field, Andy P. Discovering Statistics Using SSPSS. Los Angeles, Thousand Oaks, Calif. SAGE
Publications, 2009.
EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 50
VALIDATION FOR HEALTH SCIENCE STUDIES
PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES
27. Cortina, J. M. What is coefficient alpha? An examination of theory and applications. Journal of
applied psychology, 1993, 78(1), 98.