BookPDF Available

Experimental Design for Health Sciences: Questionnaire Development & Validation for Health Science Studies

November 2020

November 2020

Publisher: IPS USM

Authors:

Universiti Sains Malaysia

Questionnaires are commonly and widely used for survey-based research in medical, social, economic, psychological, and behavioural research. Due to questionnaires are important tools that determine the scientific merit of questionnaire-based research, this module describes the step-by-step approach to systematically develop and validate questionnaires for research. The author outlines the key steps for developing and validating questionnaires based on the best practice and the author’s research experience in this area.

The basic structure of factors and observed variables (items)

…

Figure FVI 1: An example of instruction and rating scale in the response process validation form to the raters, in this case students.

…

Figure EFA 1: Getting into EFA

…

Figure EFA 8: The Factor Analysis Rotation menu

…

Figure EFA 11: The Descriptive Statistics output

…

Figures - uploaded by Muhamad Saiful Bahri Yusoff

Content may be subject to copyright.

Content uploaded by Muhamad Saiful Bahri Yusoff

Content may be subject to copyright.

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Experimental Design for Health Sciences: Questionnaire

Development & Validation for Health Science Studies

ASSOCIATE PROFESSOR DR MUHAMAD SAIFUL BAHRI BIN YUSOFF

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 2

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

TABLE OF CONTENT

Page

OBJECTIVES OF THE COURSE 3

LEARNING OUTCOMES 3

SYNOPSIS 3

VALIDITY CONCEPT 4

QUESTIONNAIRE DEVELOPMENT

Goal Setting 08

Defining Factors 09

Generating and Writing Items 11

Selecting Response Format 12

QUESTIONNAIRE VALIDATION

Parameters for Selecting Items 15

Content Validity Index 16

Face Validity Index 22

Exploratory Factor Analysis 28

Reliability Analysis 39

Reporting EFA & Reliability Analysis 44

CONCLUSION 46

REFERENCE 47

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 3

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

OBJECTIVES OF THE COURSE

At the end of this course, learners are expected to:

• Describe the concepts of validity

• Describe the sources of evidence to support validity

• Describe the key steps to develop a questionnaire systematically

• Develop a questionnaire for research purposes

• Perform content and face validation systematically

• Calculate content and face validity index

• Perform exploratory factor analysis

• Perform reliability analysis

• Report the results of exploratory factor analysis and reliability analysis

LEARNING OUTCOMES

The learners should be abole to develop and validate questionnaires for research purposes through a

systematic process and based on the best practice.

SYNOPSIS

Questionnaires are commonly and widely used for survey-based research in medical, social, economic,

psychological, and behavioural research. Due to questionnaires are important tools that determine the

scientific merit of questionnaire-based research, this module describes the step-by-step approach to

systematically develop and validate questionnaires for research. The author outlines the key steps for

developing and validating questionnaires based on the best practice and the author’s research

experience in this area.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 4

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

VALIDITY CONCEPT

Validity refers to the degree to which evidence and theory support the interpretations of test scores

entailed by the proposed uses of tests. In other words, validity describes how well one can legitimately

trust the results of a test as interpreted for a specific purpose. There are five sources of validity evidence

to support the construct validity: content (do instrument items completely represent the construct?),

response process (the relationship between the intended construct and the thought processes of subjects

or observers), internal structure (acceptable reliability and factor structure), relations to other variables

(correlation with scores from another instrument assessing the same construct), and consequences (do

scores really make a difference?)1. These are not different types of validity but rather they are categories

of evidence that can be collected to support the construct validity of inferences made from instrument

scores.

Content validity is defined as the degree to which elements of an assessment instrument are relevant to

and representative of the targeted construct for a particular assessment purpose1,2. The assessment

purpose refers to the expected functions of the measurement tool, for examples, the Medical Student

Stressor Questionnaire (MSSQ) was developed to identify the sources of stress in medical students3 and

the Anatomy Education Environment Measurement Inventory (AEEMI) was developed to measure the

anatomy educational environment in medical schools4. The relevance of an assessment tool refers to the

appropriateness of its elements for the targeted constructs and functions of assessment, while the

representativeness of an assessment tool refers to the degree to which its elements proportional to the

facets of the targeted construct2. Despite the two aspects of content validity (i.e., relevant and

representativeness of an assessment tool), the relevance of an assessment tool has been frequently

used to measure content validity5-7. It is important to note that establishing content validity is vital to

support the validity of an assessment tool such as questionnaires, especially for research purposes.

Haynes et al (1995) emphasized, “Inferences from assessment instruments with unsatisfactory content

validity will be suspect, even when other indices of validity are satisfactory.” The content validity evidence

can be represented by the content validity index (CVI)5-8, for instances, several recent studies4,9-11

established the content validity using CVI to support the validity of an assessment tool.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 5

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

In 1947, Mosier analysed various definitions of face validity concept13. Commonly, response process

validity evidence is performed after content validity has been established14,15 and response process

validity is also known as the face validity that refers to the degree to which test respondents view the

content of a test and its items as relevant to the context in which the test is being administered16. Similarly,

other researchers define face validity as the degree of raters judge the items of an assessment

instrument are appropriate to the targeted construct and assessment objectives17,18. The raters’

understanding and interpretation about the items will determine the accuracy of an assessment tool to

measure the targeted construct. People with similar background rate test face validity similarly, and they

rate the face validity of different tests differently18. Due to so much concern about the face validity

concept, Cook & Beck (2006) has avoided to use the face validity term, instead the researchers use the

response process evidence of validity as the term to reflect the thought processes of users of the tested

assessment as they respond to the tool1,4 and it can be quantified by face validity index (FVI)4,9-11. These

are commonly evaluated in the form of clarity and comprehensibility of instructions and language used in

the assessment tool by the raters1,4. The clarity of instructions and language refers to whether there were

ambiguities or multiple ways to interpret the items, whereas the comprehensibility of instructions and

language refers to whether words and sentences of the constructed items can be understood easily by

raters. It is important to establish response process validity to support the overall validity of an

assessment tool such as questionnaires, especially for research purpose. The response process validity

can be represented by FVI and several studies4,9-11 had calculated it to support the validity of an

assessment tool.

Internal Structure (acceptable reliability and factor structure). Reliability and factor analysis data are

generally considered evidence of internal structure. Scores intended to measure a single construct should

yield homogenous results, whereas scores intended to measure multiple constructs should demonstrate

heterogeneous responses in a pattern predicted by the constructs. Reliability refers to the reproducibility

or consistency of scores from one assessment to another. Reliability is a necessary, but not sufficient,

component of validity. Reproducibility over time (test-retest), between different versions of an instrument

(parallel forms), and between raters (inter-rater) are other measures of reliability. Reliability is usually

reported as a coefficient ranging from 0 to 1. A value of 0 represents no correlation (all error), whereas 1

represents

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 6

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

perfect correlation (all variance attributable to subjects). Acceptable values will vary according to the

purpose of the instrument. For high-stakes settings (eg, licensure examination) reliability should be

greater than 0.9, whereas for less important situations values of 0.8 or 0.7 may be acceptable. Note that

the interpretation of reliability coefficients is different than the interpretation of correlation coefficients in

other applications, where a value of 0.6 would often be considered quite high. Low reliability can be

improved by increasing the number of items or observers and (in education settings) using items of

medium difficulty. Factor analysis is used to investigate relationships between items in an instrument and

the constructs they are intended to measure. Factor analysis can determine whether the items intended

to measure a given construct actually “cluster” together into “factors” as expected. Items that “load” on

more than one factor, or on unexpected factors, may not be measuring their intended constructs.

Relations to other variables (correlation with scores from another instrument assessing the same

construct). Correlation with scores from another instrument or outcome for which correlation would be

expected, or lack of correlation where it would not, supports interpretation consistent with the underlying

construct. For example, correlation between scores from a questionnaire designed to assess the severity

of benign prostatic hypertrophy and the incidence of acute urinary retention would support the validity of

the intended inferences. For a quality of life assessment, score differences among patients with varying

health states would support validity.

Consequences (do scores really make a difference?). Evaluating intended or unintended consequences

of an assessment can reveal previously unnoticed sources of invalidity. For example, if a teaching

assessment shows that male instructors are consistently rated lower than females it could represent a

source of unexpected bias. It could also mean that males are less effective teachers. Evidence of

consequences thus requires a link relating the observations back to the original construct before it can

truly be said to influence the validity of inferences. Another way to assess evidence of consequences is

to explore whether desired results have been achieved and unintended effects avoided. Finally, the

method used to determine score thresholds (eg, pass/fail cut scores or classification of symptom severity

as low, moderate, or high) also falls under this category.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 7

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure 1: The five sources of evidence to support the construct validity

Finally, when developing questionnaires, careful attention should be given to each category of validity

evidence in turn as illustrated in Figure 1.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 8

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

QUESTIONNAIRE DEVELOPMENT

Goal Setting

The first step is to set clear aims and goals for developing a questionnaire19. The following are several

questions to help researchers, without clear answers to them your measure may not be useful:

• What precisely will this questionnaire measure? For example, Medical Student Stressor

Questionnaire (MSSQ) was developed to measure sources of stress among medical students3

and Anatomy Education Environment Measurement Inventory (AEEMI) was developed to

measure learning experience that influence medical students’ motivation to learn anatomy

subject, thus affecting their attitudes, values and behaviours towards anatomy-related learning

tasks4. It is essential to have a clear end in mind about the attributes (i.e., concepts,

characteristic or features of someone or something) to be measured before developing a

questionnaire.

• Who is the intended target group? Knowing the exact target group (i.e., the respondents)

who will be responding to the questionnaire is important to ensure its validity, for example, the

intended target group for MSSQ and AEEMI was medical students.

• Why it needs to be developed? Defining a clear reason for developing a questionnaire for

research is critical to ensure its validity. For instance, many instruments are measuring stress

level, but none was developed to measure the sources of stress among medical students,

therefore it is important to develop an instrument (MSSQ) that specifically measuring the

sources of stress among medical students. It is also important to ensure researchers do not

reinvent the wheel if a similar tool was developed and validated by other researchers, otherwise

it will be a waste of time and resources.

• How it will contribute to the practices in the field? Stating clearly the expected contributions

of developing a questionnaire is important to ensure its relevance to the current practices in the

field and it will not become a “reinvention of the wheel”. For instance, the contributions of MSSQ

development was to become a universal tool to identify sources of stress among medical

students and encourage medical educators around the globe to evaluate the potential sources

of stress among their students, hence early interventions could be planned to alleviate the

stressor.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 9

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

DEFINING FACTORS

Figure 2: The basic structure of factors and observed variables (items)

After clearly defining the purpose of questionnaires to be developed, it is essential to define the factors

to be measured by questionnaires19. Figure 2 shows the basic structure of factors and observed variables.

Factors are conceptualized as constructs, attributes or domains to be measured and the observed or

manifest variables are items that measure the factors. Having a good understanding of relevant theories

and literature can help us to come out with a suitable definition for each factor. Having clear description

of a factor is important because it will help to generate items of the factor. The following are recommended

strategies:

• Conduct a literature review – to have a sound basic understanding of the attribute and other

research involving it; to identify other existing measures; to consider what kind of items is

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 10

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

needed, how the questionnaire might look like it and how it differs from the existing

questionnaires.

• Conduct interviews and/or focus groups - to learn how the population of interest

conceptualizes and describes the attributes of interest. Both strategies ensure the

conceptualization of the attributes make theoretical sense to scholars in the field and uses

language that the intended population understands. Failure to clarify exactly the attributes to

be measured could mean that we end up with an assessment which is not coherent and invalid.

The following Table 1 shows an example of detailed descriptions of the factors measured by

GSQ20, and in this example, a stressor is defined as a personal or environmental event that

causes stress:

Table 1: Identified GSQ factors and the description of each factor.

Identified factors

Description

Family

Events occur in family that can lead to person's emotional disturbances such

as poor

relationship with spouse, poor support from family members and having lack

of skill in managing family.

Relationship with

superior

Interpersonal relationship events that can cause distress feelings to a person

such as lack of support from superior and unfair assessment from

supervisors.

Bureaucratic

constraints

Organizational working environment that can cause distress feeling to a

person such as lack of support from authority, having to do task out of ability,

and lack of opportunity in decision making.

Work-family conflicts

Work events that compromise a person’s personal and home life that lead to

distress feelings such as life is too centered on working, advancing career at

the expense of personal or home life and work demands affect personal life.

Relationship with

colleagues

Interpersonal relationship events that can cause distress feelings to a person

such as lack of support from uncooperative and incompetence colleagues.

Performance

pressure

Work demands that cause emotional disturbances to a person such as work

overload, short duration given to complete tasks and doing high risk task

where any mistake can lead to disastrous consequences.

Job prospect

Events related to reward and recognition given to an individual that cause

distress feelings such as lack of promotion prospect, feeling of being

underpaid, and lack of recognition to the job.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 11

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Generating and Writing Items

This is the critical step in the development of questionnaire, to write items and to consider the most

appropriate response format, resulting in what is commonly called an answer sheet19. Each item should

be written clearly and simply, avoiding double negatives and being as short as possible. To reduce

response bias, where someone tends to give the same answer to every item, reverse-phrase some of

them (i.e., negative items) and do not forget to reverse score the items afterwards before any analysis.

Layout of items should be simple and straightforward and enable respondents to easily connect

responses with the different options. This is important to ensure the response process validity. The

following are several areas need to be considered to generate items:

i. Test content – a) use a grid-style blueprint for determining content areas and how these are

potentially manifest by people, b) get a small group to brainstorm a list of as many facets of an

attribute as possible or c) include people who might be at the extremes of the attribute so that

you can identify item content which reflects the entire spectrum.

ii. Target population – it should be defined clearly.

iii. The kinds of items needed and their number - we need to reflect all relevant aspects of the

attribute and for a relatively simple measure, it would be wise to aim for at least 10 items per

attribute at the development stage.

iv. Administration instructions - it is important to be clearly developed, especially for a self-

reporting questionnaire. Clear instructions ensure respondents understand what to do and how

to rate the questionnaire, hence it is strengthening the response process validity.

v. Estimate the time limits, or the time required for completion – this depends on the kind of

measure, and it is wise to develop a questionnaire that requires less time required for completion.

If possible, try to make it less than 15 minutes! The shorter time required for completion, the

better the response rate.

vi. How scores should be calculated and interpreted - the simplest process might be to sum

responses, though if you write some items which are negatively phrased compared to others you

will need to reverse their scores before totalling. Providing clear interpretation of the scores is

vital to ensure the consistency of its meaning across researches.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 12

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

The following Table 2 shows a sample of items potentially representing each factor of GSQ20 based on

the literature review.

Table 2: The GSQ factors and its potential items.

Identified factors

Potential items

Family

- Inadequate preparation for dealing with more difficult aspects of family

matters

- Insufficient knowledge in educating and building child/children characters

- Poor communication and relationship with family members

- Poor relationship with spouse

Relationship with

superiors

- Lack of support from superior

- Difficulty in maintaining relationship with superior

- My beliefs contradict with those of my superior

- Unfair assessment from superior

Bureaucratic

constraints

- Lack of authority to carry out my job duties

- Unable to make full use of my skills and ability

- Cannot participate in decision making

- Having to do work outside of my competence

Work-family

conflicts

- Work demands affect my personal/home life

- Advancing a career at the expense of home/personal life

- My life is too centered on my work

- Absence of emotional support from family

Relationship with

colleagues

- Working with uncooperative colleagues

- Working with incompetence colleagues

- Relationship problems with colleagues/subordinates

- Competition among colleagues

Performance

pressure

- Time pressures deadlines to meet

- Work overload

- Fear of making mistakes that can lead to serious consequences

- My work is mentally straining

Job prospect

- Feeling insecure in this job

- Society does not think highly of my profession

- Lack of promotion prospects

- Feeling of being underpaid

Selecting Response Format

Response format should be selected based on the nature of questionnaire1, for instance, GSQ uses the

rating scale based on the following Likert-scale: 0 = causing no stress at all, 1 = causing mild stress, 2 =

causing moderate stress, 3 = causing high stress, 4 = causing severe stress4. The selection of response

format due to the nature of GSQ assessing stress level caused by each potential item, which is stressor.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 13

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

The following are examples of response format that are commonly used in questionnaire21; however,

researchers can choose other response format to fit with the nature of items:

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 14

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

For choosing the Likert-type scale response anchors, we recommend for researchers to refer an article

written by Vagias (2006)21.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 15

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

QUESTIONNAIRE VALIDATION

Parameters for Selecting Items

The following Table 3 summarizes the parameters to be considered for selecting items .

Table 3: The summary of parameters for selecting items.

Parameters

Descriptions

Content validity index

Content validation should be carried out to assess how represent and

relevant the items are with respect to the construct of interest15.

Face validity index

Response process validation should be carried out to assess

respondents understanding and interpretation of items in the manner

that survey designer intends14,22.

Item percentage of

response

Descriptive statistics should be carried out on each item to assess

minimum and maximum rating – this is obtained to ensure all range of

the rating scale is utilized (e.g. 1 to 5).

Floor and ceiling effects

Descriptive statistics should be carried out on each item to assess the

floor and ceiling effects, when more than 15% of responses are at the

lowest or the highest ends of the scale.

Mean (standard deviation)

or Median (interquartile

range)

Descriptive statistics should be carried out on each item to assess the

distribution of responses per item. Items with good distribution of

responses are selected.

Factorial Structure

Validation studies are carried to assess the factor structure of

questionnaire based on the following factor analysis:

i. Exploratory Factory Analysis (EFA) explores the potential factors can

be extracted from the validation data.

ii. Confirmatory Factor Analysis (CFA) tests the latent factors based on

the validation data.

Internal Consistency

Reliability analysis will be performed to assess the internal consistency,

commonly presented by Cronbach’s alpha coefficients

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 16

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

The next sections will elaborate and demonstrate the processes of content validation, response process

validation, EFA, and reliability analysis.

Content Validity Index

Content validity is defined as the degree to which elements of an assessment instrument are relevant to

and representative of the targeted construct for a particular assessment purpose1,2. The relevance of an

assessment tool refers to the appropriateness of its elements for the targeted constructs and functions

of assessment, while the representativeness of an assessment tool refers to the degree to which its

elements proportional to the facets of the targeted construct2. Despite the two aspects of content validity

(i.e., relevant and representativeness of an assessment tool), the relevant of an assessment tool has

been frequently used to measure the content validity5-7. It is important to note that establishing the content

validity is vital to support the validity of an assessment tool such as questionnaires, especially for research

purpose. The content validity evidence can be represented by the content validity index (CVI)5-8. This

section describes the best practice to quantify content validity using CVI based on the following are the

six steps of content validation15:

1. Preparing content validation form

2. Selecting a review panel of experts

3. Conducting content validation

4. Reviewing domains and items

5. Providing score on each item

6. Calculating CVI

1.0 Preparing content validation form

The first step of content validation is to prepare the content validation form to ensure the review panel of

experts will have clear expectation and understanding about the task. An example for the instruction and

rating scale is provided in Figure CVI 1. The recommended rating scale of relevance has been used for

scoring individual items (Figure CVI 2). It is recommended to provide the definition of domain to facilitate

the scoring process by the experts – please refer to Figure CVI 2 for an example.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 17

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure CVI 1: An example of instruction and rating scale in the content validation form to the experts

Figure CVI 2: An example of layout for content validation form with domain, its definition and items

represent (measure) the domain.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 18

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

2.0 Selecting a Review Panel of Experts

The selection of individual to review and critique an assessment tool (e.g., questionnaire) is usually based

on the individual expertise with the topic to be studied. The following table summarizes the recommended

number of experts with its implication on the acceptable cut-off score of CVI (Table 4).

Table 4: The number of experts and its implication on the acceptable cut-off score of CVI

Number of experts

Acceptable CVI values

Source of recommendation

Two experts

At least 0.80

Davis (1992)

Three to five experts

Should be 1

Polit & Beck (2006), Polit et al., (2007)

At least six experts

At least 0.83

Polit & Beck (2006), Polit et al., (2007)

Six to eight experts

At least 0.83

Lynn (1986)

At least nine experts

At least 0.78

Lynn (1986)

It can be concurred that for content validation, the minimum acceptable expert number is two, however

most of recommendations propose a minimum of six experts. Considering the recommendations and the

author’s experience, the number of experts for content validation should be at least six and does not

exceed 10.

3.0 Conducting Content Validation

The content validation can be conducted through the face-to-face or non-face-to-face approach. For the

face-to-face approach, an expert panel meeting is organised, and the researcher facilitates the content

validation process through Step 4.0 to Step 5.0 (will be described later). For the non-face-to-face

approach, usually an online content validation form is sent to the experts and clear instructions are

provided (Figure 1) to facilitate the content validation process. The most important factors need to be

considered are cost, time and response rate. The cost and time might be the challenging factor to conduct

the face-to-face approach because of difficulty to get all experts be together, but the response rate will

be at the highest. The response rate and time might be the challenging factor for the non-face-to-face

approach because of difficulty to get response on time and at risk of not getting response at all from the

expert, however the cost saving is the biggest advantage. Nevertheless, based on the author’s

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 19

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

experience, the non-face-to-face approach is very efficient if a systematic follow-up is in place to improve

the response rate and time.

4.0 Reviewing Domain and Items

In the content validation form, the definition of domain and the items represent the domain are clearly

provided to the experts as shown in Figure CVI 2. The experts are requested to critically review the

domain and its items before providing score on each item. The experts are encouraged to provide verbal

comment or written comment to improve the relevance of items to the targeted domain. All comments

are taken into consideration to refine the domain and its items.

5.0 Providing score on each item

Upon completion of reviewing domain and items, the experts are requested to provide score on each

item independently based on the relevant scale (Figure 1 and Figure 2). The experts are required to

submit their responses to the researcher once they have completely provided the score on all items.

6.0 Calculating CVI

Table 5: The definition and formula of I-CVI, S-CVI/Ave and S-CVI/UA

The CVI indices

Definition

Formula

I-CVI (item-level content

validity index)

The proportion of content experts giving item

a relevance rating of 3 or 4

I-CVI = (agreed

item)/(number of expert)

S-CVI/Ave (scale-level

content validity index

based on the average

method)

The average of the I-CVI scores for all items

on the scale or the average of proportion

relevance judged by all experts. The

proportion relevant is the average of

relevance rating by individual expert.

S-CVI/Ave = (sum of I-CVI

scores)/(number of item)

S-CVI/Ave = (sum of

proportion relevance

rating)/(number of expert)

S-CVI/UA (scale-level

content validity index

based on the universal

agreement method)

The proportion of items on the scale that

achieve a relevance scale of 3 or 4 by all

experts. Universal agreement (UA) score is

given as 1 when the item achieved 100%

experts in agreement, otherwise the UA score

is given as 0.

S-CVI/UA = (sum of UA

scores)/(number of item)

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 20

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

** The definition and formula was based on the recommendations by Lynn (1985), Davis (1995),

Polit & Beck (2006) and Polit et al (2007)

There are two forms of CVI, in which CVI for item (I-CVI) and CVI for scale (S-CVI). Two methods for

calculating S-CVI, in which the average of the I-CVI scores for all items on the scale (S-CVI/Ave) and

the proportion of items on the scale that achieve a relevance scale of 3 or 4 by all experts (S-CVI/UA)

(6). The definition and formula of the CVI indices are summarised in Table 5.

Table 6: The relevance ratings on the item scale by ten experts

Prior to the calculation of CVI, the relevance rating must be recoded as 1 (relevance scale of 3 or 4) or 0

(relevance scale of 1 or 2) as shown in Table 6. To illustrate the calculation of different CVI indices, the

relevance ratings on item scale by ten experts are provided in Table 3.

To illustrate the calculation for the CVI indices (please refer to Table 5), the following are examples of

calculation based on the data provided in Table 6:

• Experts in agreement: just sum up the relevant rating provided by all experts for each item, for

example, the experts in agreement for Q2 (1 + 0 + 1 +1 + 1 + 1 + 1+ 1 + 1 + 1) = 9

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 21

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

• Universal agreement: score ‘1’ is assigned to the item that achieved 100% experts in agreement,

for examples, Q1 obtained 1 because all the experts provided relevance rating of 1, while Q2

obtained 0 because not all the experts provided relevance rating of 1.

• I-CVI: the expert in agreement divided by the number of experts, for example I-CVI of Q2 is 9

divided by 10 experts that is equal to 0.9.

• S-CVI/Ave (based on I-CVI): the average of I-CVI scores across all items, for example the S-

CVI/Ave [(10 + 9 + 0 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)/12] is equal to 0.91.

• S-CVI/Ave (based on proportion relevance): the average of proportion relevance scores across

all experts, for example the S-CVI/Ave [(0.92 + 0.83 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 +

0.92 + 0.92)/10] is equal to 0.91.

• S-CVI/UA: the average of UA scores across all items, for example the S-CVI/UA [(1 + 0 + 0 + 1

+ 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)/12] is equal to 0.83.

Based on the above calculation, we can conclude that I-CVI, S-CVI/Ave and S-CVI/UA meet satisfactory

level, and thus the scale of questionnaire has achieved satisfactory level of content validity. For more

examples on how to report the content validity index, please refer to papers written by Hadie et al (2017)14,

Ozair et al (2018)9, Lau et al (2018)10 and Marzuki et al (2018)11.

Content validity is vital to ensure the overall validity of an assessment, therefore a systematic approach

for content validation should be done based on the evidence and best practice. This paper has provided

a systematic and evidence-based approach to conduct a proper content validation.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 22

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Face Validity Index

Commonly, response process validity evidence is performed after content validity has been established13-

15 and response process validity is also known as the face validity that refers to the degree to which test

respondents view the content of a test and its items as relevant to the context in which the test is being

administered16. Similarly, other researchers define face validity as the degree of raters judge the items of

an assessment instrument are appropriate to the targeted construct and assessment objectives17,18. The

raters of face validity include i) the person who actually take the test, ii) the nonprofessional users who

work with the results of the test, and iii) the general public18. In other words, the people who are involved

with the test taking should be asked to do the rating, in which they cannot be replaced by professional,

experts or pyschometricians18. The raters’ understanding and interpretation about the items will

determine the accuracy of an assessment tool to measure the targeted construct. People with similar

background rate test face validity similarly, and they rate the face validity of different tests differently18.

Due to so much concern about the face validity concept, Cook & Beck (2006) has avoided to use the face

validity term7, instead the researchers use the response process evidence of validity as the term to reflect

the thought processes of users of the tested assessment as they respond to the tool1,4 and it can be

quantified by face validity index (FVI)4, 9-11. These are commonly evaluated in the form of clarity and

comprehensibility of instructions and language used in the assessment tool by the raters1,4. The clarity of

instructions and language refers to whether there were ambiguities or multiple ways to interpret the items,

whereas the comprehensibility of instructions and language refers to whether words and sentences of

the constructed items can be understood easily by raters. It is important to establish response process

validity to support the overall validity of an assessment tool such as questionnaires, especially for

research purpose. The response process validity can be represented by FVI and several studies4,9-11 had

calculated it to support the validity of an assessment tool. This section describes the best practice to

perform response process validation and calculating FVI based on the following are the six steps of

response process validation:

1. Preparing response process validation form

2. Selecting a panel of raters

3. Conducting response process validation

4. Reviewing items for clarity and comprehension

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 23

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

5. Providing score on each item based on the clarity and comprehensibility rating scale

6. Calculating FVI

1.0 Preparing response process validation form

The first step of response process validation (can be also known as face validation) is to prepare the

response process validation form to ensure the panel of raters who are the intended respondents will

have clear expectation and understanding about the task. An example for the instruction and rating scale

is provided in Figure FVI 1. The rating scales of clarity and comprehension have been used for scoring

individual items5-8 (Figure FVI 2).

Figure FVI 1: An example of instruction and rating scale in the response process validation form to the

raters, in this case students.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 24

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure FVI 2: An example of layout for response process validation form with domain and items.

2.0 Selecting a panel of raters

The selection of raters to review and critique an assessment tool (e.g., questionnaire) is usually based

on the target user of the tool, for examples students, public and teachers. The following table summarizes

the number of raters with its implication on the acceptable cut-off score of FVI (Table 7) based on the

previous studies4,9-11,22-25.

Table 7: The number of raters and its implication on the acceptable cut-off score of FVI

Source of study

Number of raters

Acceptable FVI value

Method

Hadie et al (2017)

30 medical students

At least 0.80

Face to face survey

Ozair et al (2017)

30 paramedics

At least 0.83

Face to face survey

Lau et al (2017)

30 parents of pre-school

children

At least 0.80

Face to face survey

Lau et al (2018)

30 parents of pre-school

children

At least 0.80

Face to face survey

Marzuki et al (2018)

10 users of medical apps

At least 0.83

Online survey

Chin et al (2018)

32 medical students

At least 0.80

Online survey

Mahadi et al (2018)

32 medical students

At least 0.80

Online survey

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 25

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

It can be concurred that for response process validation, the minimum acceptable number of rater is 10,

however most studies administered to raters at least 30 raters. Considering the previous studies (Table

7) and the author’s experience, the number of experts for content validation should not less than 10

raters.

3.0 Conducting Response Process Validation

The response process validation can be conducted through face-to-face or online survey (Table 7). For

the face-to-face survey, the researcher facilitates the response process validation process by holding a

meeting with the raters and followed by Step 4 and Step 5 (will be elaborated later). For the online survey,

an online response process validation form is sent to the raters and clear instructions are provided (Figure

FVI 1) to facilitate the validation process. Based on the author’s experience, the face-to-face approach is

very efficient to increase the response rate, whereas the online survey is efficient in term of cost and time.

4.0 Reviewing items for clarity and comprehension

In the response process validation form, the domain and its items are provided to the raters as shown in

Figure 2. The raters are requested to review all items before providing score on each item. The raters

are encouraged to provide verbal comment or written comment to improve the clarity and comprehension

of the items. All comments are taken into consideration to refine items.

5.0 Providing score on each item based on the clarity and comprehensibility rating scale

Upon completion of reviewing all items, the raters are requested to provide score on each item

independently based on the clarity and comprehension scale (Figure 1 and Figure 2). The raters are

required to submit their responses to the researcher once they have completely provided the score on all

items.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 26

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Step 6: Calculating FVI

Table 8: The definition and formula of I-FVI, S-FVI/Ave and S-FVI/UA

The CVI indices

Definition

Formula

I-FVI (item-level face validity

index)

The proportion of rater giving item a clarity and

comprehension rating of 3 or 4

I-FVI = (agreed item)/(number of

rater)

S-FVI/Ave (scale-level face

validity index based on the

average method)

The average of the I-FVI scores for all items on the

scale or the average of proportion clarity and

comprehension judged by all raters. The proportion

clarity and comprehension is the average of rating by

individual rater.

S-FVI/Ave = (sum of I-FVI

scores)/(number of item)

S-FVI/Ave = (sum of proportion

clarity and comprehension

rating)/(number of rater)

S-FVI/UA (scale-level face

validity index based on the

universal agreement method)

The proportion of items on the scale that achieve a

clarity and comprehension scale of 3 or 4 by all raters.

Universal agreement (UA) score is given as 1 when the

item achieved 100% raters in agreement, otherwise the

UA score is given as 0.

S-FVI/UA = (sum of UA

scores)/(number of item)

** The definition and formula was based on the content validity index formula reported in Yusoff (2019) (3)

There are two forms of FVI, in which FVI for item (I-FVI) and FVI for scale (S-FVI). Two methods for

calculating S-FVI, in which the average of the I-FVI scores for all items on the scale (S-FVI/Ave) and the

proportion of items on the scale that achieve a clarity and comprehension scale of 3 or 4 by all raters (S-

FVI/UA) (Table 8). The definition and formula of the FVI indices are summarised in Table 8.

Prior to the calculation of FVI, the clarity and comprehension rating must be recoded as 1 (the scale of 3

or 4) or 0 (the scale of 1 or 2) as shown in Table 9. To illustrate the calculation of different FVI indices,

the clarity and comprehension ratings on item scale by 10 raters are provided in Table 9.

Table 9: The clarity and comprehension ratings on the item scale by 10 raters

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 27

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

To illustrate the calculation for the FVI indices (please refer to Table 8), the following are examples of

calculation based on the data provided in Table 9:

• Raters in agreement: just sum up the relevant rating provided by all raters for each item, for

example, the raters in agreement for Q2 (1 + 0 + 1 +1 + 1 + 1 + 1+ 1 + 1 + 1) = 9

• Universal agreement: score ‘1’ is assigned to the item that achieved 100% raters in agreement,

for examples, Q1 obtained 1 because all the raters provided rating of 1, while Q2 obtained 0

because not all raters provided rating of 1.

• I-FVI: the raters in agreement divided by the number of raters, for example I-FVI of Q2 is 9 divided

by 10 raters that is equal to 0.9.

• S-FVI/Ave (based on I-FVI): the average of I-FVI scores across all items, for example the S-

FVI/Ave [(10 + 9 + 0 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)/12] is equal to 0.91.

• S-FVI/Ave (based on proportion clarity and comprehension): the average of proportion clarity

and comprehension scores across all raters, for example the S-FVI/Ave [(0.92 + 0.83 + 0.92 +

0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92 + 0.92)/10] is equal to 0.91.

• S-FVI/UA: the average of UA scores across all items, for example the S-FVI/UA [(1 + 0 + 0 + 1

+ 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)/12] is equal to 0.83.

Based on the above calculation, we can conclude that I-FVI, S-FVI/Ave and S-FVI/UA meet satisfactory

level, and thus the scale of questionnaire has achieved satisfactory level of response process validity.

For more examples on how to report the response process validity index, please refer to papers written

by Hadie et al (2017), Ozair et al (2017), Lau et al (2017), Lau et al (2018), Marzuki et al (2018), Chin et

al (2018) and Mahadi et al (2018).

Response process validity is vital to ensure the overall validity of an assessment, therefore a systematic

approach for validating response process should be done based on the best evidence. This paper has

provided a systematic and evidence-based approach to conduct a proper response process validation

through face validity index.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 28

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Exploratory Factor Analysis

The primary purpose of Exploratory Factor Analysis (EFA) is to explore the number of constructs that can

be extracted from a data set26. This section illustrates the key steps of running EFA by SPSS version 26

and the SPSS data set can be downloaded from this link https://tinyurl.com/yb9wup7e. The data set must

be checked and cleaned from incomplete data before running EFA.

1.0 Running EFA

Figure EFA 1: Getting into EFA

Open the data set for EFA, click on ‘Analyze’ tab, browse on ‘Dimension Reduction’, and click ‘Factor’

menu (Figure EFA 1). The following display will appear on the screen (Figure EFA 2).

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 29

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure EFA 2: The main dialog box of Factor Analysis

Figure EFA 3: Selection of all items into the ‘Variables’ box

Select all items into the ‘Variables’ box and click on the ‘Descriptives’ menu (Figure EFA 3). The

following display will appear (Figure EFA 4). Tick on ‘Univariate descriptives’, ‘Initial solution’, and

‘KMO and Bartlett’s test of sphericity’ options as shown in Figure EFA 4. Click on the ‘Continue’ menu

to proceed to the next step.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 30

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure EFA 4: The factor analysis descriptives screen

Figure EFA 5: The main EFA displays and the ‘Extraction’ menu

Click on the ‘Extraction’ menu (Figure EFA 5). The following display will appear (Figure EFA 6). Tick on

‘Unrotated factor solution’, and ‘Scree plot’ options as shown in Figure EFA 6.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 31

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure EFA 6: The factor analysis extraction display

In the ‘Extract’ column, researchers can choose either to let EFA suggesting the number of factors can

be extracted based on the Eigenvalues (by default is 1) or researchers can specify the intended number

of factors by specifying the exact number of factor to be extracted in the ‘Factors to extract’ box (Figure

EFA 6). Click on the ‘Continue’ menu to proceed to the next step.

Figure EFA 7: The main EFA displays and the ‘Rotation’ menu

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 32

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Click on the ‘Rotation’ menu (Figure EFA 7), the following display will appear as shown in Figure EFA 8.

The purpose of the rotation is to optimize the factor loading of each item on all other factors. There two

types of rotations, orthogonal and oblique. The orthogonal rotations assume all factors are independent

of each other (not overlapping), while the oblique rotations assume all factors are dependent on each

other (overlapping). Researchers are recommended to choose the rotation based on their assumption of

the factors to be extracted. In this example, the ‘Maximum Iterations for Convergence’ value was set at

30, in which by default SPSS set it at 25. The higher value due to the large sample size (more than 2500),

thus requires more iteration to optimize the factor loadings across the extracted factors. Click on the

‘Continue’ menu to proceed to the next step.

Figure EFA 8: The Factor Analysis Rotation menu

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 33

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure EFA 9: The main EFA displays and the ‘Options’ menu

Figure EFA 10: The Factor Analysis Options menu

Click on the ‘Options’ menu (Figure EFA 9), the display will appear as shown in Figure EFA 10. In the

‘Coefficient Display Format’ column, tick on ‘Sorted by size’ and ‘Suppress small coefficients’ options

(Figure EFA 10). Insert a value of 0.40 in the ‘Absolute value below’ box and click on the ‘Continue’

menu to proceed to the next step.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 34

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

2.0 Interpretation of EFA Outputs

Several outputs appear after running EFA. Descriptive statistics provide information on several

parameters including mean, standard deviation, and the number of responses (Figure EFA11.0). This

output provides an idea about the extent of data completeness. Click on the ‘KMO and Bartlett's Test’

bar for the next output (Figure EFA12.0).

Figure EFA 11: The Descriptive Statistics output

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 35

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure EFA 12: The KMO and Bartlett’s Test output

The KMO and Bartlett’s Test output (Figure EFA 12) provides parameters to indicate the sampling

adequacy and appropriateness of factor analysis. The ‘Kaiser-Meyer-Olkin Measure of Sampling

Adequacy’ value of more than 0.7 is considered as a good level of factor distinction. The significant

‘Bartlett’s Test of Sphericity’ (p-value less than 0.05) indicates the factor analysis is appropriate. If the

value of ‘Kaiser-Meyer-Olkin Measure of Sampling Adequacy’ is less than 0.5, researchers should

consider collecting more samples.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 36

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure EFA13.0: The Communalities output

The communality (extraction) is the squared root factor loadings, representing the proportion of variance

in the observed variable that is explained by the factor. It reflects the contribution of the item to the

extracted factor. The communality (extraction) should be at least 0.70 and the average value of

communalities should be at least 0.6026. In this example, only one item has more than 0.70 and the

average of communality values is less than 0.60. However, considering the large sample, it is probably

safe to assume Kaiser’s criterion.

Figure EFA 14: The Total Variance Explained output

The ‘Total Variance Explained’ output (Figure EFA 14) proposes the number of factors that can be

extracted based on the Eigenvalues (in this example, the value was set at 1 and higher). The ‘Initial

Eigenvalues’ column provides the total number of factors potentially can be formed regardless of the

Eigenvalues value (before extraction). The ‘Extraction Sum of Squared Loadings’ provides the number

of factors that can be extracted based on the Eigenvalues of 1 (after extraction) and it shows that the four

factors have Eigenvalues of at least 1. The third column, ‘Rotation Sums of Squared Loading’, provides

the results after Varimax rotation. The total Eigenvalues after the Varimax rotation have been optimally

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 37

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

distributed across the four factors. This information is important for researchers to decide whether to

choose the ‘Component Matrix’ (Figure EFA 15) or the ‘Rotated Component Matrix’ (Figure EFA 16).

In this case, the Eigenvalues have been optimally distributed across the four factors, the ‘Rotated

Component Matrix’ will be chosen as the final factorial structure. The detailed output of the ‘Rotated

Component Matrix’ is displayed in Figure EFA 17.

Figure EFA 15: The Component Matrix output

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 38

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure EFA 16: The Rotated Component Matrix output

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 39

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure EFA 17: The detail output of the Rotated Component Matrix

The factor loadings of more than 0.40 are displayed in the respective factor (component) as shown in

Figure EFA 17. The analysis proposed four factors are representing the questionnaire.

Rotated Component Matrixa

Component

q06 I have little experience of computers

.800

q18 SPSS always crashes when I try to use it

.684

q13 I worry that I will cause irreparable damage

because of my incompetenece with computers

.647

q07 All computers hate me

.638

q14 Computers have minds of their own and

deliberately go wrong whenever I use them

.579

q10 Computers are useful only for playing games

.550

q15 Computers are out to get me

.459

q20 I can't sleep for thoughts of eigen vectors

.677

q21 I wake up under my duvet thinking that I am

trapped under a normal distribtion

.661

q03 Standard deviations excite me

.567

q12 People try to tell you that SPSS makes statistics

easier to understand but it doesn't

.473

.523

q04 I dream that Pearson is attacking me with

correlation coefficients

.516

q16 I weep openly at the mention of central tendency

.514

q01 Statiscs makes me cry

.496

q05 I don't understand statistics

.429

q08 I have never been good at mathematics

.833

q17 I slip into a coma whenever I see an equation

.747

q11 I did badly at mathematics at school

.747

q09 My friends are better at statistics than me

.648

q22 My friends are better at SPSS than I am

.645

q23 If I'm good at statistics my friends will think I'm a

nerd

.586

q02 My friends will think I'm stupid for not being able

to cope with SPSS

.543

q19 Everybody looks at me when I use SPSS

.428

Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser Normalization.a

a. Rotation converged in 8 iterations.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 40

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Reliability Analysis

Once the items have been identified based on the factor, the reliability analysis will be performed to

determine the internal consistency of each factor. This chapter illustrates the key steps of reliability

analysis using SPSS version 26.

1.0 Running Reliability Analysis

Figure RA 1: The reliability analysis menu

Open the data set for EFA, click on ‘Analyze’ tab, scroll over the ‘Score’, and click ‘Reliability Analysis’

menu (Figure RA 1). The following display will appear on the screen (Figure RA 2).

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 41

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure RA 2: The reliability analysis interface

Select the items according to the specific factor (in this case, all items of factor 1) and move them to the

‘Items’ box as shown in Figure RA 3. The Cronbach alpha is the commonest coefficient being used to

determine the internal consistency level. Click on the ‘Statistics’ menu for the next step.

Figure RA 3: The transfer of select item for a specific factor to the ‘Items’ box

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 42

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Figure RA 4: The Reliability Analysis Statistic display

Tick on ‘Item’, ‘Scale’, and ‘Scale if item deleted’ options as shown in Figure RA 4. Then click on

‘Continue’ and ‘OK’ bar.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 43

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

2.0 Interpretation of Reliability Analysis Outputs

The following is the reliability analysis output interface (Figure RA 5). It shows the total number of valid

cases and the Cronbach alpha value for the six items of factor 1, in this case, it was 0.814.

Figure RA 5: The reliability output interface

Figure RA 6: The Item-Total Statistics output

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 44

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

The ‘Item-Total Statistics’ provides parameters that suggest the contribution of each item to the factor.

The ‘Corrected Item-Total Correlation’ values should be at least 0.30 to indicate an acceptable level of

contribution to the factor. The ‘Cronbach’s Alpha if Item Deleted’value is another parameter to indicate

the contribution of items to the factor. If the ‘Cronbach’s Alpha if Item Deleted’ value is lower than the

Cronbach alpha (in this case 0.814), it indicates the item has a high contribution to the reliability of the

factor, thus should be retained. If the value is higher, it indicates the item has less contributed to the

reliability, thus can be removed. Repeat the above steps for the remaining factor. Generally, the internal

consistency of above 0.70 or greater is preferred27.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 45

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

Reporting EFA and Reliability Analysis Results

Once EFA and reliability analysis completely performed, the following is the recommended way to

report the result:

Table 10: Summary of EFA results for the questionnaire (N = 2571)

Items

Factor 1

Factor 2

Factor 3

Factor 4

Q06 I have little experience of computers

0.800

Q18 SPSS always crashes when I try to use it

0.684

Q13 I worry that I will cause irreparable damage because of

my incompetenece with computers

0.647

Q07 All computers hate me

0.638

Q14 Computers have minds of their own and deliberately go

wrong whenever I use them

0.579

Q10 Computers are useful only for playing games

0.550

Q15 Computers are out to get me

0.459

Q20 I can't sleep for thoughts of eigen vectors

0.677

Q21 I wake up under my duvet thinking that I am trapped

under a normal distribtion

0.661

Q03 Standard deviations excite me

0.567

Q12 People try to tell you that SPSS makes statistics easier to

understand but it doesn't

0.523

Q04 I dream that Pearson is attacking me with correlation

coefficients

0.516

Q16 I weep openly at the mention of central tendency

0.514

Q01 Statiscs makes me cry

0.496

Q05 I don't understand statistics

0.429

Q08 I have never been good at mathematics

0.833

Q17 I slip into a coma whenever I see an equation

0.747

Q11 I did badly at mathematics at school

0.747

Q09 My friends are better at statistics than me

0.648

Q22 My friends are better at SPSS than I am

0.645

Q23 If I'm good at statistics my friends will think I'm a nerd

0.586

Q02 My friends will think I'm stupid for not being able to cope

with SPSS

0.543

Q19 Everybody looks at me when I use SPSS

0.428

Eigenvalues

3.730

3.340

2.553

1.850

Percentage of Variance

16.21

14.52

11.09

8.47

Cronbach’s Alpha

0.81

0.82

0.57

For this example we might write something like this:

A principal component analysis (PCA) was conducted on the 23 items with orthogonal rotation

(varimax). The Kaiser–Meyer–Olkin measure verified the sampling adequacy for the analysis,

KMO = .93, which is well above the acceptable limit of 0.5. Bartlett’s test of sphericity ꭓ² (253) =

19334.49, p < 0.001, indicated that correlations between items were sufficiently large for PCA. An

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 46

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

initial analysis was run to obtain eigenvalues for each component in the data. Four components

had eigenvalues over Kaiser’s criterion of 1 and in combination explained 50.32% of the variance.

Given the large sample size, Kaiser’s criterion on four components, this is the number of

components that were retained in the final analysis. Table 1 shows the factor loadings after

rotation. The items that cluster on the same components suggest that Factor 1 represents a fear

of computers, Factor 2 represents a fear of statistics, Factor 3 represents a fear of maths, and

Factor 4 represents peer evaluation concerns.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 47

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

CONCLUSION

It is hoped that this short module is useful and helpful for anyone to develop and validate any

questionnaire. This module does not cover confirmatory factor analysis (CFA), but CFA can always be

learned from other modules that are available in the market.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 48

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

REFERENCE

1. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments:

theory and application. The American journal of medicine. 2006;119(2):166. e7-. e16.

2. Haynes SN, Richard D, Kubany ES. Content validity in psychological assessment: A functional

approach to concepts and methods. Psychological assessment. 1995;7(3):238.

3. Yusoff MSB. A systematic review on validity evidence of medical student stressor questionnaire.

Education in Medicine Journal. 2017;9(1):1-16.

4. Hadie SNH, Hassan A, Ismail ZIM, Asari MA, Khan AA, Kasim F, et al. Anatomy education

environment measurement inventory: A valid tool to measure the anatomy learning environment.

Anatomical sciences education. 2017;10(5):423-32.

5. Davis LL. Instrument review: Getting the most from a panel of experts. Applied nursing research.

1992;5(4):194-7.

6. Polit DF, Beck CT. The content validity index: are you sure you know what's being reported? Critique

and recommendations. Research in nursing & health. 2006;29(5):489-97.

7. Polit DF, Beck CT, Owen SV. Is the CVI an acceptable indicator of content validity? Appraisal and

recommendations. Research in nursing & health. 2007;30(4):459-67.

8. Lynn MR. Determination and quantification of content validity. Nursing research. 1986;35(6):381-5.

9. Ozair MM, Baharuddin KA, Mohamed SA, Esa W, Yusoff MSB. Development and Validation of the

Knowledge and Clinical Reasoning of Acute Asthma Management in Emergency Department (K-

CRAMED). Education in Medicine Journal. 2017;9(2):1-17.

10. Lau AS, Yusoff MS, Lee Y-Y, Choi S-B, Xiao J-Z, Liong M-T. Development and validation of a

Chinese translated questionnaire: A single simultaneous tool for assessing gastrointestinal and upper

respiratory tract related illnesses in pre-school children. Journal of Taibah University medical

sciences. 2018;13(2):135-41.

11. Marzuki MFM, Yaacob NA, Yaacob NM. Translation, Cross-Cultural Adaptation, and Validation of

the Malay Version of the System Usability Scale Questionnaire for the Assessment of Mobile Apps.

JMIR human factors. 2018;5(2):e10308.

12. American Educational Research Association, American Psychological Association, National Council

on Measurement in Education. Standards for Educational and Psychological Testing. Washington,

DC: American Educational Research Association; 1999.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 49

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

13. Mosier CI. A critical examination of the concepts of face validity. Educational and Psychological

Measurement. 1947;7(2):191-205.

14. Anthony R. Artino J, Rochelle JSL, Dezee KJ, Gehlbach H. Developing questionnaires for

educational research: AMEE Guide No. 87. Medical Teacher. 2010;36:463–74.

15. Yusoff MSB. ABC of content validation and content validity index calculation. Education in Medicine

Journal. 2019;11(2):49-54.

16. Holden RR. Face validity. The corsini encyclopedia of psychology. 2010:1-2.

17. Hardesty DM, Bearden WO. The use of expert judges in scale development: Implications for

improving face validity of measures of unobservable constructs. Journal of Business Research.

2004;57(2):98-107.

18. Nevo B. Face validity revisited. Journal of Educational Measurement. 1985;22(4):287-93.

19. Coaley K. An Introduction to Psychological Assessment and Psychometrics. London: Sage; 2010.

20. Yusoff MSB, Esa AR. The reliability and validity of the General Stressor Questionnaire (GSQ) among

house officer. International Medical Journal, 18 (3), 179-182.

21. Vagias WM. Likert-type scale response anchors. Clemson International Institute for Tourism &

Research Development, Department of Parks, Recreation and Tourism Management, Clemson

University. 2006.

22. Yusoff MSB. ABC of Response Process Validation and Face Validity Index Calculation. Education in

Medicine Journal. 2019;11(3):55-61.

23. Lau AS-Y, Yusoff MSB, Lee YY, Choi S-B, Rashid F, Wahid N, et al. Development, Translation and

Validation of Questionnaires for Diarrhea and Respiratory-related Illnesses during Probiotic

Administration in Children. Education in Medicine Journal. 2017;9(4).

24. Chin RWA, Chua YY, Chu MN, Mahadi NF, Wong MS, Yusoff MS, et al. Investigating validity

evidence of the Malay translation of the Copenhagen Burnout Inventory. Journal of Taibah University

Medical Sciences. 2018;13(1):1-9.

25. Mahadi NF, Chin RWA, Chua YY, Chu MN, Wong MS, Yusoff MSB, et al. Malay Language

Translation and Validation of the Oldenburg Burnout Inventory Measuring Burnout. Education in

Medicine Journal. 2018;10(2).

26. Field, Andy P. Discovering Statistics Using SSPSS. Los Angeles, Thousand Oaks, Calif. SAGE

Publications, 2009.

EXPERIMENTAL DESIGN FOR HEALTH SCIENCES: QUESTIONNAIRE DEVELOPMENT & 50

VALIDATION FOR HEALTH SCIENCE STUDIES

PROFESSIONAL AND PERSONAL DEVELOPMENT FOR POSTGRADUATES

27. Cortina, J. M. What is coefficient alpha? An examination of theory and applications. Journal of

applied psychology, 1993, 78(1), 98.

ResearchGate has not been able to resolve any citations for this publication.

ABC of Response Process Validation and Face Validity Index Calculation

Article

Full-text available

Oct 2019

Muhamad Saiful Bahri Yusoff

Validity evidence can be supported by five sources that are content, response process, internal structure, relation to other variables, and consequences. Response process validity measures the thought processes of users of the tested inventory as they respond to the assessment tool. These are commonly evaluated in the form of clarity of instructions and language used in the assessment tool, as well as the comprehension of instruction after training or an observation session. Response process validity contributes to the overall validity of an assessment tool; therefore, it should be quantified systematically based on the evidence and best practice. This paper describes a systematic approach to quantify response process validity in the form of face validity index based on the evidence.

ABC of Content Validation and Content Validity Index Calculation

Article

Full-text available

Jun 2019

Muhamad Saiful Bahri Yusoff

There are five sources of validity evidence that are content, response process, internal structure, relation to other variables, and consequences. Content validity is the extent of a measurement tool represents the measured construct and it is considered as an essential evidence to support the validity of a measurement tool such as a questionnaire for research. Since content validity is vital to ensure the overall validity, therefore content validation should be performed systematically based on the evidence and best practice. This paper describes a systematic approach to quantify content validity in the form of content validity index based on the evidence and best practice.

Malay Language Translation and Validation of the Oldenburg Burnout Inventory Measuring Burnout

Article

Full-text available

Jun 2018

This study aimed to translate Oldenburg Burnout Inventory (OLBI) into Malay language, and test its response process (face validity) and internal structure (factor structure and internal consistency). To the author's knowledge, OLBI is not yet validated in Malay language, thus this study aimed to produce a validated Malay version of OLBI (OLBI-M) in order to measure burnout among the healthcare learner population in Malaysia. OLBI has great potentials mainly due to its accessibility and free of any cost to use it, thus might promote more researchers to conduct burnout research in Malaysia. The forward-backward translation was performed as per standard guideline. The OLBI-M was distributed to 32 medical students to assess face validity and later to 452 medical students to assess construct validity. Data analysis was performed by Microsoft Excel, Statistical Package for the Social Sciences (SPSS), and Analysis of Moment Structures (AMOS). The face validity index of OLBI-M was more than 0.70. The two factors of CBI-M achieved good level of goodness of fit indices (Cmin/df = 3.585, RMSEA = 0.076, GFI = 0.958, CFI = 0.934, NFI = 0.912, TLI = 0.905) after removal of several items. The composite reliability values of the two factors ranged from 0.71 to 0.73. The Cronbach's alpha values of the three factors ranged from 0.70 to 0.74. This study shows OLBI-M is a reliable and valid tool to measure burnout in medical students. Future burnout studies in Malaysia are highly recommended to utilise OLBI-M. However, it is crucial for further validity to be carried out to verify the credential of OLBI-M.

Translation, Cross-cultural Adaptation and Validation of System Usability Scale (Malay Version) Questionnaire for the Assessment of Mobile Application

Article

Full-text available

Mar 2018

Background: A mobile app is a programmed system designed to be used by a target user on a mobile device. The usability of such a system refers not only to the extent to which product can be used to achieve the task that it was designed for, but also its effectiveness and efficiency, as well as user satisfaction. The System Usability Scale is one of the most commonly used questionnaires used to assess the usability of a system. The original 10-item version of System Usability Scale was developed in English and thus needs to be adapted into local languages to assess the usability of a mobile apps developed in other languages. Objective: The aim of this study is to translate and validate (with cross-cultural adaptation) the English System Usability Scale questionnaire into Malay, the main language spoken in Malaysia. The development of a translated version will allow the usability of mobile apps to be assessed in Malay. Methods: Forward and backward translation of the questionnaire was conducted by groups of Malay native speakers who spoke English as their second language. The final version was obtained after reconciliation and cross-cultural adaptation. The content of the Malay System Usability Scale questionnaire for mobile apps was validated by 10 experts in mobile app development. The efficacy of the questionnaire was further probed by testing the face validity on 10 mobile phone users, followed by reliability testing involving 54 mobile phone users. Results: The content validity index was determined to be 0.91, indicating good relevancy of the 10 items used to assess the usability of a mobile app. Calculation of the face validity index resulted in a value of 0.94, therefore indicating that the questionnaire was easily understood by the users. Reliability testing showed a Cronbach alpha value of .85 (95% CI 0.79-0.91) indicating that the translated System Usability Scale questionnaire is a reliable tool for the assessment of usability of a mobile app. Conclusions: The Malay System Usability Scale questionnaire is a valid and reliable tool to assess the usability of mobile app in Malaysia.

Development and validation of a Chinese translated questionnaire: A single simultaneous tool for assessing gastrointestinal and upper respiratory tract related illnesses in pre-school children

Article

Full-text available

Feb 2018

Objectives Children are prone to contagious illnesses that come from peers in nurseries, kindergartens, and day care centres. The administration of probiotics has been reported to decrease the episodes of such illnesses, leading to decreased absences and consumption of antibiotics. With less emphasis on, and preferences for, blood collection from young subjects, quantifiable data are merely obtained from surveys and questionnaires. Malaysia has a population which is 25% ethnic Chinese. We aimed to develop a single tool that enables simultaneous assessments of both gastrointestinal and respiratory tract-related illnesses among young Chinese children. Methods The English-language validated questionnaires using data about demographics and monthly health records were translated into the Chinese language. Both forward and backward translated versions were validated. Results The developed demographic and monthly health questionnaires showed an overall item-level content validity index (I-CVI) of 0.99 and 0.97, respectively; while the translated Chinese versions showed I-CVI of 0.97 and 0.98, respectively. Item-level of response process validity index of 1.00 for this questionnaire was obtained from 30 respondents inferring that the items were clear and comprehensible. Conclusions This study showed acceptable levels validity in the Chinese translated version, illustrating a valid and reliable tool to be used for simultaneous assessment of gastrointestinal and respiratory tract-related illnesses in young children that is applicable for Malaysia's Chinese population and other Chinese-speaking nations.

Development, Translation and Validation of Questionnaires for Diarrhoea and Respiratory-related Illnesses during Probiotic Administration in Children

Article

Full-text available

Dec 2017

Background: Gastrointestinal illnesses and respiratory-related illnesses are common among young children in Malaysia, especially those who are attending day care. During administration of probiotic, the occurences of gastrointestinal and respiratory-related illnesses can be reduced. These were observed by evaluation through a single questionnaire. However, currently no single tool exists to simultaneously evaluate the domains of gastrointestinal and respiratory-related illnesses among these young children. The current study aimed to develop a source questionnaire in English, translate and validate into the Malay. Methods: Relevant domains of gastrointestinal and respiratory-related illnesses were identified to generate items and formed a screening tool through literature reviews, focus groups and opinions of experts. Results: The developed Basic Demographic and Lifestyle Questionnaire (BDLQ) and Monthly Healthy Questionnaires (MHQ) showed item-level content validity index (I-CVI) of 0.99 and 0.97, respectively, while the translated Malay versions showed I-CVI of 1.00 and 0.99, respectively. Item-level face validity index (I-FVI) of 1.00 for both questionnaires were obtained from 30 respondents showing that the items were clear and comprehensible. Conclusion: This study showed good level of I-CVI and I-FVI in both developed questionnaires and their Malay translated versions. These tools in English and Malay were valid and thus reliable to be used for assessing gastrointestinal and respiratory-related illnesses in young children.

Development and Validation of the Knowledge and Clinical Reasoning of Acute Asthma Management in Emergency Department (K-CRAMED)

Article

Full-text available

Jun 2017

Introduction: Asthma is a common non-communicable and chronic respiratory illness that affects individuals regardless of age. Suboptimal management of asthma will burden and restrict daily activities of affected individuals, unfortunately, approximately 40% of asthmatic patients received suboptimal management. Therefore, this study aimed to develop a tool to assess knowledge and clinical reasoning of healthcare providers on acute asthmatic management in emergency setting. Method: The tool was developed via three phases: 1) domain identification, 2) domain blueprinting based the GINA and BTS guidelines, and 3) item generation for each domain for assessing knowledge and clinical reasoning. Three forms of validity evidence related to content, response process and internal structure were appraised. Content validity index (CVI), face validity index (FVI) and intraclass correlation coefficient (ICC) estimate the content validity, response process and internal structure of the tool. Results: A new tool was developed, named as K-CRAMED, which assesses knowledge and clinical reasoning on three domains related to management of acute asthma – diagnosis, treatment and disposition. CVI values for the three domains were more than 0.83. FVI values for the three domains among doctors and paramedics were at least 0.83. The ICC between scores given by emergency specialists was 0.989 (CI 95% 0.982, 0.994, p-value < 0.001). Conclusion: The newly developed tool, named as K-CRAMED, is a valid tool to assess knowledge and clinical reasoning of healthcare providers who manage patients with acute asthma. Further validation is required to verify its validity in other setting.

A Systematic Review on Validity Evidence of Medical Student Stressor Questionnaire

Article

Full-text available

Mar 2017

Muhamad Saiful Bahri Yusoff

Introduction: Detecting sources of stress of medical students is important for planning wellness program to improve their psychological wellbeing. One of instruments to detect the sources of stress is the Medical Student Stressor Questionnaire (MSSQ). A systematic review was performed to find out evidence to support its validity in term of content, response process, internal structure, relation to other variables, and consequences. Method: The author planned, conducted and reported this study according to PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) standard of quality for reporting meta-analyses. Systematic search was performed on EBSCOHOST, Scopus, Proquest, PubMed, Web of Science, and Google Scholar databases. Result: The author yielded 613 relevant articles based on search terms, 44 articles had used MSSQ, and after critical appraisal, only 18 articles provided evidence to support validity MSSQ and thus were included in the systematic review. Conclusion: This systematic review supports the validity of MSSQ in relation to content, response process, internal structure, relations to other variables, and consequences of its scores. MSSQ is a valid tool to detect sources of stress in medical students and its results can be utilised as a guide to plan wellness program or intervention to improve medical students’ wellbeing.

Developing questionnaires for educational research: AMEE Guide No. 87

Article

Full-text available

Mar 2014
MED TEACH

Abstract In this AMEE Guide, we consider the design and development of self-administered surveys, commonly called questionnaires. Questionnaires are widely employed in medical education research. Unfortunately, the processes used to develop such questionnaires vary in quality and lack consistent, rigorous standards. Consequently, the quality of the questionnaires used in medical education research is highly variable. To address this problem, this AMEE Guide presents a systematic, seven-step process for designing high-quality questionnaires, with particular emphasis on developing survey scales. These seven steps do not address all aspects of survey design, nor do they represent the only way to develop a high-quality questionnaire. Instead, these steps synthesize multiple survey design techniques and organize them into a cohesive process for questionnaire developers of all levels. Addressing each of these steps systematically will improve the probabilities that survey designers will accurately measure what they intend to measure.

An Introduction to Psychological Assessment and Psychometrics

Book

Jan 2010

Keith Coaley

Experimental Design for Health Sciences: Questionnaire Development & Validation for Health Science Studies

Abstract and Figures

Recommended publications

ABC of Response Process Validation and Face Validity Index Calculation

ABC of Questionnaire Development and Validation for Survey Research

Development and Assessment of the Media Impact on Health Information Perception and Behavior Scale (...

ABC of Content Validation and Content Validity Index Calculation