ArticlePDF Available

The Yale Guideline Recommendation Corpus: A representative sample of the knowledge content of guidelines

Authors:

Abstract and Figures

To develop and characterize a large, representative sample of guideline recommendations that can be used to better understand how current recommendations are written and to test the adequacy of guideline models. We refer to this sample as the Yale Guideline Recommendation Corpus (YGRC). To develop the YGRC, we extracted recommendations from guidelines downloaded from the National Guideline Clearinghouse (NGC). We evaluated the representativeness of the YGRC by comparing the frequency of use of controlled vocabulary terms in the YGRC sample and in the NGC. We examined semantic and formatting indicators that were used to denote recommendation statements. In the course of reviewing 7527 recommendation statements, we extracted 1275 recommendations from the NGC and characterized the guidelines from which they were derived. Both semantic and formatting indicators were used inconsistently to denote recommendations. Recommendation statements were not reliably identifiable in 31.6% (310/982) of the guidelines and many recommendations were not executable as written. We also found variability and inconsistency in the way strength of recommendation is currently reported. Over half of the recommendations (52.7%), did not indicate strength, while 6.5% inaccurately indicated strength. The YGRC provides a representative sample of current guideline recommendations and demonstrates considerable variability and inconsistency in the way recommendations are written and in the way the recommendation strength is currently reported.
Content may be subject to copyright.
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
international journal of medical informatics 78 (2009) 354–363
journal homepage: www.intl.elsevierhealth.com/journals/ijmi
The Yale Guideline Recommendation Corpus:
A representative sample of the knowledge
content of guidelines
Tamseela Hussain, George Michel, Richard N. Shiffman
Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, United States
article info
Article history:
Received 3 April 2008
Received in revised form
6 November 2008
Accepted 17 November 2008
Keywords:
Guidelines
Recommendations
National Guideline Clearinghouse
Strength
abstract
Objective: To develop and characterize a large, representative sample of guideline recommen-
dations that can be used to better understand how current recommendations are written
and to test the adequacy of guideline models. We refer to this sample as the Yale Guideline
Recommendation Corpus (YGRC).
Method: To develop the YGRC, we extracted recommendations from guidelines downloaded
from the National Guideline Clearinghouse (NGC). We evaluated the representativeness of
the YGRC by comparing the frequency of use of controlled vocabulary terms in the YGRC
sample and in the NGC. We examined semantic and formatting indicators that were used
to denote recommendation statements.
Results: In the course of reviewing 7527 recommendation statements, we extracted 1275
recommendations from the NGC and characterized the guidelines from which they were
derived. Both semantic and formatting indicators were used inconsistently to denote recom-
mendations. Recommendation statements were not reliably identifiable in 31.6% (310/982)
of the guidelines and many recommendations were not executable as written. We also found
variability and inconsistency in the way strength of recommendation is currently reported.
Over half of the recommendations (52.7%), did not indicate strength, while 6.5% inaccurately
indicated strength.
Conclusion: The YGRC provides a representative sample of current guideline recom-
mendations and demonstrates considerable variability and inconsistency in the way
recommendations are written and in the way the recommendation strength is currently
reported.
© 2008 Elsevier Ireland Ltd. All rights reserved.
1. Introduction
Clinical practice guidelines are intended to directly improve
the processes of health care and ultimately to improve
the outcomes experienced by patients. Guidelines that are
evidence-based aid in optimizing clinical decision making by
suggesting a course of action based on “conscientious, explicit
Corresponding author at: Yale Center for Medical Informatics, 300 George Street Suite 501, New Haven, CT 06511, United States.
Tel.: +1 203 737 6091; fax: +1 203 737 5708.
E-mail address: tamseela.hussain@yale.edu (T. Hussain).
and judicious use of current best evidence about the care of
individual patients” [1]. Guidelines vary greatly in terms of
both their method of development and the utility of the fin-
ished products.
Clinical guidelines contain recommendation statements
that define appropriate care and, in so doing, differentiate
guidelines from other publications such as systematic reviews.
1386-5056/$ see front matter © 2008 Elsevier Ireland Ltd. All rights reserved.
doi:10.1016/j.ijmedinf.2008.11.001
Author's personal copy
international journal of medical informatics 78 (2009) 354–363 355
Table 1 Guidelines modeled in several current representation systems.
Guideline representation system Guidelines used for development/testing
SAGE (Standards based, Sharable Active Guideline
Environment) [27]
4 Guidelines
Immunization (CDC); diabetes (Standards of Medical Care in Diabetes
2006); diabetic hypertension (7th JNC Report); community-acquired
pneumonia (Infectious Disease Society of America)
GLIF (Guideline Interchange Format) [28] 4 Guidelines
Breast mass workup (Borton M. Gynecological Decision Making
Decker 1988); breast cancer treatment (Eastern Cooperative oncology
Group); cholesterol management (NCEP); influenza vaccine (CDC)
GEM (Guideline Element Module) [29] 5 Guidelines
Urinary tract infection; febrile seizures; developmental dysplasia of
the hip; asthma exacerbations; attention deficit disorder (American
Academy of Pediatrics)
Protocure [30] 2 Guidelines
Jaundice (American Academy of Pediatrics); diabetes (source not
listed)
Most recommendations consist of relatively straightforward
declarative statements that advocate a particular clinical prac-
tice. Ideally, each recommendation should describe precisely
the nature of the proposed actions as well as the exact circum-
stances under which the actions should be undertaken [2].
Such recommendations are referred to as executable recom-
mendations. Specific, concrete recommendation statements
are more likely to be understood, remembered, and acted
upon, and can serve as a basis for the development of bench-
marks or performance indicators. Presenting evidence and
recommendations in a clear, concise, and accessible manner
facilitates the retrieval and assimilation of specific informa-
tion [3]. Yet many guidelines include vague and seriously
underspecified recommendations that make implementation
difficult [4,5].
Users of guidelines need to know how to apply the
knowledge contained in guidelines effectively and how much
confidence to place in the recommendations. This informa-
tion is most often conveyed by categorizing the quality of the
body of evidence on which each recommendation is based.
Quality of evidence is defined as the extent to which one can
be confident that an estimate of effect is correct [6]. In addi-
tion to the quality of evidence, many guideline developers
have also recognized the critical importance of weighing the
benefits that may be anticipated when a recommendation is
followed against any expected risks, harms, and costs [7]. This
judgment is referred to most often as the ‘Recommendation
Strength’. Recommendation strength translates into an expec-
tation of level of adherence. Guideline authors at several sites,
including the American Academy of Pediatrics, the Ameri-
can Academy of Otolaryngology-Head and Neck Surgeons, the
American Thoracic Society, and the American College of Chest
Physicians, explicitly consider and report recommendation
strength [7–10].
The application of the concept of recommendations
strength in guidelines has not been examined systematically.
Previous studies that addressed strength of recommendation
have done so on small, non-representative samples of recom-
mendations and have discussed the need of a uniform system,
or advocated their own system such as GRADE, SORT the mod-
ification of GRADE used by the American College of Chest
Physicians, etc. [10–12].
Previous studies in modeling guideline recommendations
for implementation in computer-based decision support sys-
tems have often relied on small numbers of recommendations
selected from limited, convenient samples of guidelines (see
Table 1). We believe such studies may result in knowledge
models that fit the selected recommendations well, but may
fail to effectively represent large numbers of guideline recom-
mendations.
The primary objective of this work is to develop and
characterize a large, representative sample of guideline rec-
ommendations that can be used to better understand how
current recommendations are written. We refer to this sample
as the Yale Guideline Recommendation Corpus (YGRC). A cor-
pus is defined as a large collection of writings of a specific kind
or on a specific subject used for linguistic analysis. Viewing a
corpus makes patterns in language more visible [13].
In the following sections, we describe the process of YGRC
development, characteristics of the guidelines from which the
recommendations are derived, the difficulties we encountered
in identifying and extracting recommendation statements
from guideline text, and use of the corpus to describe the
prevalence of recommendation strength statements.
2. Methods
2.1. Guideline selection
To initiate the development of a representative sample of
guideline recommendations, we downloaded all 1964 guide-
line summaries available at the Agency for Healthcare
Research and Quality’s (AHRQ) National Guideline Clearing-
house website (NGC) on 15 June 2007. The NGC provides
a comprehensive, web-accessible database of summaries of
evidence-based clinical practice guidelines and related docu-
ments. These summaries are prepared for AHRQ by ECRI, a
contractor organization that develops the summaries accord-
ing to a set of carefully prescribed protocols in consultation
with the organizations that authored the guidelines.
To be included in the NGC:
Guidelines must be produced under the auspices of medi-
cal specialty associations, professional societies, public or
Author's personal copy
356 international journal of medical informatics 78 (2009) 354–363
Fig. 1 Method of development of the Yale Guideline Recommendation Corpus.
private organizations, government agencies, health care
organizations, or plans.
Guideline development must include a systematic literature
search and review of scientific evidence.
The full text of the guideline must be available in English.
Guidelines must have been developed, reviewed, or revised
within the past 5 years.
Each NGC summary is an XML document that accommo-
dates text content in up to 55 elements per guideline. Thirteen
of these elements utilize controlled vocabularies of terms
to classify guideline attributes. Coding of elements is not
required and the concepts are not mutually exclusive so each
guideline may contain one, several, or no controlled vocab-
ulary terms in each field. We used the controlled vocabulary
terms to characterize subsets of guidelines we selected from
the NGC.
Our goal was to include at least 1000 recommenda-
tion statements that were broadly representative of all the
currently available guideline recommendations at the NGC
website. The method by which the Yale Guideline Recommen-
dation Corpus was developed from the NGC is summarized
in Fig. 1. We numbered each guideline summary sequentially
and selected those with odd-numbered identifiers to achieve
a representative sample of guidelines (N= 982).
2.2. Selection of recommendations
In each NGC guideline summary, all recommendations are
aggregated within a single XML element entitled “Major Rec-
ommendations.” They are accompanied by highly variable
text that describes, for example, background information
about the disease or condition to which the recommendation
applies, the rationale for the recommendation, and informa-
tion that amplifies how the recommendation might be carried
out. We attempted to identify statements that were recogniz-
able as individual recommendations within the text of this
element. We operationally defined a “recommendation” as
a statement whose apparent intent is to provide guidance
about the advisability of a clinical action. Recommendations
were identified based on semantic considerations, formatting
(such as bullets, bolded text, and enumeration), headers (that
include descriptors such as ‘recommended’) and presence of
recommendation strength indicators. Guidelines were next
reviewed for consistency of presentation. If recommendations
were not consistently recognizable throughout a given guide-
line, then the guideline was set aside for further review (see
below).
We then counted the total number of recommendations
in each eligible guideline and, using a random number gen-
erator, selected three recommendations from each. Random
selection was necessary to avoid an order bias, because rec-
ommendations are often organized in a sequence in which
the first statements address screening and diagnostic consid-
erations and latter recommendations address management
considerations. To avoid over sampling, guidelines that con-
tained fewer than three recommendations were excluded.
Guidelines with more than 100 recommendations were also
excluded from the sampling because consistent counting of
large numbers of recommendations was not feasible. We
excluded those guidelines that represented recommendations
in algorithmic (flowchart) format because we planned to focus
Author's personal copy
international journal of medical informatics 78 (2009) 354–363 357
on declarative rather than procedural knowledge. We also
excluded those guidelines that provided recommendations in
tables because the tabular format encodes meta-information
that would be difficult to capture consistently in a corpus of
textual statements. Finally, the text of each recommendation
was extracted and entered into a MySQL database along with
the source guideline’s title and the developer’s name to create
the YGRC.
For each recommendation entered into the MySQL
database, we also collected information regarding the indi-
cation of strength of recommendations and categorized
recommendation strength indicators as:
Present, i.e., the strength of this recommendation incorpo-
rated an appraisal of benefits, harms, risks and costs.
Absent, i.e., the strength of this recommendation was not
indicated.
Strength Inaccurately Indicated, i.e., some information is
indicated as recommendation strength that, in fact, merely
describes evidence quality.
For the 315 guideline summaries in which we were unable
to reliably identify recommendation statements, we reviewed
the original guideline statements from their sources to assure
that the NGC summaries were congruent with the original
publications In 5 of 315 guidelines (1.6%) that we excluded
because recommendations were not consistently identifiable,
the original publications provided text or formatting that
made identification of the recommendation statements pos-
sible. Three of these guidelines were excluded from the YGRC
pool because the recommendations were presented in tabular
format or they contained fewer than three recommendations.
The remaining two guidelines were added to the pool of YGRC
source guidelines for randomization. That left a set of 310
guidelines whose recommendations were not consistently
identifiable.
3. Results
3.1. Characteristics of NGC and YGRC guidelines
As shown in Table 2, most guidelines included in the NGC
are developed by medical specialty societies (39.3%) and pro-
fessional associations (15.8%). Non-US governmental agencies
account for 14.1%, while US governmental contributions
account for 9.9%.
Most guidelines were coded to indicate that they provide
advice about treatment and management (see Table 3). Advice
regarding evaluationwas available in almost half of guidelines.
Diagnostic assistance and advice regarding prevention were
provided by about 1/3 of guidelines.
We found 425 guidelines that met eligibility criteria for
inclusion in the YGRC. From these guidelines we identified
and enumerated 7527 recommendation statements and ran-
domly selected 1275 recommendations from them. These
recommendations cover a broad range of diseases and mental
disorders.
To assure that the YGRC sample reflected the NGC content,
we compared the proportion of YGRC guidelines coded with
Table 2 Sources of guideline developers contributing to
the 1964 guidelines at National Guidelines
Clearinghouse. More than one guideline developer may
contribute to each guideline.
39.3% Medical Specialty Society
15.8% Professional Association
8.3% National Government Agency\Non-US
6.9% Federal Government Agency\US
6.7% Private Nonprofit Organization
5.8% State/Local Government Agency\Non-US
4.5% Independent Expert Panel
4.4% Academic Institution
3.0% State/Local Government Agency\US
2.7% Disease Specific Society
1.7% Hospital/Medical Center
1.0% Public For Profit Organization
0.7% Private Nonprofit Research Organization
0.3% Private For Profit Organization
0.3% Managed Care Organization
0.2% International Agency
Guideline Category controlled vocabulary terms, with the pro-
portion of NGC guidelines that were coded with these terms
(see Table 3). Because the NGC and YGRC were similar in rank-
ings and percentages of Guideline Category code application
(the frequency of “Screening” and Assessment of Therapeutic
Effectiveness” differed slightly), we concluded that the YGRC
subset was representative of the NGC with respect to guideline
categories.
3.2. Identification of recommendation statements
To facilitate consistent identification, extraction, and count-
ing we defined several indicators that we used to recognize
recommendations. These indicators include the following.
3.2.1. Semantic indicators
1. Recommendations may include (a) modal operators (e.g.,
terms such as “should,” “must,” “may”) to express a level
of obligation or permission or (b) statements of suitabil-
ity under specific circumstances (e.g., “is appropriate”, “is
indicated”).
Example. An F 18-deoxyglucose positron emission tomogra-
phy (FDG PET) scan should be performed to investigate solitary
pulmonary nodules in cases where a biopsy is not possible or
has failed, depending on nodule size, position and CT charac-
terization [14].
2. Indicative headings and titles, such as ‘Recommendations’
and ‘Recommended’ may be used to demarcate recommen-
dation statements.
Example. Recommendation: Treat duodenal ulcers with
H2RAs or PPIs for 4–8 weeks [15].
3. Guidance may be presented in concise paragraphs.
In the most recognizable format, as shown in the following
example, the first (topic) sentence of the paragraph explicitly
states an advisable course of action, although other formatting
Author's personal copy
358 international journal of medical informatics 78 (2009) 354–363
Table 3 Application of category codes in the full NGC (1964 guidelines) and in the YGRC sample (425 guidelines). More
than one category may be used to code each guideline.
NGC YGRC Category codes
Percentage Number Percentage Number
58.8% 1155 62.4% 265 Management
57.7% 1133 61.2% 260 Treatment
47.5% 932 43.5% 185 Evaluation
41.5% 815 34.4% 146 Diagnosis
31.2% 612 32.0% 136 Prevention
17.5% 344 21.2% 90 Risk Assessment
12.1% 237 17.2% 73 Assessment of Therapeutic Effectiveness
14.9% 292 15.5% 66 Screening
9.6% 189 11.3% 48 Counseling
2.1% 42 1.9% 8 Technology Assessment
2.1% 41 1.4% 6 Rehabilitation
0.1% 2 0.0% 0 Education
indicators are absent. The information that follows amplifies
and explains the recommended activity.
Example. Beginning in their 20s, women should be told about
the benefits and limitations of breast self examination (BSE).
The importance of prompt reporting of any new breast symp-
toms to a health professional should be emphasized. Women
who choose to do BSE should receive instruction and have
their technique reviewed on the occasion of a periodic health
examination [16].
4. Recommendation statements may be accompanied by an
indicator of evidence quality or strength of recommenda-
tion.
Following is an example from the YGRC in which strength
of recommendation and quality of evidence is indicated.
Example. Rituximab is active in the treatment of Wm but
associated with the risk of transient exacerbations of clinical
effects of the disease and should only be used with cau-
tion, especially in patients with symptoms of hyper-viscosity
and/or IgM levels >40g/L. Level of evidence IIb, Grade of Rec-
ommendation B [17].
3.2.2. Several formatting indicators may be used to
facilitate recognition of recommendations
1. Enumeration of statements.
2. Boldface text.
3. Bulleted text
3.3. Characteristics of recommendations that were not
easily identifiable
We found several recurring issues that interfered with our
ability to reliably and consistently identify recommendation
statements within guideline text.
3.3.1. Clinical facts were formatted as recommendations
Many statements, which were formatted like recommenda-
tions, were simply facts and were not executable as written.
Example. Suppressive therapy is effective for preventing
recurrent infections. (Strength of Recommendation A-1) [18]
In this example, the statement indicates that the agent is
effective and it is even supported by an indicator of strength
of recommendation. However, the factual assertion does not
indicate whether or under what conditions “suppressive ther-
apy” should be used. A decision about whether or not to use
the therapy is dependent on other unspecified considerations,
such as its comparative effectiveness (vis-à-vis no therapy or
other agents), its safety profile, and its cost. As written, this
recommendation is not executable.
3.3.2. Guidance was deeply embedded in paragraphs
Some statements that were executable and provided guid-
ance about the advisability of a clinical action were embedded
in long paragraphs, without any formatting to indicate that
the statements were intended to serve as recommendations.
Identifying such statements as recommendations required a
thorough reading and understanding of the text.
Example. A pilot open-label study suggested that paroxetine
is effective in reducing pain and other IBS symptoms. A lit-
erature search revealed only one randomized controlled trial
(RCT) examining the use of an SSRI (paroxetine) for treatment
of IBS. This trial did suggest an improvement in overall well-
being in both depressed and non-depressed individuals with
IBS. Given the limited evidence, their use is not recommended
as routine or first-line therapy exceptin patients who also have
co-morbid depression [19].
In this example, the paragraph begins with the suggestion
that a drug is effective and only at the end does the reader
learn that its use is not recommended.
3.3.3. Formatting of recommendations was inconsistent
within the same guideline
We noted many instances where the same formatting (e.g.,
bullets, boldface) was used to denote recommendations in one
part of a guideline and used for other purposes or not used
with other recommendations within the same guideline. This
inconsistency complicated the identification of text that was
Author's personal copy
international journal of medical informatics 78 (2009) 354–363 359
Table 4 Results: documentation of strength of
recommendations in the YGRC.
Strength
Present, N(%)
Strength Absent,
N(%)
Strength Inaccurately
Indicated, N(%)
519 (40.7) 672 (52.7) 84 (6.58)
intended by the guideline developers to serve as recommen-
dations.
The following example shows inconsistent use of bullets to
denote recommendations.
Example.
Oral antiviral drugs are indicated within 5 days of the start
of the episode and while new lesions are still forming. (A
recommendation)
Topical agents are less effective than oral agents. (An asser-
tion of fact)
Acyclovir, valaciclovir, and famciclovir all reduce the sever-
ity and duration of episodes. Antiviral therapy does not alter
natural history of the disease [20]. (Two additional asser-
tions of fact)
Such inconsistencies made it difficult in many cases to iden-
tify, count, and extract recommendations from within the
guideline text.
3.4. Indication of strength of recommendation was
variable and inconsistent in guidelines
Table 4 shows the variability and inconsistency that we found,
in the way strength of recommendation is currently being
reported in guidelines.
Variability exists, because rather than a uniform system,
multiple different methods were used by guideline developers
for demarcating strength of recommendation, e.g., alphabet
characters [e.g., grades A, B, C, D] Arabic numbered levels
[1, 2, 3, 4], and Roman numerals [I, II, III, IV] [11]. Inconsis-
tency exists, because the strength of recommendation was
not always indicated consistently. Some recommendation
statements had a notation of strength of recommendation,
whereas other statements in the same guideline did not. Fol-
lowing is an example from the YGRC in which only one of
the two contiguous recommendations has an indication of
strength.
Example.
Diet
Dietary modifications alone, such as a clear liquid diet, are
inadequate for colonoscopy. However they have proven to be
a beneficial adjunct to other mechanical cleansing methods
(Grade IIB).
Enemas
Use enemas in patients who present to endoscopy with a
poor distal colon preparation and in patients with a defunc-
tionalized distal colon [21].
3.5. Comparison of YGRC and sub-optimally
formatted guidelines
We excluded 310 guidelines from the YGRC because we
were unable to consistently identify recommendations within
them. We hypothesized that this subset of guidelines might
be characterized by additional weaknesses in guideline devel-
opment. We therefore compared these guidelines with the
guidelines from which the YGRC was derived using variables
that are encoded using the NGC’s controlled vocabulary.
We found that guidelines that were coded as having been
produced by US federal government agencies were dispro-
portionately heavily represented (10.3% vs. 2.6%, 2=19.48,
P< 0.00001) within the subset of guidelines with inconsis-
tently identifiable recommendations. Contrariwise, guidelines
produced by non-US national government agencies were dis-
proportionately poorly represented (1.3% vs. 10.3%, 2=22.63,
P= 0.000002) among the excluded guidelines.
Several of the NGC controlled vocabulary’s authorized
terms pertain to the methodology of guideline development.
The percentage of guidelines that utilized systematic review
and meta-analysis to analyze evidence high-quality, trans-
parent approaches to evidence appraisal was higher in the
YGRC than in the set of guidelines whose recommendations
were not consistently identifiable (see Fig. 2).
Similarly, the methods used to assess the quality and
strength of evidence for guidelines in the YGRC weighted rec-
ommendations according to an explicitly stated rating scheme
significantly more often than did the guidelines whose rec-
ommendations were not consistently identifiable (see Fig. 3).
Those guidelines whose recommendations were not consis-
tently identifiable (1) more frequently depended on subjective
reviews and (2) failed to state whether or not a rating scheme
was applied, or (3) if a rating scheme was applied, it was not
supplied in the guideline.
4. Discussion
We developed a corpus of 1275 randomly selected rec-
ommendation statements from the National Guidelines
Clearinghouse and characterized the guidelines from which
they were derived. We found considerable variability and
inconsistency in the way guideline recommendations are cur-
rently written and reported. These deficiencies were serious
enough to imperil the very identification of the statements
that were intended to be clinical recommendations and thus
influence clinical practice.
Guideline authors currently use both semantic and for-
matting indicators to signify recommendations. However, we
noted that inconsistent formatting was prevalent in this
guideline sample. Moreover, in many cases guideline authors
used declarative statements of clinical facts in place of exe-
cutable recommendations. Such statements fail to convey
critical details that are necessary to apply the knowledge in the
course of clinical care. Guidelines that included sub-optimally
formatted recommendations were also deficient in other qual-
ity indicators such as methodology of evidence review and
use of a transparent rating scheme for evidence quality and
recommendation strength.
Author's personal copy
360 international journal of medical informatics 78 (2009) 354–363
Fig. 2 Methods used to analyze evidence.
To influence patient outcomes, systems must be devised
that promote adherence to guideline recommendations by
targeted clinicians. Grol et al. related adherence rates to 12
attributes of guidelines, including whether or not the recom-
mendations were concrete, evidence based, or controversial
[22]. Adherence rates almost doubled when recommendations
were clear and precise when compared with recommenda-
tions judged to be vague and non-specific. Likewise, Shekelle
et al. found that nonspecific guideline statements actually
decrease appropriate test-ordering behavior [23].
This study also demonstrates that the strength of rec-
ommendation is currently applied infrequently, variably, and
inconsistently in guideline recommendations [24]. Slightly
more than half of the recommendations (52.7%) did not
indicate strength of recommendation and 6.5% inaccurately
stated strength of recommendation, where the strength, as
stated, purported an indication of evidence quality. Qual-
ity of evidence determines the extent to which one can be
confident that an estimate of effect is correct. The strength
of recommendation describes the extent to which one can
be confident that adherence to the recommendation will
do more good than harm. Because we used the presence
of recommendation strength as a determinant of identifi-
able recommendations, we presume that the estimate of
under specification would be even higher in the NGC as a
whole.
The process of transformation of guideline-based knowl-
edge into effective decision support systems remains an
informatics challenge. Application of the concept of rec-
ommendation strength is also critical in the design of
clinical decision support systems that deliver patient-
specific advice at the point of care. Strong recommendations
can be operationalized in systems that require adherence
before allowing the user to move on. Lower level recom-
mendations can promote appropriate practice by offering
a default choice or by simply facilitating documenta-
tion.
We believe that an ideal recommendation explicitly or
implicitly answers the questions: WHO should do WHAT to
WHOM, UNDER WHAT CIRCUMSTANCES, HOW, and WHY?
[25]. Vague and underspecified recommendations violate the
principle of clarity set forth by the Institute of Medicine in
1992 that guideline recommendations must use unambiguous
language, define terms precisely, and use logical and easy-to-
follow modes of presentation [26].
Guideline recommendations would be clearer and more
acceptable to users if authors and publishers adhered to the
following recommendations:
1. Identify the critical recommendations in guideline text
using semantic indicators (such as “The Committee rec-
ommends ... or “Whenever X, Y, and Z occur clinicians
Author's personal copy
international journal of medical informatics 78 (2009) 354–363 361
Fig. 3 Methods used to assess quality and strength of evidence.
should ...”) and formatting (e.g., bullets, enumeration, and
boldface text).
2. Use consistent semantic and formatting indicators
throughout the publication.
3. Group recommendations together in a summary section to
facilitate their identification.
4. Do not use assertions of fact as recommendations. Recom-
mendations must be decidable and executable.
5. Avoid embedding recommendation text deep within long
paragraphs. Ideally, recommendations should be stated in
the first (topic) sentence of the paragraph and the remain-
der of the paragraph can be used to amplify the suggested
guidance.
6. Clearly and consistently assign evidence quality and
recommendation strength in proximity to each recommen-
dation and distinguish between the distinct concepts of
quality of evidence and strength of recommendation.
5. Limitations
The National Guideline Clearinghouse may not be a represen-
tative source of the universe of guideline documents because it
is limited to English-language guidelines and includes mostly
guidelines created in North America. However, it is a rich
source of guideline knowledge with reasonable standards for
inclusion. In addition, it is widely used and is highly accessible.
We excluded a large number of guidelines from our sam-
pling process because they included fewer than 3 or more
than 100 recommendations, or presented recommendations
in a tabular or algorithmic format. This might be expected to
diminish the representativeness of the sample. Nonetheless,
we retained a large number of guidelines that we demon-
strated were representative of the NGC content.
Some formatting inconsistency may be introduced at the
time guideline summaries are created. However, upon review
of the 315 original guideline publications that were initially
excluded from the YGRC because of formatting deficiencies,
only 5 (1.6%) were found that included identifiable recommen-
dations.
6. Future work
We plan to use the YGRC as a resource for further explor-
ing the quality and knowledge content of guidelines and
to investigate and clarify problems in guideline authoring
and dissemination. For example, we have completed a study
of the YGRC to ascertain current patterns in the use of
statements of Recommendation Strength, a parameter that
is of critical importance to guideline implementers and
end-users [24].
Additional planned studies involve the use of manual and
natural language processing techniques to characterize the
Author's personal copy
362 international journal of medical informatics 78 (2009) 354–363
Summary points
What was already known before this study
Previous studies in modeling guideline recommenda-
tions for implementation in computer-based decision
support systems have often relied on small numbers of
recommendations selected from limited, convenience
samples of guidelines.
The concept of application of strength of recommen-
dation in clinical guidelines has not been previously
systematically examined.
Clinical guidelines include vague and underspecified
recommendations that make implementation diffi-
cult.
What this study has added to our knowledge
This paper demonstrates the characteristics of current
clinical guidelines in the form of a corpus that is the
largest and broadest sample of its kind. The corpus was
created after reviewing 7527 recommendation state-
ments and includes 1275 extracted statements which
represent both national and international guidelines.
These recommendations cover a broad range of dis-
eases and mental disorders.
This paper studies the important concept of applica-
tion of strength of recommendations in guidelines and
also reports the problems associated.
This study provides further insight in the way guide-
line recommendations are currently written and
reported. The issues reported are serious enough
to imperil the very identification of the state-
ments that were intended to influence clinical
practice.
language of recommendations and to define exemplary pat-
terns of statement clarity. The YGRC can also be used to
test the adequacy of vocabularies to encode concepts nec-
essary for guideline implementation in computer-mediated
decision support systems. Although considerable work has
highlighted the capacity of current vocabularies to repre-
sent patient data necessary for decision support, little work
has been done to examine the fitness of these systems
for expressing the knowledge-based concepts that must be
represented.
Acknowledgements
This work was supported by grant LM07199, which is co-
funded by the National Library of Medicine and the Agency for
Healthcare Research and Quality, and by grant T15-LM07065
from the National Library of Medicine.
We would like to thank Denise Heresy for her support in
her role as library liaison at the Yale Medical Library.
references
[1] D.L.S Rosenberg##W.M. Sackett, J.A. Gray, R.B. Haynes, W.S.
Richardson, Evidence based medicine: what it is and what it
isn’t, BMJ 312 (7023) (1996) 71–72.
[2] R.N. Shiffman, P. Shekelle, J.M. Overhage, J. Slutsky, J.
Grimshaw, A.M. Deshpande, Standardized reporting of
clinical practice guidelines: a proposal from the Conference
on Guideline Standardization, Ann. Intern. Med. 139 (6)
(2003) 493–498.
[3] S. Michie, K. Lester, Words matter: increasing the
implementation of clinical guidelines, Qual. Saf. Health Care
14 (October(5)) (2005) 367–370.
[4] S. Codish, R.N. Shiffman, A model of ambiguity and
vagueness in clinical practice guideline recommendations,
AMIA Annu. Symp. Proc. (2005) 146–150.
[5] W.M. Tierney, J.M. Overhage, B.Y. Takesue, L.E. Harris, M.D.
Murray, D.L. Vargo, et al., Computerizing guidelines to
improve care and patient outcomes: the example of heart
failure, J. Am. Med. Inform. Assoc. 2 (September–October(5))
(1995) 316–322.
[6] D. Atkins, D. Best, P.A. Briss, M. Eccles, Y. Falck-Ytter, S.
Flottorp, et al., Grading quality of evidence and strength of
recommendations, BMJ 328 (June(7454)) (2004)
1490.
[7] H.J. Schunemann, R. Jaeschke, D.J. Cook, W.F. Bria, A.A.
El-Solh, A. Ernst, et al., An official ATS statement: grading
the quality of evidence and strength of recommendations in
ATS guidelines and recommendations, Am. J. Respir. Crit.
Care Med. 174 (September(5)) (2006) 605–614.
[8] American Academy of Pediatrics Steering Committee on
Quality Improvement Management, Classifying
recommendations for clinical practice guidelines, Pediatrics
114 (September(3)) (2004) 874–877.
[9] R.M. Rosenfeld, R.N. Shiffman, Clinical practice guidelines: a
manual for developing evidence-based guidelines to
facilitate performance measurement and quality
improvement, Otolaryngol. Head Neck Surg. 135 (October(4
Suppl.)) (2006) S1–S28.
[10] G. Guyatt, D. Gutterman, M.H. Baumann, D. Addrizzo-Harris,
E.M. Hylek, B. Phillips, et al., Grading strength of
recommendations and quality of evidence in clinical
guidelines: report from an American College of Chest
Physicians Task Force, Chest 129 (January(1)) (2006) 174–181.
[11] Grading Recommendations Assessment, Development and
Evaluation (GRADE) Working Group, 2007. Available at:
http://www.gradeworkinggroup.org/index.htm (accessed
August 8, 2007).
[12] M.H. Ebell, J. Siwek, B.D. Weiss, S.H. Woolf, J.L. Susman, B.
Ewigman, et al., Simplifying the language of evidence to
improve patient care: strength of recommendation
taxonomy (SORT): a patient-centered approach to grading
evidence in medical literature, J. Fam. Pract. 53 (February(2))
(2004) 111–120.
[13] E. McKean, Corpus: Exploring What Words Really Mean, New
York Times Magazine, July 29, 2007.
[14] National Collaborating Center for Acute Care, The Diagnosis
and Treatment of Lung Cancer. Available at: www.ngc.gov.
[15] New Zealand Guidelines Group, Management of Dyspepsia
and Heartburn. Available at: www.ngc.gov.
[16] American Cancer Society, ACS Guidelines for Breast Cancer
Screening. Available at: www.ngc.gov.
[17] British Committee for Standards in Hematology, Guidelines
on the Management of Waldenstrom’s Macroglobulinemia.
Available at: www.ngc.gov.
[18] Infectious Diseases Society of America, Guidelines for
Treatment of Candidiasis. Available at: www.ngc.gov.
Author's personal copy
international journal of medical informatics 78 (2009) 354–363 363
[19] University of Texas at Austin, School of Nursing, The
Efficacy of Antidepressants and Various Psychotherapies as
Adjunctive Treatments for Irritable Bowel Syndrome.
Available at: www.ngc.gov.
[20] British Association of Sexual Health and HIV Medical
Society, 2002 National Guideline for the Management of
Genital Herpes. Available at: www.ngc.gov.
[21] American Society for Gastrointestinal Endoscopy, Society of
American Gastrointestinal and Endoscopic Surgeons, A
Consensus Document on Bowel Preparation Before
Colonoscopy. Available at: www.ngc.gov.
[22] R. Grol, J. Dalhuijsen, S. Thomas, C. Veld, G. Rutten, H.
Mokkink, Attributes of clinical guidelines that influence use
of guidelines in general practice: observational study, BMJ
317 (September(7162)) (1998) 858–861.
[23] P.G. Shekelle, R.L. Kravitz, J. Beart, M. Marger, M. Wang, M.
Lee, Are nonspecific practice guidelines potentially harmful?
A randomized comparison of the effect of nonspecific
versus specific guidelines on physician decision making,
Health Serv. Res. 34 (March(7)) (2000) 1429–1448.
[24] How Often is Strength of Recommendation Indicated in
Guidelines: A Study of the Yale Guideline Recommendation
Corpus, American Medical Informatics Association Annual
Meeting, October, 2008.
[25] R.N. Shiffman, G. Michel, A. Essaihi, E. Thornquist, Bridging
the guideline implementation gap: a systematic,
document-centered approach to guideline implementation,
J. Am. Med. Inform. Assoc. 11 (September–October(5)) (2004)
418–426.
[26] Institute of Medicine (U.S.), Committee on Clinical Practice
Guidelines, Guidelines for Clinical Practice: From
Development to Use, National Academy Press, 1992.
[27] S.W. Tu, J.R. Campbell, J. Glasgow, M.A. Nyman, R. McClure, J.
McClay, et al., The SAGE guideline model: achievements and
overview, J. Am. Med. Inform. Assoc. 14
(September–October(5)) (2007) 589–598.
[28] L. Ohno-Machado, J.H. Gennari, S.N. Murphy, N.L. Jain, S.W.
Tu, D.E. Oliver, et al., The guideline interchange format: a
model for representing guidelines, J. Am. Med. Inform.
Assoc. 5 (July–August(4)) (1998) 357–372.
[29] R.N. Shiffman, B.T. Karras, A. Agrawal, R. Chen, L. Marenco,
S. Nath, GEM: a proposal for a more comprehensive
guideline document model using XML, J. Am. Med. Inform.
Assoc. 7 (September–October(5)) (2000) 488–498.
[30] Protocure, Protocure II, 2007. Available at: http://www.
protocure.org/ (accessed July 18, 2007).
... The recommendations were organised under headings of assessment; red flags; imaging; management; return to work; advice and education; exercise and electrophysical modalities. The recommendations within each heading were then clustered according to the wording and focus of each recommendation (Gupta et al. 2016;Hussain, Michel & Shiffman 2009;Shiffman et al. 2005). No attempt was made to synthesise the original recommendations into composite recommendations . ...
... The authors believe that synthesising the recommendations would jeopardise not only the intent of the individual recommendations but also the parent CPGs. There is a growing body of work about how the semantics of writing recommendation from the underpinning research evidence affect its uptake by the target audience (Gupta et al. 2016;Hussain et al. 2009;Shiffman et al. 2005). Whilst recommendations in each cluster appear to use common references, and have similar intent, they often use different words to convey the strength of the body of evidence, and the message. ...
Article
Full-text available
Background: Clinical practice guidelines (CPGs) provide conveniently packaged evidence-based recommendations to inform clinical decisions. However, intended end-users often do not know how to source, appraise, interpret or choose among CPGs. Moreover, it can be confusing when recommendations on the same topic differ among CPGs, in wording, intent and underpinning evidence. Objectives: This article reports on the processes of: (1) identifying current CPGs for acute and subacute low back pain (LBP) to fit the needs of South African physiotherapists, (2) collating and summarising CPG recommendations to produce a user-friendly end-user product and (3) testing the utility of the summary CPG document on South African physiotherapy clinicians to efficiently determine acceptability, appropriateness and feasibility to inform clinical decision-making. Method: An adapted approach was followed by systematically searching online CPG repositories and online databases for LBP CPGs; screening and critically appraising identified CPGs; summarising recommendations from relevant CPGs and organising them into clinical practice activities. Feedback on utility was obtained from 11 physiotherapists. Results: Three high-quality, international CPGs provided 25 recommendations on the assessment and management of acute and subacute LBP relevant to South African physiotherapy practice. They were organised into 10 headings. Physiotherapy user feedback suggested that this document would assist in clinical decision-making. Conclusion: Organised recommendations extracted from multiple, relevant CPGs provide an end-user-friendly resource for physiotherapists treating LBP. Clinical implications: Collated and organised CPG recommendations may effectively assist South African physiotherapists’ clinical decision-making in assessing and managing patients with acute and subacute LBP.
... --Operative reports (surgery) (Lohr and Herms, 2016) 450 22k 266k Discharge summaries from a dermatology department (Kreuzthaler et al., 2016) Some larger corpora of CPGs for the English language exist already. Hussain et al. (2009) present the Yale Guideline Recommendation Corpus (YGRC), a sample of 1,275 guideline recommendations extracted from the National Guideline Clearinghouse (NGC). Their work revealed inconsistencies in writing style and reporting of the strength of recommendations. ...
Conference Paper
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines for oncology. This corpus is one of the largest ever built from German medical documents. Unlike clinical documents, clinical guidelines do not contain any patient-related information and can therefore be used without data protection restrictions. Moreover, GGPONC is the first corpus for the German language covering diverse conditions in a large medical subfield and provides a variety of metadata, such as literature references and evidence levels. By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language to other corpora, medical and non-medical ones.
... --Operative reports (surgery) (Lohr and Herms, 2016) 450 22k 266k Discharge summaries from a dermatology department (Kreuzthaler et al., 2016) Some larger corpora of CPGs for the English language exist already. Hussain et al. (2009) present the Yale Guideline Recommendation Corpus (YGRC), a sample of 1,275 guideline recommendations extracted from the National Guideline Clearinghouse (NGC). Their work revealed inconsistencies in writing style and reporting of the strength of recommendations. ...
Preprint
Full-text available
The lack of publicly available text corpora is a major obstacle for progress in clinical natural language processing, for non-English speaking countries in particular. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely distributable German language corpus based on clinical practice guidelines in the field of oncology. The corpus is one of the largest corpora of German medical text to date. It does not contain any patient-related data and can therefore be used without data protection restrictions. Moreover, it is the first corpus for the German language covering diverse conditions in a large medical subfield. In addition to the textual sources, we provide a large variety of metadata, such as literature references and evidence levels. By applying and evaluating existing medical information extraction pipelines for German text, we are able to draw comparisons for the use of medical language to other medical text corpora.
... Trust developed by the IOM Committee on Standards for Developing Trustworthy Clinical Practice Guidelines. 28 The reader will note the use of the word "should," "may," and "must" as action words in each of the key action statements. Lomotan et al, 2010 suggest that "must" conveys the strongest level of obligation and that guideline developers rarely use the term, except in cases of a clear legal standard or potential for imminent patient harm. ...
Preprint
Shoemaker MJ, Dias KJ, Lefebvre KM, et al. Physical therapist clinical practice guideline for the management of individuals with heart failure. Phys Ther. 2019;99:1-28.] The American Physical Therapy Association (APTA), in conjunction with the Cardiovascular and Pulmonary Section of APTA, have commissioned the development of this clinical practice guideline to assist physical therapists in their clinical decision making when treating patients with heart failure. Physical therapists treat patients with varying degrees of impairments and limitations in activity and participation associated with heart failure pathology across the continuum of care. This document will guide physical therapist practice in the examination and treatment of patients with a known diagnosis of heart failure. The development of this clinical practice guideline followed a structured process and resulted in 9 key action statements to guide physical therapist practice. The level and quality of available evidence were graded based on specific criteria to determine the strength of each action statement. Clinical algorithms were developed to guide the physical therapist in appropriate clinical decision making. Physical therapists are encouraged to work collaboratively with other members of the health care team in implementing these action statements to improve the activity, participation, and quality of life in individuals with heart failure and reduce the incidence of heart failure-related re-admissions.
... Trust developed by the IOM Committee on Standards for Developing Trustworthy Clinical Practice Guidelines. 28 The reader will note the use of the word "should," "may," and "must" as action words in each of the key action statements. Lomotan et al (2010) suggest that "must" conveys the strongest level of obligation and that guideline developers rarely use the term, except in cases of a clear legal standard or potential for imminent patient harm. ...
Article
The American Physical Therapy Association (APTA), in conjunction with the Cardiovascular and Pulmonary Section of APTA, have commissioned the development of this clinical practice guideline to assist physical therapists in their clinical decision making when managing patients with heart failure. Physical therapists treat patients with varying degrees of impairments and limitations in activity and participation associated with heart failure pathology across the continuum of care. This document will guide physical therapist practice in the examination and treatment of patients with a known diagnosis of heart failure. The development of this clinical practice guideline followed a structured process and resulted in 9 key action statements to guide physical therapist practice. The level and quality of available evidence were graded based on specific criteria to determine the strength of each action statement. Clinical algorithms were developed to guide the physical therapist in appropriate clinical decision making. Physical therapists are encouraged to work collaboratively with other members of the health care team in implementing these action statements to improve the activity, participation, and quality of life in individuals with heart failure and reduce the incidence of heart failure-related re-admissions.
... Guidelines advocating that recommendations are actionable (e.g. "don't"), avoid ambiguity, and are clear enough as a stand-alone resource [9][10][11]. Using qualified phrases like "when possible", "don't routinely" or "unless necessary" may be used to capture clinical complexity but they also give clinicians some latitude to choose an interpretation that avoids practice change. ...
Article
Full-text available
Background It is unknown to what extent Choosing Wisely recommendations about income-generating treatments apply to members of the society generating the recommendations. The primary aim of this study is to determine the proportion of Choosing Wisely recommendations targeting income-generating treatments, and whether recommendations from professional societies on income-generating treatments are more likely to target members or non-members. The secondary aim is to determine the prevalence of qualified statements, and whether qualified statements are more likely to appear in recommendations targeting income-generating or non-income-generating treatments that apply to members. Methods We performed a content analysis of all Choosing Wisely recommendations, with data extracted from Choosing Wisely websites. Two researchers coded recommendations as test or treatment-based, for or against a procedure, containing qualified statements, income-generating and applying to members. Disagreements were resolved by discussion or consultation with a third researcher. A Chi-squared test evaluated whether society recommendations on income-generating treatments were more likely to target members or non-members; and whether qualified statements were more likely to appear in recommendations targeting income-generating or non-income-generating treatments that apply to members. Results We found 1293 Choosing Wisely recommendations (48.3% tests and 48.6% treatments). Ninety-eight treatment recommendations targeted income-generating treatments (17.8%), and recommendations on income-generating treatments were less likely to target members compared to non-members (15.6% vs. 40.4%, p < 0.001). Nearly half of all recommendations were qualified (41.9%), with a similar proportion of recommendations targeting income-generating and non-income-generating treatments that apply to members containing qualified statements (49.4% vs. 42.0%, p = 0.23). Conclusions Many societies provide Choosing Wisely recommendations that minimise impact on their own members. Only 20% of treatment recommendations target income-generating treatments, and of these recommendations mostly target non-members. Many recommendations are also qualified. Increasing the number of recommendations from societies that are unqualified and target member clinicians responsible for de-implementation of low-value and costly treatments should be a priority.
... 9 -14 These discrepancies can cause confusion about the best treatment for the patient, and naivety about the underlying reason for such differences could lead clinicians to inaccurately apply these recommendations in practice. 9 A common means of comparing guidelines is by using quality ratings like the Appraisal of Guidelines Research and Evaluation II (AGREE II). 15 However, little is known about potential guideline treatment recommendation agreement among common prevalent pediatric conditions. Asthma and bronchiolitis are among the most prevalent and costly pediatric medical conditions requiring hospitalization; accordingly, these conditions have been identified as high priorities for research because of their prevalence and cost. ...
Article
Background and objectives: Guideline recommendations for the same clinical condition may vary. The purpose of this study was to determine the degree of agreement among comparable asthma and bronchiolitis treatment recommendations from guidelines. Methods: National and international guidelines were searched by using guideline databases (eg, National Guidelines Clearinghouse: December 16-17, 2014, and January 9, 2015). Guideline recommendations were categorized as (1) recommend, (2) optionally recommend, (3) abstain from recommending, (4) recommend against a treatment, and (5) not addressed by the guideline. The degree of agreement between recommendations was evaluated by using an unweighted and weighted κ score. Pairwise comparisons of the guidelines were evaluated similarly. Results: There were 7 guidelines for asthma and 4 guidelines for bronchiolitis. For asthma, there were 166 recommendation topics, with 69 recommendation topics given in ≥2 guidelines. For bronchiolitis, there were 46 recommendation topics, with 21 recommendation topics provided in ≥2 guidelines. The overall κ for asthma was 0.03, both unweighted (95% confidence interval [CI]: -0.01 to 0.07) and weighted (95% CI: -0.01 to 0.10); for bronchiolitis, it was 0.32 unweighted (95% CI: 0.16 to 0.52) and 0.15 weighted (95% CI: -0.01 to 0.5). Conclusions: Less agreement was found in national and international guidelines for asthma than for bronchiolitis. Additional studies are needed to determine if differences are based on patient preferences and values and economic considerations or if other recommendation-level, guideline-level, and condition-level factors are driving these differences.
... Streamlining PPH-care, according to clear, descriptive protocols that are founded on concrete evidence-based guidelines and ATLS-based course instructions, is necessary for every professional to provide high quality care [12,35]. However, guideline recommendations are rarely specified in precise behavioural terms such as who does what, when, where, and how, and therefore local protocols are essential to close the gap between best evidence and practice [36][37][38]. Proper implementation of evidence-based PPH-guidelines and ATLS-based courses are essential for high quality PPH-care and can only be achieved once the causes for not following guidelines and instructions on different levels have been identified and overcome [7,39]. From literature it is known, that transformation of guideline recommendations into clear and descriptive local protocols requires time, skills in protocol development and convincing evidence or guideline recommendations [40,41]. ...
Article
( Acta Obstet Gynecol Scand. 2015;94:1118–1127) Postpartum hemorrhage (PPH) is a major cause of maternal mortality worldwide, and it has been reported that many mothers suffering from hemorrhage received substandard care. Two recent studies have shown that 30% to 90% of women with PPH received care that is less than optimal, while a third study has shown that hospitals with a fetal maternal specialist experienced a lower rate of PPH. Research has also shown that a lack of training and education among health care providers could play a significant role in the development of severe hemorrhage. The authors of this paper developed a series of quality indicators to evaluate whether providers are adhering to the optimal guidelines when caring for patients with PPH.
Article
Full-text available
Introduction Current recommendations for vitamin D and calcium in dietary guidelines and bone health guidelines vary significantly among countries and professional organisations. It is unknown whether the methods used to develop these recommendations followed a rigourous process and how the differences in methods used may affect the recommended intakes of vitamin D and calcium. The objectives of this study are (1) collate and compare recommendations for vitamin D and calcium across guidelines, (2) appraise methodological quality of the guideline recommendations and (3) identify methodological factors that may affect the recommended intakes for vitamin D and calcium. This study will make a significant contribution to enhancing the methodological rigour in public health guidelines for vitamin D and calcium recommendations. Methods and analyses We will conduct a systematic review to evaluate vitamin D and calcium recommendations for osteoporosis prevention in generally healthy middle-aged and older adults. Methodological assessment will be performed for each guideline against those outlined in the 2014 WHO handbook for guideline development. A systematic search strategy will be applied to locate food-based dietary guidelines and bone health guidelines indexed in various electronic databases, guideline repositories and grey literature from 1 January 2009 to 28 February 2019. Descriptive statistics will be used to summarise the data on intake recommendation and on proportion of guidelines consistent with the WHO criteria. Logistic regression, if feasible, will be used to assess the relationships between the methodological factors and the recommendation intakes. Ethics and dissemination Ethics approval is not required as we will only extract published data or information from the published guidelines. Results of this review will be disseminated through conference presentations and peer-reviewed publications. PROSPERO registration number CRD42019126452
Article
Full-text available
Objectives We systematically analysed recommendations from gout guidelines as an example, to provide a basis for developing a reporting standard of recommendations in clinical practice guidelines (CPGs). Design Systematic review without meta-analysis. Methods We systematically searched MEDLINE and all relevant guideline websites (National Institute for Health and Care Excellence, National Guideline Clearinghouse, Scottish Intercollegiate Guidelines Network, WHO, Guidelines International Network, DynaMed, UpTodate, Best Practice) from their inception to January 2017 to identify and select gout CPGs. We used search terms such as ‘gout’, ‘hyperuricemia’ and ‘guideline’. We included the eligible CPGs of gout according to the predefined inclusion and exclusion criteria after screening titles, abstracts and full texts. The characteristics of recommendations reported in the included guidelines were extracted and analysed. Results A total of 15 gout guidelines with a range of 5–80 recommendations were retrieved. Several indicators were used in the gout guidelines to facilitate identification of recommendations, including grouping all recommendations in a summary section, formatting recommendations in a particular or special way, using locating words for recommendations and indicating the strength of recommendation and quality of evidence. We found some components commonly used in the recommendations. The wording of recommendations varied across guidelines. Recommendations were detailed and explained in the section of rationale and explanation of recommendations. In some guidelines, recommendations were accompanied with other material to assist their reporting. Conclusions Variability and inconsistency were found on the reporting and presentation of recommendations in gout guidelines. Several points for reporting recommendation can be summarised. First, we suggested summarising and highlighting the core recommendations in a guideline. Second, guideline developers should try to structure and write recommendations reasonably. Third, it was necessary to detail and explain the recommendations and their rationale. Finally, describing and providing other potential useful contents was also a helpful way for clear reporting.
Article
Full-text available
Objective: To allow exchange of clinical practice guidelines among institutions and computer-based applications. Design: The GuideLine Interchange Format (GLIF) specification consists of the GLIF model and the GLIF syntax. The GLIF model is an object-oriented representation that consists of a set of classes for guideline entities, attributes for those classes, and data types for the attribute values. The GLIF syntax specifies the format of the test file that contains the encoding. Methods: Researchers from the InterMed Collaboratory at Columbia University, Harvard University (Brigham and Women's Hospital and Massachusetts General Hospital), and Stanford University analyzed four existing guideline systems to derive a set of requirements for guideline representation. The GLIF specification is a consensus representation developed through a brainstorming process. Four clinical guidelines were encoded in GLIF to assess its expressivity and to study the variability that occurs when two people from different sites encode the same guideline. Results: The encoders reported that GLIF was adequately expressive. A comparison of the encodings revealed substantial variability. Conclusion: GLIF was sufficient to model the guidelines for the four conditions that were examined. GLIF needs improvement in standard representation of medical concepts, criterion logic, temporal information, and uncertainty.
Article
Full-text available
Increasing amounts of medical knowledge, clinical data, and patient expectations have created a fertile environment for developing and using clinical practice guidelines. Electronic medical records have provided an opportunity to invoke guidelines during the everyday practice of clinical medicine to improve health care quality and control costs. In this paper, efforts to incorporate complex guidelines [those for heart failure from the Agency for Health Care Policy and Research (AHCPR)] into a network of physicians' interactive microcomputer workstations are reported. The task proved difficult because the guidelines often lack explicit definitions (e.g., for symptom severity and adverse events) that are necessary to navigate the AHCPR algorithm. They also focus more on errors of omission (not doing the right thing) than on errors of commission (doing the wrong thing) and do not account for comorbid conditions, concurrent drug therapy, or the timing of most interventions and follow-up. As they stand, the heart failure guidelines give good general guidance to individual practitioners, but cannot be used to assess quality or care without extensive "translation" into the local environment. Specific recommendations are made so that future guidelines will prove useful to a wide range of prospective users.
Article
Clinical practice guidelines are intended to improve the quality of clinical care by reducing inappropriate variations, producing optimal outcomes for patients, minimizing harm, and promoting cost-effective practices. This statement proposes an explicit classification of recommendations for clinical practice guidelines of the American Academy of Pediatrics (AAP) to promote communication among guideline developers, implementers, and other users of guideline knowledge, to improve consistency, and to facilitate user understanding. The statement describes 3 sequential activities in developing evidence-based clinical practice guidelines and related policies: 1) determination of the aggregate evidence quality in support of a proposed recommendation; 2) evaluation of the anticipated balance between benefits and harms when the recommendation is carried out; and 3) designation of recommendation strength. An individual policy can be reported as a "strong recommendation," " recommendation," " option," or " no recommendation." Use of this classification is intended to improve consistency and increase the transparency of the guideline-development process, facilitate understanding of AAP clinical practice guidelines, and enhance both the utility and credibility of AAP clinical practice guidelines.
Article
To determine which attributes of clinical practice guidelines influence the use of guidelines in decision making in clinical practice. Observational study relating the use of 47 different recommendations from 10 national clinical guidelines to 12 different attributes of clinical guidelines-for example, evidence based, controversial, concrete. General practice in the Netherlands. 61 general practitioners who made 12 880 decisions in their contacts with patients. Compliance of decisions with clinical guidelines according to the attribute of the guideline. Recommendations were followed in, on average, 61% (7915/12 880) of the decisions. Controversial recommendations were followed in 35% (886/2497) of decisions and non-controversial recommendations in 68% (7029/10 383) of decisions. Vague and non-specific recommendations were followed in 36% (826/2280) of decisions and clear recommendations in 67% (7089/10 600) of decisions. Recommendations that demanded a change in existing practice routines were followed in 44% (1278/2912) of decisions and those that did not in 67% (6637/9968) of decisions. Evidence based recommendations were used more than recommendations for practice that were not based on research evidence (71% (2745/3841) v 57% (5170/9039)). People and organisations setting evidence based clinical practice guidelines should take into account some of the other important attributes of effective recommendations for clinical practice.