Content uploaded by Amy Hoseth
Author content
All content in this area was uploaded by Amy Hoseth on Apr 16, 2015
Content may be subject to copyright.
Abstract
Google Scholar is an internet-based search engine designed to lo-
cate scholarly information, including peer-reviewed articles, theses,
books, preprints, abstracts, and court opinions from academic pub-
lishers, professional societies, online repositories, universities, and
other Web sites. This review looks at the strengths and weaknesses
of this search engine to assist librarians in making informed decisions
about the use of this tool.
Pricing Options
Free access via any Web browser.
Product Description
Google Scholar is an internet-based search engine designed to lo-
cate scholarly information, including peer-reviewed articles, theses,
books, preprints, abstracts, and court opinions from academic pub-
lishers, professional societies, online repositories, universities, and
other Web sites. Results are returned in a relevance-ranked format.
Google Scholar is free on the Web; institutions whose holdings are
available via a link resolver and/or WorldCat can opt to link patrons to
those resources as part of their Google Scholar search results.
Critical Evaluation
GOOGLE SCHOLAR: THE GOOD,
THE BAD, AND THE UGLY
Since its launch in 2004, Google Scholar has firmly established itself
as a critical resource for those conducting academic research. Bol-
stered by its hard-to-beat pricing (free) and its broad, interdisciplin-
ary coverage, Google Scholar is now included as a resource on many
library Web sites and taught to students. Certainly, Google Scholar
is a solid entrant into the world of scholarly research and offers both
students and serious researchers alike a highly accessible, easy-to-use
research tool. However, this promising tool is not without significant
flaws. As William Badke noted in a June 2009 article, “Google Schol-
ar is, in essence, a large, academic metasearch tool. As such, it carries
all the promise and frustrations of metasearch––with additional frus-
trations” (Badke 2009, 48).
Google Scholar’s initial launch was met with a mixture of skepticism
and support, and since then it has been the subject of numerous arti-
cles, studies, and reviews. More than five years later, the product still
wears a “beta” label and evidence indicates that programmers contin-
ue to make changes to Google Scholar behind the scenes. Today it has
been estimated that Google Scholar participates with approximately
2,900 scholarly publishers and includes more than 10 million items
from Google Book Search (Jascó 2010, 176–177), although there is
no authoritative information on potential overlap between Google
Scholar, Google Books, and regular Google. Google’s ongoing re-
fusal to provide discrete information about the size and scope of its
database makes exact quantitative analysis next to impossible.
As with other Google products, Google Scholar relies primarily on
keyword searching to return relevant results. The exact algorithm that
makes these searches possible is unknown. An Advanced Scholar
Search option allows users to perform somewhat more sophisticated
queries (searching by author name, for example), although the prod-
uct’s lack of a controlled vocabulary, unpredictable handling of Bool-
ean operators, and incompatibility with standard database search op-
tions such as word truncation continue to challenge more experienced
researchers. And, as will be explored later, the decision by Google
developers to rely on their own parsers and “smart crawlers” rather
than publisher-supplied metadata has led to significant errors in the
database.
Since most database administrators and librarians are familiar with
Google Scholar at this point, this review will highlight those elements
of the product that are positive (“The Good”), negative (“The Bad”),
and particularly problematic (“The Ugly”) at this point in time, more
than five years after the product was launched.
THE GOOD
Perhaps the best elements of Google Scholar are those inherent to its
mission and purpose: the product is free, and it provides researchers
with a way to search for academic citations. As is the case with many
Open Access publications, Google Scholar can also help researchers
find items that are freely available in full text. Google Scholar re-
quires no login and can be accessed from any computer with an inter-
net connection.
Coverage
Google Scholar’s coverage of journals and books has expanded sig-
nificantly since it was launched: the coverage of books is supported
by the Google Book Search project, which is ongoing and allows us-
ers to search within the full text of digitized monographs. In addi-
tion, many more scholarly publishers appear to be cooperating with
Google Scholar now as compared to when the service first launched,
including major players such as Elsevier and the American Chemical
Society. Google Scholar pulls information from publishers and their
Web sites as well as from abstracting and indexing (A&I) databases.
In late 2010, new research by Xiaotian Chen reports that “Google
Scholar is able to retrieve any scholarly journal article record from all
ADVISOR REVIEWS––STANDARD REVIEW
Google Scholar
doi:10.5260/chara.12.3.36 Date of Review: November 9, 2010
36 Advisor Reviews / The Charleston Advisor / January 2011 www.charlestonco.com
Composite Score: HH 1/2 Reviewed by: Amy Hoseth
Assistant Professor/Liaison Librarian
Morgan Library
Colorado State University
1019 Campus Delivery
Fort Collins, CO 80523
<amy.hoseth@colostate.edu>
The Charleston Advisor / January 2011 www.charlestonco.com 37
while the Alert function does not guarantee that the articles to which
you are directed are recently published (rather, they may be articles
that have simply been newly indexed by Google Scholar), this is still
a useful feature.
Searching Within Citing Articles
In July 2010 Google Scholar added the option to search within citing
articles for additional terms. After running a search, users can click
on the Cited By link beneath an article to see a list of other articles
that have cited the original work. By entering additional search terms
and clicking on Search Within Articles, users can sort and sift through
large numbers of citations to find information on more specific top-
ics. For example, a search for John F. Nash, Jr.’s classic 1950 paper,
“The bargaining problem,” indicates that it has been cited, according
to Google Scholar, more than 4,000 times. The Search Within Articles
feature allows users to navigate through those thousands of citing pa-
pers by using other keywords (such as economics or political science)
to refine those results.
Finally, Google Scholar remains a useful resource to identify arti-
cles where only a partial or incomplete citation has been found (a
good “port in the storm” when other databases are not helpful) and a
broad research supplement to interdisciplinary and cross-disciplinary
searches.
THE BAD
Unfortunately, the good points of Google Scholar are not strong
enough to outweigh the many problems, both “bad” and “ugly,” af-
the publicly accessible Web sites and from subscription-based data-
bases it is allowed to crawl” (Chen 2010, 221). Chen’s research also
indicates that the turnaround time between the date new articles are
published to the date they are indexed by Google Scholar has dropped
to approximately nine days.
Google Scholar has enhanced its coverage still further by including a
significant number of patents, legal documents, and court cases. The
service enables users to search and read opinions for U.S. state appel-
late and Supreme Court cases since 1950, U.S. federal district, appel-
late, tax, and bankruptcy courts since 1923, and U.S. Supreme Court
cases since 1791.
Geographic and Linguistic Expansion
Google Scholar has greatly improved and expanded the amount of
content it includes from other countries and from publications writ-
ten in languages other than English. A 2010 study found that, among
a random sample of non-English journal articles, the coverage rate
by Google Scholar was 100 percent (Chen 2010, 225). Because most
scholarly databases emphasize anglophone sources (in particular
those from the U.S., Canada, and the U.K.), Google’s geographic ex-
pansion and linguistic additions are noteworthy.
Links to Local Content
The addition of the Library Links and Library Search tools to Google
Scholar is another feature worth highlighting. Those libraries that
make full-text access available to researchers via a link resolver can
opt-in to Google Scholar’s Library Links feature, which will display
an additional link within records to direct users back to the library’s
servers and then to the item itself in full-text when available. Library
Search provides a similar service for participating libraries whose
collections are indexed in OCLC’s Open WorldCat; clicking on the
Library Search link takes users to the WorldCat system, where they
can find specific titles in area libraries.
Bibliographic Citation Support and Alerts
Like many other scholarly databases, Google Scholar supports biblio-
graphic exporting to a number of citation tools as well as the creation
of alerts to inform researchers about articles that have been newly
added to the Google Scholar database. The bibliographic exporting
feature supports EndNote, RefWorks, and several other tools. And
Google Scholar Review Scores Composite: HH 1/2
The maximum number of stars in each category is 5.
Content: HHH
Expanded coverage of journals and books is a plus, but coverage gaps and ambiguous content are problematic. Problems
with illiteracy and innumeracy compromise the integrity of many records.
User Interface/Searchability: HH
Google Scholar’s Advanced Scholar Search options are not advanced enough for serious researchers; the tool offers limited
options for sorting and limiting searches.
Pricing: N/A
Contract Options: N/A
Contact Information
Google
1600 Amphitheatre Parkway
Mountain View, California 94043
Phone: (650) 253-0000
Fax: (650) 618-1499
E-mail: <info@google.com>
URL: <http://www.google.com>
URL: <http://scholar.google.com/>
38 Advisor Reviews / The Charleston Advisor / January 2011 www.charlestonco.com
years (Chen, 221), Google remains closed-mouthed about the extent
of its coverage––prompting scholars to comment that “Google Schol-
ar could render future [studies] unnecessary and obsolete, simply by
sharing a detailed description of its content collection methodology”
(Neuhaus, 139).
Full-text Access
While the addition of the Library Links feature to Google Scholar
was a positive development, it is not without some issues. Google
Scholar commonly includes links to British Library Direct (BL Di-
rect) beneath the articles themselves. Google has partnered with BL
Direct since 2006 to provide fee-based access to articles found on-
line via Google Scholar. The BL Direct link gets prime real estate on
the results page and is often provided for articles that are also free-
ly available online, such as those accessible via PubMed. It remains
up to the savvy searcher to realize he can customize Google Scholar
preferences to include Library Links and that he can access some arti-
cles freely online or via a local library instead of purchasing them via
BL Direct. Google Scholar’s lack of reliance on publisher metadata
also means that, even when users click on Library Links, full biblio-
graphic content may not transfer from Google Scholar to an individu-
al institutions’ link resolver.
THE UGLY
Ambiguous Content
Perhaps the most serious problem with Google Scholar is that, un-
like scholarly databases, users of Google Scholar have no idea what
they are searching. “What does Google Scholar point to, cover, and
index? These questions, as numerous authors have noted, have neither
been made clear by Google Scholar nor by its creator Anurag Acha-
rya” (Neuhaus et al, 128). As has been mentioned earlier, we have no
definitive information on what sources Google crawls or how often it
updates its database. Google is “almost ridiculously [rigid] when it
comes to publishing full details of the scientific journals it crawls to
generate its database, or to revealing details of how often those jour-
nals are updated” (Winder, 10). Until Google Scholar is more forth-
coming about exactly what it indexes, it will be difficult to take it seri-
ously as an important academic resource.
Ghost Authors
Another critical error introduced to Google Scholar by the developers’
decision not to use publisher metadata is poor author name informa-
tion. These “ghost authors” often take their names from other fields in
the document, resulting in clearly erroneous author names such as P
Login (for Please Login) or A Registered (for Already Registered).
This problem has received significant coverage in the literature (see
Jascó, 2009, among others); it appears that, as these errors have been
spotted, reported, and published, Google’s developers have retroac-
tively cleaned up the database. However, other errors remain. For ex-
ample, a search in early November 2010 returned an article ostensibly
written by “F Policy.” The actual article, titled “Fiscal policy, legisla-
ture size, and political parties: Evidence from state and local govern-
ments in the first half of the 20th century,” was written by Thomas W.
Gilligan and John G. Matsusaka. These errors significantly compro-
mise users’ ability to consult Google Scholar as a source for deter-
mining scholarly productivity.
Publication Date Errors
Erroneous publication years are yet another problem with Google
Scholar. Conducting an Advanced Scholar Search and limiting the
fecting the search tool. For example, while Google’s simple search
interface has many fans and imitators, the relatively limited advanced
search options in Google Scholar and its complete lack of controlled
vocabulary frustrate experienced searchers and result in noisy search-
es that are almost impossible to narrow down. Other problems also
exist.
Relevancy Ranking
The default ranking for Google Scholar results is by relevancy, rather
than by date as is generally the case in academic databases. So, for
example, a simple search for “mountain pine beetle” returns a book
from 1985 as the very first result. Unfortunately, Google Scholar of-
fers limited options for reordering and limiting the results set. Users
may incorporate Advanced Search features to focus on articles from a
certain date range or use pull-down menus on the results page to limit
their searches to articles published since a certain year––neither of
which is a particularly elegant or effective way to sort. Google con-
tinues to provide no information on how articles are weighted or how
relevancy is determined.
Numerical Errors
Innumeracy creates a significant number of errors and problems in
Google Scholar. Some of these numerical challenges are painfully ob-
vious. For example, searching Google Scholar for the term “the”––the
most frequently used word in the English language––returns approxi-
mately 8.55 million results. Adding the word “a”––another common
English word––should logically result in more results. But searching
for “the OR a” instead returns just 7.68 million hits.
This illogical situation was explored by Jascó, who contends,
“The enhancement of the content [in Google Scholar] has not been
matched by improvements in the software” (Jascó 2008, 107). Be-
yond concerns about innumeracy, this simple test also raises ques-
tions about how well (or whether) Google Scholar handles simple
Boolean searching.
Inflated Citation Counts
Because the developers of Google Scholar did not use publisher-sup-
plied metadata, there are a number of errors in the database. One of
the more egregious is the inclusion of both master records and cita-
tion records for individual articles. This quirk results in multiple hits
for the same article, and results in inflated citation counts that make it
nearly impossible to evaluate scholarly productivity by using Google
Scholar. So, for example, a search for the article, “Song recognition
without identification: When people cannot ‘name that tune’ but can
recognize it as familiar,” by Bogdan Kostic and Anne M. Cleary, re-
turns seven versions, including two that are simply citations without
links to full-text options.
Coverage Confusion
While it is impossible to know exactly what sources Google Scholar
includes, researchers have studied the issue numerous times in the
years since its launch. Early research indicated that there were signifi-
cant gaps in the full-text indexing of many important serial and Open
Access publications (Mayr 2008, 97); that Google Scholar’s cover-
age of Open Access and scientific and medical literature was fairly
strong, but that it was much weaker in other academic areas, includ-
ing the social sciences, humanities, and business (Neuhaus, 138); and
that there were lengthy delays between an article’s publication and
its indexing in Google Scholar. While Chen’s recent research indi-
cates that these areas have improved significantly in the intervening
The Charleston Advisor / January 2011 www.charlestonco.com 39
date range to articles published between 2012 and 2025, for example,
returns more than 1,700 articles, all with problematic dates of publi-
cation. A casual review of these articles indicates that Google Scholar
is creating bad dates from page numbers, volume and issue numbers,
and other sets of numerical data. This is another example of how pro-
gramming errors have compromised the overall quality of the data-
base and hamper the ability of users to search for relevant content.
Conclusion
At this time, Google Scholar still appears full of potential, particular-
ly for researchers who are conducting broad, interdisciplinary search-
es and who can benefit from a free online search tool. However, the
tool still raises serious concerns for those who are familiar with more
sophisticated and comprehensive search techniques due to significant
search interface limitations and uncertainty regarding exactly what it
indexes. Google Scholar remains, as its “beta” label indicates, a work
in progress.
Contract Provisions
No contract required. Freely available at <http://scholar.google.
com>.
Authentication
None required. Libraries that have implemented a link resolver can
sign up for Google Scholar’s Library Links program, which includes
a link to full text (when available) at the user’s home institution next
to each item in the results list. Users must customize preferences to
see the links. IP authentication is handled at the local level. Simi-
larly, libraries that include their holdings in OCLC’s Open WorldCat
can participate in Google Scholar’s Library Search option, which pro-
vides links to local library holdings when possible.
References
Badke, William. “Google Scholar and the Researcher.” Online
(Weston, Conn.) 33, no. 3 (2009): 47–49.
Chen, Xiaotian. “Google Scholar’s Dramatic Coverage Improvement
Five Years after Debut.” Serials Review 36, no. 4 (2010): 221–226.
Howland, Jared L., Thomas C. Wright, Rebecca A. Boughan, and Bri-
an C. Roberts. “How Scholarly Is Google Scholar? A Comparison to
Library Databases.” College and Research Libraries 70, no. 3 (2009):
227–234.
Jascó, Péter. “Google Scholar’s Ghost Authors.” Library Journal 134,
no. 18 (2009): 26–27.
———. “Google Scholar Revisited.” Online Information Review 32,
no. 1 (2008): 102–114.
———. “Metadata Mega Mess in Google Scholar.” Online Informa-
tion Review 34, no. 1 (2010): 175–191.
Mayr, Philipp, and Anne-Kathrin Walter. “An Exploratory Study of
Google Scholar.” Online Information Review 31, no. 6 (2007): 814–
30.
———. “Studying Journal Coverage in Google Scholar.” Journal of
Library Administration 47, no. 1 (2008): 81–99.
Neuhaus, Chris, Ellen Neuhaus, Alan Asher, and Clint Wrede. “The
Depth and Breadth of Google Scholar: An Empirical Study.” Portal:
Libraries and the Academy 6, no. 2 (2006): 127–141.
Walters, William H. “Google Scholar Search Performance: Compara-
tive Recall and Precision.” Portal: Libraries and the Academy 9, no.
1 (2009): 5–24.
Wilson, Virginia. “A Content Analysis of Google Scholar: Coverage
Varies by Discipline and by Database.” Evidence Based Library and
Information Practice 2, no. 1 (2007): 134–136.
Winder, Davey. “The Struggle for Scholarly Search.” Information
World Review 244 (2008): 10–11.
About the Author
Amy Hoseth is an Assistant Professor and Liaison Librarian at the
Colorado State University Libraries in Fort Collins. She holds an
M.L.S. from the University of Maryland at College Park and a B.A.
in history from Drake University in Des Moines, Iowa. Before join-
ing the faculty at CSU she worked at the Association of Research Li-
braries in Washington, D.C. as a communications coordinator for the
LibQUAL+ assessment instrument. n