Content uploaded by Antti Knutas
Author content
All content in this area was uploaded by Antti Knutas on Dec 03, 2017
Content may be subject to copyright.
International Conference on Computer Systems and Technologies - CompSysTech’16
Cloud-based Patent and Paper Analysis Tool for Comparative Analysis
of Research
Samira Ranaei, Antti Knutas, Juho Salminen, Arash Hajikhani
Abstract: In this paper, we present an analysis method that allows the combination of multiple data
sources by extending the NAILS bibliometric cloud service, with the focus on the development of a novel cloud-
based online infrastructure that enables the user to compare scientific literature and patent data related to a
particular technology domain. This cloud-based tool leverages meta-data analysis and text-mining techniques
to visualize the semi-structured patent and journal articles data stored on Web of Science database. The
designed cloud-based tool can automate the process of patent landscape visualization, scientific literature
mapping and provides an independent interface for comparing patent and paper trends on a specific subject.
The implementation demonstrates how a flexible plugin system can benefit tools by introducing new data
sources. We also present a roadway to fully realize a service oriented analysis service for utilizing open data
and discuss the steps required to realize this vision.
Key words: Patent-Paper comparison, patent analysis, R, cloud computing, literature analysis,
bibliometrics, open data, software as a service
INTRODUCTION
Searching for scientific artifacts such as research paper and patent from online
databases is recognized as a consuming task amoung researchers. Scientific articles are the
output of basic research and represents the knowledge discovery progress. On the other
hand, patent documents are considered as the outcome of research and development (R&D)
phase and known as technology development indicator [1]. Learning the parallel evolution of
science and technology (S&T) is a necessity. Since, science provides seeds for technological
inventions [4], while new innovations may not have emerged in the absence of scientific
research [9]. Previously, the literature analysis tool NAILS [8] (http://nailsproject.net), as an
open online service has been introduced which facilitates the scientific literature mapping.
This paper aims to extend the NAILS service platform further by the addition of the patent
analysis feature.
However, the sheer amount of patents and publications added to these databases every
year makes the quantitative analysis of the data a challenging labor task. The mapping
process involves several steps of data collection, filtering irrelevant data, storing, archiving
and visualization. Once the retrieved data get organized, the comparison and monitoring of
both patent and publication are another challenging step. As this challenge is well
acknowledged by researcher’s communities, several well-established commercialized
software packages have been developed [2, 3, 7, 15]. The available tools are monolithic
software packages or closed commercial services that are onlycapable of analyzing either
patent documents or articles. Connecting the tools is difficult or outright impossible. What is
missing here is a tool that can utilize both data sources in order to report the overlap between
S&T and provide visualized comparison of both data.
This paper is focused on how to automate the process of science and technology comparison
by overcoming the limitations in existing solutions. We also briefly discuss how the availability
of open data and service oriented architecture in science [5] would benefit bibliometric
analysis practices and to the large extend the scientific community. As a proof of concept we
present the Cloud-based Patent and Paper Analysis Tool (CPPAT), which is a freely available
This is a pre-print version of an article. The actual version will be published in ACM DL at
http://dl.acm.org/event.cfm?id=RE248 by late 2016. Please use the official version of the
paper and publication reference when citing: Ranaei, S., Knutas, A., Salminen, J.,
Hajikhani, A., 2016. Cloud-based Patent and Paper Analysis Tool for Comparative Analysis
of Research. CompSysTech 2016.
International Conference on Computer Systems and Technologies - CompSysTech’16
online service for creating and viewing the patent analysis and literature mapping results in
a single web interface. The output report enables users to monitor the patent- publication
trends, S&T topical overlap and offers the big picture of S&T evolution in a particular domain.
As opposed to currently available commercialized tools, CPPAT is an open cloud-based
application provided as Software as a Service (SaaS) that can be accessed from any
connected location and device.
The paper is structured as follows; first the literature related to monitoring science and
technology evolution will be reviewed, then the functionality of the designed tool will be
introduced. For the purpose of demonstration a case study on the “online gaming” subject
domain will be presented to illustrate the usibility and the output of CPPAT. Finally, the paper
will be concluded with a discussion and conclusion section.
BACKGROUND ON THE QUANTITATIVE STUDY OF SCIENCE AND
TECHNOLOGY
Initially, science and technology (S&T) have been depicted as dance partners [20],
which are dancing with two different rhyme of music [16]. In other word, scientific progress
stimulates substantial technological development [17] and reciprocally technological
progress can promote scientific studies [6, 10]. The complementary role of science and
technology for each other, necessitate the development of quantitative measures to map the
evolutionary structure [21]. Therefore, exploration of scientific publication and patents in
parallel suggested to be the most promising quantitative method for monitoring science and
technology dynamics [18].
Monitoring science and technology interplay have attracted many scholars’ attention
and as a results multitude methods have been suggested so far. Narin et al. [14] proposed
that analysis of non-patent literature (NPL) would give some insights on science and
technology linkage. Also Schmoch [18] discussed that in addition to NPL, counting number
of industry paper publications and university patenting shed light on S&T interaction. Later,
Murray [12] argued that the in-depth analysis of patent-paper pairs can reveal how each pair
is influencing the other. More recent practical approaches [19] are employing citation network
analysis and natural language processing to detect any overlap or even gaps between
scientific literature and patents. Even though these methods are providing very rich insights
about published scientific and technological materials, they are very complex and require
many steps to be implemented. The complexity of the research procedures and databases
used to develop such methods motivated many software developers to reduce the efforts by
designing software packages.
Commercialized software packages for patent mapping (e.g. Patbase, CordMap), and
for literature analysis (e.g. Citespace, Vosviewer) have been developed. The commercialized
patent mapping tools are accessible for users via desktop software. The tools available for
literature analysis are free, but still need to be installed on users’ computer and only analyses
data from scientific literature databases. Knowing the mentioned limitation, this paper is
presenting a free online cloud-based tool named CPPAT, which is able to analyze patent and
research paper data and producing a visualized report of both data in a single interface. The
visualized infographics will provide additional insights to researchers and decision makers
regarding their concerned field of study.
IMPLEMENTATION OF THE CLOUD-BASED PAPER AND PATENT COMPARATIVE
ANALYSIS TOOL
In this paper, we present a Software as a Service design that can be used to extend the
NAILS [8] cloud analysis system by combining data flows from multiple sources. In the case
study presented in this paper we use the cloud analysis service to combine bibliometrics and
patent data to get better insights into the state of the art of research. The aim is to reach a
fully automated, open and service oriented system, but the proof of concept still uses manual
International Conference on Computer Systems and Technologies - CompSysTech’16
import of data sources because of data source license restrictions. The system’s plugin-
oriented architecture and how the data flow is arranged is presented in a diagram in Figure
1. The architecture diagram describes a data flow between the different system components
and visualizes how it merges in the analysis components.
Figure 1. Data flow diagram from individual databases to the user
In the test case only two databases, two analysis services and a single visualization
service were used, but in the architecture of the system there are no limits on how many
components are used or how they are connected. Going forward more plugins can be
implemented for more complex analysis scenarios or alternative visualization services can
be implemented, like machine readable APIs.
In the future, the system presented here could reach its full potential as a fully automatic
service oriented architecture, where the user chooses the data sources automatically, flexibly
incorporating external services at user request and conversely allowing external services to
utilize the cloud platform. This would enable the system to be a component of the so called
service oriented science architecture [5]. It would have the opportunity to benefit all of the
scientific community, because data, at its core, is essential to science. The open availability
of data promotes transparency and reproducibility, which are at the very core of the scientific
method, and making this evidence of research public would increase the efficiency of the
scientific process [11]. Open data in this context means allowing reuse, redistribution with no
technological distribution [13]. Right now licensing and technological barriers in data sources
prevent realising this vision.
A CASE STUDY ON COMPARING PATENT AND SCIENTIFIC LITERATURE IN
ONLINE GAMING
“Online gaming” subject has been selected to implement a demonstration of our cloud
based tool. Online gaming is one of the emerging and fast growing fields in the entertainment
industry, therefore it would be interesting to follow related activities happening in academia
and industry. The data have been retrieved from WoS databases, Derwent Innovation Index
(DII) and Web of Science core collection, respectively for patent documents and research
papers using similar search queries. The main keywords used in both queries are “online
gaming”, “browser gaming”, “cloud based gaming” and “multi player online game”. As a
results, 1149 scientific literatures and 2540 patent documents have been retrieved and used
as input for our cloud-based tool. The advantage of using DII for patent search is that the
vague text of patent title and abstracts are rewritten in English by Derwent subject experts to
International Conference on Computer Systems and Technologies - CompSysTech’16
make it more understandable. In case of research papers we have filtered only English
articles, books and conference papers.
The following diagrams have been selected from the visualized report produced by
CPPAT. Figure 2 shows the distribution of patent and paper publications per year regarding
the “online gaming” subject. By comparing the patent and scientific literature trends, we can
observe that related patents to online gaming started by 2000, while the scientific literatures
published a year after. Scientific literature diagram (on the left) shows the breakdown to the
number of books, journal articles and conference papers.
The overall implication from both diagrams is that the activities in academia and industry
regarding online gaming is on the rise, particularly since 2005. It should be noted that the
sudden drop of all trends after 2014, can be explained by the two to three years of time lag
for patents or articles to get published on databases.
Figure 2. Number of scientific literature per year (left) – Number of Patent publications per year (right)
Figure 3 presents the productive patent assignees and authors in online gaming
industry. The productivity indicator measures the number of published materials by each
inventor or author and provides lists of names with higher number of patents or scientific
literature. From the top inventors diagram, the name of several well-known individual
inventors can be seen such as Jeong Kim(Kim J), Woon Yong Kim(Kim,W Y), Chang Joon
Park (Park, C J).
Figure 3.Top productive authors from scientif literature (left) –Top productive inventors from patent documents (right)
The next diagrams are the illustration of active industrial participants contributed to the
development of online gaming field, versus the top funding organization in academic
environment that support the research projects related to online gaming (Figure 4). From the
productive assigness diagram, we can see top companies such as NCSOFT producer of
Guild Wars Game, or Microsoft and Neowiz Game that are pioneers in development of
multiple online games.
International Conference on Computer Systems and Technologies - CompSysTech’16
Figure 4.Top funding agencies (left) – Top productive assigness (right)
The majority of emerging technologies are the result of several scientific fields or
multiple technological domain. From Figure 5, users can learn about the source of related
papers based on top ten journals and conference paper categories assigned by editors or
academic institutions. Moreover, the list of top IPCs
1
shows main technological classes that
contain online game related patents. The top class is “G06” which is about computing and
calculation.
Figure 5. Top journals or conference proceedings (left) – Top International Patent classification codes - IPC (right)
The most important output of our tool is the keyword recommendation section. Where
the user can review the most common keywords used by authors or written in patent
abstracts, and refine their own search queries accordingly. Text mining techniques
1
International Patent classification codes (IPC) designed by World Intellectual Property Organization (WIPO):
http://web2.wipo.int/classifications/ipc/ipcpub/#refresh=page
International Conference on Computer Systems and Technologies - CompSysTech’16
(tokenisation, stemming,stopword removal, TFIDF
2
, … etc) have been applied on document
abstracts. Refining the query means addition of new relevant keywords appears on the Figure
6, or elimination of irrelevant phrases. Figure 6 shows the frequency of common keyword
appeared in patents and papers related to online gaming per year. The last ten years have
been selected for better visualization. The common keywords for articles are extracted form
the author assigned keyword list. As the patent document structure does not contain an
indexed keyword, most frequent words were extracted from patent abstracts using text-
mining techniques.
Users can refine their queries based on the appeared common word (Figure 6). For
example, keywords such as “virtual environment”, “game server”, “video game” and “mobile
gaming” can be added to the initial search query to expand the search. Also, some keywords
can be eliminated to avoid irrelevant data retrieval, (e.g. “gambling”, “internet addiction”).
Figure 6. Top keywords assigned by authors (top) – Top keywords appeared in patent abstracts (below)
Moreover, we can observe a shift in utilization of keywords in research papers between
the time periods. The usage of “online game” keyword increased from 2006 and peaked at
2009, followed by more frequent appearance of “Massively Multilple Online game-mmog”
2
Term frequency –Inverse Document frequency is a term weighting method that extract important keywords
from the text.
International Conference on Computer Systems and Technologies - CompSysTech’16
phrase by 2012. Then the emergence of “cloud gaming” since 2014. Notably, from the patent
common keywords, we can observe the similar shift toward usage of “game servers” since
2013. The major implication here is that the field of online gaming is moving toward cloud-
based technologies since the last recent years.
DISCUSSION AND CONCLUSION
The parallel monitoring of scientific and technological progress offers researchers
insights on the direction of academic and industrial activities toward a specific technological
domain. Tracing scientific and technological development by the evidence of scientific
literature and patents is a complex and time consuming task, because of the sheer amount
of data. A multitude of practical methods have been introduced that can facilitate tracking
science and technology directions (e.g. NPL analysis, author-inventor analysis, citation
network, text-mining). However, the replication of these methods may not be straightforward
for researchers unfamiliar with these techniques. Several commercialized software packages
have been introduced to overcome the complexity, analyses either patent or paper, and
provide insights of data to the end user.
The initial motivation for this paper was designing an open cloud based service (CPPAT)
that uses both data types (patent and scientific literature), implements analysis and generates
visualization for both data in a single interface. The advantages of using CPPAT are
visualized comparison of patent-paper trajectories for a particular technology domain, and
recommendation of keywords in order to refining the initial search queries. CPPAT enables
the comparative analysis of patents and scientific publications in order to find the state of the
art in research.
In this paper, we presented a prototype of how a bibliometric cloud service can flexibly
utilize multiple data sources by using an analysis plugin architecture and a vision of how it
could be taken to the next level with open data. However, this solution is still restricted by
data sources: The process cannot yet fully be converted into a service oriented architecture
system because of licensing and technological restrictions. Molloy’s conclusion [11] that there
are barriers to adoption of open data in science still stands.
For the purpose of demonstration, we presented one case study regarding the "online
gaming" domain. In it, we collected and analysed patent and scientific literature related to
"online gaming". The CPPAT output report has been partially illustrated and discussed in the
case study section. The full report and all figures are available at (link removed for review
purposes). The overall results signals a shift in "online gaming" toward cloud-gaming and
virtual reality technologies for the last recent years. One of the main limitations is the
compatibility of CPPAT with only the Web of Science database. Also, at the moment the
comparison of CPPAT with other analysis systems can not be included in this paper as to
our best knowledge there are no tools available with similar features. Promising areas of
future work include the addition of more data sources and a recommendation system that
identifies semantic similarity between patent and paper.
REFERENCES
[1] Campbell, R.S. 1983. Patent trends as a technological forecasting tool. World
Patent Information. 5, 3 (Jan. 1983), 137–143.
[2] Chen, C. 2006. CiteSpace II: Detecting and Visualizing Emerging Trends and
Transient Patterns in Scientific Literature Chaomei. Journal of the American Society for
Information Science and Technology. 57, 3 (2006), 359–377.
[3] van Eck, N.J. and Waltman, L. 2010. Software survey: VOSviewer, a
computer program for bibliometric mapping. Scientometrics. 84, 2 (Aug. 2010), 523–538.
[4] Fleming, L. and Sorenson, O. 2004. Science as a map in technological
search. Strategic Management Journal. 25, 89 (Aug. 2004), 909–928.
International Conference on Computer Systems and Technologies - CompSysTech’16
[5] Foster, I. 2005. Service-oriented science. Science. 308, 5723 (2005), 814–
817.
[6] Glänzel, W. and Schubert, A. 2003. A new classification scheme of science
fields and subfields designed for scientometric evaluation purposes. Scientometrics. 56, 3
(2003), 357–367.
[7] Intellectual property map, arranged according to the level of semantic
similarity between patents: http://www.candormap.com/. Accessed: 2016-03-28.
[8] Knutas, A. et al. 2015. Cloud-based bibliometric analysis service for
systematic mapping studies. Proceedings of the 16th International Conference on
Computer Systems and Technologies (2015), 184–191.
[9] Mansfield, E. 1991. Academic Research and Industrial Innovation. Research
Policy. 20, 1 (1991), 1–12.
[10] Meyer, M. 2002. Tracing Knowledge Flows in Innovation Systems— an
Informetric Perspective on Future Research Science-based Innovation. Economic Systems
Research. 14, 4 (2002), 323–344.
[11] Molloy, J.C. 2011. The open knowledge foundation: open data means better
science. PLoS Biol. 9, 12 (2011), e1001195.
[12] Murray, F. 2002. Innovation as co-evolution of scientific and technological
networks: exploring tissue engineering. Research Policy. 31, 8-9 (2002), 1389–1403.
[13] Murray-Rust, P. 2008. Open data in science. Serials Review. 34, 1 (2008),
52–64.
[14] Narin, F. et al. 1997. The increasing linkage between U.S. technology and
public science. Research Policy. 26, 3 (1997), 317–330.
[15] PatBase analytics: Analyse patent data fast: http://minesoft.com/patbase-
analytics-analyse-patent-data/. Accessed: 2016-03-28.
[16] Price, D.J. de S. 1965. Is Technology Historically Independent of Science? A
Study in Statistical Historiography. Technology and Culture. 6, 4 (1965), 553–568.
[17] Rosenberg, N. 1982. How exogenous is science? Exploring the Black Box:
Technology, Economics, and History. Cambridge University Press. 141–159.
[18] Schmoch, U. 1997. Indicators and the relations between science and
technology. Scientometrics. 38, 1 (1997), 103–116.
[19] Shibata, N. et al. 2010. Extracting the commercialization gap between science
and technology — Case study of a solar cell. Technological Forecasting and Social
Change. 77, 7 (Sep. 2010), 1147–1155.
[20] Toynbee, A.J. 1963. A Study of History: Introduction the Genesis of
Civilization, and the Growth of Civilization.
[21] Verbeek, A. et al. 2002. Measuring progress and evolution in science and
technology - I: The multiple uses of bibliometric indicators. International Journal of
Management Reviews. 4, 2 (2002), 179–211.
ABOUT THE AUTHORS
Samira Ranaei, M.Sc., Lappeenranta University of Technology, Phone: +358-0294-462-
111, Е-mail: samira.raneei@lut.fi.
Antti Knutas, M.Sc., Lappeenranta University of Technology, Phone: +358-0294-462-
111, Е-mail: antti.knutas@lut.fi.
Juho Salminen, D.Sc., Lappeenranta University of Technology, Phone: +358-0294-462-
111, Е-mail: antti.knutas@lut.fi.
Arash Hajikhani, M.Sc., Lappeenranta University of Technology, Phone: +358-0294-
462-111, Е-mail: arash.hajikhani@lut.fi.