ChapterPDF Available

A Strategy Based on Technological Maps for the Identification of the State-of-the-Art Techniques in Software Development Projects: Virtual Judge Projects as a Case Study: 13th Colombian Conference, CCC 2018, Cartagena, Colombia, September 26–28, 2018, Proceedings

Authors:
A Strategy Based on Technological Maps
for the Identification
of the State-of-the-Art Techniques
in Software Development Projects:
Virtual Judge Projects as a Case Study
Carlos G. Hidalgo Suarez1(B
), Vıctor A. Bucheli1,
Felipe Restrepo-Calle2, and Fabio A. Gonzalez2
1Universidad del Valle, Cali, Colombia
{carlos.hidalgo,victor.bucheli}@correounivalle.edu.co
2Universidad Nacional de Colombia, Bogot´a, Colombia
{ferestrepoca,fagonzalezo}@unal.edu.co
Abstract. We propose a novel strategy based on technological watch
(TW) to identify the state-of-the-art techniques of software development
projects. Taking as a starting point the data analysis of the GitHub
platform using the VigHub tool, technological maps that describe the
development and evolution of software projects, and perspectives of spe-
cific technologies are obtained. The proposed strategy was tested in a
case study about virtual judge projects for programming. When analyz-
ing the GitHub data, four main technological maps were obtained: pro-
gramming languages, topic evolution timeline, successful projects, and
successful users and organizations. This article shows how the use of
the developed strategy supports the identification of the state-of-the-
art software techniques. This facilitates the appropriate identification of
ideas, source code, and tools that can support and improve the soft-
ware development, expanding the creation of strategies to support the
decision-making process on new development projects and technological
inventions.
Keywords: Virtual judges ·GitHub ·Technological watch
Technological maps ·Software repository analysis
1 Introduction
The constant changes in software development, information access technologies
and technological advances create an environment that changes rapidly. The
organizations have been supported on strategies based on technological watch
to adapt to this changing environment [1,2]. The strategies allow to build value-
added information to support the decision-making process [3]. New software
c
Springer Nature Switzerland AG 2018
J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 338–354, 2018.
https://doi.org/10.1007/978-3-319-98998-3_27
Strategy for the Identification of the State-of-the-Art Techniques 339
projects require a previous research (state-of-the-art techniques). This must pro-
vide information about the development level, also, it must identify the progress
status in the application area, as well as the level of development achieved in
the results and the methodologies used [13]. All of the aforementioned leads to
reduce the loss of time and resources, this allows us to identify useful resources
that eliminate the need to develop everything from scratch, thanks to the inte-
gration of APIs (Application Programming Interfaces) or libraries in the project.
Having this information is a differentiating factor that increases the competitive-
ness of the organizations, while allowing them to know the most recent advances
and development trends in a specific area. Moreover, it allows to identify the
desirable characteristics in the competences that the members of the develop-
ment team of a project must have i.e. technologies or programming skills. In
addition, it allows to acknowledge the information related with the intellectual
property, the type of license, and also to find useful open source technologies
[4,5].
The state of-the-art techniques of a technology is usually made by using clas-
sic information sources such as scientific publications, documentation on confer-
ences or patents [6,7]. Currently, there are new information sources like col-
laborative development platforms where information on development projects is
stored. We can find meta-data and the source code of the project. Based on such
information it is possible to obtain a state of-the-art techniques of a specific tech-
nology, which allows to answer interesting questions, such as: what programming
language should be used for a new software project in a particular area? Which
useful APIs, platforms or tools exist? Which similar projects exist and which
are the most successful? Which organizations are interested in the topic? among
others that would be decisive for a software project. The strategy presented is a
complement to the surveys or systematic reviews based on patents, books, and
papers in journals or conferences.
The purpose of the article is to show a strategy that allows to visualize the
state-of-the-art techniques of a software project for a specific topic, which can
be observed in 4 technological maps: programming languages, topic evolution
timeline, successful projects, and successful users and organizations.
The document is organized in the following sections. First, a review of related
works is made in Sect. 2. In Sect. 3, the proposed strategy is described in detail.
In Sect. 4, a case study of virtual judges is presented in detail. Finally, Sect.5
presents the discussion and Sect. 6, the conclusions.
2 Related Work
Several repositories for the development of projects have been proposed, such as
Getforge, Launchpad, GoogleCode, GitHub or Bitbucket. They contain informa-
tion related to the projects, as well as the source code. However, the platforms
of the repositories only allow to search the projects, so, they don’t allow to do
any analysis or obtain information with value-added that supports the decision-
making process.
340 C. G. Hidalgo Suarez et al.
As far as we know, there are no similar projects to the one proposed. However,
it is possible to find projects that implement models, methodologies, techniques
and tools that contribute to obtaining the current state of a technology. Projects
such as [8,9], propose methodologies to determine the current state of a technol-
ogy and its life cycle. They allow to identify the main trends and affirm that the
construction of tools implemented by these methodologies can be a key factor
when translating the information of the environment, into results that may be
involved in decision-making processes.
Moreover, models such as [10] implement a software tool capable of analyzing
and classifying bibliographic management platforms, like: CiteuLike, Connotea,
Refbase, Wikindx, Zotero. This tool is composed of a search engine supported
in information extraction and semantic web that allows to find relationships and
group tools with similar characteristics. The objective of the authors is to create
their own bibliographic management service based on the modules of other tools.
Other projects propose tools, such as Depsy [11], which is a pioneer platform
that seeks to provide significant incentives for scientists-developers, giving the
software the necessary recognition so that it can be cited without being linked
to an investigative article that describes it. Besides, it has allied tools that allow
Depsy to perform searches on packages of the Python and R languages. It also
allows to detail the impact measurements on the scientific software and articles
that refer to them. Depsy provides several statistics about the packages; such as
the number of downloads, the number of citations and the value of PageRank
in the dependency network. This can be very useful for non-computing readers,
since it indicates statistics based on percentiles, which allows to know the impact
that each software tool has on the scientific field.
Finally, projects such as [12,13], show approaches of the great impact over
the construction of technical articles, indicating that they are a powerful tool for
the analysis of technological trends in the development of technological software.
Nevertheless, results are not enough for decision-making in the industry. So, it
is necessary to build tools which organize articles together with its investigative
resources, allowing readers to replicate the reports.
3 Method for Building up the State-of-the-Art
Techniques in Software Projects
The proposed process is based on 4 phases:
3.1 Software-Focus Research
From the technological challenge or from a technological need, the specific topic
is identified, the definition is made through searches in bibliographic databases
or with the help of experts in the subject. As a result of the process, the keywords
or key phrases and their alternative forms of writing are obtained.
Strategy for the Identification of the State-of-the-Art Techniques 341
3.2 Making of Query Strings
With the data obtained in the previous phase, the search equations are built,
these are expressions composed by the keywords and Boolean operators sup-
ported by the search engines. The result of this phase are search equations that
allow to sweep of the subject, an example of search equation is: ("Judge" AND
"programming").
3.3 Data Selection
There are several sources of relevant data to make the state-of-the-art tech-
niques on a topic. Commonly, the data sources allow us to search with equations
in the aforementioned way. For the case of scientific articles [14,15] or patent
databases [16,17]. In this article we focus on the analysis of software reposito-
ries, mainly GitHub, however, the information of publications is used for the
analysis phase and description of the evaluation of the subject. GitHub is a col-
laborative platform that stores open source development projects. This is the
most popular platform for open source and contains information on more than
3.5 million users and more than 23 million repositories since 2009, in which the
relevant information about authors, development language and metrics such as
stars, forks, watchers and score is contained [18].
3.4 Construction of Technological Maps
This paper presents a strategy for building the state-of-the-art techniques for
a project in order to develop a specific topic, four technological maps are pro-
posed: programming languages, topic evolution timeline, successful projects and
successful users and organizations. In the Fig. 1shows the diagram that allows
to identify how and what data were obtained to obtain the technological maps
that are detailed below.
In order to carry out the implementation of the maps listed above, different
technologies are used with the purpose of transforming the information into
easy-to-understand structures. The technological maps are constructed from the
data provided by the API of the GitHub platform [19], the data is obtained
in JSON files (JavaScript Object Notation), which are subsequently stored in
a relational database, in the database all the requests made for each of the
search equations are accumulated, that is, all the meta-data of all the projects
concerning a specific topic are stored in the database, meaning that each search
equation for the keywords described in the previous phases brings a JSON file
that is stored in the database. This way, the database is the information source
for the subsequent processing and application of computational techniques to
construct the maps. For the processing of natural language, the SpaCy library is
used [20] and the Google Charts library is used to make the graphs or maps [21].
Technological map of the programming languages (Treemap): with the data
obtained from the GitHub API, the application of Named-entity recognition
(NER) of the SpaCy library is used, where the language and the name of the
342 C. G. Hidalgo Suarez et al.
Fig. 1. Workflow: shows how the technological maps components were obtained (Color
figure online)
repositories is taken from each repository found in the query. Based on this
information, a binary vector of co-appearance of programming languages is con-
structed in the projects. This is in the entry for the agglomerative clustering
algorithm, which creates a hierarchical tree in which the first to appear will be
the one with the most similar repositories.
Technological map of the evolution of the topic timeline (Fishbone): for the con-
struction of this map, four tasks are carried out. The application NER and
Information Retrieval (IR) of Spacy is applied, as follows: (1) each repository is
classified according to the year of creation and (2) the programming languages
of each project are obtained for each year and they are grouped with 3 relevant
colors, green, orange and red. Green shows the most used language. (3) data of
each repository is compared with a corpus that contains the types of software
in a predefined way, once this is done, the category with the highest hierarchy
is obtained in each year. (4) with all the data from the repositories matching
words are searched, such words are evaluated according to the number of times
they appear, this way, the obtaining of the type of technology used in each year
is performed.
Technological map of successful projects (Bubble Charts): it is constructed
through the SpaCy library, for this purpose, the indexation of the library is
used as follows: feature extraction is applied to the number of stars per reposi-
tory (value assigned by users to a repository), and the score of each repository
(value in which GitHub compares other developers with repositories in the same
language or in the same location). Data is ordered from a higher to lower value
of stars.
Strategy for the Identification of the State-of-the-Art Techniques 343
To carry out the pertinent distinction, it has been placed on each node:
GitHub user/Repository name. This classification is obtained through the two
measures that evaluate the ranking of the repositories: stars and score. The stars
are granted by users who recommend the project in a positive way (each user
can give a star to each repository). Moreover, the score is ranked according to
the score compared to other developers with repositories in the same language
or in the same location (by city, country or worldwide), according to GitHub
score is computed for each language using this formula: sum(stars)+(1.0
1.0/count(repositories)). Subsequently, the normalization of both values is made
between 0 and 1 through the formula of min-max scaling [22].
Technological map of successful users and organizations (Tables): using the IR
feature extraction from the SpaCy library, which takes the meta-data from the
query and finds the information of the repository creator, that information is
extracted for each one and it is evaluated whether it belongs to an organization
or not. This can be known because a user can not have collaborators in the
repositories, while the organizations can. After that, the frequency of participa-
tion and organization on the topic of each user is found. The frequency is the
number of times the user is named or quoted in the other repositories.
4 Case Study Virtual Programming Judges
In the learning of computer programming courses, instructors usually assign pro-
gramming problems to students with the objective of helping them acquire this
skill. However, it is unquestionable that, for instructors, correcting programming
assignments can be a slow, boring and error-prone task. Fortunately, program-
ming problems are ideal candidates for automated evaluation, this has led to the
development of a wide variety of tools known as programming judges [2326].
Currently, programming online judges are widely used to evaluate solutions
to programming problems. Virtual judges are web-based systems that offer a
repository of programming problems and a way to present solutions to these
problems, all of this in order to obtain an automatic verdict on their behavior
when running in different public test cases and private rooms designed to be as
exhaustive as possible. The solutions to such problems are complete programs
or functions with a well-defined specification.
Historically, online judges were aimed to train participants for important pro-
gramming contests. Currently, virtual judges are created for both, the academy;
to improve the programming teaching and learning. And for companies in order
for them to find and recruit highly qualified programmers who present their
developments on virtual online judges platforms.
Due to the above, in order to highlight the importance of virtual judges,
in this article, it is proposed that finding the state-of-the-art techniques, may
be relevant for researchers, developers, academics and entrepreneurs. We have
focused on the programming judges, since this technology has a wide field in
which the current situation can be analyzed: what is its evolution and what are
the future trends.
344 C. G. Hidalgo Suarez et al.
4.1 Software-Focus Research
Experts group the different keywords and ways to refer to competitive program-
ming issues with programming judges. The listed keywords are: Programming
Judge, Automatic judge, Programming competitions, Online Judge, Evaluator
Judge.
4.2 Making of Query Strings
With the keywords, the search strings are created, making use of the ‘OR’ and
‘AND’ connectors, which the GitHub API supports, thus, the query strings are
formed as follows: “virtual online judge”; (“automatic judge” AND “program-
ming”); (“Virtual Judge” OR “evaluator Judge” AND “programming”); (“Vir-
tual Judge” AND competitions AND programming).
4.3 Data Selection
The study was made on the GitHub platform, obtaining 423 results; many of
these do not contain complete information. As a solution, we use the VigHub1
tool that we have previously developed [27,28]. VigHub is a prototype tool
based on a TW model. It is supported by computational techniques: extract,
store, process, analyze and automatic visualization data of platform GitHub.
VigHub allows us to obtain the latest technological developments (repositories)
specifically in any software field. The VigHub tool has a search engine in which
queries are processed and 247 repositories are produced in the query. The 176
(423 247 = 176) repositories that were excluded in the query are filtered by
VigHub for 3 reasons: their weight in bytes is 0 (empty repositories), they are
private repositories (restricted access), and finally the API does not have access
to the URL of the repositories, so it is not possible to obtain the necessary
characteristics for this study, they may be user pages but not repositories.
4.4 Construction of Technological Maps
The maps and their respective analysis are presented below. Each one of the
information maps that allows to answer specific questions of the state of-the-art
techniques of a development project such as: What is the programming language
used to develop projects?, What is the evolution of the topic in GitHub?, What
is the most successful software projects on GitHub? and What are the users and
organizations that develop projects on the specific topic?. So, for each map there
is an associated question that allows a more detailed analysis of the field that is
being studied.
1http://eiscapp.univalle.edu.co/vighubjson/.
Strategy for the Identification of the State-of-the-Art Techniques 345
4.4.1 Technological Map of Language Programming
Knowing the programming languages which are being used to develop program-
ming judges helps to determine trends in the market, and current and future
technological needs. Taking into account the 247 projects, in the Fig. 2shows
the least and most relevant projects according to the number of stars, copies and
visits that have been received by the platform by users. In this case, the projects
that have greatest relevance are JavaScript, C++ and Python. Moreover, it
has been found that the technological development of programming judges on
GitHub over the last 9years, has been carried out using 16 programming lan-
guages. C++ has been the most common, with 47 projects, followed by Python
and JavaScript with 45 and 44 repositories respectively. The other languages
such as PHP, Java, Go, Ruby and C, have between 18 and 10 repositories.
Fig. 2. Language and importance of the repository in the subject; according to its size:
the higher is the picture that covers a language, the greater the amount of develop-
ments. According to its color: the orange color identifies the most relevant languages
and developments in GitHub. (Color figure online)
4.4.2 Technological Map of the Topic Evolution Timeline
Based on an exhaustive analysis of the changes in technologies and paradigms
that have taken place between 2009 and 2017, it was possible to obtain a specific
perspective which allowed us to identify which have been the most important
projects, trend languages and types of technology that have been used for each
year (see Fig. 3). A detailed description of the evolution of software in program-
ming judges:
2009–2010. During these years, an optimizer for online programming in Ruby
was developed, it can be installed locally and it allows to control memory
and execution time. For this year there was a total of 4 projects, 2 in C, 1
346 C. G. Hidalgo Suarez et al.
Fig. 3. The most relevant language with green, orange and red; green indicates the
most used language per year. The most important developments for this research are
shown in the top of the line that corresponds to the years. Trends and classification by
category for each year are shown at the bottom of each line. (Color figure online)
in Python and 1 in Ruby. None of the 4 projects developed a virtual judge
itself, the aforementioned are programs that allow to control the resources of
the machine, mainly to manipulate the limits in hardware, such as memory and
CPU time. These programs are used to prevent system crash and the Sphere
online Judge platform is highlighted Judge [29]. In addition to the mentioned,
for the ACM-ICPC (ACM International Collegiate Programming Contest) [30]or
annual programming and algorithm competition between universities around the
world sponsored by the IBM company, where teamwork is the priority. Different
projects are developed in C++, C, which share solutions to the proposed prob-
lems. Additionally, the development of the client/server project of ZOJ (Zhejiang
University online judge) in Java [31] begins.
2011–2012. These years shows a remarkable growth regarding the number of
projects, in which new languages such as JavaScript with the largest number of
projects appear, followed by C++ and Python. The growth of platforms of vir-
tual judges has many packages and servers. Additionally, the perspectives of how
the work had been done in the past changes, and instead of installing and setting
up servers, it is possible to set up virtual machines as platforms of virtual judges
which are easy to configure in order to execute the necessary environments and
specific, for example: different programming languages, compilers, among others.
On the other hand, there is a development in the creation of virtual judges who
consume computational resources, also, they allow to automatically qualify the
programming assignments and use sets of test cases taken from the ACM-ICPC
[32]. PHP, Python and C++ are the trending, leaving behind languages such
as JavaScript, Java, C and Ruby. Platforms where users can register, read the
descriptions of programming problems for PHP and send their responses to the
server are created, those platforms will then compile and execute the program
and will decide if the result is correct depending on some cases of input and
output. From this moment, the projects aim at the development of collaborative
Strategy for the Identification of the State-of-the-Art Techniques 347
platforms such as LeetCode [33]. The virtual judges become social networks for
programmers, where you can practice with programming examples, supported
by practice communities (active users), with which you can form groups to par-
ticipate in contests or challenge yourself to obtain prizes.
2013–2014. The beginning of collaborative platforms make programmers par-
ticipate constantly in programming contests and tournaments. Therefore, it is
necessary to implement new evaluation algorithms that allow to read the online
collaborative code, which, at the same time, has to be evaluated repeatedly
to see if the user program is correct. Projects developed in PHP: Aurora [34],
Lavida [35]. C++: Project lemon [36]. JavaScript: gdoj [37]. Other projects are
added with the purpose of creating challenges and verifying programs on online
code evaluation such as: PAT [38], POJ [39], leetcode, Google Code Jam [40],
hihoCoder [41], Codeforces [42]. These years Back-end tools are implemented for
Leetcode in C++, which are configured in a command line to compile, execute,
verify their output and generate reports in XML (Extensible Markup Language)
and JSON (JavaScript Object Notation). These standards allow the exchange of
solutions and source codes of problems in several virtual judges in any platform.
New platforms appear, such as DMJO [43] developed in Python with high inter-
face usability and with the capacity to support I/O-based, with runtime data
generators and custom output validators [44,45].
2015–2016. The interest in the topic increases, it is proved with the number
of projects with different programming languages. Python and C++ have the
largest number of projects. Most projects are applications that work with the
Linux operating system, the highlighted projects are: Kattis [46], Baekjoon
online Judge [47] and Judge Node [48]. The variety of platforms and languages
makes it necessary be able to execute them in any hardware and software environ-
ment, JavaScript becomes a pioneer regarding the use of microservice-oriented
architectures, by using lightweight and portable containers for software applica-
tions. That can be executed on any machine. In these years, among its develop-
ments, JavaScript has Judge0 [49]: Open source online judge API for code compi-
lation and execution on given test data. Platforms such as judge [50] and uoj [51]
are based on containers with Mobi technology (formerly Docker) to optimize data
I/O processes. Languages such as Python, C++, C continue with the inheritance
of JavaScript, which expands their processing capacity with microservices-based
architectures. These include: Sabo [52], onlineJudge2.0 and Judger [53].
2017. Currently for Python projects, computational techniques based on pre-
diction are implemented: Camisole [54], deep learning: RankFace [55], classifica-
tion: SunnyJudge [56]. These projects, rather than being judges for a code, they
obtain the capacity to evaluate the code of an application, in order to know if the
application is optimal. C++ develops projects that are focused on the learning of
programming. Concerning the platforms codes such as UVa Online Judge, ACM-
ICPC, some projects are: Algorithms [57], competitive-programming-archive [58]
and timus [59]. In JavaScript, some of the projects propose collaborative virtual
judge system: A project to implement a web-based collaborative code editor
348 C. G. Hidalgo Suarez et al.
which supports multiple users editing simultaneously. it designed a single-page
web application for coding, running, and compiling problems (based on REST-
ful API and loaded balancing by Nginx). projects such as: Angular-online-judge
[60], PutongOJ [61], collaborative-online-judger [62]. JavaScript also points to
the user experience with onlineJudgeFE [63].
The projects mentioned in this section are located in 5 categories, the cate-
gories were making of according to the processes of the virtual judges that are
carried out in [64]. (See Table 1).
Platforms: Programming platforms with code evaluation systems designed for
competitions, projects, hobbies, etc. Available online offline.
Application Programming Interface: Code judge library to be used by other
software as an abstraction layer.
Educational platforms: Project that encourages learning (Programming exer-
cises to learn taken from online and offline platforms).
Development of tools: Source code to create a code evaluator.
Others: Projects that evaluate code but are not judges.
Table 1. Classification of projects by category
Category To ta l pr o j ec ts Main programming language
Platforms 63 JavaScript
API 7Python
Educational platforms 45 JavaScript, C++
Development of tools 93 Python, C#
Others 39 Python, C++
The table was made with the 247 projects, in order to know to which category
they belong to. For this, two tasks were applied: (1) Technique used to extract
information from websites, with the practical guide of [65] for web scraping
in php. In which the information of each URL is converted into data strings.
(2) A word counter was created, it receives as parameters a range of inputs
(data chains) and a range of classes (proposed categories), the coincidence of
each repository is identified with the categories, the highest match will be its
category.
4.4.3 Technological Map of Successful Projects
It is important to know which projects are the most significant on GitHub, in
order to find out what trends there are in the development of the subject. The
most successful projects are those with the highest number of stars, copies, visits
and scores, according to GitHub [66]. The projects shown in Fig. 4are the 10
most important projects for GitHub and are arranged on a scale of 0 to 1, where
1 is the most important.
Strategy for the Identification of the State-of-the-Art Techniques 349
There are 5 of the most important projects which belong to the category
“Platforms”, among which are in order of most to least position: uoj [51]XOJ
[67], Judge [43], GoOnlineJudge [68], site [69], CCR-Plus [70], RLJ [71]. These
projects are modern virtual judges and online platform systems, which support
IO-based tasks, with runtime data generators and customized output validators.
Most of them work in Bajo linux and a few in Windows, they are good aesthetic
systems and they are governed by standards such as the ACM.
Fig. 4. Most successful repositories on GitHub according to the normalized values
between stars and the score of each repository.
2 projects belong to the “API” category, cyaron [72], which allows extracting
information from the CyaRon platform, which is a platform for online virtual
judges. hawx1993 [73], A lightweight library of JavaScript to judge code from
the judgejs platform. Finally in the “Others” category, virtual judge [74]isa
design of a virtual judge system to obtain the title of each master virtual judge
through the tracking technology and at the same time to implement the virtual
judge sending agent and obtain the result of the judgment.
350 C. G. Hidalgo Suarez et al.
4.4.4 Technological Map of Successful Users and Organizations
The Tables 2and 3, show in the first column, the main users and organizations
that are currently trending in the development of the topic of programming
judges. In the second column, the frequency of the number of times that the name
of the repository has participated or is mentioned, among the other repositories
is shown. The third column shows the number of currently existing repositories
concerning programming judges or similar topics. In the last Column the country
of each user and organization is shown.
Table 2. Active organizations on the area of virtual judges
Organizations Frequency in topic Number of repositories Country
QuindaoU 35 10 China
DMOJ 25 34 China
Vijos 19 11 NotFound
Lougu-dev 15 6 NotFound
ZJGSU-Open-Source 8 10 China
Hit-moodle 8 16 China
UESTC-ACM 57 6 USA
JudgeGilr 3 14 China
UniversalOJ 3 3 China
Table 3. Active users on the area of virtual judges
Users Frequency in topic Number of repositories Country
MaskRay 31 126 California
NIkolayIT 29 25 Bulgaria
Robertgraham 25 30 Germany
Joecorcoran 25 80 USA
Sbanarango 24 0 NotFound
Mjnaderi 24 28 Iran
Vfleaking 22 6 NotFound
Anishathalye 21 55 Massachusetts
5 Discussion
The analysis of the projects previously developed in GitHub to perform state-
of-the-art techniques of virtual judges for programming, allows the developer to
Strategy for the Identification of the State-of-the-Art Techniques 351
save time and effort, identifying which programming languages are the trend,
which are the developed projects, which are the main users and organizations
that contribute to the topic, and what has been the evolution of the topic.
Information that can be used to create strategies before making decisions in the
development of new projects.
The presented strategy is implemented in the VigHub tool, which allows the
process to be automatic from the extraction of the GitHub information, to the
analysis and visualization of the technological maps. The tool allows to save
the queries made, allowing to replicate the information for future academic or
research studies. Based on the projects found and analyzed in this article, it is
estimated that 65% of the topic is being developed by Asian countries, where
the dedication of business and academic organizations is remarkable compared
to other countries in Europe and North America.
It is estimated that 45% of the online platforms found in the GitHub projects
for this article are up-to-date, which made it difficult for us to analyze the
evolution of the topic concerning the years of some projects, since we could not
find their first stages, for they only show current information.
The strategy developed in this work was designed to support developers and
researchers who intent to initiate a software project. They need technological
tools that allow them to see the state-of-the-art techniques of a specific technol-
ogy on which they can rely on in order to make the best decisions.
It is important to mention that this proposal is intended to complement the
development of the state-of-the-art techniques, and not to replace the traditional
way of doing it, by means of searches in scientific databases.
6 Conclusions
A strategy that allows to support the creation of the state-of-the-art techniques
of a technology through the data analysis of the GitHub platform was created.
The strategy is based on the implementation of technological maps which allows
to answer questions of interest such as: In what programming language develop
a new project in the area? Which API’s, platforms or tools exist? Which similar
projects exist and which are the most successful nowadays? Which organizations
are interested in the topic? among other questions that would be decisive while
making a software project.
This study allowed us to make a systematic review of 247 public program-
ming judges software development projects deposited on the GitHub platform. In
each of the projects, the main software tools, technological maps (with relevant
information for programming) and developers were identified.
The repositories provided by the GitHub platform are progressively becoming
the places in which numerous organizations store and organize the results of their
activities, which include; software development in all areas of computer science,
biology, education, finance, administration, legislation, and many more.
Finally, we consider these results as strategic for research, development and
innovation in the scientific, academic, commercial and industrial communities,
352 C. G. Hidalgo Suarez et al.
since they allow us to diagnose the status of software developments for the
creation of new projects in programming judges.
Acknowledgements. We thank the Universidad del Valle, Faculty of Engineering,
Systems and Information Engineering School and the fundation CEIBA (Centro de
Estudios Interdisciplinarios B´asicos y Aplicados) Nari˜no.
References
1. Entonado, F.B., et al.: Sociedad de la informaci´on y educaci´on. Ciencia y Tec-
nolog´ıa. Junta de Extremadura, Consejer´ıa de Educaci´on (2001)
2. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci.
1(1), 83–98 (2008)
3. Dubey, A., Wagle, D.: Delivering software as a service. McKinsey Q. 6(2007), 1–7
(2007)
4. Erickson, J., Lyytinen, K., Siau, K.: Agile modeling, agile software development,
and extreme programming: the state of research. J. Database Manag. 16(4), 88
(2005)
5. Hood, W.W., Wilson, C.S.: The scatter of documents over databases in different
subject domains: how many databases are needed? J. Assoc. Inf. Sci. Technol.
52(14), 1242–1254 (2001)
6. Gonzalez-Diaz, C., Iglesias-Garcia, M., Martin Llaguno, M., Gonzalez-Pacanowski,
A., et al.: Antecedentes y estado de la cuesti´on sobre los repositorios institucionales
de contenido educativo (RICE) (2015)
7. Hood, W.W., Wilson, C.S.: Overlap in bibliographic databases. J. Assoc. Inf. Sci.
Technol. 54(12), 1091–1103 (2003)
8. Zarta, R.H., et al.: Vigilancia tecnol´ogica y an´alisis del ciclo de vida de la tecnolog´ıa:
evaluaci´on del potencial comercial de un prototipo de guantes biodegradables a
partir de almid´on termopl´astico de yuca. Revista ESPACIOS 37(13), 27–45 (2016)
9. Rivero, Y., Sanches, M., Su´arez, Y.: Evaluation model for the software using metric
indicators to science and technology surveillance. ACIMED 20(6), 125–140 (2009)
10. Tramullas, J., Gim´enez L´opez, M.: Evaluaci´on de software libre para la gesti´on de
bibliograf´ıa (2007)
11. Singh Chawla, D.: The unsung heroes of scientific software. Nat. News 529(7584),
115 (2016). https://doi.org/10.1038/529115a
12. Escorsa, P., Maspons, R., Rodr´ıguez, M.: Technology mapping, business strat-
egy and market opportunities. The case of the textiles for medical uses (Mapas
technol´ogicos, estrategia empresarial y oportunidades de mercado. El caso de los
textiles para usos m´edicos) (2000)
13. Bandrowski, A., et al.: The resource identification initiative: a cultural shift
in publishing. J. Comp. Neurol. 524(1), 8–22 (2016). https://doi.org/10.12688/
f1000research.6555.1
14. Harzing, A.W., Alakangas, S.: Google scholar, scopus and the web of science: a lon-
gitudinal and cross-disciplinary comparison. Scientometrics 106(2), 787–804 (2016)
15. Mongeon, P., Paul-Hus, A.: The journal coverage of web of science and scopus: a
comparative analysis. Scientometrics 106(1), 213–228 (2016)
16. Abbas, A., Zhang, L., Khan, S.U.: A literature review on the state-of-the-art in
patent analysis. World Pat. Inf. 37, 3–13 (2014)
Strategy for the Identification of the State-of-the-Art Techniques 353
17. Mazieri, M.R., Quoniam, L., Santos, A.M.: Innovation from the patent information:
proposition model open source patent information extraction (crawler). J. Manag.
Technol. 16(1), 76–112 (2016)
18. Github: Github Octoverse 2017 (2017). https://octoverse.github.com/
19. Google Developers: REST API v3 (2018). https://developer.github.com/v3/
20. Spacy: Industrial-strength natural language processing (2018). https://spacy.io/
21. Google Developers: Google charts (2018). https://developers.google.com/chart/
22. Al Shalabi, L., Shaaban, Z., Kasasbeh, B.: Data mining: a preprocessing engine.
J. Comput. Sci. 2(9), 735–739 (2006)
23. Kurnia, A., Lim, A., Cheang, B.: Online judge. Comput. Educ. 36(4), 299–315
(2001)
24. Douce, C., Livingstone, D., Orwell, J.: Automatic test-based assessment of pro-
gramming: a review. J. Educ. Resour. Comput. (JERIC) 5(3), 4 (2005)
25. Ihantola, P., Ahoniemi, T., Karavirta, V., Sepp¨al¨a, O.: Review of recent systems
for automatic assessment of programming assignments. In: Proceedings of the 10th
Koli Calling International Conference on Computing Education Research, pp. 86–
93. ACM (2010)
26. Cheang, B., Kurnia, A., Lim, A., Oon, W.C.: On automated grading of program-
ming assignments in an academic institution. Comput. Educ. 41(2), 121–131 (2003)
27. Hidalgo, C., Bucheli, V.: Computing colombian conference. In: Society in computa-
tional intelligence (CI). Herramienta tecnol´ogica de VT para GitHub. pp. 131–133
(2016)
28. Hidalgo, C.G., Bucheli, V.A.: S´eptimo congreso internacional de computaci´on
cicom. In: Maily, Q., Lorena, M. (eds.) VIGHUB: herramienta prototipo para el
apoyo de la vigilancia tecnol´ogica en el campo de desarrollo del software. pp. 231–
245. FAbbecor.ong (2017)
29. Sphere Research Labs: Sphere online judge (2009). http://www.spoj.com/
30. Baylor University: The ACM International Collegiate Programming Contest
(ICPC) (2001). https://icpc.baylor.edu/
31. ZUA Team: ZOJ (2001). http://acm.zju.edu.cn/
32. Hit moodle: onlinejudge (2011). http://cms.hit.edu.cn/
33. LeetCode: Leetcode (2001). https://leetcode.com/
34. Aurora: Aurora (2013). https://aurora.pushkaranand.com/
35. Naver02: lavida (2013). http://judge.lavida.us/
36. Google: Project-lemon (2011). https://code.google.com/archive/p/project-lemon/
37. weizengke: Gdoj: the programming contest web 1.2 platform & judge kernel
v100r001c00b100 version (2011). http://debugforces.com/
38. Instituto Nacional de Investigaci´on de la Educaci´on Superior AiCaiEnet Facul-
tad de Inform´atica y Tecnolog´ıa, U.d.Z.: Pat (programming ability test) (2012).
https://www.patest.cn/
39. Ying, F., Xu, P., Xie, D.: PKU judgeonline (2013). http://poj.org/
40. Google: Code Jam (2009). https://code.google.com/codejam/about
41. hihoCoder Hu ICP: hihocoder (2010). http://www.hihocoder.com/
42. Telegram: Codeforces (2010). http://codeforces.com/
43. onion: DMOJ: modern online judge! (2010). https://dmoj.ca/
44. mjnaderi: Sharif judge (2011). https://github.com/mjnaderi/Sharif-Judge/tree/
docs
45. CenaPlus: Cenaplus (2003). https://github.com/CenaPlus/CenaPlus
46. Kattis: Kattis online judge (2015). https://github.com/Kattis/kattis-cli
47. Start Link: Baekjoon online judge (2015). https://www.acmicpc.net/
354 C. G. Hidalgo Suarez et al.
48. Nodejs: Judge girl (2015). https://judgegirl.csie.org/
49. Aglio: Judge0 API (2016). https://api.judge0.com/
50. hnshhslsh: virtual-judge (2016). https://github.com/hnshhslsh/virtual-judge
51. vfleaking: UOJ (universal online judge) (2016). https://github.com/vfleaking/uoj
52. tokers: sabo (2016). https://github.com/tokers/sabo
53. Qingdao University: Onlinejudge (2015). https://docs.onlinejudge.me/
54. Proling: camisole is a secure online judge for CS teachers (2016). https://camisole.
prologin.org/
55. Entropy-xcy: Rankface (2017). https://github.com/Entropy-xcy/RankFace
56. yudazilian: Sunnyjudge (2017). https://github.com/yudazilian/SunnyJudge
57. nileshsah: Algorithms (2015). https://github.com/nileshsah/Algorithms
58. gshopov: Competitive-programming-archive (2017). https://github.com/gshopov/
competitive-programming-archive
59. Otrebus: Timus (2017). https://github.com/Otrebus/timus
60. sugarac: Angular online judge (2017). https://github.com/sugarac/angular-online-
judge
61. acm309: PutongOJ (2017). https://github.com/acm309/PutongOJ
62. cqlzx: Collaborative online judger (2017). https://github.com/cqlzx/collaborative-
online-judger
63. Qingdao University: Onlinejudgefe (2011). https://github.com/QingdaoU/
OnlineJudgeFE
64. Speedie, S.M., Treffinger, D.J., Houtz, J.C.: Classification and evaluation of
problem-solving tasks. Contemp. Educ. Psychol. 1(1), 52–75 (2008). https://doi.
org/10.1016/0361-476X(76)90007-2
65. anchetaWern: php-webscraping.md (2014). https://gist.github.com/anchetaWern/
6150297
66. GitHub: Awards (2018). http://github-awards.com
67. DaDaMrX: Xoj (2013). https://github.com/DaDaMrX/XOJ
68. Zhejiang Gongshang University: Goonlinejudge (2015). https://github.com/
ZJGSU-Open- Source/GoOnlineJudge
69. DMOJ: https://github.com/DMOJ/site
70. sxyzccr: CCR-Plus (2016). https://github.com/sxyzccr/CCR-Plus
71. rqy1458814497: RLJ (2012). https://github.com/rqy1458814497/RLJ
72. luogu dev: CYaRon (2015). https://github.com/luogu-dev/cyaron
73. hawx1993: Judge (2017). https://github.com/hawx1993/judge
74. Open fightcoder: virtual judge (2011). https://github.com/open-fightcoder/
virtual-judge
... VigHub es una herramienta que soporta los procesos de vigilancia tecnológica y permite ser la fuente de información para la construcción de estados del arte relacionados con el desarrollo de software [31]. La herramienta toma los datos de la plataforma GitHub [32], a partir de la información que se extrae se construyen mapas tecnológicos que describen el estado y la evolución de proyectos de software y las perspectivas de tecnologías específicas. ...
... La herramienta toma los datos de la plataforma GitHub [32], a partir de la información que se extrae se construyen mapas tecnológicos que describen el estado y la evolución de proyectos de software y las perspectivas de tecnologías específicas. La herramienta permite obtener mapas tecnológicos tales como, lenguajes de programación o cronología de evolución de temas [31]. ...
... El modelo de SASR parte de los trabajos de [30], [31]. El modelo de SASR caracteriza un corpus de referencias automáticamente, donde las referencias bibliográficas provienen de un archivo Bibtex. ...
Article
Full-text available
El trabajo describe el desarrollo de un prototipo de software basada en inteligencia artificial para la construcción semi-asistida del estado del arte en un proceso de investigación (SASR). La herramienta permiten obtener información con valor agregado y mapas, visualizaciones, para la escritura de un estado del arte de un tema de investigación. Los resultados son verificados mediante una encuesta a 50 estudiantes de maestría y doctorado, quienes reportan la utilidad de la herramienta. En el documento se presenta la arquitectura y la implementación SASR. Finalmente se presenta un estudio comparativo con las herramientas existentes, se discuten las potencialidades de la herramienta y el impacto positivo que puede tener SASR en la investigación, principalmente en el contexto colombiano. Así también se discuten las técnicas de inteligencia artificial implementadas, la escalabilidad de la herramienta y la facilidad de integrar nuevos análisis y visualizaciones.
... Thereby, such systems have become an important source of technical and social information about software development that is used to identify conventions, patterns, artifacts, etc. made by software development teams to understand and improve the quality of software (Del Carpio, 2017). However, the repository platforms only allow searches on projects, so they do not allow any analysis or valueadded information to support the decision-making process (Hidalgo Suarez et al., 2018). Researchers are interested in analysing these code repositories for information on different software issues (e.g. ...
Article
Full-text available
Code repositories contain valuable information, which can be extracted, processed and synthesized into valuable information. It enabled developers to improve maintenance, increase code quality and understand software evolution, among other insights. Certain research has been made during the last years in this field. This paper presents a systematic mapping study to find, evaluate and investigate the mechanisms, methods and techniques used for the analysis of information from code repositories that allow the understanding of the evolution of software. Through this mapping study, we have identified the main information used as input for the analysis of code repositories (commit data and source code), as well as the most common methods and techniques of analysis (empirical/experimental and automatic). We believe the conducted research is useful for developers working on software development projects and seeking to improve maintenance and understand the evolution of software through the use and analysis of code repositories.
Article
Full-text available
In this study, we focussed on designing and evaluating a learning tool for the research process in higher education. Mastering the research process seems to be a bottleneck within the academy. Therefore, there is a great need to offer students other ways to learn this skill in addition to books and lectures. The MethodViz tool supports ubiquitous aspects of the research process in their scientific works higher education students follow. Moreover, the tool facilitates and structures the process interactively. In this paper, we describe the creation process of the artefact and examine the characteristics and scope of MethodViz alongside the traits and ideas of design science research. The evaluation’s results are encouraging and show that Method Viz has the potential to improve students’ learning achievements.
Article
Full-text available
Objetivo: La implementación de estrategias pedagógicas como el aula invertida (AI) y el aprendizaje colaborativo (AC) han contribuido en la enseñanza de programación de computadores. En este sentido, este artículo permite identificar el estado actual de las tecnologías y herramientas basadas en aprendizaje activo y colaborativo, y cómo han apoyado el aprendizaje de la programación, y los aportes que ha generado la inteligencia artificial en este proceso. Metodología: Se realizó una revisión sistemática bajo la propuesta de Kitchenham para definir las preguntas de investigación, la selección de fuentes de información, el desarrollo de la revisión y el análisis de la información. En el proceso se utilizaron diferentes fuentes de datos y criterios de selección para artículos investigativos y conferencias publicadas entre 2013 y 2020. Resultados: Se identificó el estado actual del AI y el AC, enfatizando en cómo estas estrategias apoyan el proceso de enseñanza de la programación con métodos educativos y herramientas de software. También se identificó el apoyo de la inteligencia artificial en el aprendizaje de la programación a través de diferentes aplicaciones y técnicas computacionales que integran AI y AC. Conclusiones: Desde la aparición de las herramientas virtuales, la implementación de la inteligencia artificial se ha convertido en una prioridad para la educación virtual, potenciando la forma de entender y comprender las necesidades específicas del estudiante. Las nuevas herramientas y estrategias basadas en inteligencia artificial han contribuido con la gestión y la visualización, donde el docente puede tomar decisiones oportunas del proceso formativo y retroalimentar las actividades desarrolladas en los ambientes de aprendizaje.
Article
Full-text available
En el artículo se presentan los resultados del análisis del ciclo de vida de la tecnología aplicado a tres tecnologías relacionadas con la fabricación de guantes biodegradables a partir de almidón termoplástico de yuca, para estas tecnologías se construyeron ecuaciones de búsqueda para artículos y patentes que fueron utilizadas en bases de datos especializadas. Para el análisis de la información fueron utilizadas las curvas en S las cuales permiten determinar el estado de la tecnología dentro de su ciclo de vida para de esta manera disminuir la incertidumbre en la toma de decisiones con un punto de referencia o punto de inflexión. Las tecnologías seleccionadas fueron extrusión­soplado, extrusión reactiva y coextrusión, luego se determinaron las ecuaciones de búsqueda para artículos y patentes y posteriormente se acumularon los datos en los dos parámetros de desempeño elegidos. Todo lo anterior fue el insumo para aplicar trece modelos que ofrece el software especializado para curvas en S Sigmaplot, los cuales son: Sigmoidal 3 parámetros, Sigmoidal 4 parámetros, Sigmoidal 5 parámetros, Logístico 3 parámetros, Logístico 4 parámetros, Weibull 4 parámetros, Weibull 5 parámetros, Gompertz 3 parámetros, Gompertz 4 parámetros, Hill 3 parámetros, Hill 4 parámetros, Chapman 3 parámetros y Chapman 4 parámetros. El trabajo con los trece modelos tiene por objetivo encontrar los puntos de inflexión, que son aquellos que indican el momento o fase en el que se encuentra la tecnología. Después de aplicar los trece modelos a las tres tecnologías en los datos de artículos y patentes acumuladas, se selecciona el modelo que presenta mejor ajuste según los parámetros estadísticos establecidos: valor t: mayor que 2 y menor que ­2; valor p: menor que 0.005; Durbin Watson: entre 1.8 y 2.2; R2 Ajustado: que tienda a 1. Por último, se realizó el análisis de la tecnología.
Article
Full-text available
Science and technology surveillance allow that company to be informed on worldwide changes so that important decisions could be taken. Since data to analyze are continuously growing. Science and Technology Surveillance services need to have software to make this easier. In addition, there are several automated programs that have one or another function in the cycle of surveillance or that comply one or another requirement in the organizations. Considering this, it is necessary to have tools to characterize and assess both used and potentially useful programs for this kind of services. In this paper, a proposal of a model for the assessment of programs is presented. Only programs with the possibility of applying metric indicators to information analysis and used in science and technology surveillance were included. The model, based on ISO-9126, includes a group of indicators that provides analysts with elements to know the program and to determine whether it meets their analysis needs. The model was applied to a group of software commonly used in surveillance, as a result the strength of these tools in the analysis process of Science and Technology Watch services was demonstrated.
Article
Full-text available
This article aims to provide a systematic and comprehensive comparison of the coverage of the three major bibliometric databases: Google Scholar, Scopus and the Web of Science. Based on a sample of 146 senior academics in five broad disciplinary areas, we therefore provide both a longitudinal and a cross-disciplinary comparison of the three databases. Our longitudinal comparison of eight data points between 2013 and 2015 shows a consistent and reasonably stable quarterly growth for both publications and citations across the three databases. This suggests that all three databases provide sufficient stability of coverage to be used for more detailed cross-disciplinary comparisons. Our cross-disciplinary comparison of the three databases includes four key research metrics (publications, citations, h-index, and hI, annual, an annualised individual h-index) and five major disciplines (Humanities, Social Sciences, Engineering, Sciences and Life Sciences). We show that both the data source and the specific metrics used change the conclusions that can be drawn from cross-disciplinary comparisons.
Article
Full-text available
Bibliometric methods are used in multiple fields for a variety of purposes, namely for research evaluation. Most bibliometric analyses have in common their data sources: Thomson Reuters' Web of Science (WoS) and Elsevier's Scopus. This research compares the journal coverage of both databases in terms of fields, countries and languages, using Ulrich's extensive periodical directory as a base for comparison. Results indicate that the use of either WoS or Scopus for research evaluation may introduce biases that favor Natural Sciences and Engineering as well as Biomedical Research to the detriment of Social Sciences and Arts and Humanities. Similarly, English-language journals are overrepresented to the detriment of other languages. While both databases share these biases, their coverage differs substantially. As a consequence, the results of bibliometric analyses may vary depending on the database used.
Conference Paper
Full-text available
La implantación del Espacio Europeo de Educación Superior ha significado profundas transformaciones en el proceso de enseñanza-aprendizaje, que incluyen tanto la forma de impartir docencia por parte del profesorado, como la manera en la que el alumnado adquiere los conocimientos. Uno de los aspectos clave en la transformación de la Universidad son las TIC y, por extensión, la forma en la que se crea, difunde y adquiere la información. Bajo este contexto, los repositorios institucionales de las universidades cobran un gran protagonismo como herramientas que permiten almacenar, distribuir y preservar la información. Si bien los trabajos de índole científico predominan en los repositorios universitarios, cada vez cobran más fuerza los Repositorios Institucionales de Contenido Educativo (RICE). Bajo este contexto, esta comunicación tiene como objetivo realizar una revisión de la literatura sobre los trabajos que aborden el estudio de los repositorios universitarios con especial atención a los RICE, para poder determinar: 1) evolución y estado actual; 2)características; 3) ventajas e inconvenientes; y 4) propuestas de mejora.
Article
Full-text available
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as “What other studies used resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.
Article
Creators of computer programs that underpin experiments don’t always get their due — so the website Depsy is trying to track the impact of research code.
Article
The rapid growth of patent documents has called for the development of sophisticated patent analysis tools. Currently, there are various tools that are being utilized by organizations for analyzing patents. These tools are capable of performing wide range of tasks, such as analyzing and forecasting future technological trends, conducting strategic technology planning, detecting patent infringement, determining patents quality and the most promising patents, and identifying technological hotspots and patent vacuums. This literature review presents the state-of-the-art in patent analysis and also presents taxonomy of patent analysis techniques. Moreover, the key features and weaknesses of the discussed tools and techniques are presented and several directions for future research are highlighted. The literature review will be helpful for the researchers in finding the latest research efforts pertaining to the patent analysis in a unified form.