Diego Elias Costa

Diego Elias Costa
Concordia University Montreal

PhD

About

65
Publications
54,890
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
692
Citations
Introduction
I am an assistant professor at Concordia University working in Software Engineering. My research interests cover a wide range of software engineering and performance engineering related topics, including mining software repositories, software ecosystems, dependency management, bots in software engineering, and performance testing.
Additional affiliations
September 2019 - July 2021
Concordia University
Position
  • PostDoc Position
January 2015 - August 2019
Universität Heidelberg
Position
  • PhD Student
November 2012 - December 2014
Universidade Federal de Uberlândia (UFU)
Position
  • Master's Student

Publications

Publications (65)
Article
Pull Requests (PRs) that are neither progressed nor resolved clutter the list of PRs, making it difficult for the maintainers to manage and prioritize unresolved PRs. To automatically track, follow up, and close such inactive PRs, Stale bot was introduced by GitHub. Despite its increasing adoption, there are ongoing debates on whether using Stale b...
Preprint
Full-text available
Software ecosystems (e.g., npm, PyPI) are the backbone of modern software developments. Developers add new packages to ecosystems every day to solve new problems or provide alternative solutions, causing obsolete packages to decline in their importance to the community. Packages in decline are reused less overtime and may become less frequently mai...
Preprint
Full-text available
Software systems are increasingly relying on deep learning components, due to their remarkable capability of identifying complex data patterns and powering intelligent behaviour. A core enabler of this change in software development is the availability of easy-to-use deep learning libraries. Libraries like PyTorch and TensorFlow empower a large var...
Article
Full-text available
Managing project dependencies is a key maintenance issue in software development. Developers need to choose an update strategy that allows them to receive important updates and fixes while protecting them from breaking changes. Semantic Versioning was proposed to address this dilemma but many have opted for more restrictive or permissive alternativ...
Preprint
Full-text available
Pull Requests (PRs) that are neither progressed nor resolved clutter the list of PRs, making it difficult for the maintainers to manage and prioritize unresolved PRs. To automatically track, follow up, and close such inactive PRs, Stale bot was introduced by GitHub. Despite its increasing adoption, there are ongoing debates on whether using Stale b...
Preprint
Full-text available
Managing project dependencies is a key maintenance issue in software development. Developers need to choose an update strategy that allows them to receive important updates and fixes while protecting them from breaking changes. Semantic Versioning was proposed to address this dilemma but many have opted for more restrictive or permissive alternativ...
Article
Security issues are a major concern in software packages and their impact can be detrimental if exploited. Modern code review is a widely-used practice that project maintainers adopt to improve the quality of contributed code. Prior work has shown that code review has an important role in improving software quality, however, in-depth analyses on co...
Article
Full-text available
In this work, we evaluate three popular fairness preprocessing algorithms and investigate the potential for combining all algorithms into a more robust preprocessing ensemble. We report on lessons learned that can help practitioners better select fairness algorithms for their models.
Article
Full-text available
Software ecosystems play an important role in modern software development, providing an open platform of reusable packages that speed up and facilitate development tasks. However, this level of code reusability supported by software ecosystems also makes the discovery of security vulnerabilities much more difficult, as software systems depend on an...
Preprint
Full-text available
Diversity and inclusion are necessary prerequisites for shaping technological innovation that benefits society as a whole. A common indicator of diversity consideration is the representation of different social groups among software engineering (SE) researchers, developers, and students. However, this does not necessarily entail that diversity is c...
Article
Artificial Intelligence based systems offer a multitude of opportunities and significant advantages to the industry and the larger society, but at the same time, they may create or contribute to social and ethical problems in society. Developing AI-based systems is a challenge when considering the need for specialized knowledge of the social implic...
Preprint
Full-text available
As machine learning (ML) systems get adopted in more critical areas, it has become increasingly crucial to address the bias that could occur in these systems. Several fairness pre-processing algorithms are available to alleviate implicit biases during model training. These algorithms employ different concepts of fairness, often leading to conflicti...
Article
The reliance on vulnerable dependencies is a major threat to software systems. Dependency vulnerabilities are common and remain undisclosed for years. However, once the vulnerability is discovered and publicly known to the community, the risk of exploitation reaches its peak, and developers have to work fast to remediate the problem. While there ha...
Preprint
Full-text available
Gamification is the use of game elements such as points, leaderboards, and badges in a non-game context to encourage a desired behavior from individuals interacting with an environment. Recently, gamification has found its way into software engineering contexts as a means to promote certain activities to practitioners. Previous studies investigated...
Preprint
Full-text available
Modern software systems are often built by leveraging code written by others in the form of libraries and packages to accelerate their development. While there are many benefits to using third-party packages, software projects often become dependent on a large number of software packages. Consequently, developers are faced with the difficult challe...
Preprint
Full-text available
The Open Source Software movement has been growing exponentially for a number of years with no signs of slowing. Driving this growth is the widespread availability of libraries and frameworks that provide many functionalities. Developers are saving time and money incorporating this functionality into their applications resulting in faster more feat...
Article
Full-text available
Pull-based development has enabled numerous volunteers to contribute to open-source projects with fewer barriers. Nevertheless, a considerable amount of pull requests (PRs) with valid contributions are abandoned by their contributors , wasting the effort and time put in by both the contributors and maintainers. To better understand the underlying d...
Article
Full-text available
Context While in serverless computing, application resource management and operational concerns are generally delegated to the cloud provider, ensuring that serverless applications meet their performance requirements is still a responsibility of the developers. Performance testing is a commonly used performance assessment practice; however, it trad...
Article
Full-text available
Nowadays, wearables-based Human Activity Recognition (HAR) systems represent a modern, robust, and lightweight solution to monitor athlete performance. However, user data variability is a problem that may hinder the performance of HAR systems, especially the cross-subject HAR models. Such a problem may have a lesser effect on the subject-specific m...
Article
Full-text available
Due to their increasing complexity, today's software systems are frequently built by leveraging reusable code in the form of libraries and packages. Software ecosystems (e.g., npm) are the primary enablers of this code reuse, providing developers with a platform to share their own and use others' code. These ecosystems evolve rapidly: developers ad...
Preprint
Pull-based development has enabled numerous volunteers to contribute to open-source projects with fewer barriers. Nevertheless, a considerable amount of pull requests (PRs) with valid contributions are abandoned by their contributors, wasting the effort and time put in by both their contributors and maintainers. To gain a more comprehensive underst...
Article
Full-text available
Inertial sensors are widely used in the field of human activity recognition (HAR), since this source of information is the most informative time series among non-visual datasets. HAR researchers are actively exploring other approaches and different sources of signals to improve the performance of HAR systems. In this study, we investigate the impac...
Chapter
Full-text available
Java 8 marked a shift in the Java development landscape by introducing functional-like concepts in its stream library. Java developers can now rely on stream pipelines to simplify data processing, reduce verbosity, easily enable parallel processing and increase the expressiveness of their code. While streams have seemingly positive effects in Java...
Article
Full-text available
Dependency management in modern software development poses many challenges for developers who wish to stay up to date with the latest features and fixes whilst ensuring backwards compatibility. Project maintainers have opted for varied, and sometimes conflicting, approaches for maintaining their dependencies. Opting for unsuitable approaches can in...
Conference Paper
Full-text available
Java 8 marked a shift in the Java development landscape by introducing functional-like concepts in its stream library. Java developers can now rely on stream pipelines to simplify data processing, reduce verbosity, easily enable parallel processing and increase the expressiveness of their code. While streams have seemingly positive effects on Java...
Preprint
Full-text available
Context. While in serverless computing, application resource management and operational concerns are generally delegated to the cloud provider, ensuring that serverless applications meet their performance requirements is still a responsibility of the developers. Performance testing is a commonly used performance assessment practice; however, it tra...
Preprint
Due to its increasing complexity, today's software systems are frequently built by leveraging reusable code in the form of libraries and packages. Software ecosystems (e.g., npm) are the primary enablers of this code reuse, providing developers with a platform to share their own and use others' code. These ecosystems evolve rapidly: developers add...
Article
Full-text available
Chatbots are envisioned to dramatically change the future of Software Engineering, allowing practitioners to chat and inquire about their software projects and interact with different services using natural language. At the heart of every chatbot is a Natural Language Understanding (NLU) component that enables the chatbot to understand natural lang...
Conference Paper
Full-text available
Continuous Integration (CI) is the process of automatically compiling, building, and testing code changes in the hope of catching bugs as they are introduced into the code base. With bug fixing being a core and increasingly costly task in software development, the community has adopted CI to mitigate this issue and improve the quality of their soft...
Conference Paper
Full-text available
Vulnerable dependencies are a major problem in modern software development. As software projects depend on multiple external dependencies, developers struggle to constantly track and check for corresponding security vulnerabilities that affect their project dependencies. To help mitigate this issue, Dependabot has been created, a bot that issues pu...
Article
Full-text available
A decade after its first release, the Go programming language has become a major programming language in the development landscape. While praised for its clean syntax and C-like performance, Go also contains a strong static type-system that prevents arbitrary type casting and arbitrary memory access, making the language type-safe by design. However...
Article
Full-text available
Nowadays, Human Activity Recognition (HAR) systems, which use wearables and smart systems, are a part of our daily life. Despite the abundance of literature in the area, little is known about the impact of muscle fatigue on these systems’ performance. In this work, we use the biceps concentration curls exercise as an example of a HAR activity to ob...
Conference Paper
Full-text available
Software ecosystems play an important role in modern software development, providing an open platform of reusable packages that speed up and facilitate development tasks. However, this level of code reusability supported by software ecosystems also makes the discovery of security vulnerabilities much more difficult, as software systems depend on an...
Preprint
Full-text available
Chatbots are envisioned to dramatically change the future of Software Engineering, allowing practitioners to chat and inquire about their software projects and interact with different services using natural language. At the heart of every chatbot is a Natural Language Understanding (NLU) component that enables the chatbot to understand natural lang...
Preprint
Dependency management in modern software development poses many challenges for developers who wish to stay up to date with the latest features and fixes whilst ensuring backwards compatibility. Project maintainers have opted for varied, and sometimes conflicting, approaches for maintaining their dependencies. Opting for unsuitable approaches can in...
Preprint
Software vulnerabilities have a large negative impact on the software systems that we depend on daily. Reports on software vulnerabilities always paint a grim picture, with some reports showing that 83% of organizations depend on vulnerable software. However, our experience leads us to believe that, in the grand scheme of things, these software vul...
Preprint
Full-text available
A decade after its first release, the Go programming language has become a major programming language in the development landscape. While praised for its clean syntax and C-like performance, Go also contains a strong static type-system that prevents arbitrary type casting and arbitrary memory access, making the language type-safe by design. However...
Conference Paper
Full-text available
Chatbots are becoming increasingly popular due to their benefits in saving costs, time, and effort. This is due to the fact that they allow users to communicate and control different services easily through natural language. Chatbot development requires special expertise (e.g., machine learning and conversation design) that differ from the developm...
Article
Full-text available
Despite huge software engineering efforts and programming language support, resource and memory leaks are still a troublesome issue, even in memory-managed languages such as Java. Understanding the properties of leak-inducing defects, how the leaks manifest, and how they are repaired is an essential prerequisite for designing better approaches for...
Conference Paper
Full-text available
Domain Specific Languages (DSLs) have proven useful in the domain of data science, as witnessed by the popularity of SQL. However, implementing and maintaining a DSL incurs a significant effort which limits their utility in context of fast-changing data science frameworks and libraries. We propose an approach and a Python-based library/tool NLDSL w...
Conference Paper
Full-text available
Monitoring software performance evolution is a daunting and challenging task. This paper proposes a lightweight visualization technique that contrasts source code variation with the memory consumption and execution time of a particular benchmark. The visualization fully integrates with the commit graph as common in many software repository managers...
Thesis
Full-text available
Software systems are an integral part of modern society. As we continue to harness software automation in all aspects of our daily lives, the runtime performance of these systems become increasingly important. When everything seems just a click away, performance issues that compromise the responsiveness of a system can lead to severe financial and...
Article
Full-text available
Microbenchmarking frameworks, such as Java's Microbenchmark Harness (JMH), allow developers to write fine-grained performance test suites at the method or statement level. However, due to the complexities of the Java Virtual Machine, developers often struggle with writing expressive JMH benchmarks which accurately represent the performance of such...
Preprint
Full-text available
Data analysis is at the core of scientific studies, a prominent task that researchers and practitioners typically undertake by programming their own set of automated scripts. While there is no shortage of tools and languages available for designing data analysis pipelines, users spend substantial effort in learning the specifics of such languages/t...
Preprint
Full-text available
Despite huge software engineering efforts and programming language support, resource and memory leaks are still a troublesome issue, even in memory-managed languages such as Java. Understanding the properties of leak-inducing defects, how the leaks manifest, and how they are repaired is an essential prerequisite for designing better approaches for...
Conference Paper
Full-text available
Networks play an increasingly important role in modelling real-world systems due to their utility in representing complex connections. For predictive analyses, the engineering of node features in such networks is of fundamental importance to machine learning applications, where the lack of external information often introduces the need for features...
Conference Paper
Full-text available
Despite many software engineering efforts and programming language support, resource and memory leaks remain a troublesome issue in managed languages such as Java. Understanding the properties of leak-related issues, such as their type distribution, how they are found, and which defects induce them is an essential prerequisite for designing better...
Conference Paper
Full-text available
Networks play an increasingly important role in modelling realworld systems due to their utility in representing complex connections. For predictive analyses, the engineering of node features in such networks is of fundamental importance to machine learning applications, where the lack of external information often introduces the need for features...
Presentation
Full-text available
CGO Slides of CollectionSwitch: A Framework for Efficient and Dynamic Collection Selection
Conference Paper
Full-text available
Selecting collection data structures for a given application is a crucial aspect of the software development. Inefficient usage of collections has been credited as a major cause of performance bloat in applications written in Java, C++ and C#. Furthermore, a single implementation might not be optimal throughout the entire program execution. This de...
Conference Paper
Full-text available
Selecting collection data structures for a given application is a crucial aspect of the software development. Inefficient usage of collections has been credited as a major cause of performance bloat in applications written in Java, C++ and C#. Furthermore, a single implementation might not be optimal throughout the entire program execution. This de...
Conference Paper
Full-text available
Collection data structures have a major impact on the performance of applications, especially in languages such as Java, C#, or C++. This requires a developer to select an appropriate collection from a large set of possibilities, including different abstractions (e.g. list, map, set, queue), and multiple implementations. In Java, the default implem...
Conference Paper
Configuration options are widely used for customiz-ing the behavior and initial settings of software applications, server processes, and operating systems. Their distinctive property is that each option is processed, defined, and described in different parts of a software project-namely in code, in configuration file, and in documentation. This cre...
Conference Paper
Full-text available
Dynamic memory allocation is one of the most ubiquitous operations in computer programs. In order to design effective memory allocation algorithms, it is a major requirement to understand the most frequent memory allocation patterns present in modern applications. In this paper, we present an experimental characterization study of dynamic memory al...
Conference Paper
Full-text available
Dynamic memory allocation is one of the most ubiquitous operations in computer programs. In order to design effective memory allocation algorithms, it is a major requirement to understand the most frequent memory allocation patterns present in modern applications. In this paper, we present an experimental characterization study of dynamic memory al...
Data
Full-text available
Software systems running continuously for a long time often confront software aging, which is the phenomenon of progressive degradation of execution environment caused by latent software faults. Removal of such faults in software development process is a crucial issue for system reliability. A known major obstacle is typically the large latency to...
Conference Paper
Full-text available
Software systems running continuously for a long time often confront software aging, which is the phenomenon of progressive degradation of execution environment caused by latent software faults. Removal of such faults in software development process is a crucial issue for system reliability. A known major obstacle is typically the large latency to...
Conference Paper
Full-text available
In this paper, we present an experimental study to compare six user-level memory allocators. In addition, we compare the experimental results with the asymptotic analyses of the evaluated algorithms. The experimental results show that parallelism affects negatively the investigated allocators. The theoretical analysis of the execution time demonstr...

Network

Cited By