Figure - uploaded by Sherlock A. Licorish
Content may be subject to copyright.
Qualitative analysis for security violations

Qualitative analysis for security violations

Source publication
Article
Full-text available
Community Question and Answer (CQA) platforms use the power of online groups to solve problems, or gain information. While these websites host useful information, it is critical that the details provided on these platforms are of high quality, and that users can trust the information. This is particularly necessary for software development, given t...

Citations

... Another strategy is having your class embrace the practical solution of "Googling it." Teach learners how to search appropriately, reference documentation, and assess code from crowdsourced forums like StackOverflow with a critical eye [19]. You can also explain that professionals check documentation regularly, as packages are updated frequently. ...
... For users, these features can be utilized to consider the quality of both questions and answers to refer to. Thus, to understand the popularity of the react-related questions shared on SO, we further empirically investigate the question scores, the number of answers, comments, favorites, and views [23] for each post related to react. To facilitate understanding the popularity of react-related questions, we break down the analysis into 3 sub-questions. ...
Article
Full-text available
React is a JavaScript library to develop user interfaces for single-page applications. Developers utilize react to build large web apps that allow users to update data without refreshing the page. Despite its benefits, many developers face react-related issues in the implementation. To find a solution, developers commonly shared and discussed their issues on stack overflow (SO). Although recent studies have demonstrated the benefits of utilizing react in web development, the trends of the users’ attentions remain unknown. In this study, we conducted a preliminary empirical study of react library-related questions shared on SO. We applied an exploratory data analysis technique to investigate the distribution of problems shared by the developers. The findings reveal that although the quantity of react-related topics on SO has risen over time, community interest is beginning to decrease. This is shown by the increase of the unsolved questions and the decrease of the number of views per year. Regarding the react users’ activity, most of them are more active in providing answers rather than commenting and providing scores. The findings of this study might point to future research that recommends approaches to assist the react community in overcoming issues while using react in the early phases.
... For users, these features can be utilized to consider the quality of both questions and answers to refer to. Thus, to understand the popularity of the react-related questions shared on SO, we further empirically investigate the question scores, the number of answers, comments, favorites, and views [23] for each post related to react. To facilitate understanding the popularity of react-related questions, we break down the analysis into 3 sub-questions. ...
Article
Full-text available
React is a JavaScript library to develop user interfaces for single-page applications. Developers utilize react to build large web apps that allow users to update data without refreshing the page. Despite its benefits, many developers face react-related issues in the implementation. To find a solution, developers commonly shared and discussed their issues on stack overflow (SO). Although recent studies have demonstrated the benefits of utilizing react in web development, the trends of the users’ attentions remain unknown. In this study, we conducted a preliminary empirical study of react library-related questions shared on SO. We applied an exploratory data analysis technique to investigate the distribution of problems shared by the developers. The findings reveal that although the quantity of react-related topics on SO has risen over time, community interest is beginning to decrease. This is shown by the increase of the unsolved questions and the decrease of the number of views per year. Regarding the react users’ activity, most of them are more active in providing answers rather than commenting and providing scores. The findings of this study might point to future research that recommends approaches to assist the react community in overcoming issues while using react in the early phases.
... Existing research shows that developers leverage Stack Overflow code snippets to build their own programs 13,8,1 . In this work, we intend to study how developers reuse code from Stack Overflow to create Jupyter Notebook. ...
... By parsing the Notebook files, we are able to extract the code snippets from the Notebook. 1 { 2 "cell_type" : "code", 3 "execution_count": 1, # integer or null 4 "metadata" : { 5 "collapsed" : True, # whether the output of the cell is collapsed 6 "scrolled": False, # any of true, false or "auto" 7 }, 8 "source" : "[some multi−line code]", 9 "outputs": [{ 10 # list of output dicts (described below) 11 "output_type": "stream", 12 ... 13 }], 14 } Finally, we obtain 3,758,196 code snippets. ...
... Ragkhitwetsagul et al. 8 conducted a large-scale empirical study for online code clones between Stack Overflow and 111 Java open source projects to understand: (1) code clone practices; (2) code clone patterns; (3) out-dated code clones; and (4) software licensing violations. Meldrum et al. 13 conducted a large-scale study on code snippet quality for code snippets from StackOverflow. Rahman et al. 27 discussed the insecure code practices for Python code on StackOverflow. ...
Preprint
Jupyter Notebook is a popular tool among data analysts and scientists for working with data. It provides a way to combine code, documentation, and visualizations in a single, interactive environment, facilitating code reuse. While code reuse can improve programming efficiency, it can also decrease readability, security, and overall performance. We conduct a large-scale exploratory study of code reuse practices in the Jupyter Notebook development community on the Stack Overflow platform to understand the potential negative impacts of code reuse. Our findings identified 1,097,470 Jupyter Notebook clone pairs that reuse Stack Overflow code snippets, and the average code snippet has 7.91 code quality violations. Through our research, we gain insight into the reasons behind Jupyter Notebook developers’ decision to reuse code and the potential drawbacks of this practice.
... Existing research shows that developers leverage Stack Overflow code snippets to build their own programs 13,8,1 . In this work, we intend to study how developers reuse code from Stack Overflow to create Jupyter Notebook. ...
... By parsing the Notebook files, we are able to extract the code snippets from the Notebook. 1 { 2 "cell_type" : "code", 3 "execution_count": 1, # integer or null 4 "metadata" : { 5 "collapsed" : True, # whether the output of the cell is collapsed 6 "scrolled": False, # any of true, false or "auto" 7 }, 8 "source" : "[some multi−line code]", 9 "outputs": [{ 10 # list of output dicts (described below) 11 "output_type": "stream", 12 ... 13 }], 14 } Finally, we obtain 3,758,196 code snippets. ...
... Ragkhitwetsagul et al. 8 conducted a large-scale empirical study for online code clones between Stack Overflow and 111 Java open source projects to understand: (1) code clone practices; (2) code clone patterns; (3) out-dated code clones; and (4) software licensing violations. Meldrum et al. 13 conducted a large-scale study on code snippet quality for code snippets from StackOverflow. Rahman et al. 27 discussed the insecure code practices for Python code on StackOverflow. ...
Preprint
Jupyter Notebook is a popular tool among data analysts and scientists for working with data. It provides a way to combine code, documentation, and visualizations in a single, interactive environment, facilitating code reuse. While code reuse can improve programming efficiency, it can also decrease readability, security, and overall performance. We conduct a large-scale exploratory study of code reuse practices in the Jupyter Notebook development community on the Stack Overflow platform to understand the potential negative impacts of code reuse. Our findings identified 1,097,470 Jupyter Notebook clone pairs that reuse Stack Overflow code snippets, and the average code snippet has 7.91 code quality violations. Through our research, we gain insight into the reasons behind Jupyter Notebook developers’ decision to reuse code and the potential drawbacks of this practice.
... The second study focused on the quality characteristics researchers commonly use to study the quality of code snippets in light of the increasing use of code available on online collaborative platforms during software development. We only focused on Stack Overflow 2 in this phase because it is the most popular community question and answer 2 https://stackoverflow.com/ 4 portal developers use (Meldrum et al., 2020a). Third, we investigated practitioners' views on code snippets by conducting semi-structured interviews to gain rich and detailed knowledge of how practitioners judge the quality of code snippets. ...
... These refer to the insecurities or weaknesses found in Stack Overflow code snippets (mainly in the answer codes). They can result from the following; code injection (Chen et al., 2019a;Rahman et al., 2019a;Zhang et al., 2021), lack of education in addressing security issues (Acar et al., 2016), complex API and extensive code snippets (Meldrum et al., 2020a;Meng et al., 2018;Ye et al., 2018), and prioritisation of other requirements over security (Acar et al., 2016;Meng et al., 2018;Zhang et al., 2021 Overflow code snippets could pose serious security threats to real-world software systems such as Android apps (Acar et al., 2016;Chen et al., 2019a;Ye et al., 2018). Acar et al. (2016) identified that not addressing the security implications of using insecure code snippets creates the possibility for developers to copy and paste insecure functional solutions that avoid existing security measures without realising their actions. ...
... They claim that reusing code from knowledgesharing communities boosts productivity and increases project quality. Such a view may be deemed contentious, however, especially when considering that Stack Overflow code may possess quality concerns (Meldrum et al., 2020a). RQ2b. ...
Article
Context Over the years, there has been debate about what constitutes software quality and how it should be measured. This controversy has caused uncertainty across the software engineering community, affecting levels of commitment to the many potential determinants of quality among developers. An up-to-date catalogue of software quality views could provide developers with contemporary guidelines and templates. In fact, it is necessary to learn about views on the quality of code on frequently used online collaboration platforms (e.g., Stack Overflow), given that the quality of code snippets can affect the quality of software products developed. If quality models are unsuitable for aiding developers because they lack relevance, developers will hold relaxed or inappropriate views of software quality, thereby lacking awareness and commitment to such practices. Objective We aim to explore differences in interest in quality characteristics across research and practice. We also seek to identify quality characteristics practitioners consider important when judging code snippet quality. First, we examine the literature for quality characteristics used frequently for judging software quality, followed by the quality characteristics commonly used by researchers to study code snippet quality. Finally, we investigate quality characteristics used by practitioners to judge the quality of code snippets. Methods We conducted two systematic literature reviews followed by semi-structured interviews of 50 practitioners to address this gap. Results The outcomes of the semi-structured interviews revealed that most practitioners judged the quality of code snippets using five quality dimensions: Functionality, Readability, Efficiency, Security and Reliability. However, other dimensions were also considered (i.e., Reusability, Maintainability, Usability, Compatibility and Completeness). This outcome differed from how the researchers judged code snippet quality. Conclusion Practitioners today mainly rely on code snippets from online code resources, and specific models or quality characteristics are emphasised based on their need to address distinct concerns (e.g., mobile vs web vs standalone applications, regular vs machine learning applications, or open vs closed source applications). Consequently, software quality models should be adapted for the domain of consideration and not seen as one-size-fits-all. This study will lead to targeted support for various clusters of the software development community.
... These portals are particularly useful as they allow the community to openly critique solutions. While this mechanism is anticipated to help with improving code quality, evidence has shown that many faults remain in code available in online portals [8]. Stack Overflow code, in particular, is extensively reused, at times introducing unsuspecting vulnerabilities in the systems where such code is copied [7]. ...
... In helping to remedy faulty code, static analysis is used extensively for understanding the quality of code on Stack Overflow and other online portals. For instance, PMD and SpotBugs have provided insights around how much contributors adhere to code readability, reliability, performance and security rules [4,8]. These tools may help the software engineering community to quickly understand the quality of code, and whether or not provided solutions conform to coding standards. ...
... We have used the 8010 snippets that were provided by [8]. The data were extracted from Stack Overflow for 2014, 2015, and 2016 and were said to represent suitably long compilable code from answers where a high level of reuse was evident. ...
Preprint
Full-text available
Software developers are increasingly dependent on question and answer portals and blogs for coding solutions. While such interfaces provide useful information, there are concerns that code hosted here is often incorrect, insecure or incomplete. Previous work indeed detected a range of faults in code provided on Stack Overflow through the use of static analysis. Static analysis may go a far way towards quickly establishing the health of software code available online. In addition, mechanisms that enable rapid automated program improvement may then enhance such code. Accordingly, we present this proof of concept. We use the PMD static analysis tool to detect performance faults for a sample of Stack Overflow Java code snippets, before performing mutations on these snippets using GIN. We then re-analyse the performance faults in these snippets after the GIN mutations. GIN's RandomSampler was used to perform 17,986 unique line and statement patches on 3,034 snippets where PMD violations were removed from 770 patched versions. Our outcomes indicate that static analysis techniques may be combined with automated program improvement methods to enhance publicly available code with very little resource requirements. We discuss our planned research agenda in this regard.
... For instance, software practitioners also revealed occasionally experiencing buggy code snippets. Recent studies have shed light on code snippet quality [62]. For example, Fischer et al. [18] showed that 15% of Android applications contain vulnerable code snippets copied from Stack Overflow. ...
Article
Software developers make use of on crowdsourcing during development. Beyond learning from others, developers use online portals such as Stack Overflow as a vehicle for collaboration. However, little is known about developers' experiences on such platforms, particularly around problems that are encountered online. Such insights could benefit software developers in terms of recommendations for pitfalls to avoid, ways to exploit crowdsourced knowledge, and the provision of insights to improve online code sharing communities. We interviewed 50 practitioners to fill this gap, where outcomes show that software developers' use of online portals is targeted, and such portals are a lifeline to modern software development. Practitioners are facilitated with code solutions and debugging, often in a very timely fashion. While these experiences are largely positive, practitioners also encounter negative experiences online, some of which could be significantly deleterious to the community. We discuss the implications of these findings, such as creating awareness of the quality and reliability of code snippets, improving code searches, code validation and outdated code detection and attribution of code snippets.
... Second, we then carried out a systematic literature review on primary studies on code snippets to provide a deep comprehension of what has been done on code snippet quality in order to identify gaps in this area of research and synthesise relevant quality models. We used Stack Overflow 2 as our case study because it is the most popular community question and answer portal developers use (Meldrum et al., 2020a). Third, we investigated practitioners views on code snippets to gain a rich and detailed knowledge of how they judge the quality of code snippets and compare academics and practitioners views on code snippet quality. ...
... These refer to the insecurities or weaknesses found in Stack Overflow code snippets (mainly in the answer codes). They can result from the following; code injection (Chen et al., 2019a;Rahman et al., 2019a;Zhang et al., 2021), lack of education in addressing security issues (Acar et al., 2016), complex API and extensive code snippets (Meldrum et al., 2020a;Meng et al., 2018;Ye et al., 2018), and prioritisation of other requirements over security (Acar et al., 2016;Meng et al., 2018;Zhang et al., 2021 Overflow code snippets could pose serious security threats to real-world software systems such as Android apps (Acar et al., 2016;Chen et al., 2019a;Ye et al., 2018). Acar et al. (2016) identified that not addressing the security implications of using insecure code snippets creates the possibility for developers to copy and paste insecure functional solutions that avoid existing security measures without realising their actions. ...
... Java 10 is one of the most popular and researched general-purpose programming languages on Stack Overflow (Ahmad & Cinnéide, 2019;Meldrum et al., 2020a;Treude et al., 2011;Ye et al., 2018). This popularity is partly due to the Java programming language for Android application development (Acar et al., 2016;Chen et al., 2019a;Fischer et al., 2017). ...
... Therefore, SO ecosystem encourages many programmers to not only help each other solve their programming questions voluntarily, but also to showcase their ability in programming problem solving and seeking a better job (Xu et al., 2020). Nevertheless, with its rise in popularity, issues such as duplication of questions (Wang et al., 2020) and the quality of the answers in response to the questions on the platform (Meldrum et al., 2020) greatly affect the browsing experience by programmers when searching for answers through this platform. ...
Article
Full-text available
Stack Overflow (SO) is one of the largest discussion platforms for programmers to communicate their ideas and thoughts related to various topics like software development and data analysis. Many programmers are actively contributing to this platform and discuss about Python programming language. To better study the topics related to Python questions posted on the platform, a text analytics approach incorporating text preprocessing steps and Latent Dirichlet Allocation (LDA) topic modelling algorithm is proposed. The two main objectives of this study are: to discover and compare the topics of the questions about Python programming language posted on SO from 2008 to 2016, and to analyze questions about Python programming language with high votes posted on SO from 2008 to 2016 using topic modelling technique with a suitable number of topics. From the study, we find that the topics of the Python questions posted on Stack Overflow have gradually shifted towards those related to data modelling and analysis from 2008 to 2016. Furthermore, the study also shows that a suitable number of topics using the topic modelling technique yield a high coherence score concerning the topic model in use, which is important to extract more meaningful topics from the collection of Python questions.