Figure 4 - uploaded by Hongyu Zhang
Content may be subject to copyright.
Mining sample code using Bing  

Mining sample code using Bing  

Source publication
Conference Paper
Full-text available
In programming practice, developers often need sample code in order to learn how to solve a programming-related problem. For example, how to reuse an Application Programming Interface (API) of a large-scale software library and how to implement a certain functionality. We believe that previously written code can help developers understand how other...

Similar publications

Conference Paper
Full-text available
Lehman’s well-known laws of software evolution have existed since the early 1980’s and although they have been nuanced, augmented and discussed many times since then, software and software development practices have changed dramatically since then, not least due to the rise and popularity of open source software (OSS). OSS is written collaborativel...

Citations

... For instance, the Creative Help system assists in creative writing by automatically proposing new sentences within a story (Roemmele and Gordon, 2018). In coding, the Bing Developer Assistant (BDA) system provides code suggestions during programming by recommending sample API code from public software repositories (Zhang et al., 2016). Moreover, some systems focus specifically on the review phase to support the writing process. ...
Conference Paper
Full-text available
Advances in natural language processing (NLP) and machine learning (ML) have resulted in numerous designs for learning systems in the IS community, which offer learners intelligent feedback during the writing process. Despite the increasing interest in research on learning systems for writing skills, there still needs to be more empirical knowledge regarding the impacts of intelligent feedback in different phases of cognitive writing. Therefore, my objective is to explore the effects of intelligent feedback on students' writing skills. I center my investigation on the cognitive writing model, examining how intelligent feedback influences student outcomes. Through a field experiment involving 101 students, I demonstrate the positive effects of intelligent feedback on students' writing skills across all stages of writing (planning, translating (writing), and revision). In addition, I was able to gain initial insights into the phases of writing in which intelligent feedback is most effective.
... Another example can be found in the literature that supports writing code. The Bing Developer Assistant (BDA) system gives the user suggestions for lines of code during programming by recommending sample API code from public software repositories (Zhang et al. 2016). Other systems focus mainly on the review phase in supporting the writing process. ...
... Users can access the system from any device with an internet connection and can collaborate with others in real-time. Inspired by research in the field of AI-based information systems, we have added the characteristics Mobile Application (Imran 2022) and Plugins and Extensions (Zhang et al. 2016). Mobile Applications are designed to run on smartphones and tablets. ...
Conference Paper
Full-text available
In the field of natural language processing (NLP), advances in transformer architectures and large-scale language models have led to a plethora of designs and research on a new class of information systems (IS) called writing support systems, which help users plan, write, and revise their texts. Despite the growing interest in writing support systems in research, there needs to be more common knowledge about the different design elements of writing support systems. Our goal is, therefore, to develop a taxonomy to classify writing support systems into three main categories (technology, task/structure, and user). We evaluated and refined our taxonomy with seven interviewees with domain expertise, identified three clusters in the reviewed literature, and derived five archetypes of writing support system applications based on our categorization. Finally, we formulate a new research agenda to guide researchers in the development and evaluation of writing support systems.
... Apart from GUI search approaches, retrieving code through NL queries has been studied extensively in research before (McMillan et al. 2011;Cambronero et al. 2019;Gu et al. 2018;Lv et al. 2015;Zhang et al. 2016). These approaches range from the application of traditional IR algorithms to specifically developed semantic Deep Learning architectures to find relevant program code for a given NL query. ...
Article
Full-text available
Rapid GUI prototyping has evolved into a widely applied technique in early stages of software development to facilitate the clarification and refinement of requirements. Especially high-fidelity GUI prototyping has shown to enable productive discussions with customers and mitigate potential misunderstandings, however, the benefits of applying high-fidelity GUI prototypes are accompanied by the disadvantage of being expensive and time-consuming in development and requiring experience to create. In this work, we show RaWi, a data-driven GUI prototyping approach that effectively retrieves GUIs for reuse from a large-scale semi-automatically created GUI repository for mobile apps on the basis of Natural Language (NL) searches to facilitate GUI prototyping and improve its productivity by leveraging the vast GUI prototyping knowledge embodied in the repository. Retrieved GUIs can directly be reused and adapted in the graphical editor of RaWi. Moreover, we present a comprehensive evaluation methodology to enable (i) the systematic evaluation of NL-based GUI ranking methods through a novel high-quality gold standard and conduct an in-depth evaluation of traditional IR and state-of-the-art BERT-based models for GUI ranking, and (ii) the assessment of GUI prototyping productivity accompanied by an extensive user study in a practical GUI prototyping environment.
... This approach is ubiquitous (arising as early as 1985 [1]), appearing in IDEs, purpose-specific editors like Jupyter [58] or Excel [60], as well as specific-domains such as exploratory data analysis [50], data visualization [78], and computational notebooks [49]. This strategy might take the visual form of an autocomplete (as in IntelliSense [59] or Calcite [61]) or through UIs with search-like signifiers like a query bar (as in Blueprint [11] or Bing Developer Assistant [97]). The interface for asking for a suggestion (and therein specifying intent and context) can involve caret position [11,59,61,66], code comments [95], partial implementations [74], integrated examples [67], or explicit querying. ...
Preprint
Full-text available
AI-powered code assistants, such as Copilot, are quickly becoming a ubiquitous component of contemporary coding contexts. Among these environments, computational notebooks, such as Jupyter, are of particular interest as they provide rich interface affordances that interleave code and output in a manner that allows for both exploratory and presentational work. Despite their popularity, little is known about the appropriate design of code assistants in notebooks. We investigate the potential of code assistants in computational notebooks by creating a design space (reified from a survey of extant tools) and through an interview-design study (with 15 practicing data scientists). Through this work, we identify challenges and opportunities for future systems in this space, such as the value of disambiguation for tasks like data visualization, the potential of tightly scoped domain-specific tools (like linters), and the importance of polite assistants.
... Wang uses topic-enhanced dependence graphs [42]. Code recommendation based on source code has also studied at the fragment [19,38,45] and component level [10]. ...
Preprint
Full-text available
We describe an intelligent assistant based on mining existing software repositories to help the developer interactively create checkable specifications of code. To be most useful we apply this at the subsystem level, that is chunks of code of 1000-10000 lines that can be standalone or integrated into an existing application to provide additional functionality or capabilities. The resultant specifications include both a syntactic description of what should be written and a semantic specification of what it should do, initially in the form of test cases. The generated specification is designed to be used for automatic code generation using various technologies that have been proposed including machine learning, code search, and program synthesis. Our research goal is to enable these technologies to be used effectively for creating subsystems without requiring the developer to write detailed specifications from scratch.
... The Bing Developer Assistant (Y. Wei et al., 2015;Zhang et al., 2016) (also referred to as Bing Code Search) was an experimental extension for Visual Studio initially released in 2015. It enabled an in-IDE, identifier-aware search for code snippets from forums such as Stack Overflow. ...
Preprint
Full-text available
Large language models, such as OpenAI's codex and Deepmind's AlphaCode, can generate code to solve a variety of problems expressed in natural language. This technology has already been commercialised in at least one widely-used programming editor extension: GitHub Copilot. In this paper, we explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance. We draw upon publicly available experience reports of LLM-assisted programming, as well as prior usability and design studies. We find that while LLM-assisted programming shares some properties of compilation, pair programming, and programming via search and reuse, there are fundamental differences both in the technical possibilities as well as the practical experience. Thus, LLM-assisted programming ought to be viewed as a new way of programming with its own distinct properties and challenges. Finally, we draw upon observations from a user study in which non-expert end user programmers use LLM-assisted tools for solving data tasks in spreadsheets. We discuss the issues that might arise, and open research challenges, in applying large language models to end-user programming, particularly with users who have little or no programming expertise.
... Reusing the outcomes of previous projects could increase the efficiency of software development. Although it is not clear how general these benefits could be, some researchers (e.g., [50][51][52]) think that source code reuse could improve programmer productivity. In a study conducted by Selby [53], the findings show that module reuse could reduce the average "total development time per source line" from 1.089 man hours to 0.047 man hours. ...
Article
Full-text available
Traceability allows engineers to trace and monitor the relationships between software artifacts. Monitoring these relationships is vital to many software engineering activities such as software understanding and reuse. Grasping these relationships is studied in the framework of Requirement Traceability Recovery (RTR). RTR is vital to software reuse as it allows the identification and comparison of requirements of new and existing systems, and hence the reuse of software system components. Due to the difficulties in recovering the traceability links manually, only few software development processes take the monitoring of these relationships fully into account. Many attempts to automate the RTR task that enjoyed some success are based on methods from the field of information retrieval. However, these methods only concentrate on calculating the textual similarity between various software artifacts and do not take into account other properties of the artifacts. In this paper, we propose a search-based RTR approach using genetic algorithms, that relies not only on semantic similarity between software artifacts, but also takes into account the history of reuse of the artifacts, and incorporates knowledge into RTR in the form of user (designer/developer) feedback. Experimental results show that the approach is promising.
... Researchers have consecrated many efforts in offering fundamental recommendation features as a primary recommendation for modern IDEs [38], [39], [40]. In recent years, concerning further improve the efficiency of developers, advanced research works based on recommendations are emerging [4], [41], [42], [43], [44], [45], [46], [47]. As another example, Gu et al. [48] present a graph kernelbased approach to the selection of API usage examples by representing source code as object usage graphs. ...
Article
Android developers are often faced with the need to learn how to use different APIs suitable for their projects. Automated API recommendation approaches have been invented to help fill this gap, and these have been demonstrated to be useful to some extent. Unfortunately, most state-of-the-art works are not proposed for Android developers, and the ones dedicated to Android app development often suffer from high redundancy and poor run-time performance, or do not target the problem of recommending API usage patterns. To address this gap we propose to the community a new tool, namely APIMatchmaker , to recommend API usages by learning directly from similar real-world Android apps. Unlike existing recommendation approaches, which leverage a single context to find similar projects, we innovatively introduce a multi-dimensional, context-aware, collaborative filtering approach to better achieve the purpose. Specifically, in addition to code similarity, we also take app descriptions (or topics) into consideration to ensure that similar apps also provide similar functions. We evaluate APIMatchmaker on a large number of real-world Android apps and observe that APIMatchmaker yields a high success rate in recommending APIs for Android apps under development, and it is also able to outperform the state-of-the-art.
... The main idea behind these projects is to automatically generate code from user intent expressed in natural language. Some of the developed tools and prototypes include the PSI Program Synthesizer (1976) [43], the Programmer's Apprentice (1988) [44], Strathcona (2005) [45,46], Prospector (2005) [47], ParseWeb (2007) [48], MAPO (2009) [49], Code Suggestion Plugin for Eclipse (2012) [50], SLANG Code Completer (2014) [51], Bing Developer Assistant (2015) [52,53], A Programmer's Apprentice Again (2015) [54], and Amanuensis (2018) [55]. One common approach to develop these tools is to re-use large corpora of already written code available either on the web (such as public code repositories, open-source projects, code search engines, and developers' community platforms), or in the local repositories, libraries, and APIs of a programming language. ...
... One common approach to develop these tools is to re-use large corpora of already written code available either on the web (such as public code repositories, open-source projects, code search engines, and developers' community platforms), or in the local repositories, libraries, and APIs of a programming language. Some of the built tools (such as Strathcona [45], Prospector [47], ParseWeb [48], MAPO [49], and Bing Developer Assistant [52,53]) search and mine the available corpora to find code fragments relevant to a programmer's query, rank the results and return the top answers to the programmer. Others (such as Code Suggestion Plugin for Eclipse [50] and SLANG code completer [51]) use the available code corpora for statistical purposes and suggest code words and lines which frequently appear together. ...
Article
Full-text available
With the rise in initiatives such as software ecosystems and Internet of Things (IoT), developing web Application Programming Interfaces (web APIs) has become an increasingly common practice. One main concern in developing web APIs is that they expose back-end systems and data toward clients. This exposure threatens critical non-functional requirements, such as the security of back-end systems, the performance of provided services, and the privacy of communications with clients. Although dealing with non-functional requirements during software design has been long studied, there is still no framework to specifically assist software developers in addressing these requirements in web APIs. In this paper, we introduce Rational API Designer (RAPID), an open-source assistant that advises on designing non-functional requirements in the architecture of web APIs. We have equipped RAPID with a broad range of expert knowledge about API design, systematically collected and extracted from the literature. The API design knowledge has been encoded as a set of 156 rules using the Non-Functional Requirements (NFR) multi-valued logic, a formal framework commonly used to describe non-functional and functional requirements of software systems. RAPID uses the encoded knowledge in a stepwise inference procedure to arrive from a given requirement, to a set of design alternatives to a final recommendation for a given API design specification. Seven well-experienced software engineers have blindly evaluated the accuracy of RAPID’s consultations over seven different cases of web API design and on providing design guidelines for thirty design questions. The results of the evaluation show that RAPID’s recommendations meet acceptable standards of the majority of the evaluators 73.3% of the time. Moreover, analysis of the evaluators’ comments suggests that more than one-third of the unacceptable ratings (33.8%) given to RAPID’s answers are due to valid but incomplete design guidelines. We thus expect that the accuracy of the consultations will increase as RAPID’s knowledge of API design is extended and refined.
... Other approaches in the area of API recommendations do not focus on recommending methods, but for example on code snippets [15,34,35,45] or parameters [14] instead. ...