Figure 1 - uploaded by Tao Xie
Content may be subject to copyright.
The protocol for java.util.zip.zipOutputStream in STD

The protocol for java.util.zip.zipOutputStream in STD

Source publication
Article
Full-text available
Component-based software development has increasingly gained popularity in industry. While correct component usage is critical to successful reuse of components, the expected component usage is rarely specified explicitly. To address this issue, one recent area of research has been to infer specifications of protocols or sequencing constraints usin...

Context in source publication

Context 1
... sequencing constraints on M are conceptually separated into several groups, each of which operates on a particular type of scenario. For example, the zipOutputStream protocol is divided into two groups: DEFLATED and STORED, respectively corresponding to the left and right portions of Figure 1. Sometimes the separated groups are combined by some additional sequencing constraints. ...

Citations

... When behavior models are either absent or inconsistent, inference techniques can be used to extract them [6,7]. FSM inference has received a lot of attentions over the past years [8,7,[9][10][11][12][13]. Surveys on existing work can be found in [7,14]. ...
Article
Full-text available
Finite State Machine (FSM) inference from execution traces has received a lot of attention over the past few years. Various approaches have been explored, each holding different properties for the resulting models, but the lack of standard benchmarks limits the ability of comparing the proposed techniques. Evaluation is usually performed on a few case studies, which is useful for assessing the feasibility of the algorithm on particular cases, but fails to demonstrate effectiveness in a broad context. Consequently, understanding the strengths and weaknesses of inference techniques remains a challenging task.
... Indeed, their minimization process relies on an approximation algorithm to support incompletely specified FSM [105]. For instance, T. Xie [144] uses a k-tail algorithm that merges states from which possible transitions generate the same future messages (up to an established horizon), Prospex uses an extension of the Exbar algorithm [85] and Hsu et al. [67] propose their own offline state merging algorithm to find consistent DFAs out of a built PTA. All these approaches suffer from a scalability issue when applied on large automaton due to the NP-completeness of such algorithms. ...
Article
This thesis exposes a practical approach for the automatic reverse engineering of undocumented communication protocols. Current work in the field of automated protocol reverse engineering either infer incomplete protocol specifications or require too many stimulation of the targeted implementation with the risk of being defeated by counter-inference techniques. We propose to tackle these issues by leveraging the semantic of the protocol to improve the quality, the speed and the stealthiness of the inference process. This work covers the two main aspects of the protocol reverse engineering, the inference of its syntactical definition and of its grammatical definition. We propose an open-source tool, called Netzob, that implements our work to help security experts in their work against latest cyber-threats. We claim Netzob is the most advanced published tool that tackles issues related to the reverse engineering and the simulation of undocumented protocols.
... However, the main difference between their use of this algorithm and ours is in the nature of the input traces and consequently in the obtained LTSes. Indeed, while our LTSes used as input of the k-tail show the interaction among the system's objects, the used models in [17]–[19] of a single component through its interaction with a user via its graphical interface. We consider that our present results constitute a further validation of the k-tail family of algorithms in the context of reverse engineering of sequence diagrams. ...
Conference Paper
Full-text available
The reverse engineering of behavioral models consists in extracting high-level models that help understand the behavior of existing software systems. In the context of reverse engineering of sequence diagrams, most approaches strongly depend on the static analysis and instrumentation of the source code to produce correct diagrams that take into account control flow structures such as alternative blocks ("if"s) and repeated blocks ("loop"s). This approach is not possible with systems for which no source code is available anymore (e.g. some legacy systems). In this paper, we propose an approach for the reverse engineering of sequence diagrams from the analysis of execution traces produced dynamically by an object-oriented application. Our approach is fully based on dynamic analysis and reuses the k-tail merging algorithm to produce a Labeled Transition System (LTS) that merges the collected traces. This LTS is then translated into a sequence diagram which contains alternatives and loops. A prototype of this approach has been tested with a real world application that has been developed independently from the present work. Our results show that this approach can produce sequence diagrams in reasonable time and suggest that these diagrams are helpful in understanding the behavior of the underlying application.
... In contrast to a natural language, formal specifications are less ambiguous and can be processed automatically. To evaluate whether our inference technique produces FSMs that document correct API usage, we tried to recover existing documentation given in textual form (inspired by Xie [21]). A widely used standard reference provides the following documentation on using Java's ZipFile class [18]: A ZipFile can be created by specifying the ZIP file to be read either as String filename or as a File object. . . . ...
Conference Paper
Formal specifications are used to identify programming errors, verify the correctness of programs, and as documentation. Unfortunately, producing them is error-prone and time-consuming, so they are rarely used in practice. Inferring specifications from a running application is a promising solution. However, to be practical, such an approach requires special techniques to treat large amounts of runtime data. We present a scalable dynamic analysis that infers specifications of correct method call sequences on multiple related objects. It preprocesses method traces to identify small sets of related objects and method calls which can be analyzed separately. We implemented our approach and applied the analysis to eleven real-world applications and more than 240 million runtime events. The experiments show the scalability of our approach. Moreover, the generated specifications describe correct and typical behavior, and match existing API usage documentation.
Article
A trace is a record of the execution of a computer program, showing the sequence of operations executed. A trace may be obtained through static or dynamic analysis. An object trace contains only those operations that relate to a particular object.Traces can be very large for longer system executions. Moreover, they lack structure because they do not show the control dependencies and completely unfold loops. Object process graphs are a finite concise description of dynamic object traces. They offer the advantage of representing control dependencies and loops explicitly.This article describes a new technique to extract object process graphs through dynamic analysis and discusses several applications, in particular program understanding and protocol recovery. A case study is described that illustrates and demonstrates use and feasibility of the technique. Finally, statically and dynamically derived object process graphs are compared.