Deriving behavioral Specifications of Industrial Software Components

Kousar Aslam

promotor: prof.dr. Mark van den Brand (TU/e)
co-promotors: dr.ir. Loek Cleophas (TU/e) and dr.ir. Ramon Schiffelers (TU/e and ASML)
Eindhoven University of Technology
Date: 14 June, 2021
Thesis: PDF

Summary

High-tech companies are struggling with the maintenance of inherently complex software. To facilitate maintenance of legacy software, a comprehensive understanding of the software’s behavior is essential. In terms of component-based software engineering, it is necessary to completely understand the behavior of components in relation to their interfaces, i.e., their interface protocols, and to preserve this behavior during maintenance activities of the components. Given the large scale software systems, automated techniques are required to infer these interface protocols. In our work, we used the active learning technique for this purpose and provide several contributions towards its applicability and scalability in an industrial environment, discussed below.

During maintenance activities, the external behavior of software components need to be preserved so that the whole system can function in the same way as before. We presented a methodology to infer interface protocols of software components using active learning and designed a framework to apply active learning to industrial software components developed with model-driven engineering (MDE), in Chapter 2. We performed a large scale case study of applying active learning on 202 ASML software components to analyze its scalability in industry. As already discussed in literature, out of the two phases of active learning process i.e., learning and testing, testing was found to be the bottleneck for scalability of the technique (Chapters 3). Therefore, there is a need for efficient means to improve the testing phase.

When performing active learning, conformance testing techniques are usually used to test the conformance between hypothesis learned by learning algorithm and the system under learning. Research has been done previously to improve efficiency of these testing techniques. Extending this research, we investigated the impact of using different search strategies for navigating the search space for a single conformance testing technique (Chapter 4), on the same set of industrial software components as used in Chapter 3. Difference in performance among different search strategies suggests that the order of navigation through the search space is relevant for the performance of active learning process. The performance of the learning process can also be significantly improved by combining active and passive learning techniques. We explored this research direction by introducing the idea of incorporating logs and passive learning results into the active learning framework. Again, a thorough industrial evaluation on MDE-based components discussed in Chapter 5 provides a validation for this work.

So far we have been validating our methods of applying active learning on MDE-based components. The main challenge is to learn unknown (legacy) software components which may lead to unknown practical problems as well. To facilitate this, we present a general learning architecture to perform active learning on component-based software operating in a client/server paradigm. An interfacing protocol is also presented that provides a systematic description to handle the communication between the active learning tool and the system under learning (Chapter 6). This work has been discussed in the context of ASML and used to perform a case study of learning real ASML software components (Chapter 7).

The contributions discussed above give useful insights about the practical applications of active learning. The work presented in this thesis opens up several interesting directions for research in the area of active learning and reverse engineering (Chapter 8).