Efficient Learning and Analysis of System Behavior

Jeroen Meijer

first promotor: prof.dr. J.C. van de Pol (UT)
second promotor: prof.dr. M.I.A. Stoelinga (UT en RU)
University of Twente
Date: 20 September 2019
Thesis: PDF

Summary

In this thesis, we present techniques for more efficient learning and analysis of system behavior. The first part covers novel algorithms and tooling for testing systems based on active automata learning and Linear-time Temporal Logic (LTL) model checking, also called Learning-Based Testing (LBT). Next, we provide an improved learning algorithm that is able to deal with huge alphabets. These are commonly seen in large-scale industrial systems where input symbols contain data parameters. In the second part we discuss improvements for analyzing formal system specifications. We start out by looking at separated read, write and copy dependencies for symbolic model checking to speed up the verification of these specifications. Then, we show that bandwidth reduction techniques, originally designed for sparse matrix solvers, are very capable at reducing the memory footprint of the specifications’ symbolic state space. Implementations of the presented algorithms are subjected to case studies and rigorous experimentation with scientific software competitions. Possible future improvements to these algorithms are discussed as well.

Part 1: Learning-Based Testing

The first contribution involves a design and implementation for LBT in the LearnLib, which is thoroughly documented and made freely available in the public domain. The LearnLib is a library that can be used to automatically infer the behavior of systems by means of active automata learning algorithms. By applying our implementations to the Rigorous Examination of Reactive Systems (RERS) verification competition we show how well learning algorithms such as L* in the LearnLib perform in the context of LBT, which was previously unknown. An investigation into considering both the safety and liveness aspect of LTL properties separately is also done. We show that on the one hand, confirming counterexamples to safety properties on a running system is more straightforward than confirming lasso-shaped counterexamples to liveness properties. On the other hand, falsifying liveness properties seems to be more informative to learning algorithms than falsifying safety properties.

The second major contribution to LBT is a new learning algorithm that extends regular learning algorithms for finite state machines with automated refinement for partitioned alphabets. The key feature of our algorithm is a procedure that checks whether the partitioned alphabet, or the learned hypothesis automaton needs to be refined. The decision of the procedure is based on resolving non-determinism that may arise if the alphabet partitioning is too coarse. We subject the new algorithm again to the RERS challenge, in order to give software testers insight into the performance of learning algorithms available in the LearnLib: the ADT algorithm seems to be a good choice in general.

Part 2: Symbolic Model Checking

The first contribution to symbolic model checking exploits the separated read, write and copy dependencies between variables and expressions over them. By carefully looking at which variables are read, written or copied in assignments, we can reduce the number of next-state function calls required to compute the state space of a formal system specification. The new next-state function, an interface between a back-end storing the state space, and a language front-end, shows a reduced runtime for analyzing specifications written in the mCRL2 language in particular. We also show that analyzing Petri nets taken from the software competition called the Model Checking Contest (MCC) becomes more tractable with our improvements.

The second contribution to symbolic model checking, based on bandwidth reduction, relies on symmetrizing the matrix that encodes the dependency relation between variables and assignments in system specifications. We discuss several means to obtain a symmetric dependency matrix, as well as a method for de-symmetrizing the matrix. The latter method is necessary for obtaining separate optimized orders for both variables and assignments. Decades old bandwidth reduction algorithms can be run on symmetric dependency matrices, including Cuthill-McKee’s and Sloan’s algorithms. By experimenting with the MCC, we find that running Sloan’s algorithm usually produces the best results; at least on par with the FORCE algorithm, which is the current state-of-the-art.

The improvements are implemented in the LTSmin model checker, and made available freely in the public domain. By interfacing LTSmin with the ProB model checker we show that specifications written in the B-method can now be verified more efficiently.