Nominal Techniques and Black Box Testing for Automata Learning

Joshua Moerman

first promotor: prof. dr. F.W. Vaandrager (RU)
second promotor: prof. dr. A. Silva (University College London, London, United Kingdom)
copromotor: dr. S.A. Terwijn (RU)
Radboud University
Date: 1 July 2019
Thesis: PDF


Automata learning plays a more and more prominent role in the field of software verification. Learning algorithms are able to automatically explore the behaviour of software. By revealing interesting properties of the software, these algorithms can create models of the, otherwise unknown, software. These learned models can, in turn, be inspected and analysed, which often leads to finding bugs and inconsistencies in the software.
An important tool which we need when learning software is test generation. This is the topic of the first part of this thesis. After the learning algorithm has learned a model and constructed a hypothesis, test generation methods are used to validate this hypothesis. Efficiency is key: we want to test as little as possible, as testing may take valuable time. However, our tests have to be complete: if the hypothesis fails to model the software well, we better have a test which shows this discrepancy.
The first few chapters explain black box testing of automata. We present a theoretical framework in which we can compare existing n-complete test generation methods. From this comparison, we are able to define a new, efficient algorithm. In an industrial case study on embedded printer software, we show that this new algorithm works well for finding counterexamples for the hypothesis. Besides the test generation, we show that one of the subproblems – finding the shortest sequences to separate states – can be solved very efficiently.
The second part of this thesis is on the theory of formal languages and automata with infinite alphabets. This, too, is discussed in the context of automata learning. Many pieces of software make use of identifiers or sequence numbers. These are used, for example, in order to distinguish different users or messages. Ideally, we would like to model such systems with infinitely many identifiers, as we do not know beforehand how many of them will be used.
Using the theory of nominal sets, we show that learning algorithms can easily be generalised to automata with infinite alphabets. In particular, this shows that we can learn register automata. Furthermore, we deepen the theory of nominal sets. First, we show that, in a special case, these sets can be implemented in an efficient way. Second, we give a subclass of nominal automata which allow for a much smaller representation. This could be useful for learning such automata more quickly.