Automaton-based Techniques for Optimized Term Rewriting
Rick Erkens
Promotors: prof.dr. Jan Friso Groote (TU/e)
Co-promotors: dr. Bas Luttik (TU/e)
Eindhoven University of Technology
Date: 30 September 2024
Thesis: PDF
Summary
Term rewriting is a basic model of computation used in various areas of computer science. In this thesis we are interested in rewriting terms to a normal form as fast as possible. To this end we use automata, which are mathematically defined structures based on states and transitions between them. The automata presented in this thesis are designed to make the actual rewriting of terms more efficient, and can be built from only the term rewrite system. Such techniques are particularly beneficial in settings where many terms need to be rewritten by using the same rewrite system. Even though constructing an automaton can be expensive, the time it takes to build one usually pales in comparison to the time it takes to rewrite a vast number of terms. In this thesis we solve several subproblems of term rewriting efficiently by using automaton-based solutions.
The first subproblem we study is that of root pattern matching, i.e. the problem of finding all pattern matches at a specific position in a term. An automaton-based solution to this problem already exists in the form of adaptive pattern matching automata, but these have no explicit support for patterns without repeating variables. We extend this solution with consistency automata, which efficiently filter the inconsistent patterns yielded by the adaptive pattern matching automata by comparing each pair of subterms at most once. We also define adaptive non-linear pattern matching automata, which interleave function symbol inspections and subterm comparisons, but they may have redundant states.
In term rewriting we are interested in all pattern matches in all subterms, rather than only the matches at the root. The second subproblem we study is that of subterm pattern matching, i.e. the problem of finding all pattern matches at all positions in a term. We define a set automaton to solve this problem. By using a set automaton we can find all matches in a term by inspecting each symbol only once. The automaton uses a set to remember which positions still have to be inspected much like a pushdown automaton uses a stack. The elements in the set can be consumed in any order, or even in parallel, while still inspecting each symbol of the input term only once.
A naive method of using subterm pattern matching for term rewriting is to repeatedly scan the entire term for all matches, choose a match and use it to rewrite the term, until no more matches can be found. As only a small part of the term changes, subsequent runs of the subterm matching algorithm will be the same, hence this process results in a lot of duplicate work. We present a better way of term rewriting by using set automata. By virtue of some set automaton properties it is possible to remember part of the matching work upon performing a rewrite step. Additionally in many cases it is possible to skip matching the symbols of the substituted right-hand sides by precomputing the path they take through the set automaton.
The last subproblem we consider, is writing down the final result of a sequence of rewrite steps without explicitly calculating the intermediate terms. When two backward overlapping rules are executed in succession, the rule that is applied last, directly consumes a part of the rule that was applied first. We formally define a transducer that operates on creeper traces, which are sequences of backward overlapping rules. This creeper trace transducer writes the result of this trace without writing the intermediate terms, and skips overlap between each pair of rules.