Domain Specific Languages and their Type Systems

Arjan van der Meer

Promotor: prof.dr. M.G.J. van den Brand (TU/e)
Co-promotor: dr. A. Serebrenik (TU/e)
Technische Universiteit Eindhoven
Date: 10 November, 2014, 16:00
Thesis: PDF

Summary

In recent years, Model Driven Engineering (MDE) has been suggested as an approach to make software development easier, faster and cheaper. In order to achieve this, domain models are used to represent systems and their behavior within a specific knowledge domain such as finance or complex manufacturing systems. Domain experts design and interact with these models. As domain experts frequently have limited knowledge of software engineering, facilitating model design and interaction becomes imperative. A popular solution addressing this challenge consists in creating Domain-Specific Languages (DSLs). As opposed to General Purpose Languages (GPLs), DSLs include special constructs for domain concepts, and exclude constructs that are not useful in the domain. Using these constructs, models can be made smaller, clearer and more expressive.

One disadvantage of DSLs is that each DSL needs its own set of tools to implement it in order to make the DSL effective. Tools are used to create, display and modify models, to convert models between different representations and languages, and to express the behavior or computation expressed in the model. From a theoretical perspective, these tools are all based on two fundamental aspects of any formal language, its syntax and semantics. The syntax describes how models in the language are constructed. Usually, the syntax is split into two parts, one that defines the concepts at a conceptual level, called abstract syntax, and a part that describes how models are presented to the user, called concrete syntax. Most formal languages allow a user to create models in concrete syntax, which is then converted to a representation in abstract syntax by a so-called parser.

A common step after parsing is type checking. The purpose of type checking is twofold. First of all, type checking uses the semantics of the language to recover more explicit information from the result of parsing. This more explicit information might be required for the subsequent model processing such as conversion between different representations and languages. For instance, the type information can also be used to select between different possible interpretations of similar constructs, allowing more concise representations of detailed models. Second, type checking provides early error diagnostics. Indeed, as the name suggests, type checking revolves around the concept of types. Types describe set of values that can be produced as results of computations, or are expected by computations as input. By comparing the type of the values produced to the expected type of the input, inconsistencies can be discovered before they cause failures or errors.

These features make type systems useful for many languages. However, constructing a type checker takes significant effort and requires specific skills. Additionally, it is difficult to verify whether a manually constructed type checker indeed implements the desired type system. We propose to address this by generating type checkers based on specifications. We believe this process will reduce the skill required to create a type checker and clarify the relation between type checker and type system, making it easier both to create a type checker and to evaluate the usefulness of type system constructs for different languages. However, in order to specify type systems, a suitable formalism is needed. This formalism should be precise, but accessible to language designers, and support the language design process. This leads to the primary research question of this thesis: How can DSL type systems be specified in an understandable, formal and evolvable way

To decide what properties a type system formalism needs, we conducted a Systematic Literature Review (SLR) to discover the kind of type systems DSLs have. We found DSL type systems are generally strong, static and not object-oriented. Based on the SLR, exploration of the requirements of DSL stakeholders and previous experience, we decided to select Modular Structural Operational Semantics (MSOS) as our first candidate as type system specification formalism. MSOS is a semantics formalism developed to specify operational semantics in a syntax-oriented and inductive way. Until now, MSOS has mainly been used to specify dynamic semantics, but theoretically it can be used for static semantics as well. We tested this by specifying part of the type system of the modeling language Compositional Interchange Format (CIF) in MSOS. We found that was an effective specification method, that in particular allowed the domain experts to give rapid feedback on how the type system should behave.

Once we had a type system, we wanted to create an implementation of it that could type CIF models. While doing this, we discovered that MSOS was not as convenient for specifying type systems as we initially presumed. In particular, we found MSOS had little support to define some of the properties of the CIF type system in an efficient manner, and the abstract nature of the rules make it hard to connect them to the actual data structures used by the parser and other tools, limiting interoperability.

To solve these problems, we decided to create our own language inspired by MSOS, EMF-TL. The primary theoretical difference between EMF-TL and MSOS lies in the addition of constraint conditions, which allow us to define CIF constructs that require multiple MSOS rules to define in one EMF-TL rule instead. Furthermore, EMF-TL is based on the Eclipse Modeling Framework, EMF, a framework designed to improve interaction between modeling tools by providing a common infrastructure. EMF aims to make DSL design easier by providing a “One Stop Shop” with tools that cover a variety of use cases. By basing our language on EMF, we can use the facilities the framework provides to connect with the infrastructure of the base language of our type system, CIF in our first example, directly.

We then implemented EMF-TL using the ATL transformation language and the \eclipse constraint engine. By using these pre-existing technologies we were able to create a flexible and reasonably efficient prototype with only limited effort. Because the CIF language is also EMF-based, we could use this implementation to create a prototype type checker for CIF models. Using this prototype, we could demonstrate that EMF-TL indeed solves the problems we had with MSOS and can effectively implement the CIF type system.

To validate EMF-TL further, we then carried out a number of other case studies, to evaluate the general applicability of EMF-TL. The three languages we choose for these case studies are WebDSL, an object-oriented web design language, mCRL2, a process language with algebraic data types, and POOSL, a process modeling language with object features. With these three languages, we covered both a representative set of different type system features, but also different levels of complexity and of existing type checkers. We implemented partial or complete type systems for all three languages, demonstrating the flexibility of EMF-TL.

We conclude that EMF-TL reinforces EMF as a “One Stop Shop” solution for defining DSLs, enabling the definition of non-trivial DSL type systems that can be seamlessly integrated with existing Ecore infrastructure. Overall, we have demonstrated that type system specifications are an effective and practical technique in domain specific language design.