Type Error Customization for Embedded Domain-Specific Languages

Alejandro Serrano Mena

promotor: prof.dr. J.T. Jeuring (UU)
copromotor: dr. J. Hage (UU)
Universiteit Utrecht
Date: 23 April 2018
Thesis: PDF

Summary

Domain-specific languages (DSLs) are a widely used technique in the programming world, since they make communication between experts and developers more fluid. Some well-known examples are SQL for databases and HTML for web page description. There are two different approaches to developing DSLs: external – a new compiler is created from scratch – and internal or embedded – the DSL is a library inside a general-purpuse language. We focus on the latter, in particular in DSLs which are embedded in strongly-typed functional languages such as Haskell. Unfortunately, the use of an embedded DSL is not completely transparent, as it ought to be. Since from the point of view of the compiler the DSL is merely a library, error messages are not phrased in domain terms. Not only the abstraction is broken, but also the internals of the library are exposed. The consequence is that programmers have to learn to decipher the error messages to make use of the DSL in a productive way. This problem is not new to the community. In fact, Jeremy Wazny in his thesis “Type inference and type error diagnosis for Hindley/Milner with extensions” and Bastiaan Heeren in his thesis “Top Quality Error Messages” focus on this same problem. This thesis extends their work in three different directions: abstraction, context-dependence, and support for advanced type systems. The first problem arises from the fact that creators of DSLs often need very similar error messages for different situations. None of the previous work tackled the problem of possible duplication, and abstraction over common error patterns. Our solution is two-fold: first, the introduction of more powerful matching mechanisms for errors, such as tree regular expressions and functional patterns, whenever the language designer is able to change the general-purpose language; second, the use of type-level features in Haskell to describe errors. Another common problem is this area is to give different error messages to the same piece of code depending on the context in which it appears. For example, Haskell has a “fmap” function which works on lists of values, but also optional values and others. This is usually an advantage, since the programmer only has to learn one function for many different scenarios. But this complicates error reporting. We propose a two-phase type checking process in which the error messages can be refined by the type of the elements involved. The final research question in whether our approach can be translated among languages. To fulfil this goal we need to develop a shared framework to describe type systems. We take the already-existing Constraint Handling Rules and add features for type variables and universal quantification. To show that this framework is useful, we describe impredicative types – an advanced feature in Haskell – using our extension to CHRs. In addition to the theoretical work, two prototypes have been build. A milder form of our approach has been integreated into the well-known GHC compiler, enabling us to evaluate our approach with real-world libraries.