Fall Days on Legacy and Evolution

Hotel De Wageningsche Berg, Wageningen (see Google maps)
October 28 – November 1, 2019

The IPA Fall Days are an annual multi-day event, dedicated to a specific theme of current interest to the research community of IPA. This year’s Fall Days are dedicated to Legacy and Evolution.

The programme of the Fall Days is composed with the help of Thomas Degueule (CWI), Yaping (Luna) Luo (TU/e and ING), Tom van Dijk (UT), and the IPA PhD council.


The Fall Days start with registration and lunch on Monday 28 October, and conclude mid-afternoon on Friday 1 November. Most of the sessions are devoted to various themes and topics around software legacy and evolution. On Thursday afternoon there will be a special session around a keynote talk by Moritz Beller, one of two winners of the IPA Dissertation Award 2018.

Monday October 28th

12:00–12:30 Registration
12:30–13:30 Lunch
14:00–14:10 Opening
14:10–15:10 Johan Fabry (Raincode Labs, Belgium)
Software migration: some devils and some details
At Raincode and Raincode Labs we modernize legacy software along two big axes: firstly we migrate software off of the mainframe (Raincode) and secondly we provide bespoke compilers for old and niche programming languages (Raincode Labs). In this talk I will speak about both sides of the business. First I will present some motivations for migrating software off of the mainframe. I will then highlight various unique challenges of such an effort, showing some devils in the process. Second I will talk about a research project in which I am involved. It aims to use graph mining techniques to ease legacy software modernization, by revealing patterns in source code written in legacy languages. I will give an overview of our current infrastructure, show some details and some results.
15:10–15:45 Coffee break
15:45–16:30 Amir Saeidi (Mendix and UU)
Learning from identifiers to assist developers in evolution of legacy
software systems
For many enterprises, legacy systems remain the core foundation on top of which all business processes function. They simply cannot be
removed as they implement and store critical business logic. However, proper documentation, skilled manpower, and resources to evolve or maintain these legacy systems are typically scarce. Hence, acquiring knowledge about them is fundamental to perform downstream tasks such as code evolution and refactoring. Classical approaches to source code understanding treat the software system as a mathematical object with formal semantics in order to infer sound properties about its behavior. However, many software engineering tasks including those for program comprehension (e.g., code summarization, feature location and ranking code completion suggestions) need not be sound or complete. Machine learning provides the means to perform these tasks by exploiting the naturalness of software systems to mine statistical patterns that characterize the software. A major source of information are source code identifiers, in that may contain domain and application concepts embedded within them. Learning from identifiers is essential in reasoning about source code. In this talk, we employ techniques from relational and language learning to project identifiers and their relationships into a vector space, where learning tasks such as prediction can be effectively and efficiently performed. Within this context, we look at two prediction tasks: VarNaming (predicting the name of a variable given its usage), and VarMisuse (suggesting the correct variable to be used in a given program location). We shall demonstrate the effectiveness of learning models in giving meaningful recommendations. Furthermore, we show that by exploring the learned vector space embeddings of identifiers, we can find interesting relationships including prevalent topics in the source code as well as analogies between identifiers.
16:30–17:15 Xixi Lu (UU)
Trace Clustering on Very Large Event Data in Healthcare Using Frequent Sequence Patterns
Trace clustering has increasingly been applied to find homogenous process executions. However, current techniques have difficulties in finding a meaningful and insightful clustering of patients on the basis of healthcare data. The resulting clusters are often not in line with those of medical experts, nor do the clusters guarantee to help return meaningful process maps of patients’ clinical pathways. After all, a single hospital may conduct thousands of distinct activities and generate millions of events per year. We present a novel trace
clustering approach by using sample sets of patients provided by medical experts. The approach is implemented in an open source Process Mining framework known as ProM and evaluated using a large data set obtained from a university medical center. The evaluation shows F1-scores of 0.7 for grouping kidney injury, 0.9 for diabetes, and 0.64 for head/neck tumor, while the process maps show meaningful behavioral patterns of the clinical pathways of these groups, according to the domain experts.
17:15 Drinks
18:30 Dinner
Afterwards Opportunity to interact, play board & card games etc. in hotel bar

Tuesday October 29th

9:00–10:00 Serge Demeyer (University of Antwerp, Belgium)
Agile Software Engineering — Opportunities for Industry 4.0
Industry 4.0 is the current trend of automation and data exchange in manufacturing technologies. This trend forces the manufacturing industry to switch to a more agile way of working, hence software engineering teams can and should take a leading role therein. This talk will explore the state-of-the-art in agile software development and the opportunities this may present for Industry 4.0. Consequently it will address questions like: Will our test suite detect critical defects early? Where should we fix a defect? How long will it take to fix defects? Which team members get frustrated? Can we use bots to process easy issues?Serge Demeyer is a professor at the University of Antwerp and the spokesperson for the ANSYMO (Antwerp System Modelling) research group. He directs a research lab investigating the theme of ”Software Reengineering” (LORE – Lab On REengineering). Serge Demeyer is a spokesperson for the NEXOR interdisciplinary research consortium and an affiliated member of the Flanders Make Research Centre. In 2007 he received a “Best Teachers Award” from the Faculty of Sciences at the University of Antwerp and as a consequence remains very active in all matters related to teaching quality. His main research interest concerns software evolution, more specifically how to strike the right balance between reliability (striving for perfection) and agility (optimising for improvements). He is an active member of the corresponding in-ternational research communities, serving in various conference organization and program committees. He has written a book entitled “Object-Oriented Reengineering” and edited a book on “Software Evolution”. He also authored numerous peer reviewed articles, many of them in top conferences and journals.
10:00–10:30 Coffee break
10:30–11:15 Lodewijk Bergmans (Software Improvement Group)
Determining Programming Language Verbosity
How do a 1000 lines of COBOL compare to a 1000 lines of Java, or Haskell? This question is highly relevant if you want to compare source code in different languages, or aggregate metrics from source code components that are written in different languages. Programming languages have different levels of verbosity: how many characters and lines of code are needed to express the same amount of information? Verbosity is a key factor for the average amount of effort needed to write those 1000 lines. In this talk we will discuss how we employed a bit of complexity theory to come with an objective, measurable way of estimating language verbosity, applicable to a wide range of programming languages.
11:15–12:05 Introduction talks by new IPA PhD students: to be determined
12:15–13:15 Lunch
13:45–14:45 Frank de Boer (CWI and UL)
Program correctness of legacy software: uncovering a 20 year old bug in Java
14:45–15:15 Hans-Dieter Hiep (CWI)
Verifying OpenJDK’s LinkedList using KeY (demo)
There is a bug in Java’s LinkedList. In this technical demo session, the KeY theorem prover is shown in action, as we walk through verification of some methods of a repaired LinkedList implementation, and explain the most interesting steps of its correctness proof.
15:15–15:35 Introduction talks by new IPA PhD students: to be determined
15:35–16:05 Coffee break
16:05–17:25 Introduction talks by new IPA PhD students: to be determined
18:30 Dinner

Wednesday October 30th

9:00–10:00 Sandro Schulze (Otto von Guericke Universität Magdeburg, Germany)
Analysis Techniques for Feature and Variability Extraction (from Legacy Systems)
Reuse is a pivotal concept in software development, as it, among others, allows to reuse established code, reduce effort and increases time to market. A common way of reusing software is ad hoc reuse via clone-and-own, that is, entire software systems (or parts thereof) are copied and used as a
starting point for subsequent modifications. While this is efficient in
the short term, this way of reuse comes with costs in the long term,
such as redundant changes or missing reuse opportunities of new features.
These drawbacks are usually addressed by structured reuse using advanced
concepts such as software product lines.
However, making a transition from adhoc to structured reuse or at least
synchronize related software systems with each other requires knowledge
about relations between these systems, which are usually neither
documented nor available elsewhere. Hence, this information must be recovered to identify commonalities and differences of related software systems.
In my talk, I will present and discuss techniques that allow to reverse
engineer this information (called variability mining) from different
artifacts (models, code, requirements).
The resulting information enables developer to evolve such system more
efficiently (while still keeping them physically separate) or even
integrate these systems as a software product line.
10:00–10:30 Coffee break
10:30–11:15 Dennis Dams (ESI, TNO)
Understanding and rejuvenating legacy code: A look inside ESI’s tool box
ESI (part of TNO) helps high-tech industrial partners to get a grip on their legacy code. This starts by gaining a better understanding of their existing code base. We have obtained good results by using an interactive approach using visual models that are tailored to the specific domain. The code transformations that are needed for rejuvenation are often easily expressed in terms of those visual models. Combining automated with manual rewriting of code allows for progressive automation in which recurring transformation patterns are automated; corner-cases are better left to human scrutiny. In this talk I will discuss some of the tools in our legacy code tool box that we take with us to our customers.
11:15–12:00 Sicco Verwer (TUD)
Algorithms for Learning of State Machines from Data: the basics, state-of-the-art, tools, and stories
12:15–13:15 Lunch
13:45–14:45 Eleni Constantinou (TU/e)
Socio-technical health of evolving software ecosystems
Today’s software is mainly relying on open software software components that are typically distributed through package managers for a wide variety of programming languages, and developed and maintained through online distributed software development services like GitHub. Software component repositories are perceived as software ecosystems that constitute complex and evolving socio-technical software dependency networks. Because of their complexity and evolution, these ecosystems tend to suffer from a wide variety of software health issues that can be either technical or social in nature. Examples of such issues include using outdated, unmaintained or obsolete software components; the prolonged presence of unfixed bugs and security vulnerabilities; the abandonment or high turnover of key contributors, suboptimal collaboration between contributors, and many more. This presentation will report on past and ongoing empirical research that studies such health factors within software packaging ecosystems (such as npm, RubyGems) and provide empirical evidence and lessons learned from such health problems.
14:45–15:15 Introduction talks by new IPA PhD students: to be determined
15:15–15:45 Coffee break
15:45–16:45 Joost Gabriels (TU/e)
Funding your research beyond the PhD
16:45–18:00 Individual consultation slots with Joost Gabriels regarding funding opportunities based on your research ideas
18:30 Dinner
20:00 Social event organised by the IPA PhD council

Thursday October 31st

9:00–10:00 Sven-Bodo Scholz (RU)
Code Generation for Improved Software Maintainability and Adaptability
Software development has evolved from being a vehicle to make use of early days’ hardware towards complex systems with several layers of abstraction. Over the years, it has become evident that appropriate layering with suitably chosen interfaces for the individual layers is crucial for dealing more effectively with matters such as software maintenance and software adaptability.
As the top layers become increasingly abstract and information hiding between layers increasingly important, the overheads of this layered approach are on the rise as well. One possible way out of this dilemma is code generation. It is based on the idea of shifting the execution time of parts of the program into compile time: some top layers of abstraction are compiled away into application-specific code prior to the actual execution.
This talk presents some of the experiences of code generation made in the context of SaC, a highly abstract, functional array programming language that is capable of generating high-performance codes for a range of parallel architectures.
10:00–10:30 Coffee break
10:30–11:15 Djamel Eddine Khelladi (CNRS, France)
Handling the co-evolution in modeling languages
Modeling languages are widely adopted nowadays. Metamodels (i.e., language specifications) play a significant role when building a modeling language and its toolings. In fact, metamodels are the foundation for the instantiation of the models, but also to specify constraints (language properties), transformation scripts (model-to-model or model-to-text), and for code generation of a core API. The latter is further enriched by developers with additional code implementing advanced functionalities (language services, tooling, etc.). When a modeling language is evolved to the next released version, the metamodels are evolved as well. As a result, all dependent artifacts (models, constraints, transformations, code) may be impacted and thus may need to be co-evolved accordingly. This talk will highlight the existing challenges around co-evolution and present some of my contributions in this field.
11:15–12:00 Arie van Deursen (TUD)
Software and Data Analytics Research in AFL, the TU Delft — ING AI for Fintech Lab
In 2019, TU Delft and Dutch bank ING announced the launch of the AI for FinTech Lab (AFL). Research lines in this lab relate to regulatory compliance, software experimentation, fairness, software analytics, and autonomous software engineering. In this talk I will present initial results of ongoing joint research in this lab, as well as directions the lab will pursue in the next five years.Arie van Deursen is a professor in software engineering at Delft University of Technology, where he heads the Department of Software Technology. He holds an MSc from Vrije Universiteit (1990) and a PhD degree from the University of Amsterdam (1994). He is the co-founder of Infotron and the Software Improvement Group, two companies that started based on his research collaborations. His research interests include software testing, human aspects of software engineering, and automated software engineering. He currently serves as program co-chair for the IEEE/ACM International Conference on Software Engineering to be held in Madrid, 2021.
12:15–13:15 Lunch
13:45–13:55 Award of IPA Dissertation Award 2018 certificate to Moritz Beller, by award committee chair Marieke Huisman (UT)
13:55–14:40 Moritz Beller (Facebook, USA; graduate of TUD, and co-winner of the IPA Dissertation Award 2018)
Feedback-Driven Software Development
Software developers today crave for feedback, be it from their peers in the form of code review, static analysis tools like their compiler, or the local or remote execution of their tests in the Continuous Integration (CI) environment. With the advent of social coding sites such as GitHub and tight integration of CI services such as Travis CI, software development practices have fundamentally changed. Despite a highly alternated software engineering landscape, however, we still lack a suitable holistic description of contemporary software development practices. Existing descriptions such as the V-model are either too coarse-grained to describe an individual contributor’s workflow, or only regard a sub-part of the development process, like Test-Driven Development (TDD). In addition, most existing models are pre-rather than de-scriptive.
By contrast, in this talk, I will give an overview of Feedback-Driven Development (FDD), a concept I coined in my PhD thesis through a series of empirical studies. I will explain two such studies that are center to my thesis in detail: the “Last Line Effect,” a phenomenon at the boundary between static analysis and code review, and WatchDog, a telemetry platform with which I observed local testing patterns in the Integrated Development Environment (IDE) of developers.
14:40–15:25 Darius Sas (RUG)
What software history can tell us about architectural smells evolution
Understanding the software quality history of a project is of paramount importance when trying to drive its development and future evolution. Software maintenance plays a big role in this decision process. Recently, a lot of research effort has been spent on understanding how to manage architectural smells, a type of issue that affects software at the architecture level and increases maintenance
costs in the long term.
In this presentation, I am going to present my current research on tracking and understanding architectural smells using software repository mining to extract knowledge about their evolution.
15:25–15:55 Coffee break
15:55–16:45 Yanja Dajsuren (TU/e)
Evolution of Cooperative Intelligent Transport System Architectures and Services
In Cooperative-Intelligent Transport Systems (C-ITS) domain, cooperative systems such as traffic management systems, traffic light controllers, and connected vehicle on-board units communicate with each other to increase traffic efficiency and safety for specific transport modes. However, C-ITS systems have been deployed independently from each other with different goals, stakeholders and settings. With continuous progress in C-ITS domain, the evolution and convergence of these systems demands a strategy to coordinate and communicate with each other efficiently. We present our results on the analysis of the existing C-ITS architectures and services and propose a C-ITS reference architecture that can be used for large-scale deployment of C-ITS services across Europe.
16:45-17:25 Introduction talks by new IPA PhD students: to be determined
18:30 Dinner

Friday November 1st

9:00–9:45 Annibale Panichella (TUD)
Automated Test Generation for Unit Testing and Beyond
9:45–10:30 Machiel van der Bijl (Axini)
Model-based testing, theory and practice
In this talk we focus on the application of an academic model-based testing theory in practice. We will cover the following topics:

  • A short recap of the ioco-theory for model-based testing
  • Modeling with data and time in practice
  • A model-based testing approach with examples from practice, including our tools behind model-based testing

Machiel van der Bijl is co-founder of Axini BV, Amsterdam, The Netherlands. Machiel has a broad experience in both theoretical and practical computer science. Before founding Axini he worked for several companies in the financial and embedded/high tech sector. Machiel has a MSc and a PhD degree in computer science from the University of Twente.

10:30-15:30 Special session for PhD students, around the theme of Legacy & Evolution
Around 15:30, transport to an intercity train station in the region will be provided for all attendees.
Social Events

On Monday, drinks will be on IPA between the last session and dinner.
On Wednesday evening, a by now famous IPA social event will once again be organised by the IPA PhD council.
On Friday after the first morning session, a special IPA event for PhD students will take place. Around 15:30, transport to an intercity train station in the region will be provided for all attendees.


Registration is now closed.

To make maximal use of the available capacity, we process applications on the following basis: Registrations are treated “first come, first served”. In principle, all PhD students (IPA and non-IPA) will share a room. Others may also be asked to share if we run out of rooms. Until the registration deadline, we will accept participants until we run out of space at the hotel!

New PhD students

PhD students new to IPA Fall Days are expected to give a brief talk in a slot of 10 minutes. The purpose of such talks is to introduce your research and yourself to the IPA community. You can briefly introduce your research (research area, initial and future research directions), but also yourself (who you are, what your background is, but also e.g. you like or don’t like to do in your spare time). The Fall Days provide a friendly, open, and informal atmosphere to do this, and you will surely be welcomed into the IPA community and receive constructive feedback.


The venue can be reached by pubic transport in a number of ways, all leading to busstop Wageningsche Berg, less than 300m from the hotel:

  1. Take a bus (line 352, 4+ times per hour; or the slower line 51) from Arnhem Centraal to busstop Wageningsche Berg. Travel time half an hour to forty minutes.
  2. Take a bus (line 84/86/88, very frequently) from intercity station Ede-Wageningen (on the Utrecht-Arnhem train line) to Wageningen Busstation, and transfer to line 352 or 51 in the direction of Arnhem. Travel time forty minutes.

Please use e.g. Google Maps or 9292ov.nl to find out which options work well for you.