Automated Fault Localization for Service-Oriented Software Systems

Cuiting Chen

promotor: prof.dr. A. van Deursen (TUD)
co-promotor: dr. A.E. Zaidman (TUD)
Technische Universiteit Delft
Date: 27 May, 2015, 10:00
Thesis: PDF

Summary

In this thesis, we have focused on applying Spectrum-based Fault Localization (SFL) to diagnose Service-Oriented Systems at runtime. We reused a framework-based online monitoring technique to obtain the service transaction information. We devised a three-phased oracle and combined this with monitoring to detect system failures at runtime. Both monitor and oracle generate component involvement and pass/fail information required by SFL. We conducted an experiment with a case system to validate the performance of SFL in diagnosing service-oriented systems. The results show that SFL is able to correctly identify faulty service operations in 72% of the cases. With the preliminary attempt of applying SFL to service-oriented systems, we discovered that the monitoring topology can influence the accuracy of diagnosis for service-oriented systems. Therefore, we applied Genetics Algorithms (GA) to find the better diagnosable monitoring topologies. With the assistance of GA techniques, we have identified the following characteristics of better diagnosable monitoring topologies:

  • invoking components in isolation
  • more monitoring points, including the monitoring of inactivity
  • including the monitoring of the system context
  • including tracing information

Through a careful investigation of the failed diagnoses from the initial step of applying SFL to service-oriented systems, we found that the main reasons for failed diagnoses can be attributed to (1) tight interactions between services and (2) fault intermittency of services. In order to improve the diagnosis, we have proposed two possible solutions to deal with tight interaction and fault intermittency. One solution is to increase the monitoring granularity by adding monitors at the code block level in the service implementation. The other solution is to include the monitoring of invocation links between services into the SFL diagnosis. The former solution is able to achieve 100% correct diagnoses, however, it requires the ownership of services to place monitors inside the services. The latter solution can be done with a more realistic set-up and it can also significantly improve the diagnoses.

We have also assessed the runtime overhead caused by the diagnosis for service-oriented system. Since the diagnosis engine in our implementation is detached from the service-oriented system, the overhead of diagnosis imposed on the running service-oriented system is from monitoring. We measured the monitoring overhead at different levels of granularity, and found out that the monitoring at the service communication level consumes generates high overhead, whereas the monitoring at the service implementation level is much lower, but highly depends on the number of monitors deployed.