Cascading failures can severely affect the correct functioning of large enterprise applications consisting of hundreds of interacting microservices. As a consequence, the ability to effectively analyse the causes of occurred cascading failures is crucial for managing complex applications. In this paper, we present a model-based methodology to automate the analysis of application logs in order to identify the possible failures that occurred and their causality relations. Our methodology employs topology graphs to represent the structure of microservice-based applications and finite state machines to model their expected replica- and failure-aware behaviour. We also present a proof-of-concept implementation of our methodology, which we exploited to assess its effectiveness with controlled experiments and monkey testing.
What Went Wrong? Explaining Cascading Failures in Microservice-Based Applications
Soldani J.
Primo
;Montesano G.;Brogi A.
2021-01-01
Abstract
Cascading failures can severely affect the correct functioning of large enterprise applications consisting of hundreds of interacting microservices. As a consequence, the ability to effectively analyse the causes of occurred cascading failures is crucial for managing complex applications. In this paper, we present a model-based methodology to automate the analysis of application logs in order to identify the possible failures that occurred and their causality relations. Our methodology employs topology graphs to represent the structure of microservice-based applications and finite state machines to model their expected replica- and failure-aware behaviour. We also present a proof-of-concept implementation of our methodology, which we exploited to assess its effectiveness with controlled experiments and monkey testing.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.