Diagnosing and correcting failures in complex, distributed systems is difficult. In a network of perhaps dozens of nodes, each of which is executing dozens of interacting applications, sometimes from different suppliers or vendors, finding the source of a system failure is a confusing, tedious piece of detective work. The person assigned this task must trace the failing command, event, or operation through the network components and find a deviation from the correct, desired interaction sequence. After a deviation is identified, the failing applications must be found, and the fault or faults traced to the incorrect source code.
Often the primary source for tracing failures is the set of event log files generated by the applications on each node. The event logs from several platforms and from multiple virtual machines on those platforms must be filtered, merged, correlated, and examined by a human expert. The expert must locate the point of failure within the logs and then deduce which interaction or component failed, then re-assign the problem to the persons responsible for the failing component sets. Those individuals must then, in turn, use the original logs filtered and merged using different criteria to find the failing code modules, analyze the cause of the failure, and correct the code or even the architecture of the failing components.
Reducing the human effort involved in diagnosing these test failures through automated analysis of data in the logs is the goal of this project. In this paper we propose generating grammars from test successful log sequences, then using the grammars to detect points of deviation in logs from the failed tests.
Computer Science and Engineering
Number of Pages
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Hanka, Stephen, "A Grammar Based Approach to Distributed Systems Fault Diagnosis Using Log Files" (2019). Computer Science and Engineering Theses and Dissertations. 10.