SMU Data Science Review
Abstract
Large-scale software systems produce vast volumes of logs and telemetry, making manual incident triage slow and error prone. This study presents an unsupervised anomaly detection pipeline that fuses logs, metrics, and traces through late fusion. Using Hybrid Ensemble modeling with Isolation Forest, and Long Short-Term Memory (LSTM) Deep Learning model, the system detects cross-service anomalies producing and assigning a composite triage score reflecting severity and impact. Ranked alerts are categorized into Critical, High, or Medium priorities for review. A retrieval-augmented generation (RAG) layer enriches results with contextual summaries for explainable triage. Evaluated on synthetic multi-service datasets, the pipeline achieved a Recall of 0.8866 at the 90th-percentile threshold, validating its ability to detect and prioritize high-impact incidents efficiently.
Recommended Citation
Zavala Gamero, Gibran Miguel; Cheon, Hayoung; and Iqbal, Mustafa
(2025)
"Anomaly Detection for Multi-System Bug Triage,"
SMU Data Science Review: Vol. 9:
No.
3, Article 5.
Available at:
https://scholar.smu.edu/datasciencereview/vol9/iss3/5
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Included in
Artificial Intelligence and Robotics Commons, Cybersecurity Commons, Data Science Commons, Information Security Commons, Software Engineering Commons
