•  
  •  
 

SMU Data Science Review

Abstract

Large-scale software systems produce vast volumes of logs and telemetry, making manual incident triage slow and error prone. This study presents an unsupervised anomaly detection pipeline that fuses logs, metrics, and traces through late fusion. Using Hybrid Ensemble modeling with Isolation Forest, and Long Short-Term Memory (LSTM) Deep Learning model, the system detects cross-service anomalies producing and assigning a composite triage score reflecting severity and impact. Ranked alerts are categorized into Critical, High, or Medium priorities for review. A retrieval-augmented generation (RAG) layer enriches results with contextual summaries for explainable triage. Evaluated on synthetic multi-service datasets, the pipeline achieved a Recall of 0.8866 at the 90th-percentile threshold, validating its ability to detect and prioritize high-impact incidents efficiently.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS