•  
  •  
 

SMU Data Science Review

Abstract

Large Language Models (LLMs) are transforming conversational AI, yet their dependence on prompt-supplied context exposes them to context-switch attacks that covertly steer dialogue toward sensitive or malicious ends. A 70 one-sided conversation transcript evaluation set was constructed spanning various fraudulent scenarios. Each transcript embeds adversarial patterns drawn while preserving natural conversational flow. We introduce a hybrid defense that pairs a BERT-based semantic-drift detector (cosine-similarity threshold = 0.70) with a curated keyword and hack-phrase scanner to counter these threats. In aggregate, the system delivered 100 % recall, intercepting every simulated phishing or data-harvesting attempt. The keyword layer achieved perfect precision, generating a mean of 1.93 alerts per transcript with zero false positives. In contrast, the semantic layer contributed a mean of 1.84 additional warnings and captured all four attacks that lacked sensitive keywords. Overall, conversations triggered 7.5 risk signals on average (≈ 1.1 per message), and 98.6 % of transcripts activated at least two independent alarms, evidencing robust redundancy. The principal trade-off surfaced in the semantic component, where roughly one-third of its warnings reflected benign pivots, such as address or insurance confirmations, highlighting the tension between maximal coverage and conversational fluidity. Building on these findings, we recommend adaptive similarity thresholds, multistage escalation, and user-configurable sensitivity profiles to balance security and usability. By documenting the mechanics and impact of context-switch attacks and demonstrating an adequate dual-layer safeguard, this work provides both an empirical foundation and practical guidance for hardening LLM-based systems deployed in high-stakes, real-world environments.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS