Subject Area
Computer Science
Abstract
Ensuring the security and reliability of modern software systems requires a comprehensive understanding of how vulnerabilities are described, manifested, and resolved across both natural language (NL) and code artifacts. This dissertation presents a unified framework to facilitate vulnerability understanding by addressing two fundamental challenges: (1) the incompleteness of vulnerability-related queries in conversational environments and (2) the lack of structured and traceable representations linking NL vulnerability descriptions to code-level implementations.
To address the first challenge, this work formalizes query completeness in terms of two essential elements: the Problem Statement (PS) and the Expect to Do (ETD). Through an open coding study of 2,000 developer chatroom queries across eight domains, we identify recurring lexico-syntactic patterns used to express PS and ETD. Building on these insights, we develop and evaluate multiple automated approaches—including heuristic methods, traditional machine learning models, pre-trained language models, and large language models (LLMs)—to detect missing information in queries. Experimental results demonstrate that pre-trained models and LLM-based approaches achieve the best performance, highlighting their effectiveness in improving query clarity and reducing unnecessary back-and-forth interactions in both human-driven and AI-assisted troubleshooting.
To address the second challenge, this dissertation introduces a fine-grained semantic model that structures vulnerability information into three key components: vulnerability trigger (VT), its crash phenomenon (CP), and how it is repaired after being fixed (AF) from natural language vulnerability artifacts. A manually curated dataset of vulnerabilities is constructed with phrase-level annotations across both NL artifacts and code. Leveraging this structured representation, we develop automated methods for extracting vulnerability entities and establish traceability links between pairs of related NL entities and their corresponding code statements. Furthermore, we propose a relation-aware traceability framework that captures dependencies among entities (e.g., cause–effect and failure–resolution), enabling more precise alignment between vulnerability descriptions and implementation-level fixes.
Experimental results show that fine-grained, phrase-level representations significantly outperform sentence-level approaches in traceability tasks, and that modeling relationships between entities further improves retrieval effectiveness. Overall, this dissertation demonstrates that improving input completeness, structuring vulnerability semantics, and modeling inter-entity relationships are all critical for advancing automated vulnerability understanding.
Degree Date
Spring 2026
Document Type
Dissertation
Degree Name
Ph.D.
Department
Computer Science
Advisor
Dr. LiGuo Huang
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Recommended Citation
Wang, Simin, "Facilitating the Understanding of Software Vulnerability Queries and their Fixes" (2026). Computer Science and Engineering Theses and Dissertations. 56.
https://scholar.smu.edu/engineering_compsci_etds/56
