Subject Area

Computer Science

Abstract

Ensuring the security and reliability of modern software systems requires a comprehensive understanding of how vulnerabilities are described, manifested, and resolved across both natural language (NL) and code artifacts. This dissertation presents a unified framework to facilitate vulnerability understanding by addressing two fundamental challenges: (1) the incompleteness of vulnerability-related queries in conversational environments and (2) the lack of structured and traceable representations linking NL vulnerability descriptions to code-level implementations.

To address the first challenge, this work formalizes query completeness in terms of two essential elements: the Problem Statement (PS) and the Expect to Do (ETD). Through an open coding study of 2,000 developer chatroom queries across eight domains, we identify recurring lexico-syntactic patterns used to express PS and ETD. Building on these insights, we develop and evaluate multiple automated approaches—including heuristic methods, traditional machine learning models, pre-trained language models, and large language models (LLMs)—to detect missing information in queries. Experimental results demonstrate that pre-trained models and LLM-based approaches achieve the best performance, highlighting their effectiveness in improving query clarity and reducing unnecessary back-and-forth interactions in both human-driven and AI-assisted troubleshooting.

To address the second challenge, this dissertation introduces a fine-grained semantic model that structures vulnerability information into three key components: vulnerability trigger (VT), its crash phenomenon (CP), and how it is repaired after being fixed (AF) from natural language vulnerability artifacts. A manually curated dataset of vulnerabilities is constructed with phrase-level annotations across both NL artifacts and code. Leveraging this structured representation, we develop automated methods for extracting vulnerability entities and establish traceability links between pairs of related NL entities and their corresponding code statements. Furthermore, we propose a relation-aware traceability framework that captures dependencies among entities (e.g., cause–effect and failure–resolution), enabling more precise alignment between vulnerability descriptions and implementation-level fixes.

Experimental results show that fine-grained, phrase-level representations significantly outperform sentence-level approaches in traceability tasks, and that modeling relationships between entities further improves retrieval effectiveness. Overall, this dissertation demonstrates that improving input completeness, structuring vulnerability semantics, and modeling inter-entity relationships are all critical for advancing automated vulnerability understanding.

Degree Date

Spring 2026

Document Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

Advisor

Dr. LiGuo Huang

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Recommended Citation

Wang, Simin, "Facilitating the Understanding of Software Vulnerability Queries and their Fixes" (2026). Computer Science and Engineering Theses and Dissertations. 56.
https://scholar.smu.edu/engineering_compsci_etds/56

Download

Available for download on Monday, May 03, 2027

COinS

Computer Science and Engineering Theses and Dissertations

Facilitating the Understanding of Software Vulnerability Queries and their Fixes

Subject Area

Abstract

Degree Date

Document Type

Degree Name

Department

Advisor

Creative Commons License

Recommended Citation

Search

Browse

Submit

Links

Computer Science and Engineering Theses and Dissertations

Facilitating the Understanding of Software Vulnerability Queries and their Fixes

Authors

Subject Area

Abstract

Degree Date

Document Type

Degree Name

Department

Advisor

Creative Commons License

Recommended Citation

Share

Search

Browse

Submit

Links