It Only Gets Worse: Revisiting DL-Based Vulnerability Detectors from a Practical Perspective

Yunqian Wang; Xiaohong Li; Ruitao Feng; Yao Zhang; Yuekang Li; Zhiping Zhou

doi:10.1109/APSEC66846.2025.00109

Back

Conference proceeding

It Only Gets Worse: Revisiting DL-Based Vulnerability Detectors from a Practical Perspective

Yunqian Wang, Xiaohong Li, Ruitao Feng, Yao Zhang, Yuekang Li and Zhiping Zhou

Proceedings 32nd Asia-Pacific Software Engineering Conference (APSEC), pp.947-956

32nd Asia-Pacific Software Engineering Conference (APSEC), 32nd (Macau, China, 02/12/2025–05/12/2025)

02/2026

DOI: https://doi.org/10.1109/APSEC66846.2025.00109

Metrics

1 Record Views

Abstract

With the escalating threat of software vulnerabilities to the security of modern software systems, an increasing number of deep learning (DL) model-based vulnerability detectors have been developed for vulnerability detection. However, their practical reliability, consistency in usage, and adaptability across diverse software contexts remain unclear. This uncertainty may lead to unreliable detection results in practical applications, increased false positives and false negatives, and limited adaptability to newly emerged vulnerabilities. Conducting a large-scale and in-depth analysis of DL-based vulnerability detectors can help uncover critical factors influencing detection performance, improve the design and training of these models, and enhance their practical deployment in real-world scenarios. In this paper, we present VulTegra, a novel evaluation framework that, for the first time, conducts a multidimensional assessment comparing scratch-trained models and pre-trained-based models for vulnerability detection, while verifying key factors influencing detection performance. Our framework reveals that state-of-the-art (SOTA) detectors still suffer from low consistency, limited practical detection capabilities, and limited adaptability. Moreover, comparative results indicate that the increasingly favored pre-trained-based models are not universally superior to scratch-trained models; instead, they exhibit distinct strengths and application scenarios. Most importantly, our study highlights the limitations of relying solely on CWE-based classification and reveals a set of critical factors that significantly influence detection performance. Experimental validation shows that these factors have a substantial impact: modifying only any single factor led to recall improvements across all seven evaluated SOTA detectors, with six detectors also achieving higher F1 scores. Our findings provide deep insights into model behavior, highlighting the need to consider both vulnerability types and inherent code features to ensure practical applicability in real-world software environments.

Details

Title: It Only Gets Worse: Revisiting DL-Based Vulnerability Detectors from a Practical Perspective
Creators: Yunqian Wang - Tianjin University
Xiaohong Li - Tianjin University
Ruitao Feng - Southern Cross University
Yao Zhang - Tianjin University
Yuekang Li - University of New South Wales
Zhiping Zhou - Tianjin University
Publication Details: Proceedings 32nd Asia-Pacific Software Engineering Conference (APSEC), pp.947-956
Conference: 32nd Asia-Pacific Software Engineering Conference (APSEC), 32nd (Macau, China, 02/12/2025–05/12/2025)
Publisher: IEEE
Grant note: National Natural Science Foundation of China (10.13039/501100001809).
Identifiers: 991013357061702368
Academic Unit: Faculty of Science and Engineering
Language: English
Resource Type: Conference proceeding

It Only Gets Worse: Revisiting DL-Based Vulnerability Detectors from a Practical Perspective

Related links

Metrics

Abstract

Details

Southern Cross University Social media