Logo image
Leveraging Self-Paced Learning for Software Vulnerability Detection
Preprint   Open access

Leveraging Self-Paced Learning for Software Vulnerability Detection

Zeru Cheng, Yanjing Yang, He Zhang, Lanxin Yang, Jinghao Hu, Jinwei Xu, Bohan Liu and Haifeng Shen
arXiv (Cornell University)
Cornell University
12/11/2025
pdf
Leveraging Self-Paced Learning for Software Vulnerability Detection27.12 MBDownloadView
Preprint (Author's original)CC BY V4.0 Open Access
url
Leveraging Self-Paced Learning for Software Vulnerability DetectionView
Preprint (Author's original)CC BY V4.0 Open

Metrics

1 Record Views

Abstract

Software vulnerability detection Self-paced learning Large learning model CWE
Software vulnerabilities are major risks to software systems. Recently, researchers have proposed many deep learning approaches to detect software vulnerabilities. However, their accuracy is limited in practice. One of the main causes is low-quality training data (i.e., source code). To this end, we propose a new approach: SPLVD (Self-Paced Learning for Software Vulnerability Detection). SPLVD dynamically selects source code for model training based on the stage of training, which simulates the human learning process progressing from easy to hard. SPLVD has a data selector that is specifically designed for the vulnerability detection task, which enables it to prioritize the learning of easy source code. Before each training epoch, SPLVD uses the data selector to recalculate the difficulty of the source code, select new training source code, and update the data selector. When evaluating SPLVD, we first use three benchmark datasets with over 239K source code in which 25K are vulnerable for standard evaluations. Experimental results demonstrate that SPLVD achieves the highest F1 of 89.2%, 68.7%, and 43.5%, respectively, outperforming the state-of-the-art approaches. Then we collect projects from OpenHarmony, a new ecosystem that has not been learned by general LLMs, to evaluate SPLVD further. SPLVD achieves the highest precision of 90.9%, demonstrating its practical effectiveness.

Details

Logo image