Logo image
SMTPRT: Performance Regression Testing and Localization for SMT Solvers Across Multiple Logics
Conference proceeding

SMTPRT: Performance Regression Testing and Localization for SMT Solvers Across Multiple Logics

Yunqian Wang, Xiaohong Li, Yao Zhang, Yuekang Li, Zhiping Zhou and Ruitao Feng
Proceedings 32nd Asia-Pacific Software Engineering Conference (APSEC), pp.658-667
32nd Asia-Pacific Software Engineering Conference (APSEC), 32nd (Macau, China, 02/12/2025–05/12/2025)
02/2026

Metrics

1 Record Views

Abstract

SMT solver performance regression Master of Laws (LLM)
Satisfiability Modulo Theories (SMT) solvers are foundational in applications such as software verification and automated bug detection, where both correctness and performance are critical to the reliability and scalability of these systems. While existing methods predominantly focus on functional testing, performance testing has received insufficient attention, particularly regarding performance regression caused by both intentional and unintentional factors during software evolution. Current performance regression testing approaches are primarily designed for string solvers, neglecting the full spectrum of SMT theories. Furthermore, these methods often rely on time comparisons or log analysis, which makes the identification of the responsible commit slow and inefficient. To address the above issues, we propose a novel general purpose testing framework, SMTPRT, that efficiently detects and localizes performance regression issues across diverse SMT solver theories. We utilize large language models (LLMs) based on genetic algorithms (GAs) to guide the search for performance regression-inducing cases. We introduce an optimized localization technique that filters irrelevant commits using code coverage, followed by a bisecting algorithm to rapidly pinpoint the responsible commit. To thoroughly evaluate SMTPRT, we conducted extensive experiments involving six types of logic, demonstrating its superior performance. Specifically, SMTPRT successfully detected 59 regression cases, performing 3.44 times better than the baseline, and located the issues 1.16 times faster than the baseline.

Details

Logo image