An Experimental Evaluation of Imbalanced Learning and Time-Series Validation in the Context of CI/CD Prediction

Bohan Liu; He Zhang; Lanxin Yang; Liming Dong; Haifeng Shen; Kaiwen Song

doi:10.1145/3383219.3383222

Back

Conference proceeding

An Experimental Evaluation of Imbalanced Learning and Time-Series Validation in the Context of CI/CD Prediction

Bohan Liu, He Zhang, Lanxin Yang, Liming Dong, Haifeng Shen and Kaiwen Song

EASE '20: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, pp.21-30

ACM Other Conferences

EASE '20: Evaluation and Assessment in Software Engineering (Trondheim, Norway, 15/04/2020–17/04/2020)

17/04/2020

DOI: https://doi.org/10.1145/3383219.3383222

Metrics

17 Record Views

Abstract

continuous deployment

continuous integration

cross-validation

imbalanced learning

time-series-validation

Background: Machine Learning (ML) has been widely used as a powerful tool to support Software Engineering (SE). The fundamental assumptions of data characteristics required for specific ML methods have to be carefully considered prior to their applications in SE. Within the context of Continuous Integration (CI) and Continuous Deployment (CD) practices, there are two vital characteristics of data prone to be violated in SE research. First, the logs generated during CI/CD for training are imbalanced data, which is contrary to the principles of common balanced classifiers; second, these logs are also time-series data, which violates the assumption of cross-validation. Objective: We aim to systematically study the two data characteristics and further provide a comprehensive evaluation for predictive CI/CD with the data from real projects. Method: We conduct an experimental study that evaluates 67 CI/CD predictive models using both cross-validation and time-series-validation. Results: Our evaluation shows that cross-validation makes the evaluation of the models optimistic in most cases, there are a few counter-examples as well. The performance of the top 10 imbalanced models are better than the balanced models in the predictions of failed builds, even for balanced data. The degree of data imbalance has a negative impact on prediction performance. Conclusion: In research and practice, the assumptions of the various ML methods should be seriously considered for the validity of research. Even if it is used to compare the relative performance of models, cross-validation may not be applicable to the problems with time-series features. The research community need to revisit the evaluation results reported in some existing research.

Details

Title: An Experimental Evaluation of Imbalanced Learning and Time-Series Validation in the Context of CI/CD Prediction
Creators: Bohan Liu - Nanjing University
He Zhang - Nanjing University
Lanxin Yang - Nanjing University
Liming Dong - Nanjing University
Haifeng Shen - Australian Catholic University
Kaiwen Song - Nanjing University
Publication Details: EASE '20: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, pp.21-30
Conference: EASE '20: Evaluation and Assessment in Software Engineering (Trondheim, Norway, 15/04/2020–17/04/2020)
Series: ACM Other Conferences
Publisher: ACM
Identifiers: 991013176803802368
Academic Unit: Faculty of Science and Engineering
Language: English
Resource Type: Conference proceeding

An Experimental Evaluation of Imbalanced Learning and Time-Series Validation in the Context of CI/CD Prediction

Related links

Metrics

Abstract

Details

Southern Cross University Social media