Haifeng Shen

Professor, Faculty of Science and Engineering, Southern Cross University

Software engineering

Artificial intelligence

Human-centred computing

Software and application security

Conference proceeding Open access

AUCAD: Automated Construction of Alignment Dataset from Log-Related Issues for Enhancing LLM-based Log Generation

by Hao Zhang, Dongjun Yu, Lei Zhang, Guoping Rong, Yongda Yu, Haifeng Shen, He Zhang, Dong Shao and Hongyu Kuang

Published 27/10/2025

Proceedings of the 16th International Conference on Internetwar(2025)e, 413 - 425

Internetware 2025: the 16th International Conference on Internetware, 20/06/2025–22/06/2025, Trondheim, Norway

Log statements have become an integral part of modern software systems. Prior research efforts have focused on supporting the decisions of placing log statements, such as where/what to log. With the increasing adoption of Large Language Models (LLMs) for code-related tasks such as code completion or generation, automated approaches for generating log statements have gained much momentum. However, the performance of these approaches still has a long way to go. This paper explores enhancing the performance of LLM-based solutions for automated log statement generation by post-training LLMs with a purpose-built dataset. Thus the primary contribution is a novel approach called AUCAD, which automatically constructs such a dataset with information extracting from log-related issues. Researchers have long noticed that a significant portion of the issues in the open-source community are related to log statements. However, distilling this portion of data requires manual efforts, which is labor-intensive and costly, rendering it impractical. Utilizing our approach, we automatically extract log-related issues from 1,537 entries of log data across 88 projects and identify 808 code snippets (i.e., methods) with retrievable source code both before and after modification of each issue (including log statements) to construct a dataset. Each entry in the dataset consists of a data pair representing high-quality and problematic log statements, respectively. With this dataset, we proceed to post-train multiple LLMs (primarily from the Llama series) for automated log statement generation. Both human and experimental evaluations indicate that these models significantly outperform existing LLM-based solutions, thereby validating the efficacy of our method for constructing a post-training dataset to enhance LLM-based log statement generation.

Conference proceeding Peer reviewed

Code Comment Inconsistency Detection and Rectification Using a Large Language Model

by Guoping Rong, Yongda Yu, Song Liu, Xin Tan, Tianyi Zhang, Haifeng Shen and Jidong Hu

Published 2025

Proceedings from 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), 443

International Conference on Software Engineering, 27/04/2025–03/05/2025, Ottawa, Ontario, Canada

Comments are widely used in source code. If a comment is consistent with the code snippet it intends to annotate, it would aid code comprehension. Otherwise, Code Comment Inconsistency (CCI) is not only detrimental to the understanding of code, but more importantly, it would negatively impact the development, testing, and maintenance of software. To tackle this issue, existing research has been primarily focused on detecting inconsistencies with varied performance. It is evident that detection alone does not solve the problem; it merely paves the way for solving it. A complete solution requires detecting inconsistencies and, more importantly, rectifying them by amending comments. However, this type of work is scarce. In this paper, we contribute C4RLLaMA, a fine-tuned large language model based on the open-source CodeLLaMA. It not only has the ability to rectify inconsistencies by correcting relevant comment content but also outperforms state-of-the-art approaches in detecting inconsistencies. Experiments with various datasets confirm that C4RLLaMA consistently surpasses both post hoc and just-in-time CCI detection approaches. More importantly, C4RLLaMA outper-forms substantially the only known CCI rectification approach in terms of multiple performance metrics. To further examine C4RLLaMA's efficacy in rectifying inconsistencies, we conducted a manual evaluation, and the results showed that the percentage of correct comment updates by C4RLLaMAwas 65.0% and 55.9% in just-in-time and post hoc, respectively, implying C4RLLaMA's real potential in practical use.

Conference proceeding Peer reviewed

DeepHeteroIoT: Deep Local and Global Learning over Heterogeneous IoT Sensor Data

by Muhammad Sakib Khan Inan, Kewen Liao, Haifeng Shen, Prem Prakash Jayaraman, Dimitrios Georgakopoulos and Ming Jian Tang

Published 19/07/2024

Mobile and Ubiquitous Systems: Computing, Networking and Services: 20th EAI International Conference, MobiQuitous 2023, Melbourne, VIC, Australia, November 14–17, 2023, Proceedings, Part I, 593, 119 - 135

EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services (MobiQuitous), 14/11/2023–17/09/2024, Melbourne, Australia

Internet of Things (IoT) sensor data or readings evince variations in timestamp range, sampling frequency, geographical location, unit of measurement, etc. Such presented sequence data heterogeneity makes it difficult for traditional time series classification algorithms to perform well. Therefore, addressing the heterogeneity challenge demands learning not only the sub-patterns (local features) but also the overall pattern (global feature). To address the challenge of classifying heterogeneous IoT sensor data (e.g., categorizing sensor data types like temperature and humidity), we propose a novel deep learning model that incorporates both Convolutional Neural Network and Bi-directional Gated Recurrent Unit to learn local and global features respectively, in an end-to-end manner. Through rigorous experimentation on heterogeneous IoT sensor datasets, we validate the effectiveness of our proposed model, which outperforms recent state-of-the-art classification methods as well as several machine learning and deep learning baselines. In particular, the model achieves an average absolute improvement of 3.37% in Accuracy and 2.85% in F1-Score across datasets.

Conference proceeding

How Do Developers' Profiles and Experiences Influence their Logging Practices? An Empirical Study of Industrial Practitioners

by Guoping Rong, Shenghui Gu, Haifeng Shen, He Zhang and Hongyu Kuang

Published 05/2023

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 855 - 867

IEEE/ACM International Conference on Software Engineering (ICSE), 14/05/2023–20/05/2023, Melbourne, Australia

Logs record the behavioral data of running programs and are typically generated by executing log statements. Software developers generally carry out logging practices with clear intentions and associated concerns (I&Cs). However, I&Cs may not be properly fulfilled in source code as log placement - specifically determination of a log statement's context and content - is often susceptible to an individual's profile and experience. Some industrial studies have been conducted to discern developers' main logging I&Cs and the way I&Cs are fulfilled. However, the findings are only based on the developers from a single company in each individual study and hence have limited generalizability. More importantly, there lacks a comprehensive and deep understanding of the relationships between developers' profiles and experiences and their logging practices from a wider perspective. To fill this significant gap, we conducted an empirical study using mixed methods comprising questionnaire surveys, semi-structured interviews, and code analyses with practitioners from a wide range of companies across a variety of industrial domains. Results reveal that while developers share common logging I&Cs and conduct logging practices mainly in the coding stage, their profiles and experiences profoundly influence their logging I&Cs and the way the I&Cs are fulfilled. These findings pave the way to facilitate the acceptance of important logging I&Cs and the adoption of good logging practices by developers.

Conference proceeding

Fed-SC: One-Shot Federated Subspace Clustering over High-Dimensional Data

by Songjie Xie, Youlong Wu, Kewen Liao, Lu Chen, Chengfei Liu, Haifeng Shen, MingJian Tang and Lu Sun

Published 2023

2023 IEEE 39th International Conference on Data Engineering (ICDE)

IEEE International Conference on Data Engineering (ICDE) , 03/04/2023–07/04/2024, Anaheim, CA, USA

Recent work has explored federated clustering and developed an efficient k-means based method. However, it is well known that k-means clustering underperforms in high-dimensional space due to the so-called "curse of dimensionality". In addition, high-dimensional data (e.g., generated from healthcare, medical, and biological sectors) are pervasive in the big data era, which poses critical challenges to federated clustering in terms of, but not limited to, clustering effectiveness and communication efficiency. To fill this significant gap in federated clustering, we propose a one-shot federated subspace clustering scheme Fed-SC that can achieve remarkable clustering effectiveness on high-dimensional data while keeping communication cost low using only one round of communication for each local device. We further establish theoretical guarantees on the clustering effectiveness of one-shot Fed-SC and exploit the benefits of statistical heterogeneity across distributed data. Extensive experiments on synthetic and real-world datasets demonstrate significant effectiveness gains of Fed-SC compared with both subspace clustering and one-shot federated clustering methods.

Conference proceeding

Human-AI Interactive and Continuous Sensemaking: A Case Study of Image Classification using Scribble Attention Maps

by Haifeng Shen, Kewen Liao, Zhibin Liao, Job Doornberg, Maoying Qiao, Anton van den Hengel and Johan W. Verjans

Published 08/05/2021

CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 1 - 8

CHI '21: CHI Conference on Human Factors in Computing Systems, 08/05/2021–13/05/2021, Yokohama, Japan

Advances in Artificial Intelligence (AI), especially the stunning achievements of Deep Learning (DL) in recent years, have shown AI/DL models possess remarkable understanding towards the logic reasoning behind the solved tasks. However, human understanding towards what knowledge is captured by deep neural networks is still elementary and this has a detrimental effect on human’s trust in the decisions made by AI systems. Explainable AI (XAI) is a hot topic in both AI and HCI communities in order to open up the blackbox to elucidate the reasoning processes of AI algorithms in such a way that makes sense to humans. However, XAI is only half of human-AI interaction and research on the other half - human’s feedback on AI explanations together with AI making sense of the feedback - is generally lacking. Human cognition is also a blackbox to AI and effective human-AI interaction requires unveiling both blackboxes to each other for mutual sensemaking. The main contribution of this paper is a conceptual framework for supporting effective human-AI interaction, referred to as interactive and continuous sensemaking (HAICS). We further implement this framework in an image classification application using deep Convolutional Neural Network (CNN) classifiers as a browser-based tool that displays network attention maps to the human for explainability and collects human’s feedback in the form of scribble annotations overlaid onto the maps. Experimental results using a real-world dataset has shown significant improvement of classification accuracy (the AI performance) with the HAICS framework.

Conference proceeding

A Service Computing Framework for Proteomics Analysis and Collaboration of Pathogenic Mechanism Studies

by Huaming Chen, Fucun Li, Geng Sun, Xuyun Zhang, Xianjun Dong, Lei Wang, Kewen Liao, Haifeng Shen and Jun Shen

Published 12/2020

2020 IEEE International Conference on Services Computing (SCC), 463 - 465

2020 IEEE International Conference on Services Computing (SCC), 07/11/2020–11/11/2020, Beijing, China

The booming of proteomics data has positioned multiple disciplines and research areas in a more complicated and challenging place. Moreover, the proteomics data of any defined research interests, such as for pathogenic mechanism studies of infectious diseases, have presented unstructured and heterogeneous characteristics. Thus, a service computing framework for proteomics analysis is desired to bring biologists and computer scientists into this area seamlessly and efficiently. With this regard, this work is dedicated to detail the proteomics analysis and collaboration process of pathogenic mechanism studies. We articulate this framework to serve the requirements and ease the task design by broadly reviewing the state-of-the- art research and development efforts and collectively designing different informative stages. Thus, the framework has a focus of distilling different aspects, including data curation, resources distribution, standard construction and computational tasks identification, into the proteomics analysis. The framework is designed as Proteomics Analysis as a Service to deepen the understanding of the interdisciplinary research.

Conference proceeding

Preliminary Findings about DevSecOps from Grey Literature

by Runfeng Mao, He Zhang, Qiming Dai, Huang Huang, Guoping Rong, Haifeng Shen, Lianping Chen and Kaixiang Lu

Published 12/2020

2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 450 - 457

IEEE International Conference on Software Quality, Reliability and Security (QRS), 11/12/2020–14/12/2020, Macau, China

Context: Emerging from the agile culture, DevOps particularly emphasizes development and deployment speed to achieve rapid value delivery, which however brings some security risks to the software development process. DevSecOps is an extension of DevOps, which is considered as a means to intertwine development, operation and security. Some companies with security concerns begin to take DevSecOps into consideration when it comes to the application of DevOps. Objective: The goal of this study is to report the state-of-the-practice of DevSecOps as well as calling for academia to pay more attention to DevSecOps. Method: Using Google search engine to collect articles on DevSecOps, we conducted a Grey Literature Review (GLR) on the selected articles. Results: Whilst there exists three major software security risks in DevOps, the establishment of DevOps pipeline provides opportunities for software security activities. Based on the preliminary consensus that DevSecOps is an extension of DevOps, it is observed that the interpretations of DevSecOps can be classified into three core aspects, which are: DevSecOps capabilities, cultural enablers, and technological enablers. Furthermore, to materialize the interpretations into daily software production activities, the recommended DevSecOps practices we obtain from Grey Literature (GL) can be categorized in terms of process, infrastructure and collaboration. Conclusion: Although DevSecOps is getting increasing attention by industry, it is still in its infancy and needs to be promoted by both academia and industry.

Conference proceeding

An Experimental Evaluation of Imbalanced Learning and Time-Series Validation in the Context of CI/CD Prediction

by Bohan Liu, He Zhang, Lanxin Yang, Liming Dong, Haifeng Shen and Kaiwen Song

Published 17/04/2020

EASE '20: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, 21 - 30

EASE '20: Evaluation and Assessment in Software Engineering, 15/04/2020–17/04/2020, Trondheim, Norway

Background: Machine Learning (ML) has been widely used as a powerful tool to support Software Engineering (SE). The fundamental assumptions of data characteristics required for specific ML methods have to be carefully considered prior to their applications in SE. Within the context of Continuous Integration (CI) and Continuous Deployment (CD) practices, there are two vital characteristics of data prone to be violated in SE research. First, the logs generated during CI/CD for training are imbalanced data, which is contrary to the principles of common balanced classifiers; second, these logs are also time-series data, which violates the assumption of cross-validation. Objective: We aim to systematically study the two data characteristics and further provide a comprehensive evaluation for predictive CI/CD with the data from real projects. Method: We conduct an experimental study that evaluates 67 CI/CD predictive models using both cross-validation and time-series-validation. Results: Our evaluation shows that cross-validation makes the evaluation of the models optimistic in most cases, there are a few counter-examples as well. The performance of the top 10 imbalanced models are better than the balanced models in the predictions of failed builds, even for balanced data. The degree of data imbalance has a negative impact on prediction performance. Conclusion: In research and practice, the assumptions of the various ML methods should be seriously considered for the validity of research. Even if it is used to compare the relative performance of models, cross-validation may not be applicable to the problems with time-series features. The research community need to revisit the evaluation results reported in some existing research.

Conference proceeding

Containerisation as a method for supporting multiple VR visualisation platforms from a single data source

by Theodor Wyeld, Haifeng Shen and Tomasz Bednarz

Published 11/2019

Proceedings of the 17th International Conference on Virtual-Reality Continuum and its Applications in Industry, 1 - 3

VRCAI '19: The 17th International Conference on Virtual-Reality Continuum and its Applications in Industry, 14/11/2019–16/11/2019, Brisbane, Australia

This paper discusses a proof-of-concept context-aware container server for exposing multiple VR devices to a single data source. The data source was a real-time streamed reconstruction of a combat simulation generated in NetLogo. The devices included a mobile, tablet, PC, data wall, HMD and dataglove interaction. Each device had its specific requirements and user restrictions. Initial testing of this system suggests it is an efficient method for supporting diverse user needs whilst maintaining data integrity and synchronicity. The overall server architecture is discussed as well as future directions for this research.

Haifeng Shen

Professor, Faculty of Science and Engineering, Southern Cross University

Output list

Southern Cross University Social media