Logo image
Ensemble-Based Progressive YOLOv11 Framework for Quantitative Analysis of Riverine Environments
Conference proceeding

Ensemble-Based Progressive YOLOv11 Framework for Quantitative Analysis of Riverine Environments

Saomyaraj Jha, Anubhav Jain, Vivek Kumar, Dhruv Singhal, Partha Pratim Roy and Alireza Alaei
Pattern Recognition and Computer Vision, Vol.16174(Part 1), pp.387-401
Lecture Notes in Computer Science
8th Asian Conference on Pattern Recognition, ACPR 2025, 8th (Gold Coast, Australia, 10/11/2025–13/11/2025)
11/11/2025

Metrics

16 Record Views

Abstract

River Litter Detection YOLOv11 Progressive Training Ensemble Learning Test-Time Augmentation StrongSORT
Million tonnes of plastic waste from rivers and coastal communities enter the seas and oceans every year. Therefore, monitoring and reducing riverine pollution is vital for environmental sustainability and public health. As manual surveys and monitoring strategies are costly and ineffective, automated computer vision solutions have emerged in recent years. However, detecting and counting floating litter in real time is challenging due to water surface reflections, occlusions, and lighting variations. This paper introduces a three-phase progressive training strategy using YOLOv11-based object detection algorithms. Phase 1 is Progressive Model Training, which trains YOLOv11 variants sequentially with increasing capacity as it allows for earlier convergence on architectures that are simpler before the models scale up. Phase 2 is the Ensembling that captures the features of multiple variants and combines them. Phase 3, Test-Time Augmentation, evaluates each model on four resolutions, maximizing inference accuracy by selecting the best scale for each image. StrongSORT, by its refined appearance embeddings and motion-consistency associations was integrated into our pipeline on top of YOLO’s detection to enhance per-object tracking and counting in live video, maintaining consistent object IDs even under partial occlusion and rapid water flow. The experimental results on our manually collected and annotated dataset show that the best model selected from our ensemble achieves a mAP@0.5 of 0.939, outperforming individual baseline variants.

Details

Logo image