Logo image
Application of CNN and Vision Transformer Models for Classifying Crowns in Pine Plantations Affected by Diplodia Shoot Blight
Journal article   Open access   Peer reviewed

Application of CNN and Vision Transformer Models for Classifying Crowns in Pine Plantations Affected by Diplodia Shoot Blight

Mingzhu Wang, Christine Stone and Angus J. Carnegie
Forests, Vol.17(1), pp.1-25
13/01/2026
pdf
Application of CNN and Vision Transformer Models3.97 MBDownloadView
Published (Version of record) Open Access CC BY V4.0
url
Application of CNN and Vision Transformer ModelsView
Published (Version of record) Open CC BY V4.0

Related links

Metrics

1 Record Views

Abstract

tree health convolutional neural networks vision transformer aerial imagery Pinus radiata Diplodia sapinea
Diplodia shoot blight is an opportunistic fungal pathogen infesting many conifer species and it has a global distribution. Depending on the duration and severity of the disease, affected needles appear yellow (chlorotic) for a brief period before becoming red or brown in colour. These symptoms can occur on individual branches or over the entire crown. Aerial sketch-mapping or the manual interpretation of aerial photography for tree health surveys are labour-intensive and subjective. Recently, however, the application of deep learning (DL) techniques to detect and classify tree crowns in high-spatial-resolution imagery has gained significant attention. This study evaluated two complementary DL approaches for the detection and classification of Pinus radiata trees infected with diplodia shoot blight across five geographically dispersed sites with varying topographies over two acquisition years: (1) object detection using YOLOv12 combined with Segment Anything Model (SAM) and (2) pixel-level semantic segmentation using U-Net, SegFormer, and EVitNet. The three damage classes for the object detection approach were ‘yellow’, ‘red-brown’ (both whole-crown discolouration) and ‘dead tops’ (partially discoloured crowns), while for the semantic segmentation the three classes were yellow, red-brown, and background. The YOLOv12m model achieved an overall mAP50 score of 0.766 and mAP50–95 of 0.447 across all three classes, with red-brown crowns demonstrating the highest detection accuracy (mAP50: 0.918, F1 score: 0.851). For semantic segmentation models, SegFormer showed the strongest performance (IoU of 0.662 for red-brown and 0.542 for yellow) but at the cost of longest training time, while EVitNet offered the most cost-effective solution achieving comparable accuracy to SegFormer but with a superior training efficiency with its lighter architecture. The accurate identification and symptom classification of crown damage symptoms support the calibration and validation of satellite-based monitoring systems and assist in the prioritisation of ground-based diagnosis or management interventions.

Details

Logo image