Comparison of Classical Methods and Mask R-CNN for Automatic Tree Detection and Mapping Using UAV Imagery
Detecting and mapping individual trees accurately and automatically from remote sensing images is of great significance for precision forest management. Many algorithms, including classical methods and deep learning techniques, have been developed and applied for tree crown detection from remote sensing images. However, few studies have evaluated the accuracy of different individual tree detection (ITD) algorithms and their data and processing requirements. This study explored the accuracy of ITD using local maxima (LM) algorithm, marker-controlled watershed segmentation (MCWS), and Mask Region-based Convolutional Neural Networks (Mask R-CNN) in a young plantation forest with different test images. Manually delineated tree crowns from UAV imagery were used for accuracy assessment of the three methods, followed by an evaluation of the data processing and application requirements for three methods to detect individual trees. Overall, Mask R-CNN can best use the information in multi-band input images for detecting individual trees. The results showed that the Mask R-CNN model with the multi-band combination produced higher accuracy than the model with a single-band image, and the RGB band combination achieved the highest accuracy for ITD (F1 score = 94.68%). Moreover, the Mask R-CNN models with multi-band images are capable of providing higher accuracies for ITD than the LM and MCWS algorithms. The LM algorithm and MCWS algorithm also achieved promising accuracies for ITD when the canopy height model (CHM) was used as the test image (F1 score = 87.86% for LM algorithm, F1 score = 85.92% for MCWS algorithm). The LM and MCWS algorithms are easy to use and lower computer computational requirements, but they are unable to identify tree species and are limited by algorithm parameters, which need to be adjusted for each classification. It is highlighted that the application of deep learning with its end-to-end-learning approach is very efficient and capable of deriving the information from multi-layer images, but an additional training set is needed for model training, robust computer resources are required, and a large number of accurate training samples are necessary. This study provides valuable information for forestry practitioners to select an optimal approach for detecting individual trees.