Semantic Image Segmentation Based Cable Vibration Frequency Visual Monitoring Using Modified Convolutional Neural Network with Pixel-wise Weighting Strategy
Attributed to the explosive adoption of large-span spatial structures and infrastructures as a critical damage-sensitive element, there is a pressing need to monitor cable vibration frequency to inspect the structural health. Neither existing acceleration sensor-utilized contact methods nor conventional computer vision-based photogrammetry methods have, to date, addressed the defects of lack in cost-effectiveness and compatibility with real-world situations. In this study, a state-of-the-art method based on modified convolutional neural network semantic image segmentation, which is compatible with extensively varying real-world backgrounds, is presented for cable vibration frequency remote and visual monitoring. Modifications of the underlying network framework lie in adopting simpler feature extractors and introducing class weights to loss function by pixel-wise weighting strategies. Nine convolutional neural networks were established and modified. Discrete images with varying real-world backgrounds were captured to train and validate network models. Continuous videos with different cable pixel-to-total pixel (C-T) ratios were captured to test the networks and derive vibration frequencies. Various metrics were leveraged to evaluate the effectiveness of network models. The optimal C-T ratio was also studied to provide guidelines for the parameter setting of monitoring systems in further research and practical application. Training and validation accuracies of nine networks were all reported higher than 90%. A network model with ResNet-50 as feature extractor and uniform prior weighting showed the most superior learning and generalization ability, of which the Precision reached 0.9973, F1 reached 0.9685, and intersection over union (IoU) reached 0.8226 when utilizing images with the optimal C-T ratio of 0.04 as testing set. Contrasted with that sampled by acceleration sensor, the first two order vibration frequencies derived by the most superior network from video with the optimal C-T ratio had merely ignorable absolute percentage errors of 0.41% and 0.36%, substantiating the effectiveness of the proposed method.