Consistency-Check Edge Refinement for Deep Stereo Matching

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200719 ◽

2020 ◽

Author(s):

Fangrui Wu ◽

Menglong Yang

Keyword(s):

Computational Efficiency ◽

Stereo Matching ◽

Information Aggregation ◽

Experimental Results ◽

Global Information ◽

Consistency Check ◽

Filtering Method ◽

Tightly Coupled ◽

End To End ◽

Public Datasets

Recent end-to-end CNN-based stereo matching algorithms obtain disparities through regression from a cost volume, which is formed by concatenating the features of stereo pairs. Some downsampling steps are often embedded in constructing cost volume for global information aggregation and computational efficiency. However, many edge details are hard to recover due to the imprudent upsampling process and ambiguous boundary predictions. To tackle this problem without training another edge prediction sub-network, we developed a novel tightly-coupled edge refinement pipeline composed of two modules. The first module implements a gentle upsampling process by a cascaded cost volume filtering method, aggregating global information without losing many details. On this basis, the second module concentrates on generating a disparity residual map for boundary pixels by sub-pixel disparity consistency check, to further recover the edge details. The experimental results on public datasets demonstrate the effectiveness of the proposed method.

Download Full-text

OCCLUSION-AIDED SUPPORT WEIGHTS FOR LOCAL STEREO MATCHING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001412550075 ◽

2012 ◽

Vol 26 (03) ◽

pp. 1255007 ◽

Cited By ~ 2

Author(s):

WEI WANG ◽

CAIMING ZHANG ◽

SHUOZHEN WANG ◽

XUEMEI LI

Keyword(s):

Stereo Matching ◽

Experimental Results ◽

Disparity Estimation ◽

Consistency Check ◽

Negative Effects ◽

Support Weight ◽

Cost Aggregation ◽

Depth Discontinuities ◽

Local Methods ◽

Adaptive Support

There has been a significant improvement in stereo matching with the introduction of adaptive support weights. Existing local methods mainly focus on the computation of support weight which is critical in cost aggregation and usually get excellent results. However, the negative effects of occluded regions are often ignored, which results in the problem of foreground fattening and blurred depth borders. This paper proposes a novel support aggregation strategy by utilizing the occlusion information obtained from left-right consistency check. The weights of invalid points are noticeably reduced at each disparity estimation stage. Experimental results on the Middlebury images show that our method is highly effective in improving the disparities of points around occluded areas and depth discontinuities. According to the Middlebury benchmark, the proposed method achieves the best performance among all the local methods. Moreover, our approach can be easily integrated into nearly all the existing support weights strategies.

Download Full-text

Matching Large Baseline Oblique Stereo Images Using an End-to-End Convolutional Neural Network

Remote Sensing ◽

10.3390/rs13020274 ◽

2021 ◽

Vol 13 (2) ◽

pp. 274

Author(s):

Guobiao Yao ◽

Alper Yilmaz ◽

Li Zhang ◽

Fei Meng ◽

Haibin Ai ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Stereo Matching ◽

Least Square ◽

Affine Invariant ◽

Stereo Images ◽

Distance Ratio ◽

Matching Algorithm ◽

End To End

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address this problem, we propose a novel affine invariant feature matching algorithm with subpixel accuracy based on an end-to-end convolutional neural network (CNN). In our method, we adopt and modify a Hessian affine network, which we refer to as IHesAffNet, to obtain affine invariant Hessian regions using deep learning framework. To improve the correlation between corresponding features, we introduce an empirical weighted loss function (EWLF) based on the negative samples using K nearest neighbors, and then generate deep learning-based descriptors with high discrimination that is realized with our multiple hard network structure (MTHardNets). Following this step, the conjugate features are produced by using the Euclidean distance ratio as the matching metric, and the accuracy of matches are optimized through the deep learning transform based least square matching (DLT-LSM). Finally, experiments on Large baseline oblique stereo images acquired by ground close-range and unmanned aerial vehicle (UAV) verify the effectiveness of the proposed approach, and comprehensive comparisons demonstrate that our matching algorithm outperforms the state-of-art methods in terms of accuracy, distribution and correct ratio. The main contributions of this article are: (i) our proposed MTHardNets can generate high quality descriptors; and (ii) the IHesAffNet can produce substantial affine invariant corresponding features with reliable transform parameters.

Download Full-text

Mining discriminative patches for script identification in natural scene images

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200260 ◽

2021 ◽

Vol 40 (1) ◽

pp. 551-563

Author(s):

Liqiong Lu ◽

Dong Wu ◽

Ziwei Tang ◽

Yaohua Yi ◽

Faliang Huang

Keyword(s):

Neural Networks ◽

Experimental Results ◽

The Other ◽

Natural Scene ◽

Fixed Size ◽

Script Identification ◽

Aspect Ratios ◽

Novel Approach ◽

Public Datasets ◽

Natural Scene Images

This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.

Download Full-text

Multi-Sensor Fusion for Aerial Robots in Industrial GNSS-Denied Environments

Applied Sciences ◽

10.3390/app11093921 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3921

Author(s):

Paloma Carrasco ◽

Francisco Cuesta ◽

Rafael Caballero ◽

Francisco J. Perez-Grau ◽

Antidio Viguria

Keyword(s):

Sensor Fusion ◽

Computational Efficiency ◽

Probabilistic Approach ◽

Laser Scanner ◽

Industrial Applications ◽

Experimental Results ◽

Added Value ◽

Aerial Robots ◽

3D Localization ◽

High Level

The use of unmanned aerial robots has increased exponentially in recent years, and the relevance of industrial applications in environments with degraded satellite signals is rising. This article presents a solution for the 3D localization of aerial robots in such environments. In order to truly use these versatile platforms for added-value cases in these scenarios, a high level of reliability is required. Hence, the proposed solution is based on a probabilistic approach that makes use of a 3D laser scanner, radio sensors, a previously built map of the environment and input odometry, to obtain pose estimations that are computed onboard the aerial platform. Experimental results show the feasibility of the approach in terms of accuracy, robustness and computational efficiency.

Download Full-text

A Low-complexity End-to-end Stereo Matching Pipeline from Raw Bayer Pattern Images to Disparity Maps

IEEE Access ◽

10.1109/access.2021.3068497 ◽

2021 ◽

pp. 1-1

Author(s):

Shengyu Gao ◽

Hongyu Wang ◽

Xin Lou

Keyword(s):

Stereo Matching ◽

Low Complexity ◽

Bayer Pattern ◽

Disparity Maps ◽

End To End

Download Full-text

A Joint 2D-3D Complementary Network for Stereo Matching

Sensors ◽

10.3390/s21041430 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1430

Author(s):

Xiaogang Jia ◽

Wei Chen ◽

Zhengfa Liang ◽

Xin Luo ◽

Mingfei Wu ◽

...

Keyword(s):

Stereo Matching ◽

Computational Cost ◽

Research Field ◽

Disparity Map ◽

Improve Performance ◽

Cost Aggregation ◽

Disparity Range ◽

Public Datasets ◽

Coarse To Fine ◽

Speed And Accuracy

Stereo matching is an important research field of computer vision. Due to the dimension of cost aggregation, current neural network-based stereo methods are difficult to trade-off speed and accuracy. To this end, we integrate fast 2D stereo methods with accurate 3D networks to improve performance and reduce running time. We leverage a 2D encoder-decoder network to generate a rough disparity map and construct a disparity range to guide the 3D aggregation network, which can significantly improve the accuracy and reduce the computational cost. We use a stacked hourglass structure to refine the disparity from coarse to fine. We evaluated our method on three public datasets. According to the KITTI official website results, Our network can generate an accurate result in 80 ms on a modern GPU. Compared to other 2D stereo networks (AANet, DeepPruner, FADNet, etc.), our network has a big improvement in accuracy. Meanwhile, it is significantly faster than other 3D stereo networks (5× than PSMNet, 7.5× than CSN and 22.5× than GANet, etc.), demonstrating the effectiveness of our method.

Download Full-text

Experimental and Analytical Decentralized Adaptive Control of a 7-DOF Robot Manipulator

Volume 1: Adaptive/Intelligent Sys. Control; Driver Assistance/Autonomous Tech.; Control Design Methods; Nonlinear Control; Robotics; Assistive/Rehabilitation Devices; Biomedical/Neural Systems; Building Energy Systems; Connected Vehicle Systems; Control/Estimation of Energy Systems; Control Apps.; Smart Buildings/Microgrids; Education; Human-Robot Systems; Soft Mechatronics/Robotic Components/Systems; Energy/Power Systems; Energy Storage; Estimation/Identification; Vehicle Efficiency/Emissions ◽

10.1115/dscc2020-3181 ◽

2020 ◽

Author(s):

Alexander Bertino ◽

Peiman Naseradinmousavi ◽

Atul Kelkar

Keyword(s):

Adaptive Control ◽

Control Strategy ◽

Computational Efficiency ◽

Tracking Control ◽

Robot Manipulator ◽

Problem Formulation ◽

Experimental Results ◽

Model Free ◽

Trajectory Simulation ◽

Adaptive Control Strategy

Abstract In this paper, we study the analytical and experimental control of a 7-DOF robot manipulator. A model-free decentralized adaptive control strategy is presented for the tracking control of the manipulator. The problem formulation and experimental results demonstrate the computational efficiency and simplicity of the proposed method. The results presented here are one of the first known experiments on a redundant 7-DOF robot. The efficacy of the adaptive decentralized controller is demonstrated experimentally by using the Baxter robot to track a desired trajectory. Simulation and experimental results clearly demonstrate the versatility, tracking performance, and computational efficiency of this method.

Download Full-text

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Journal of Cheminformatics ◽

10.1186/s13321-020-00479-8 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Dejun Jiang ◽

Zhenxing Wu ◽

Chang-Yu Hsieh ◽

Guangyong Chen ◽

Ben Liao ◽

...

Keyword(s):

Neural Networks ◽

Computational Efficiency ◽

Domain Knowledge ◽

Prediction Models ◽

Computational Cost ◽

Large Dataset ◽

Predictive Capacity ◽

Classification Tasks ◽

Graph Neural Networks ◽

Public Datasets

AbstractGraph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.

Download Full-text

Adversarial Deep Structural Networks for Mammographic Mass Segmentation

10.1101/095786 ◽

2016 ◽

Cited By ~ 13

Author(s):

Wentao Zhu ◽

Xiaohui Xie

Keyword(s):

Conditional Random Fields ◽

Model Potential ◽

Natural Image ◽

Structural Learning ◽

Mass Detection ◽

Convolutional Network ◽

Adversarial Training ◽

Mass Segmentation ◽

End To End ◽

Public Datasets

AbstractMass segmentation is an important task in mammogram analysis, providing effective morphological features and regions of interest (ROI) for mass detection and classification. Inspired by the success of using deep convolutional features for natural image analysis and conditional random fields (CRF) for structural learning, we propose an end-to-end network for mammographic mass segmentation. The network employs a fully convolutional network (FCN) to model potential function, followed by a CRF to perform structural learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with position priori for the task. Due to the small size of mammogram datasets, we use adversarial training to control over-fitting. Four models with different convolutional kernels are further fused to improve the segmentation results. Experimental results on two public datasets, INbreast and DDSM-BCRP, show that our end-to-end network combined with adversarial training achieves the-state-of-the-art results.

Download Full-text

Towards High-Level Intrinsic Exploration in Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/733 ◽

2020 ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

State Of The Art ◽

Experimental Results ◽

Prior Work ◽

Extrinsic Rewards ◽

Intrinsic Reward ◽

Long Time ◽

End To End ◽

High Level

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.

Download Full-text