Scene Graph Generation via Convolutional Message Passing and Class-Aware Memory Embeddings

Author(s):  
Yidong Zhang ◽  
Yunhong Wang ◽  
Yuanfang Guo
2020 ◽  
Vol 34 (07) ◽  
pp. 11237-11245
Author(s):  
Mahmoud Khademi ◽  
Oliver Schulte

We propose a new algorithm, called Deep Generative Probabilistic Graph Neural Networks (DG-PGNN), to generate a scene graph for an image. The input to DG-PGNN is an image, together with a set of region-grounded captions and object bounding-box proposals for the image. To generate the scene graph, DG-PGNN constructs and updates a new model, called a Probabilistic Graph Network (PGN). A PGN can be thought of as a scene graph with uncertainty: it represents each node and each edge by a CNN feature vector and defines a probability mass function (PMF) for node-type (object category) of each node and edge-type (predicate class) of each edge. The DG-PGNN sequentially adds a new node to the current PGN by learning the optimal ordering in a Deep Q-learning framework, where states are partial PGNs, actions choose a new node, and rewards are defined based on the ground-truth. After adding a node, DG-PGNN uses message passing to update the feature vectors of the current PGN by leveraging contextual relationship information, object co-occurrences, and language priors from captions. The updated features are then used to fine-tune the PMFs. Our experiments show that the proposed algorithm significantly outperforms the state-of-the-art results on the Visual Genome dataset for scene graph generation. We also show that the scene graphs constructed by DG-PGNN improve performance on the visual question answering task, for questions that need reasoning about objects and their interactions in the scene context.


2020 ◽  
Vol 14 (7) ◽  
pp. 546-553
Author(s):  
Zhenxing Zheng ◽  
Zhendong Li ◽  
Gaoyun An ◽  
Songhe Feng
Keyword(s):  

2021 ◽  
Vol 10 (7) ◽  
pp. 488
Author(s):  
Peng Li ◽  
Dezheng Zhang ◽  
Aziguli Wulamu ◽  
Xin Liu ◽  
Peng Chen

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.


Author(s):  
Yuyu Guo ◽  
Jingkuan Song ◽  
Lianli Gao ◽  
Heng Tao Shen
Keyword(s):  

2021 ◽  
Author(s):  
Zhichao Zhang ◽  
Junyu Dong ◽  
Qilu Zhao ◽  
Lin Qi ◽  
Shu Zhang
Keyword(s):  

Author(s):  
Yikang Li ◽  
Wanli Ouyang ◽  
Bolei Zhou ◽  
Jianping Shi ◽  
Chao Zhang ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document