Georeferencing Flickr Resources Based on Multimodal Features

Author(s):  
Pascal Kelm ◽  
Sebastian Schmiedeke ◽  
Steven Schockaert ◽  
Thomas Sikora ◽  
Michele Trevisiol ◽  
...  
Keyword(s):  
Author(s):  
Felipe Ferreira ◽  
Daniele R. Souza ◽  
Igor Moura ◽  
Matheus Barbieri ◽  
Hélio C. V. Lopes
Keyword(s):  

2021 ◽  
Vol 11 (3) ◽  
pp. 1064
Author(s):  
Jenq-Haur Wang ◽  
Yen-Tsang Wu ◽  
Long Wang

In social networks, users can easily share information and express their opinions. Given the huge amount of data posted by many users, it is difficult to search for relevant information. In addition to individual posts, it would be useful if we can recommend groups of people with similar interests. Past studies on user preference learning focused on single-modal features such as review contents or demographic information of users. However, such information is usually not easy to obtain in most social media without explicit user feedback. In this paper, we propose a multimodal feature fusion approach to implicit user preference prediction which combines text and image features from user posts for recommending similar users in social media. First, we use the convolutional neural network (CNN) and TextCNN models to extract image and text features, respectively. Then, these features are combined using early and late fusion methods as a representation of user preferences. Lastly, a list of users with the most similar preferences are recommended. The experimental results on real-world Instagram data show that the best performance can be achieved when we apply late fusion of individual classification results for images and texts, with the best average top-k accuracy of 0.491. This validates the effectiveness of utilizing deep learning methods for fusing multimodal features to represent social user preferences. Further investigation is needed to verify the performance in different types of social media.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ying Li ◽  
Hang Sun ◽  
Shiyao Feng ◽  
Qi Zhang ◽  
Siyu Han ◽  
...  

Abstract Background Long noncoding RNAs (lncRNAs) play important roles in multiple biological processes. Identifying LncRNA–protein interactions (LPIs) is key to understanding lncRNA functions. Although some LPIs computational methods have been developed, the LPIs prediction problem remains challenging. How to integrate multimodal features from more perspectives and build deep learning architectures with better recognition performance have always been the focus of research on LPIs. Results We present a novel multichannel capsule network framework to integrate multimodal features for LPI prediction, Capsule-LPI. Capsule-LPI integrates four groups of multimodal features, including sequence features, motif information, physicochemical properties and secondary structure features. Capsule-LPI is composed of four feature-learning subnetworks and one capsule subnetwork. Through comprehensive experimental comparisons and evaluations, we demonstrate that both multimodal features and the architecture of the multichannel capsule network can significantly improve the performance of LPI prediction. The experimental results show that Capsule-LPI performs better than the existing state-of-the-art tools. The precision of Capsule-LPI is 87.3%, which represents a 1.7% improvement. The F-value of Capsule-LPI is 92.2%, which represents a 1.4% improvement. Conclusions This study provides a novel and feasible LPI prediction tool based on the integration of multimodal features and a capsule network. A webserver (http://csbg-jlu.site/lpc/predict) is developed to be convenient for users.


Author(s):  
Yuka Akiyama

This chapter examines the effects of lexical categories on Focus on Form (FonF) and the use of multimodal features of Skype for preemptive and reactive Language-Related Episodes (LREs) in a task-based language exchange via Skype (i.e. telecollaboration). Twelve pairs of Japanese-as-a-foreign-language learners and native speakers of Japanese engaged in two decision-making tasks. Each task prompt included target vocabulary of different lexical categories (nouns or onomatopoeia) that participants had to negotiate for task completion. The quantitative analysis of oral interaction revealed a significant effect of lexical categories on the total number and linguistic focus (i.e. morphological, lexical, and phonological items) of preemptive LREs, as well as the correction method, linguistic focus, and the uptake rate of reactive LREs. The analysis of multimodal interaction revealed that participants often used text chat, images, and webcams to carry out telecollaborative interaction and that the lexical categories affected which of these multimodal features of Skype are used for FonF.


2020 ◽  
Vol 2020 ◽  
pp. 1-18
Author(s):  
Chao Tang ◽  
Huosheng Hu ◽  
Wenjian Wang ◽  
Wei Li ◽  
Hua Peng ◽  
...  

The representation and selection of action features directly affect the recognition effect of human action recognition methods. Single feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that the existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper proposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action features with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good geometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion and has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space structure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decision-making classification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and CAD60 datasets.


Author(s):  
Yuhong Yang

Abstract Television interpreting, although serving the largest population of users, is underexplored compared with conference interpreting or community interpreting by the academic community, not to mention any systematic, in-depth analysis foregrounding or tailored to its salient multimodal features. Drawing on Kress and van Leeuwen’s multimodal social-semiotic theory of communication as well as frameworks established in nonverbal communication and audiovisual translation, this paper moves away from traditional language-based discussions of interpreter-mediated television events and attempts to gain new insights into this essentially multimodal communicative practice through multimodal analysis of data. This paper purports to testify a tentative framework of modal relations of “complementarity”, “dependency”, and “incongruity” which are at work in interpreted television events, with authentic data, amounting to a total length of 5 hours, recorded from live news programmes on Chinese TV. The findings of modal complementarity and dependency clearly point to the essentially multimodal meaning making mechanism involved in the semiotic ensemble that is to be perceived by the audience in a gestalt fashion, which reveals the inadequacy of linguistic approaches to television interpreting.


2019 ◽  
Vol 71 (1) ◽  
pp. 29-42
Author(s):  
Hong Wang ◽  
Yong‐Qiang Song ◽  
Lu‐Tong Wang ◽  
Xiao‐Hong Hu

Sign in / Sign up

Export Citation Format

Share Document