Georeferencing Flickr Resources Based on Multimodal Features

In social networks, users can easily share information and express their opinions. Given the huge amount of data posted by many users, it is difficult to search for relevant information. In addition to individual posts, it would be useful if we can recommend groups of people with similar interests. Past studies on user preference learning focused on single-modal features such as review contents or demographic information of users. However, such information is usually not easy to obtain in most social media without explicit user feedback. In this paper, we propose a multimodal feature fusion approach to implicit user preference prediction which combines text and image features from user posts for recommending similar users in social media. First, we use the convolutional neural network (CNN) and TextCNN models to extract image and text features, respectively. Then, these features are combined using early and late fusion methods as a representation of user preferences. Lastly, a list of users with the most similar preferences are recommended. The experimental results on real-world Instagram data show that the best performance can be achieved when we apply late fusion of individual classification results for images and texts, with the best average top-k accuracy of 0.491. This validates the effectiveness of utilizing deep learning methods for fusing multimodal features to represent social user preferences. Further investigation is needed to verify the performance in different types of social media.

Download Full-text

High-Accuracy Detection of Early Parkinson's Disease through Multimodal Features and Machine Learning

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2016.03.001 ◽

2016 ◽

Vol 90 ◽

pp. 13-21 ◽

Cited By ~ 42

Author(s):

R. Prashanth ◽

Sumantra Dutta Roy ◽

Pravat K. Mandal ◽

Shantanu Ghosh

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

High Accuracy ◽

Multimodal Features ◽

Early Parkinson’S Disease

Download Full-text

Capsule-LPI: a LncRNA–protein interaction predicting tool based on a capsule network

BMC Bioinformatics ◽

10.1186/s12859-021-04171-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ying Li ◽

Hang Sun ◽

Shiyao Feng ◽

Qi Zhang ◽

Siyu Han ◽

...

Keyword(s):

Protein Interactions ◽

State Of The Art ◽

Recognition Performance ◽

Feature Learning ◽

Biological Processes ◽

Multimodal Features ◽

Learning Architectures ◽

Motif Information ◽

Experimental Comparisons ◽

Better Than

Abstract Background Long noncoding RNAs (lncRNAs) play important roles in multiple biological processes. Identifying LncRNA–protein interactions (LPIs) is key to understanding lncRNA functions. Although some LPIs computational methods have been developed, the LPIs prediction problem remains challenging. How to integrate multimodal features from more perspectives and build deep learning architectures with better recognition performance have always been the focus of research on LPIs. Results We present a novel multichannel capsule network framework to integrate multimodal features for LPI prediction, Capsule-LPI. Capsule-LPI integrates four groups of multimodal features, including sequence features, motif information, physicochemical properties and secondary structure features. Capsule-LPI is composed of four feature-learning subnetworks and one capsule subnetwork. Through comprehensive experimental comparisons and evaluations, we demonstrate that both multimodal features and the architecture of the multichannel capsule network can significantly improve the performance of LPI prediction. The experimental results show that Capsule-LPI performs better than the existing state-of-the-art tools. The precision of Capsule-LPI is 87.3%, which represents a 1.7% improvement. The F-value of Capsule-LPI is 92.2%, which represents a 1.4% improvement. Conclusions This study provides a novel and feasible LPI prediction tool based on the integration of multimodal features and a capsule network. A webserver (http://csbg-jlu.site/lpc/predict) is developed to be convenient for users.

Download Full-text

Video Segmentation Using Hidden Markov Model with Multimodal Features

Lecture Notes in Computer Science - Image and Video Retrieval ◽

10.1007/978-3-540-27814-6_48 ◽

2004 ◽

pp. 401-409 ◽

Cited By ~ 6

Author(s):

Tae Meon Bae ◽

Sung Ho Jin ◽

Yong Man Ro

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Video Segmentation ◽

Hidden Markov ◽

Multimodal Features

Download Full-text

Using Skype to Focus on Form in Japanese Telecollaboration

Engaging Language Learners through Technology Integration - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-4666-6174-5.ch009 ◽

2014 ◽

pp. 181-209 ◽

Cited By ~ 4

Author(s):

Yuka Akiyama

Keyword(s):

Language Learners ◽

Multimodal Interaction ◽

Correction Method ◽

Task Completion ◽

Focus On Form ◽

Lexical Categories ◽

Foreign Language Learners ◽

Multimodal Features ◽

Linguistic Focus ◽

Oral Interaction

This chapter examines the effects of lexical categories on Focus on Form (FonF) and the use of multimodal features of Skype for preemptive and reactive Language-Related Episodes (LREs) in a task-based language exchange via Skype (i.e. telecollaboration). Twelve pairs of Japanese-as-a-foreign-language learners and native speakers of Japanese engaged in two decision-making tasks. Each task prompt included target vocabulary of different lexical categories (nouns or onomatopoeia) that participants had to negotiate for task completion. The quantitative analysis of oral interaction revealed a significant effect of lexical categories on the total number and linguistic focus (i.e. morphological, lexical, and phonological items) of preemptive LREs, as well as the correction method, linguistic focus, and the uptake rate of reactive LREs. The analysis of multimodal interaction revealed that participants often used text chat, images, and webcams to carry out telecollaborative interaction and that the lexical categories affected which of these multimodal features of Skype are used for FonF.

Download Full-text

Using a Multilearner to Fuse Multimodal Features for Human Action Recognition

Mathematical Problems in Engineering ◽

10.1155/2020/4358728 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18

Author(s):

Chao Tang ◽

Huosheng Hu ◽

Wenjian Wang ◽

Wei Li ◽

Hua Peng ◽

...

Keyword(s):

Action Recognition ◽

Feature Fusion ◽

Human Action Recognition ◽

Human Action ◽

Human Motion ◽

Image Features ◽

Depth Image ◽

Good Ability ◽

Multimodal Features ◽

Feature Based

The representation and selection of action features directly affect the recognition effect of human action recognition methods. Single feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that the existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper proposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action features with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good geometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion and has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space structure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decision-making classification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and CAD60 datasets.

Download Full-text

From text to ensemble: A multimodal study of television interpreting with cases from Chinese TV

Text & Talk - An Interdisciplinary Journal of Language Discourse Communication Studies ◽

10.1515/text-2019-2045 ◽

2019 ◽

Vol 39 (6) ◽

pp. 819-840

Author(s):

Yuhong Yang

Keyword(s):

Nonverbal Communication ◽

Meaning Making ◽

Academic Community ◽

Multimodal Analysis ◽

Communicative Practice ◽

Multimodal Features ◽

Depth Analysis ◽

Traditional Language ◽

Authentic Data ◽

Linguistic Approaches

Abstract Television interpreting, although serving the largest population of users, is underexplored compared with conference interpreting or community interpreting by the academic community, not to mention any systematic, in-depth analysis foregrounding or tailored to its salient multimodal features. Drawing on Kress and van Leeuwen’s multimodal social-semiotic theory of communication as well as frameworks established in nonverbal communication and audiovisual translation, this paper moves away from traditional language-based discussions of interpreter-mediated television events and attempts to gain new insights into this essentially multimodal communicative practice through multimodal analysis of data. This paper purports to testify a tentative framework of modal relations of “complementarity”, “dependency”, and “incongruity” which are at work in interpreted television events, with authentic data, amounting to a total length of 5 hours, recorded from live news programmes on Chinese TV. The findings of modal complementarity and dependency clearly point to the essentially multimodal meaning making mechanism involved in the semiotic ensemble that is to be perceived by the audience in a gestalt fashion, which reveals the inadequacy of linguistic approaches to television interpreting.

Download Full-text