temporal segment
Recently Published Documents


TOTAL DOCUMENTS

22
(FIVE YEARS 11)

H-INDEX

4
(FIVE YEARS 1)

PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261001
Author(s):  
Alexander Fischenich ◽  
Jan Hots ◽  
Jesko Verhey ◽  
Julia Guldan ◽  
Daniel Oberfeld

Loudness judgments of sounds varying in level across time show a non-uniform temporal weighting, with increased weights assigned to the beginning of the sound (primacy effect). In addition, higher weights are observed for temporal components that are higher in level than the remaining components (loudness dominance). In three experiments, sounds consisting of 100- or 475-ms Gaussian wideband noise segments with random level variations were presented and either none, the first, or a central temporal segment was amplified or attenuated. In Experiment 1, the sounds consisted of four 100-ms segments that were separated by 500-ms gaps. Previous experiments did not show a primacy effect in such a condition. In Experiment 2, four- or ten-100-ms-segment sounds without gaps between the segments were presented to examine the interaction between the primacy effect and level dominance. As expected, for the sounds with segments separated by gaps, no primacy effect was observed, but weights on amplified segments were increased and weights on attenuated segments were decreased. For the sounds with contiguous segments, a primacy effect as well as effects of relative level (similar to those in Experiment 1) were found. For attenuation, the data indicated no substantial interaction between the primacy effect and loudness dominance, whereas for amplification an interaction was present. In Experiment 3, sounds consisting of either four contiguous 100-ms or 475-ms segments, or four 100-ms segments separated by 500-ms gaps were presented. Effects of relative level were more pronounced for the contiguous sounds. Across all three experiments, the effects of relative level were more pronounced for attenuation. In addition, the effects of relative level showed a dependence on the position of the change in level, with opposite direction for attenuation compared to amplification. Some of the results are in accordance with explanations based on masking effects on auditory intensity resolution.


Author(s):  
Siyu Jiang ◽  
Guobin Wu

In this paper, we tackle the task of natural language video localization (NLVL): given an untrimmed video and a description language query, the goal is to localize the temporal segment within the video that best describes the natural language description. NLVL is challenging at the intersection of language and video understanding because a video may contain multiple segments of interests and the language may describe complicated temporal dependencies. Though existing approaches have achieved good performance, most of them did not fully consider the inherent differences between language and video modalities. Here, we propose Moment Relation Network (MRN) to reduce the divergence of the probability distribution of these two modalities. Specifically, MRN trains video and language subnets, and then uses transfer learning techniques to map the extracted features into an embedding-shared space where we calculate the similarity of two modalities using Mahalanobis distance metric, which is used to localize moments. Extensive experiments on benchmark datasets show that the proposed MRN significantly outperforms the state-of-the-art under the widely used metrics by a large margin.


2020 ◽  
Vol 2020 (3) ◽  
pp. 103-109
Author(s):  
Gulnara F. Lutfullina

It is necessary to imply the unity of the temporal and spatial segments of the perceived situation and perception situation, only in this case can we assume its observability. Present Continuous expresses the relevance of the temporal segment on the time axis. It expresses the action which is actual and relevant to speech moment. Present Continuous is able to imply a perception situation provided the time and place unity. If there is a spatial disunity of the speaker / observer and the participants of the other situation, their mutual observability is excluded. If the condition of spatial unity is not observed, only the value of temporal synchronization is realized. It turns out that the autonomous use of Present Continuous does not always imply a situation of perception. The crucial role belongs to the context. If the speaker and participants of another simultaneous situation are separated spatially, they can’t observe each other.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 179118-179127
Author(s):  
Qian Li ◽  
Wenzhu Yang ◽  
Xiangyang Chen ◽  
Tongtong Yuan ◽  
Yuxia Wang

2019 ◽  
Vol 41 (11) ◽  
pp. 2740-2755 ◽  
Author(s):  
Limin Wang ◽  
Yuanjun Xiong ◽  
Zhe Wang ◽  
Yu Qiao ◽  
Dahua Lin ◽  
...  

Author(s):  
David Ivorra-Piqueres ◽  
John Alejandro Castro Vargas ◽  
Pablo Martinez-Gonzalez

In this work, the authors propose several techniques for accelerating a modern action recognition pipeline. This article reviewed several recent and popular action recognition works and selected two of them as part of the tools used for improving the aforementioned acceleration. Specifically, temporal segment networks (TSN), a convolutional neural network (CNN) framework that makes use of a small number of video frames for obtaining robust predictions which have allowed to win the first place in the 2016 ActivityNet challenge, and MotionNet, a convolutional-transposed CNN that is capable of inferring optical flow RGB frames. Together with the last proposal, this article integrated a new software for decoding videos that takes advantage of NVIDIA GPUs. This article shows a proof of concept for this approach by training the RGB stream of the TSN network in videos loaded with NVIDIA Video Loader (NVVL) of a subset of daily actions from the University of Central Florida 101 dataset.


Sign in / Sign up

Export Citation Format

Share Document