scholarly journals Co-constructing utterances in face-to-face-interaction: A multimodal analysis of collaborative completions in spoken Spanish

Author(s):  
Alexander Teixeira Kalkhoff ◽  
Dennis Dressel

This article examines collaborative utterances in interaction from a multimodal perspective. Whereas prior research has analyzed co-constructions ex post as the result of local speaker collaboration on the basis of audio data, this study shifts the focus to co-constructing as a highly coordinated, embodied practice. By examining video data of Spanish interactions, this research aims to show how speakers systematically deploy a variety of linguistic and bodily resources that serve as points of joint orientation throughout the process of co-constructing utterances.

2018 ◽  
Vol 11 (6) ◽  
Author(s):  
Ülkü Arslan Aydin ◽  
Sinan Kalkan ◽  
Cengiz Acarturk

The analysis of dynamic scenes has been a challenging domain in eye tracking research. This study presents a framework, named MAGiC, for analyzing gaze contact and gaze aversion in face-to-face communication. MAGiC provides an environment that is able to detect and track the conversation partner’s face automatically, overlay gaze data on top of the face video, and incorporate speech by means of speech-act annotation. Specifically, MAGiC integrates eye tracking data for gaze, audio data for speech segmentation, and video data for face tracking. MAGiC is an open source framework and its usage is demonstrated via publicly available video content and wiki pages. We explored the capabilities of MAGiC through a pilot study and showed that it facilitates the analysis of dynamic gaze data by reducing the annotation effort and the time spent for manual analysis of video data.


Author(s):  
Andreas M. Kist ◽  
Pablo Gómez ◽  
Denis Dubrovskiy ◽  
Patrick Schlegel ◽  
Melda Kunduk ◽  
...  

Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533


Author(s):  
Mirian Palmeira

The aim of this chapter is to identify whether frontline employees perceived themselves as having feelings of sexism, ageism, and appearance discrimination against customers in retail services. This investigation is a quantitative research, a conclusive description (Gil, 2002), and ex post facto study, which utilises a survey to collect the data and sampling by convenience. Three protocols are used (1) to format the questionnaire, (2) to produce 12 different standards combining age, gender, and appearance, and (3) to create social classification (Rattam, 1998). In a previous study (Palmeira, Palmeira, & Santos, 2012), customers of different ages and genders perceived some degree of prejudice and discrimination in face-to-face retail services. Now, on the other side of the coin, frontline employees who work in Fashion and Food retailing recognise that there is prejudiced behaviour against customers, depending on their age, gender, and appearance, when providing them with face-to-face retail services. More than 95% of female and more than 64% of male attendants believe that well-dressed, young female customers are given priority when being served. Almost 80% of female and only 58% of male frontline workers believe that badly-dressed middle-aged men (not younger men) are the last to be served when there is no clear queuing process in the retail spatial area. This context strongly suggests the growing importance of an interpersonal skills training process for an organisations' staff as a way of avoiding behaviour that makes the customers think that there are prejudice and discrimination in the service process, as well as ASL development (T&D against Ageism, Sexism, and Lookism) being part of the strategic statements.


Author(s):  
Michael Odzer ◽  
Kristina Francke

Abstract The sound of waves breaking on shore, or against an obstruction or jetty, is an immediately recognizable sound pattern which could potentially be employed by a sensor system to identify obstructions. If frequency patterns produced by breaking waves can be reproduced and mapped in a laboratory setting, a foundational understanding of the physics behind this process could be established, which could then be employed in sensor development for navigation. This study explores whether wave-breaking frequencies correlate with the physics behind the collapsing of the wave, and whether frequencies of breaking waves recorded in a laboratory tank will follow the same pattern as frequencies produced by ocean waves breaking on a beach. An artificial “beach” was engineered to replicate breaking waves inside a laboratory wave tank. Video and audio recordings of waves breaking in the tank were obtained, and audio of ocean waves breaking on the shoreline was recorded. The audio data was analysed in frequency charts. The video data was evaluated to correlate bubble sizes to frequencies produced by the waves. The results supported the hypothesis that frequencies produced by breaking waves in the wave tank followed the same pattern as those produced by ocean waves. Analysis utilizing a solution to the Rayleigh-Plesset equation showed that the bubble sizes produced by breaking waves were inversely related to the pattern of frequencies. This pattern can be reproduced in a controlled laboratory environment and extrapolated for use in developing navigational sensors for potential applications in marine navigation such as for use with autonomous ocean vehicles.


Author(s):  
Udo Kuckartz ◽  
Stefan Rädiker
Keyword(s):  

2021 ◽  
Vol 3 ◽  
Author(s):  
Sushovan Chanda ◽  
Kedar Fitwe ◽  
Gauri Deshpande ◽  
Björn W. Schuller ◽  
Sachin Patel

Research on self-efficacy and confidence has spread across several subfields of psychology and neuroscience. The role of one’s confidence is very crucial in the formation of attitude and communication skills. The importance of differentiating the levels of confidence is quite visible in this domain. With the recent advances in extracting behavioral insight from a signal in multiple applications, detecting confidence is found to have great importance. One such prominent application is detecting confidence in interview conversations. We have collected an audiovisual data set of interview conversations with 34 candidates. Every response (from each of the candidate) of this data set is labeled with three levels of confidence: high, medium, and low. Furthermore, we have also developed algorithms to efficiently compute such behavioral confidence from speech and video. A deep learning architecture is proposed for detecting confidence levels (high, medium, and low) from an audiovisual clip recorded during an interview. The achieved unweighted average recall (UAR) reaches 85.9% on audio data and 73.6% on video data captured from an interview session.


2018 ◽  
Vol 1 (1) ◽  
Author(s):  
Zhitao Li

The audio and video decoding and synchronization playback system ofMPEG-2 TS stream is designed and implemented based on ARM embedded system. In this system, hardware processor is embedded in the ARM processor. In order to make full use of this resource, hardware MFC is adopted. The multi-format codec decoder decodes the video data and decodes the audio data using the open source Mad (libmad) library. The V4L2 (Video for Linux2) driver interface and the ALSA (advanced Linux sound architecture) library are used to implement the video. Because the video frame playback period and the hardware processing delay are inconsistent, the system has a time difference between the audio and video data operations, which causes the audio and video playback to be out of sync. Therefore, we use the method of synchronizing the video playback implemented to the audio playback stream; realize the audio and video are playing sync. Test results show that, the designed audio decodes and synchronization playback system can decode and synchronize audio and video data.


Author(s):  
Paul McIlvenny

Consumer versions of the passive 360° and stereoscopic omni-directional camera have recently come to market, generating new possibilities for qualitative video data collection. This paper discusses some of the methodological issues raised by collecting, manipulating and analysing complex video data recorded with 360° cameras and ambisonic microphones. It also reports on the development of a simple, yet powerful prototype to support focused engagement with such 360° recordings of a scene. The paper proposes that we ‘inhabit’ video through a tangible interface in virtual reality (VR) in order to explore complex spatial video and audio recordings of a single scene in which social interaction took place. The prototype is a software package called AVA360VR (‘Annotate, Visualise, Analyse 360° video in VR’). The paper is illustrated through a number of video clips, including a composite video of raw and semi-processed multi-cam recordings, a 360° video with spatial audio, a video comprising a sequence of static 360° screenshots of the AVA360VR interface, and a video comprising several screen capture clips of actual use of the tool. The paper discusses the prototype’s development and its analytical possibilities when inhabiting spatial video and audio footage as a complementary mode of re-presenting, engaging with, sharing and collaborating on interactional video data.


Sign in / Sign up

Export Citation Format

Share Document