audio data
Recently Published Documents


TOTAL DOCUMENTS

552
(FIVE YEARS 214)

H-INDEX

18
(FIVE YEARS 5)

2022 ◽  
Vol 40 (1) ◽  
pp. 1-23
Author(s):  
Jiaxing Shen ◽  
Jiannong Cao ◽  
Oren Lederman ◽  
Shaojie Tang ◽  
Alex “Sandy” Pentland

User profiling refers to inferring people’s attributes of interest ( AoIs ) like gender and occupation, which enables various applications ranging from personalized services to collective analyses. Massive nonlinguistic audio data brings a novel opportunity for user profiling due to the prevalence of studying spontaneous face-to-face communication. Nonlinguistic audio is coarse-grained audio data without linguistic content. It is collected due to privacy concerns in private situations like doctor-patient dialogues. The opportunity facilitates optimized organizational management and personalized healthcare, especially for chronic diseases. In this article, we are the first to build a user profiling system to infer gender and personality based on nonlinguistic audio. Instead of linguistic or acoustic features that are unable to extract, we focus on conversational features that could reflect AoIs. We firstly develop an adaptive voice activity detection algorithm that could address individual differences in voice and false-positive voice activities caused by people nearby. Secondly, we propose a gender-assisted multi-task learning method to combat dynamics in human behavior by integrating gender differences and the correlation of personality traits. According to the experimental evaluation of 100 people in 273 meetings, we achieved 0.759 and 0.652 in F1-score for gender identification and personality recognition, respectively.


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 592
Author(s):  
Deokgyu Yun ◽  
Seung Ho Choi

This paper proposes an audio data augmentation method based on deep learning in order to improve the performance of dereverberation. Conventionally, audio data are augmented using a room impulse response, which is artificially generated by some methods, such as the image method. The proposed method estimates a reverberation environment model based on a deep neural network that is trained by using clean and recorded audio data as inputs and outputs, respectively. Then, a large amount of a real augmented database is constructed by using the trained reverberation model, and the dereverberation model is trained with the augmented database. The performance of the augmentation model was verified by a log spectral distance and mean square error between the real augmented data and the recorded data. In addition, according to dereverberation experiments, the proposed method showed improved performance compared with the conventional method.


2021 ◽  
Vol 11 (24) ◽  
pp. 11926
Author(s):  
Gerard Roma ◽  
Anna Xambó ◽  
Owen Green ◽  
Pierre Alexandre Tremblay

While audio data play an increasingly central role in computer-based music production, interaction with large sound collections in most available music creation and production environments is very often still limited to scrolling long lists of file names. This paper describes a general framework for devising interactive applications based on the content-based visualization of sound collections. The proposed framework allows for a modular combination of different techniques for sound segmentation, analysis, and dimensionality reduction, using the reduced feature space for interactive applications. We analyze several prototypes presented in the literature and describe their limitations. We propose a more general framework that can be used flexibly to devise music creation interfaces. The proposed approach includes several novel contributions with respect to previously used pipelines, such as using unsupervised feature learning, content-based sound icons, and control of the output space layout. We present an implementation of the framework using the SuperCollider computer music language, and three example prototypes demonstrating its use for data-driven music interfaces. Our results demonstrate the potential of unsupervised machine learning and visualization for creative applications in computer music.


MANUSYA ◽  
2021 ◽  
Vol 24 (2) ◽  
pp. 288-309
Author(s):  
Tabtip Kanchanapoomi ◽  
Wannapa Trakulkasemsuk

Abstract Laughter is not just an element in human communication that signifies happiness and enjoyment, it can be used as a communication strategy to lubricate successful interaction including business communication. Nonetheless, not many studies have paid attention to laughter in business communication. Therefore, this paper sheds light on how Thai and Burmese participants used laughter in a restaurant and in a business meeting in Yangon, Myanmar. Audio data was collected together with various pieces of ethnographic data, for example, participant observations reported from extensive field notes, semi-structured interviews and audio recordings. The analysis was based on the classification of laughter adopted from Hayakawa (2003), and Murata and Hori (2007). The findings reveal that laughter is deployed as a communication strategy with different purposes such as to make fun of work, to ease tension and to threaten other interlocutors and unveil those factors which stimulate the laughter in informal and formal settings.


2021 ◽  
Vol 28 (4) ◽  
pp. 22-38
Author(s):  
Sergey V. Dvoryankin ◽  
Artem E. Zenov ◽  
Roman A. Ustinov ◽  
Nikita S. Dvoryankin

2021 ◽  
Vol 11 (22) ◽  
pp. 11073
Author(s):  
Jisu Kwon ◽  
Daejin Park

On-device artificial intelligence has attracted attention globally, and attempts to combine the internet of things and TinyML (machine learning) applications are increasing. Although most edge devices have limited resources, time and energy costs are important when running TinyML applications. In this paper, we propose a structure in which the part that preprocesses externally input data in the TinyML application is distributed to the hardware. These processes are performed using software in the microcontroller unit of an edge device. Furthermore, resistor–transistor logic, which perform not only windowing using the Hann function, but also acquire audio raw data, is added to the inter-integrated circuit sound module that collects audio data in the voice-recognition application. As a result of the experiment, the windowing function was excluded from the TinyML application of the embedded board. When the length of the hardware-implemented Hann window is 80 and the quantization degree is 2−5, the exclusion causes a decrease in the execution time of the front-end function and energy consumption by 8.06% and 3.27%, respectively.


Sign in / Sign up

Export Citation Format

Share Document