scholarly journals A Combined One-Class SVM and Template-Matching Approach for User-Aided Human Fall Detection by Means of Floor Acoustic Features

2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Diego Droghini ◽  
Daniele Ferretti ◽  
Emanuele Principi ◽  
Stefano Squartini ◽  
Francesco Piazza

The primary cause of injury-related death for the elders is represented by falls. The scientific community devoted them particular attention, since injuries can be limited by an early detection of the event. The solution proposed in this paper is based on a combined One-Class SVM (OCSVM) and template-matching classifier that discriminate human falls from nonfalls in a semisupervised framework. Acoustic signals are captured by means of a Floor Acoustic Sensor; then Mel-Frequency Cepstral Coefficients and Gaussian Mean Supervectors (GMSs) are extracted for the fall/nonfall discrimination. Here we propose a single-sensor two-stage user-aided approach: in the first stage, the OCSVM detects abnormal acoustic events. In the second, the template-matching classifier produces the final decision exploiting a set of template GMSs related to the events marked as false positives by the user. The performance of the algorithm has been evaluated on a corpus containing human falls and nonfall sounds. Compared to the OCSVM only approach, the proposed algorithm improves the performance by 10.14% in clean conditions and 4.84% in noisy conditions. Compared to Popescu and Mahnot (2009) the performance improvement is 19.96% in clean conditions and 8.08% in noisy conditions.

2021 ◽  
Vol 13 (4) ◽  
pp. 628
Author(s):  
Liang Ye ◽  
Tong Liu ◽  
Tian Han ◽  
Hany Ferdinando ◽  
Tapio Seppänen ◽  
...  

Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and video sequence-based methods. Sensors and surveillance cameras are used to detect campus violence. In this paper, the authors use image features and acoustic features for campus violence detection. Campus violence data are gathered by role-playing, and 4096-dimension feature vectors are extracted from every 16 frames of video images. The C3D (Convolutional 3D) neural network is used for feature extraction and classification, and an average recognition accuracy of 92.00% is achieved. Mel-frequency cepstral coefficients (MFCCs) are extracted as acoustic features, and three speech emotion databases are involved. The C3D neural network is used for classification, and the average recognition accuracies are 88.33%, 95.00%, and 91.67%, respectively. To solve the problem of evidence conflict, the authors propose an improved Dempster–Shafer (D–S) algorithm. Compared with existing D–S theory, the improved algorithm increases the recognition accuracy by 10.79%, and the recognition accuracy can ultimately reach 97.00%.


Author(s):  
Diego Droghini ◽  
Emanuele Principi ◽  
Stefano Squartini ◽  
Paolo Olivetti ◽  
Francesco Piazza

2018 ◽  
Vol 7 (2.16) ◽  
pp. 98 ◽  
Author(s):  
Mahesh K. Singh ◽  
A K. Singh ◽  
Narendra Singh

This paper emphasizes an algorithm that is based on acoustic analysis of electronics disguised voice. Proposed work is given a comparative analysis of all acoustic feature and its statistical coefficients. Acoustic features are computed by Mel-frequency cepstral coefficients (MFCC) method and compare with a normal voice and disguised voice by different semitones. All acoustic features passed through the feature based classifier and detected the identification rate of all type of electronically disguised voice. There are two types of support vector machine (SVM) and decision tree (DT) classifiers are used for speaker identification in terms of classification efficiency of electronically disguised voice by different semitones.  


2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Gregory Koshmak ◽  
Amy Loutfi ◽  
Maria Linden

Emergency situations associated with falls are a serious concern for an aging society. Yet following the recent development within ICT, a significant number of solutions have been proposed to track body movement and detect falls using various sensor technologies, thereby facilitating fall detection and in some cases prevention. A number of recent reviews on fall detection methods using ICT technologies have emerged in the literature and an increasingly popular approach considers combining information from several sensor sources to assess falls. The aim of this paper is to review in detail the subfield of fall detection techniques that explicitly considers the use of multisensor fusion based methods to assess and determine falls. The paper highlights key differences between the single sensor-based approach and a multifusion one. The paper also describes and categorizes the various systems used, provides information on the challenges of a multifusion approach, and finally discusses trends for future work.


2014 ◽  
Author(s):  
◽  
Liang Liu

Fall among elders is a main reason to cause accidental death among the population over the age 65 in United States. The fall detection methods have been brought into scene by implemented on different fall monitoring devices. For the advantages in privacy protection and non-invasive, independent of light, I design the fall detection system based on Doppler radar sensor. This dissertation explores different Doppler radar sensor configurations and positioning in both of the lab and real senior home environment, signal processing and machine learning algorithms. Firstly, I design the system based on the data collected with three configurations: two floor radars, one ceiling and one wall radars, one ceiling and one floor radars in lab. The performance of the sensor positioning and features are evaluated with classifiers: support vector machine, nearest neighbor, naïve Bayes, hidden Markov model. In the real senior home, I investigate the system by evaluating the detection variances caused by training dataset due to the variable subjects and environment settings. Moreover, I adjust the automatic fall detection system for the actual retired community apartment. I examine different features: Mel-frequency cepstral coefficients (MFCCs), local binary patterns (LBP) and the combined version of features with RELIEF algorithm. I also improve the detection performance with both pre-screener and features selection. I fuse the radar fall detection system with motion sensors. I develop a standalone fall detection system and generate a result to display on a designed webpage.


2021 ◽  
Author(s):  
Wasifur Rahman ◽  
Sangwu Lee ◽  
Md. Saiful Islam ◽  
Victor Nikhil Antony ◽  
Harshil Ratnu ◽  
...  

BACKGROUND Access to neurological care—especially for Parkinson's disease (PD)—is a rare privilege for millions of people worldwide, especially in developing countries. In 2013, there were just 1200 neurologists in India for a population of 1.3 billion; the average population per neurologist exceeds 3.3 million in Africa. On the other hand, 60,000 people are diagnosed with Parkinson's disease (PD) every year in the US alone, and similar patterns of rising PD cases — fueled mostly by environmental pollution and an aging population can be seen worldwide. The current projection of more than 12 million PD patients worldwide by 2040 is only part of the picture since more than 20% of PD patients remain undiagnosed. Timely diagnosis and frequent assessment are keys to ensure timely and appropriate medical intervention, improving the quality of life for a PD patient. OBJECTIVE In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson’s disease (PD). METHODS We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) – from all over the US and beyond. A small portion of the data (roughly 7%) was collected in a lab setting to compare quality. The participants were instructed to utter a popular pangram containing all the letters in the English alphabet “the quick brown fox jumps over the lazy dog”. We extracted both standard acoustic features (Mel Frequency Cepstral Coefficients (MFCC), jitter and shimmer variants) and deep learning-based features from the speech data. Using these features, we trained several machine learning algorithms. We also applied model interpretation techniques like SHAP (SHapley Additive exPlanations) to find out the importance of each feature in determining the model’s output. RESULTS We achieved 0.75 AUC (Area Under the Curve) performance on determining presence of self-reported Parkinson’s disease by modeling the standard acoustic features through the XGBoost – a gradient-boosted decision tree model. Further analysis reveals that the widely used MFCC features and a subset of previously validated dysphonia features designed for detecting Parkinson’s from verbal phonation task (pronouncing ‘ahh’) influence the model’s decision most. CONCLUSIONS Our model performed equally well on data collected in controlled lab environment as well as ‘in the wild’ across different gender and age groups. Using this tool, we can collect data from almost anyone anywhere with a video/audio enabled device, contributing to equity and access in neurological care.


2019 ◽  
Vol 9 (12) ◽  
pp. 2470 ◽  
Author(s):  
Anvarjon Tursunov ◽  
Soonil Kwon ◽  
Hee-Suk Pang

The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.


2015 ◽  
Vol 39 (1) ◽  
pp. 81-88 ◽  
Author(s):  
Daniel Fernández Comesana ◽  
Keith R. Holland ◽  
Dolores García Escribano ◽  
Hans-Elias de Bree

Abstract Sound localization problems are usually tackled by the acquisition of data from phased microphone arrays and the application of acoustic holography or beamforming algorithms. However, the number of sensors required to achieve reliable results is often prohibitive, particularly if the frequency range of interest is wide. It is shown that the number of sensors required can be reduced dramatically providing the sound field is time stationary. The use of scanning techniques such as “Scan & Paint” allows for the gathering of data across a sound field in a fast and efficient way, using a single sensor and webcam only. It is also possible to characterize the relative phase field by including an additional static microphone during the acquisition process. This paper presents the theoretical and experimental basis of the proposed method to localise sound sources using only one fixed microphone and one moving acoustic sensor. The accuracy and resolution of the method have been proven to be comparable to large microphone arrays, thus constituting the so called “virtual phased arrays”.


2021 ◽  
Author(s):  
José Enrique Almanza-Medina ◽  
Benjamin Henson ◽  
Yuriy Zakharov

Many underwater applications that involve the use of autonomous underwater vehicles require accurate navigation systems. Image registration from acoustic images is a technique that can be used to achieve this task by comparing two consecutive sonar images and estimate the motion of the vechicle. The use of deep learning (DL) techniques for motion estimation can significantly reduce the processing complexity and achieve high-accuracy position estimates. In this paper we investigate the performance improvement when using two sonar sensors compared to using a single sensor. The DL network is trained using images generated by a sonar simulator. The results show an improvement in the estimation accuracy when using two sensors.


Sign in / Sign up

Export Citation Format

Share Document