speech data
Recently Published Documents


TOTAL DOCUMENTS

500
(FIVE YEARS 173)

H-INDEX

18
(FIVE YEARS 3)

2021 ◽  
Vol 12 ◽  
Author(s):  
Yasunori Yamada ◽  
Kaoru Shinkawa ◽  
Miyuki Nemoto ◽  
Tetsuaki Arai

Loneliness is a perceived state of social and emotional isolation that has been associated with a wide range of adverse health effects in older adults. Automatically assessing loneliness by passively monitoring daily behaviors could potentially contribute to early detection and intervention for mitigating loneliness. Speech data has been successfully used for inferring changes in emotional states and mental health conditions, but its association with loneliness in older adults remains unexplored. In this study, we developed a tablet-based application and collected speech responses of 57 older adults to daily life questions regarding, for example, one's feelings and future travel plans. From audio data of these speech responses, we automatically extracted speech features characterizing acoustic, prosodic, and linguistic aspects, and investigated their associations with self-rated scores of the UCLA Loneliness Scale. Consequently, we found that with increasing loneliness scores, speech responses tended to have less inflections, longer pauses, reduced second formant frequencies, reduced variances of the speech spectrum, more filler words, and fewer positive words. The cross-validation results showed that regression and binary-classification models using speech features could estimate loneliness scores with an R2 of 0.57 and detect individuals with high loneliness scores with 95.6% accuracy, respectively. Our study provides the first empirical results suggesting the possibility of using speech data that can be collected in everyday life for the automatic assessments of loneliness in older adults, which could help develop monitoring technologies for early detection and intervention for mitigating loneliness.


Diagnostics ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 2312
Author(s):  
Mingyao Yang ◽  
Jie Ma ◽  
Pin Wang ◽  
Zhiyong Huang ◽  
Yongming Li ◽  
...  

As a neurodegenerative disease, Parkinson’s disease (PD) is hard to identify at the early stage, while using speech data to build a machine learning diagnosis model has proved effective in its early diagnosis. However, speech data show high degrees of redundancy, repetition, and unnecessary noise, which influence the accuracy of diagnosis results. Although feature reduction (FR) could alleviate this issue, the traditional FR is one-sided (traditional feature extraction could construct high-quality features without feature preference, while traditional feature selection could achieve feature preference but could not construct high-quality features). To address this issue, the Hierarchical Boosting Dual-Stage Feature Reduction Ensemble Model (HBD-SFREM) is proposed in this paper. The major contributions of HBD-SFREM are as follows: (1) The instance space of the deep hierarchy is built by an iterative deep extraction mechanism. (2) The manifold features extraction method embeds the nearest neighbor feature preference method to form the dual-stage feature reduction pair. (3) The dual-stage feature reduction pair is iteratively performed by the AdaBoost mechanism to obtain instances features with higher quality, thus achieving a substantial improvement in model recognition accuracy. (4) The deep hierarchy instance space is integrated into the original instance space to improve the generalization of the algorithm. Three PD speech datasets and a self-collected dataset are used to test HBD-SFREM in this paper. Compared with other FR algorithms and deep learning algorithms, the accuracy of HBD-SFREM in PD speech recognition is improved significantly and would not be affected by a small sample dataset. Thus, HBD-SFREM could give a reference for other related studies.


Mathematics ◽  
2021 ◽  
Vol 9 (24) ◽  
pp. 3172
Author(s):  
Zeeshan Hameed ◽  
Waheed Ur Rehman ◽  
Wakeel Khan ◽  
Nasim Ullah ◽  
Fahad R. Albogamy

Parkinson’s disease (PD) is a progressive and long-term neurodegenerative disorder of the central nervous system. It has been studied that 90% of the PD subjects have voice impairments which are some of the vital characteristics of PD patients and have been widely used for diagnostic purposes. However, the curse of dimensionality, high aliasing, redundancy, and small sample size in PD speech data bring great challenges to classify PD objects. Feature reduction can efficiently solve these issues. However, existing feature reduction algorithms ignore high aliasing, noise, and the stability of algorithms, and thus fail to give substantial classification accuracy. To mitigate these problems, this study proposes a weighted hybrid feature reduction embedded with ensemble learning technique which comprises (1) hybrid feature reduction technique that increases inter-class variance, reduces intra-class variance, preserves the neighborhood structure of data, and remove co-related features that causes high aliasing and noise in classification. (2) Weighted-boosting method to train the model precisely. (3) Furthermore, the stability of the algorithm is enhanced by introducing a bagging strategy. The experiments were performed on three different datasets including two widely used datasets and a dataset provided by Southwest Hospital (Army Military Medical University) Chongqing, China. The experimental results indicated that compared with existing feature reduction methods, the proposed algorithm always shows the highest accuracy, precision, recall, and G-mean for speech data of PD. Moreover, the proposed algorithm not only shows excellent performance for classification but also deals with imbalanced data precisely and achieved the highest AUC in most of the cases. In addition, compared with state-of-the-art algorithms, the proposed method shows improvement up to 4.53%. In the future, this algorithm can be used for early and differential diagnoses, which are rated as challenging tasks.


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2986
Author(s):  
Federica Vitale ◽  
Bruno Carbonaro ◽  
Gennaro Cordasco ◽  
Anna Esposito ◽  
Stefano Marrone ◽  
...  

Currently, AI-based assistive technologies, particularly those involving sensitive data, such as systems for detecting mental illness and emotional disorders, are full of confidentiality, integrity, and security compromises. In the aforesaid context, this work proposes an algorithm for detecting depressive states based on only three never utilized speech markers. This reduced number of markers offers a valuable protection of personal (sensitive) data by not allowing for the retrieval of the speaker’s identity. The proposed speech markers are derived from the analysis of pitch variations measured in speech data obtained through a tale reading task performed by typical and depressed subjects. A sample of 22 subjects (11 depressed and 11 healthy, according to both psychiatric diagnosis and BDI classification) were involved. The reading wave files were listened to and split into a sequence of intervals, each lasting two seconds. For each subject’s reading and each reading interval, the average pitch, the pitch variation (T), the average pitch variation (A), and the inversion percentage (also called the oscillation percentage O) were automatically computed. The values of the triplet (Ti, Ai, Oi) for the i-th subject provide, all together, a 100% correct discrimination between the speech produced by typical and depressed individuals, while requiring a very low computational cost and offering a valuable protection of personal data.


Author(s):  
Darkhan Kuanyshbay ◽  
Olimzhon Baimuratov ◽  
Yedilkhan Amirgaliyev ◽  
Arailym Kuanyshbayeva

Author(s):  
T.V. Madhusudhana Rao ◽  
Suribabu Korada ◽  
Y. Srinivas

The speaker identification in Teleconferencing scenario, it is important to address whether a particular speaker is a part of a conference or not and to note that whether a particular speaker is spoken at the meeting or not. The feature vectors are extracted using MFCC-SDC-LPC. The Generalized Gamma Distribution is used to model the feature vectors. K-means algorithm is utilized to cluster the speech data. The test speaker is to be verified that he/she is a participant in the conference. A conference database is generated with 50 speakers. In order to test the model, 20 different speakers not belonging to the conference are also considered. The efficiency of the model developed is compared using various measures such as AR, FAR and MDR. And the system is tested by varying number of speakers in the conference. The results show that the model performs more robustly.


Author(s):  
Kate Broome ◽  
Patricia McCabe ◽  
Kimberley Docking ◽  
Maree Doble ◽  
Bronwyn Carrigg

Purpose This study aimed to provide detailed descriptive information about the speech of a heterogeneous cohort of children with autism spectrum disorder (ASD) and to explore whether subgroups exist based on this detailed speech data. High rates of delayed and disordered speech in both low-verbal and high-functioning children with ASD have been reported. There is limited information regarding the speech abilities of young children across a range of functional levels. Method Participants were 23 children aged 2;0–6;11 (years;months) with a diagnosis of ASD. Comprehensive speech and language assessments were administered. Independent and relational speech analyses were conducted from single-word naming tasks and spontaneous speech samples. Hierarchical clustering based on language, nonverbal communication, and spontaneous speech descriptive data was completed. Results Independent and relational speech analyses are reported. These variables are used in the cluster analyses, which identified three distinct subgroups: (a) children with high language and high speech ability ( n = 10), (b) children with low expressive language and low speech ability but higher receptive language and use of gestures ( n = 3), and (c) children with low language and low speech development ( n = 10). Conclusions This is the first study to provide detailed descriptive speech data of a heterogeneous cohort of children with ASD and use this information to statistically explore potential subgroups. Clustering suggests a small number of children present with low levels of speech and expressive language in the presence of better receptive language and gestures. This communication profile warrants further exploration. Replicating these findings with a larger cohort of children is needed. Supplemental Material https://doi.org/10.23641/asha.16906978


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Qiuyu Zhang ◽  
Zhenyu Zhao ◽  
Minrui Fu

In order to ensure the confidentiality and secure sharing of speech data, and to solve the problems of slow deployment of attribute encryption systems and fine-grained access control in cloud storage, a speech encryption scheme based on ciphertext policy hierarchical attributes was proposed. First, perform hierarchical processing of the attributes of the speech data to reflect the hierarchical structure and integrate the hierarchical access structure into a single-access structure. Second, use the attribute fast encryption framework to construct the attribute encryption scheme of the speech data, and use the integrated access to the speech data; thus, the structure is encrypted and uploaded to the cloud for storage and sharing. Finally, use the hardness of decisional bilinear Diffie–Hellman (DBDH) assumption to prove that the proposed scheme is secure in the random oracle model. The theoretical security analysis and experimental results show that the proposed scheme can achieve efficient and fine-grained access control and is secure and extensible.


Sign in / Sign up

Export Citation Format

Share Document