mel frequency cepstral coefficients
Recently Published Documents


TOTAL DOCUMENTS

408
(FIVE YEARS 211)

H-INDEX

16
(FIVE YEARS 6)

Author(s):  
Murugaiya Ramashini ◽  
P. Emeroylariffion Abas ◽  
Kusuma Mohanchandra ◽  
Liyanage C. De Silva

Birds are excellent environmental indicators and may indicate sustainability of the ecosystem; birds may be used to provide provisioning, regulating, and supporting services. Therefore, birdlife conservation-related researches always receive centre stage. Due to the airborne nature of birds and the dense nature of the tropical forest, bird identifications through audio may be a better solution than visual identification. The goal of this study is to find the most appropriate cepstral features that can be used to classify bird sounds more accurately. Fifteen (15) endemic Bornean bird sounds have been selected and segmented using an automated energy-based algorithm. Three (3) types of cepstral features are extracted; linear prediction cepstrum coefficients (LPCC), mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GTCC), and used separately for classification purposes using support vector machine (SVM). Through comparison between their prediction results, it has been demonstrated that model utilising GTCC features, with 93.3% accuracy, outperforms models utilising MFCC and LPCC features. This demonstrates the robustness of GTCC for bird sounds classification. The result is significant for the advancement of bird sound classification research, which has been shown to have many applications such as in eco-tourism and wildlife management.


Author(s):  
Fadwa Abakarim ◽  
Abdenbi Abenaou

In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker’s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method.


Author(s):  
Muneera Altayeb ◽  
Amani Al-Ghraibah

<span>Determining and classifying pathological human sounds are still an interesting area of research in the field of speech processing. This paper explores different methods of voice features extraction, namely: Mel frequency cepstral coefficients (MFCCs), zero-crossing rate (ZCR) and discrete wavelet transform (DWT). A comparison is made between these methods in order to identify their ability in classifying any input sound as a normal or pathological voices using support vector machine (SVM). Firstly, the voice signal is processed and filtered, then vocal features are extracted using the proposed methods and finally six groups of features are used to classify the voice data as healthy, hyperkinetic dysphonia, hypokinetic dysphonia, or reflux laryngitis using separate classification processes. The classification results reach 100% accuracy using the MFCC and kurtosis feature group. While the other classification accuracies range between~60% to~97%. The Wavelet features provide very good classification results in comparison with other common voice features like MFCC and ZCR features. This paper aims to improve the diagnosis of voice disorders without the need for surgical interventions and endoscopic procedures which consumes time and burden the patients. Also, the comparison between the proposed feature extraction methods offers a good reference for further researches in the voice classification area.</span>


Author(s):  
Fawziya M. Rammo ◽  
Mohammed N. Al-Hamdani

Many languages identification (LID) systems rely on language models that use machine learning (ML) approaches, LID systems utilize rather long recording periods to achieve satisfactory accuracy. This study aims to extract enough information from short recording intervals in order to successfully classify the spoken languages under test. The classification process is based on frames of (2-18) seconds where most of the previous LID systems were based on much longer time frames (from 3 seconds to 2 minutes). This research defined and implemented many low-level features using MFCC (Mel-frequency cepstral coefficients), containing speech files in five languages (English. French, German, Italian, Spanish), from voxforge.org an open-source corpus that consists of user-submitted audio clips in various languages, is the source of data used in this paper. A CNN (convolutional Neural Networks) algorithm applied in this paper for classification and the result was perfect, binary language classification had an accuracy of 100%, and five languages classification with six languages had an accuracy of 99.8%.


2022 ◽  
Author(s):  
Mahbubeh Bahreini ◽  
Ramin Barati ◽  
Abbas Kamaly

Abstract Early diagnosis is crucial in the treatment of heart diseases. Researchers have applied a variety of techniques for cardiovascular disease diagnosis, including the detection of heart sounds. It is an efficient and affordable diagnosis technique. Body organs, including the heart, generate several sounds. These sounds are different in different individuals. A number of methodologies have been recently proposed to detect and diagnose normal/abnormal sounds generated by the heart. The present study proposes a technique on the basis of the Mel-frequency cepstral coefficients, fractal dimension, and hidden Markov model. It uses the fractal dimension to identify sounds S1 and S2. Then, the Mel-frequency cepstral coefficients and the first- and second-order difference Mel-frequency cepstral coefficients are employed to extract the features of the signals. The adaptive Hemming window length is a major advantage of the methodology. The S1-S2 interval determines the adaptive length. Heart sounds are divided into normal and abnormal through the improved hidden Markov model and Baum-Welch and Viterbi algorithms. The proposed framework is evaluated using a number of datasets under various scenarios.


PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262448
Author(s):  
Mohanad Alkhodari ◽  
Ahsan H. Khandoker

This study was sought to investigate the feasibility of using smartphone-based breathing sounds within a deep learning framework to discriminate between COVID-19, including asymptomatic, and healthy subjects. A total of 480 breathing sounds (240 shallow and 240 deep) were obtained from a publicly available database named Coswara. These sounds were recorded by 120 COVID-19 and 120 healthy subjects via a smartphone microphone through a website application. A deep learning framework was proposed herein that relies on hand-crafted features extracted from the original recordings and from the mel-frequency cepstral coefficients (MFCC) as well as deep-activated features learned by a combination of convolutional neural network and bi-directional long short-term memory units (CNN-BiLSTM). The statistical analysis of patient profiles has shown a significant difference (p-value: 0.041) for ischemic heart disease between COVID-19 and healthy subjects. The Analysis of the normal distribution of the combined MFCC values showed that COVID-19 subjects tended to have a distribution that is skewed more towards the right side of the zero mean (shallow: 0.59±1.74, deep: 0.65±4.35, p-value: <0.001). In addition, the proposed deep learning approach had an overall discrimination accuracy of 94.58% and 92.08% using shallow and deep recordings, respectively. Furthermore, it detected COVID-19 subjects successfully with a maximum sensitivity of 94.21%, specificity of 94.96%, and area under the receiver operating characteristic (AUROC) curves of 0.90. Among the 120 COVID-19 participants, asymptomatic subjects (18 subjects) were successfully detected with 100.00% accuracy using shallow recordings and 88.89% using deep recordings. This study paves the way towards utilizing smartphone-based breathing sounds for the purpose of COVID-19 detection. The observations found in this study were promising to suggest deep learning and smartphone-based breathing sounds as an effective pre-screening tool for COVID-19 alongside the current reverse-transcription polymerase chain reaction (RT-PCR) assay. It can be considered as an early, rapid, easily distributed, time-efficient, and almost no-cost diagnosis technique complying with social distancing restrictions during COVID-19 pandemic.


2022 ◽  
pp. 828-847
Author(s):  
Gaurav Aggarwal ◽  
Latika Singh

Classification of intellectually disabled children through manual assessment of speech at an early age is inconsistent, subjective, time-consuming and prone to error. This study attempts to classify the children with intellectual disabilities using two speech feature extraction techniques: Linear Predictive Coding (LPC) based cepstral parameters, and Mel-frequency cepstral coefficients (MFCC). Four different classification models: k-nearest neighbour (k-NN), support vector machine (SVM), linear discriminant analysis (LDA) and radial basis function neural network (RBFNN) are employed for classification purposes. 48 speech samples of each group are taken for analysis, from subjects with a similar age and socio-economic background. The effect of the different frame length with the number of filterbanks in the MFCC and different frame length with the order in the LPC is also examined for better accuracy. The experimental outcomes show that the projected technique can be used to help speech pathologists in estimating intellectual disability at early ages.


2022 ◽  
pp. 125-142
Author(s):  
Vijay Srinivas Srinivas Tida ◽  
Raghabendra Shah ◽  
Xiali Hei

The laser-based audio signal injection can be used for attacking voice controllable systems. An attacker can aim an amplitude-modulated light at the microphone's aperture, and the signal injection acts as a remote voice-command attack on voice-controllable systems. Attackers are using vulnerabilities to steal things that are in the form of physical devices or the form of virtual using making orders, withdrawal of money, etc. Therefore, detection of these signals is important because almost every device can be attacked using these amplitude-modulated laser signals. In this project, the authors use deep learning to detect the incoming signals as normal voice commands or laser-based audio signals. Mel frequency cepstral coefficients (MFCC) are derived from the audio signals to classify the input audio signals. If the audio signals are identified as laser signals, the voice command can be disabled, and an alert can be displayed to the victim. The maximum accuracy of the machine learning model was 100%, and in the real world, it's around 95%.


2021 ◽  
Vol 38 (6) ◽  
pp. 1793-1799
Author(s):  
Shivaprasad Satla ◽  
Sadanandam Manchala

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.


Author(s):  
Alaa Ehab Sakran ◽  
Mohsen Rashwan ◽  
Sherif Mahdy Abdou

In this paper, automatic segmentation system was built using the Kaldi toolkit at phoneme level for Quran verses data set with a total speech corpus of (80 hours) and its corresponding text corpus respectively, with a size of 1100 recorded Quran verses of 100 non-Arab reciters. Initiated with the extraction of Mel Frequency Cepstral Coefficients MFCCs, the proceedings of the building of Language Model LM and Acoustic Model AM training phase continued until the Deep Neural Network DNN level by selecting 770 waves (70 reciters). The testing of the system was done using 220 waves (20 reciters), and concluded with the selection of the development data set which was 280 waves (10 reciters). Comparison was implemented between automatic and manual segmentation, and the results obtained for the test set was 99% and for the Development set was 99% with Time Delay Neural Networks TDNN based acoustic modelling.


Sign in / Sign up

Export Citation Format

Share Document