Speech bandwidth classification for broadcast news domain using artificial neural network and Gaussian mixture models

Author(s):  
Marko Kos ◽  
Matej Grasic ◽  
Bojan Kotnik ◽  
Zdravko Kacic
Author(s):  
Abrham Debasu Mengistu ◽  
Dagnachew Melesew Alemayehu

<span style="color: #666666; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11.2px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff; display: inline !important; float: none;">Dialect is a difference of verbal communication spoken by people from a particular society or geographic area so the paper focuses on Amharic language dialect recognition. In this paper,  the authors have used backpropagation artificial neural network, VQ(vector quantization), (Gaussian Mixture Models) and a combination of GMM and backpropagation artificial neural network for classifying dialects of Amharic language speakers. In this research, a total of 100 speakers for each group of dialects are considered each having about 10 seconds duration is collected. The feature vectors of Mel frequency cepstral coefficients (MFCC) had been used to recognize the dialects of speakers. In this research paper the recognition model that uses a tanh activation function have a better result instead of using the Logistic Sigmoid activation function in backpropagation artificial neural network. After conducting the above experiments 95.7% accuracy achieved when GMM and backpropagation artificial neural network with tanh activation function are combined.</span>


In this paper, we show an image processing algorithm with its capabilities in detecting the corrosion. This algorithm is programmed and requires no parameter modification and no previous knowledge of image acquisition process because function evaluates their parameters. Digital image processing technique proposed to avoid such incident occurrences. Combining Poisson-Gaussian- Mixture distribution with a Fuzzy segmentation framework an algorithm is developed to clutch image information. Artificial neural network and gray level co-occurrence matrix (GLCM) utilized to recognize the corrosion. The developed algorithm can be used in the ROV to detect the corrosion spots. The algorithm results exhibit the sufficiency in perceives corroded spots. Using image processing the corrosion detection process can be automated with a monitoring software setup which can generate an alert based on corrosion severity. Using image processing the infrastructure’s corrosion evaluation effort will be minimized, and presenting the result statistics is easier. In application point of view, we can extend the algorithm capabilities to the fatigue crack detection.


Author(s):  
Gizachew Belayneh Gebre Et. al.

In this artificial intelligence time, speaker recognition is the most useful biometric recognition technique. Security is a big issue that needs careful attention because of every activities have been becoming automated and internet based. For security purpose, unique features of authorized user are highly needed. Voice is one of the wonderful unique biometric features. So, developing speaker recognition based on scientific research is the most concerned issue. Nowadays, criminal activities are increasing day to day in different clever way. So, every country should have strengthen forensic investigation using such technologies. The study was done by inspiration of contextualizing this concept for our country. In this study, text-independent Amharic language speaker recognition model was developed using Mel-Frequency Cepstral Coefficients to extract features from preprocessed speech signals and Artificial Neural Network to model the feature vector obtained from the Mel-Frequency Cepstral Coefficients and to classify objects while testing. The researcher used 20 sampled speeches of 10 each speaker (total of 200 speech samples) for training and testing separately. By setting the number of hidden neurons to 15, 20, and 25, three different models have been developed and evaluated for accuracy. The fourth-generation high-level programming language and interactive environment MATLAB is used to conduct the overall study implementations. At the end, very promising findings have been obtained. The study achieved better performance than other related researches which used Vector Quantization and Gaussian Mixture Model modelling techniques. Implementable result could obtain for the future by increasing number of speakers and speech samples and including the four Amharic accents.


Sign in / Sign up

Export Citation Format

Share Document