scholarly journals WAVELET DETAIL COEFFICIENT AS A NOVEL WAVELET-MFCC FEATURES IN TEXT-DEPENDENT SPEAKER RECOGNITION SYSTEM

2022 ◽  
Vol 23 (1) ◽  
pp. 68-81
Author(s):  
Syahroni Hidayat ◽  
Muhammad Tajuddin ◽  
Siti Agrippina Alodia Yusuf ◽  
Jihadil Qudsi ◽  
Nenet Natasudian Jaya

Speaker recognition is the process of recognizing a speaker from his speech. This can be used in many aspects of life, such as taking access remotely to a personal device, securing access to voice control, and doing a forensic investigation. In speaker recognition, extracting features from the speech is the most critical process. The features are used to represent the speech as unique features to distinguish speech samples from one another. In this research, we proposed the use of a combination of Wavelet and Mel Frequency Cepstral Coefficient (MFCC), Wavelet-MFCC, as feature extraction methods, and Hidden Markov Model (HMM) as classification. The speech signal is first extracted using Wavelet into one level of decomposition, then only the sub-band detail coefficient is used as the feature for further extraction using MFCC. The modeled system was applied in 300 speech datasets of 30 speakers uttering “HADIR” in the Indonesian language. K-fold cross-validation is implemented with five folds. As much as 80% of the data were trained for each fold, while the rest was used as testing data. Based on the testing, the system's accuracy using the combination of Wavelet-MFCC obtained is 96.67%. ABSTRAK: Pengecaman penutur adalah proses mengenali penutur dari ucapannya yang dapat digunakan dalam banyak aspek kehidupan, seperti mengambil akses dari jauh ke peranti peribadi, mendapat kawalan ke atas akses suara, dan melakukan penyelidikan forensik. Ciri-ciri khas dari ucapan merupakan proses paling kritikal dalam pengecaman penutur. Ciri-ciri ini digunakan bagi mengenali ciri unik yang terdapat pada sesebuah ucapan dalam membezakan satu sama lain. Penyelidikan ini mencadangkan penggunaan kombinasi Wavelet dan Mel Frekuensi Pekali Cepstral (MFCC), Wavelet-MFCC, sebagai kaedah ekstrak ciri-ciri penutur, dan Model Markov Tersembunyi (HMM) sebagai pengelasan. Isyarat penuturan pada awalnya diekstrak menggunakan Wavelet menjadi satu tahap penguraian, kemudian hanya pekali perincian sub-jalur digunakan bagi pengekstrakan ciri-ciri berikutnya menggunakan MFCC. Model ini diterapkan kepada 300 kumpulan data ucapan daripada 30 penutur yang mengucapkan kata "HADIR" dalam bahasa Indonesia. Pengesahan silang K-lipat dilaksanakan dengan 5 lipatan. Sebanyak 80% data telah dilatih bagi setiap lipatan, sementara selebihnya digunakan sebagai data ujian. Berdasarkan ujian ini, ketepatan sistem yang menggunakan kombinasi Wavelet-MFCC memperolehi 96.67%.

Author(s):  
Gopal Chaudhary ◽  
Smriti Srivastava ◽  
Saurabh Bhardwaj

This paper presents main paradigms of research for feature extraction methods to further augment the state of art in speaker recognition (SR) which has been recognized extensively in person identification for security and protection applications. Speaker recognition system (SRS) has become a widely researched topic for the last many decades. The basic concept of feature extraction methods is derived from the biological model of human auditory/vocal tract system. This work provides a classification-oriented review of feature extraction methods for SR over the last 55 years that are proven to be successful and have become the new stone to further research. Broadly, the review work is dichotomized into feature extraction methods with and without noise compensation techniques. Feature extraction methods without noise compensation techniques are divided into following categories: On the basis of high/low level of feature extraction; type of transform; speech production/auditory system; type of feature extraction technique; time variability; speech processing techniques. Further, feature extraction methods with noise compensation techniques are classified into noise-screened features, feature normalization methods, feature compensation methods. This classification-oriented review would endow the clear vision of readers to choose among different techniques and will be helpful in future research in this field.


2020 ◽  
Vol 16 ◽  

Different currencies are being processed in money exchange shops and banks around the globe on a daily basis, where money exchange and transfer takes place. Identifying different currency is a difficult task and can lead to financial loss. There are approximately 180 currencies being used around the world, and each of them differ in color, size and texture. Thus, to correctly identify different currencies, a currency recognition systems needs to be designed. In this paper, we propose the design of an AlexNet based currency recognition system to recognize different international currency notes. We use 10-fold Cross Validation to obtain the cross-validation results of the AlexNet model. The features for the Alex model is extracted from the images back and front of each currency note. We also explore and implement deep learning models to compare the performance of the AlexNet model.


Bioimpacts ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 101-109
Author(s):  
Faegheh Golabi ◽  
Elnaz Mehdizadeh Aghdam ◽  
Mousa Shamsi ◽  
Mohammad Hossein Sedaaghi ◽  
Abolfazl Barzegar ◽  
...  

Introduction: Riboswitches are short regulatory elements generally found in the untranslated regions of prokaryotes’ mRNAs and classified into several families. Due to the binding possibility between riboswitches and antibiotics, their usage as engineered regulatory elements and also their evolutionary contribution, the need for bioinformatics tools of riboswitch detection is increasing. We have previously introduced an alignment independent algorithm for the identification of frequent sequential blocks in the families of riboswitches. Herein, we report the application of block location-based feature extraction strategy (BLBFE), which uses the locations of detected blocks on riboswitch sequences as features for classification of seed sequences. Besides, mono- and dinucleotide frequencies, k-mer, DAC, DCC, DACC, PC-PseDNC-General and SC-PseDNC-General methods as some feature extraction strategies were investigated. Methods: The classifiers of the Decision tree, KNN, LDA, and Naïve Bayes, as well as k-fold cross-validation, were employed for all methods of feature extraction to compare their performances based on the criteria of accuracy, sensitivity, specificity, and f-score performance measures. Results: The outcome of the study showed that the BLBFE strategy classified the riboswitches indicating 87.65% average correct classification rate (CCR). Moreover, the performance of the proposed feature extraction method was confirmed with average values of 94.31%, 85.01%, 95.45% and 85.38% for accuracy, sensitivity, specificity, and f-score, respectively. Conclusion: Our result approved the performance of the BLBFE strategy in the classification and discrimination of the riboswitch groups showing remarkable higher values of CCR, accuracy, sensitivity, specificity and f-score relative to previously studied feature extraction methods.


2021 ◽  
Vol 11 (11) ◽  
pp. 661
Author(s):  
Marah Alhalabi ◽  
Mohammed Ghazal ◽  
Fasila Haneefa ◽  
Jawad Yousaf ◽  
Ayman El-Baz

Resolving circuit diagrams is a regular part of learning for school and university students from engineering backgrounds. Simulating circuits is usually done manually by creating circuit diagrams on circuit tools, which is a time-consuming and tedious process. We propose an innovative method of simulating circuits from hand-drawn diagrams using smartphones through an image recognition system. This method allows students to use their smartphones to capture images instead of creating circuit diagrams before simulation. Our contribution lies in building a circuit recognition system using a deep learning capsule networks algorithm. The developed system receives an image captured by a smartphone that undergoes preprocessing, region proposal, classification, and node detection to get a Netlist and exports it to a circuit simulator program for simulation. We aim to improve engineering education using smartphones by (1) achieving higher accuracy using less training data with capsule networks and (2) developing a comprehensive system that captures hand-drawn circuit diagrams and produces circuit simulation results. We use 400 samples per class and report an accuracy of 96% for stratified 5-fold cross-validation. Through testing, we identify the optimum distance for taking circuit images to be 10 to 20 cm. Our proposed model can identify components of different scales and rotations.


2013 ◽  
Vol 61 (2) ◽  
Author(s):  
Nurul Ashikin Abdul Kadir ◽  
Rubita Sudirman ◽  
Nasrul Humaimi Mahmood ◽  
Abdul Hamid Ahmad

The study of Malaysian Arabic phoneme is rarely found which make the references to the work is difficult. Specific guideline on Malaysian subject is not found even though a lot of acoustic and phonetics research has been done on other languages such as English, French and Chinese. In this paper, we monitored and analyzed the performance of cascade-forward (CF) networks on our phoneme recognition system of Standard Arabic (SA). This study focused on Malaysian children as test subjects. It is focused on four chosen phonemes from SA, which composed of nasal, lateral and trill behaviors, i.e. tabulated at four different articulation places. Cascade neural networks are chosen as it provide less time for samples processing. The method, k–fold cross validation to evaluate each network architecture in k times to improve the reliability of the choice of the optimal architecture. Based on this method, namely 10–fold cross validation, the most suitable cascade–layer network architecture in first hidden layer and second hidden layer is 40 and 10 nodes respectively with MSE 0.0402. The training and testing recognition rates achieved were 94% and 93% respectively.


2020 ◽  
Vol 4 (1) ◽  
pp. 45-51
Author(s):  
Ari Peryanto ◽  
Anton Yudhana ◽  
Rusydi Umar

Image classification is a fairly easy task for humans, but for machines it is something that is very complex and is a major problem in the field of Computer Vision which has long been sought for a solution. There are many algorithms used for image classification, one of which is Convolutional Neural Network, which is the development of Multi Layer Perceptron (MLP) and is one of the algorithms of Deep Learning. This method has the most significant results in image recognition, because this method tries to imitate the image recognition system in the human visual cortex, so it has the ability to process image information. In this research the implementation of this method is done by using the Keras library with the Python programming language. The results showed the percentage of accuracy with K = 5 cross-validation obtained the highest level of accuracy of 80.36% and the highest average accuracy of 76.49%, and system accuracy of 72.02%. For the lowest accuracy obtained in the 4th and 5th testing with an accuracy value of 66.07%. The system that has been made has also been able to predict with the highest average prediction of 60.31%, and the highest prediction value of 65.47%.


Author(s):  
Chao Hu ◽  
Byeng D. Youn ◽  
Pingfeng Wang

The traditional data-driven prognostic approach is to construct multiple candidate algorithms using a training data set, evaluate their respective performance using a testing data set, and select the one with the best performance while discarding all the others. This approach has three shortcomings: (i) the selected standalone algorithm may not be robust, i.e., it may be less accurate when the real data acquired after the deployment differs from the testing data; (ii) it wastes the resources for constructing the algorithms that are discarded in the deployment; (iii) it requires the testing data in addition to the training data, which increases the overall expenses for the algorithm selection. To overcome these drawbacks, this paper proposes an ensemble data-driven prognostic approach which combines multiple member algorithms with a weighted-sum formulation. Three weighting schemes, namely, the accuracy-based weighting, diversity-based weighting and optimization-based weighting, are proposed to determine the weights of member algorithms for data-driven prognostics. The k-fold cross validation (CV) is employed to estimate the prediction error required by the weighting schemes. Two case studies were employed to demonstrate the effectiveness of the proposed prognostic approach. The results suggest that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole algorithm and that the optimization-based weighting scheme gives the best overall performance among the three weighting schemes.


Author(s):  
B.V. Dhandra ◽  
Gururaj Mukarambi ◽  
Mallikarjun Hangarge

In this paper, a zone based features are extracted from handwritten Kannada Vowels and English uppercase Character images for their recognition. A Total of 4,000 handwritten Kannada and English sample images are collected for classifications. The collected images are normalized into 32 x 32 dimensions. Then the normalized images are divided into 64 zones and their pixel densities are calculated, generating a total of 64 features. These 64 features are submitted to KNN and SVM classifiers with 2 fold cross validation for recognition of the said characters. The proposed algorithm works for individual Kannada vowels, English uppercase alphabets and mixture of both the characters. The recognition accuracy of 92.71% for KNN and 96.00% for SVM classifiers are achieved in case of handwritten Kannada vowels and 97.51% for KNN and 98.26% for SVM classifiers are obtained in case of handwritten English uppercase alphabets. Further, the recognition accuracy of 95.77% and 97.03% is obtained for mixed characters (i.e. Kannada Vowels and English uppercase alphabets). Hence, the proposed algorithm is efficient for the said characters recognition. The proposed algorithm is independent of thinning and slant of the characters and is the novelty of the proposed work.


2021 ◽  
Vol 8 (1) ◽  
pp. 119
Author(s):  
Syahroni Hidayat ◽  
Andi Sofyan Anas ◽  
Siti Agrippina Alodia Yusuf ◽  
Muhammad Tajuddin

<p class="Abstrak">Penelitian pengolahan sinyal digital yang berfokus pada pengenalan pembicara telah dimulai sejak beberapa dekade yang lalu, dan telah menghasilkan banyak metode-metode pengenalan pembicara. Di antara algoritma pembentukan koefisien ciri yang telah dikembangkan tersebut, ada dua algoritma yang dapat memberikan akurasi yang tinggi jika diterapkan pada sistem, yaitu <em>Mel Frequency Cepstral Coefficient</em> (MFCC) dan <em>Wavelet</em>. Penelitian ini bertujuan untuk menguji dan memilih kanal terbaik dari proses <em>wavelet</em>-MFCC yang dapat dijadikan sebagai koefisien ciri baru untuk diterapkan pada sistem pengenal pembicara. Koefisien ciri baru tersebut kemudian disebut dengan koefisien ciri <em>Wavelet</em>-MFCC. Kofisien ini dibentuk dari merubah kanal hasil dekomposisi <em>wavelet</em>, yaitu kanal aproksimasi (cA), kanal detail (cD), dan penggabungannya (cAcD), menjadi koefisien MFCC. Metode dekomposisi <em>wavelet</em> yang digunakan adalah metode <em>dyadic</em> dengan menerapkan <em>level</em> dekomposisi <em>level</em> 1 dan <em>level</em> 2. Setiap koefisien ciri kemudian menjadi inputan pada sistem pengklasifikasi <em>Hidden Markov Models</em> (HMM). Keluaran dari HMM kemudian dihitung akurasinya dan dianalisis. Dari pengujian yang dilakukan, diperoleh bahwa kanal detail (cD) sebagai ciri dapat memberikan akurasi yang sama dengan menggunakan kanal gabungan (cAcD) dan lebih tinggi dari kanal aproksimasi (cA), dengan akurasi sebesar 95%. Hal ini menunjukkan bahwa, kanal detail pada dekomposisi <em>level</em> 1 menyimpan ciri suara dari setiap pembicara sehingga sudah cukup untuk dijadikan sebagai koefisien ciri. Maka, penggunaan dekomposisi <em>level</em> 1 dan kanal detail cD sebagai ciri <em>Wavelet</em>-<em>MFCC</em> pada sistem pengenalan pembicara dapat meringankan dan mempercepat proses komputasi.</p><p class="Abstrak"> </p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstract"><em>Research in digital signal that focused on speaker recognition has begun since decades ago, and has resulted many speaker recognition methods. there are two algorithms that can provide high accuracy in recognition system, which are Mel Frequency Cepstral Coefficient (MFCC) and Wavelet. the aims of this study is to examine and chose the best channel from wavelet-MFCC process that can be used as new feature coefficient, then called as Wavelet-MFCC features coefficient. The coefficient is built by converting the wavelet decomposition channels, which are approximation (cA), detail (cD), and its combination (cAcD), into the MFCC coefficient. Wavelet dyadic decomposition with level 1 and level 2 of decomposition is applied. Each feature coefficient acts as an input to the HMM classifier. The accuracy of the HMM output is calculated, then analyzed. The obtained results show that the detail chanel (cD) achieve equal accuracy as the combination chanel (cAcD), and higher accuracy compared to aproximation channel (cA), with accuracy 95%. Thus, it can be conclude that the detail channel on level 1 decomposition contains features of each speaker's. Then, cD is enough to be used as a Wavelet-MFCC feature. Thus, its implementation in the SRS can ease and speed up the computing process.</em></p><p class="Abstrak"><em><strong><br /></strong></em></p>


Sign in / Sign up

Export Citation Format

Share Document