g2p conversion
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 3)

H-INDEX

3
(FIVE YEARS 1)

2021 ◽  
Vol 11 (21) ◽  
pp. 10475
Author(s):  
Xiao Zhou ◽  
Zhenhua Ling ◽  
Yajun Hu ◽  
Lirong Dai

An encoder–decoder with attention has become a popular method to achieve sequence-to-sequence (Seq2Seq) acoustic modeling for speech synthesis. To improve the robustness of the attention mechanism, methods utilizing the monotonic alignment between phone sequences and acoustic feature sequences have been proposed, such as stepwise monotonic attention (SMA). However, the phone sequences derived by grapheme-to-phoneme (G2P) conversion may not contain the pauses at the phrase boundaries in utterances, which challenges the assumption of strictly stepwise alignment in SMA. Therefore, this paper proposes to insert hidden states into phone sequences to deal with the situation that pauses are not provided explicitly, and designs a semi-stepwise monotonic attention (SSMA) to model these inserted hidden states. In this method, hidden states are introduced that absorb the pause segments in utterances in an unsupervised way. Thus, the attention at each decoding frame has three options, moving forward to the next phone, staying at the same phone, or jumping to a hidden state. Experimental results show that SSMA can achieve better naturalness of synthetic speech than SMA when phrase boundaries are not available. Moreover, the pause positions derived from the alignment paths of SSMA matched the manually labeled phrase boundaries quite well.


2019 ◽  
Vol 9 (6) ◽  
pp. 1143 ◽  
Author(s):  
Sevinj Yolchuyeva ◽  
Géza Németh ◽  
Bálint Gyires-Tóth

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections and, furthermore, a model that utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.


2018 ◽  
Author(s):  
Thilini Nadungodage ◽  
Chamila Liyanage ◽  
Amathri Prerera ◽  
Randil Pushpananda ◽  
Ruvan Weerasinghe

2017 ◽  
Vol 24 (4) ◽  
pp. 475-479
Author(s):  
Marzieh Razavi ◽  
Mathew Magimai-Doss
Keyword(s):  

2015 ◽  
Vol 22 (6) ◽  
pp. 907-938 ◽  
Author(s):  
JOSEF ROBERT NOVAK ◽  
NOBUAKI MINEMATSU ◽  
KEIKICHI HIROSE

AbstractThis paper provides an analysis of several practical issues related to the theory and implementation of Grapheme-to-Phoneme (G2P) conversion systems utilizing the Weighted Finite-State Transducer paradigm. The paper addresses issues related to system accuracy, training time and practical implementation. The focus is on joint n-gram models which have proven to provide an excellent trade-off between system accuracy and training complexity. The paper argues in favor of simple, productive approaches to G2P, which favor a balance between training time, accuracy and model complexity. The paper also introduces the first instance of using joint sequence RnnLMs directly for G2P conversion, and achieves new state-of-the-art performance via ensemble methods combining RnnLMs and n-gram based models. In addition to detailed descriptions of the approach, minor yet novel implementation solutions, and experimental results, the paper introduces Phonetisaurus, a fully-functional, flexible, open-source, BSD-licensed G2P conversion toolkit, which leverages the OpenFst library. The work is intended to be accessible to a broad range of readers.


2013 ◽  
Author(s):  
Josef R. Novak ◽  
Nobuaki Minematsu ◽  
Keikichi Hirose
Keyword(s):  
N Gram ◽  

Sign in / Sign up

Export Citation Format

Share Document