actual speech
Recently Published Documents





2022 ◽  
Vol 12 (2) ◽  
pp. 827
Ki-Seung Lee

Moderate performance in terms of intelligibility and naturalness can be obtained using previously established silent speech interface (SSI) methods. Nevertheless, a common problem associated with SSI has involved deficiencies in estimating the spectrum details, which results in synthesized speech signals that are rough, harsh, and unclear. In this study, harmonic enhancement (HE), was used during postprocessing to alleviate this problem by emphasizing the spectral fine structure of speech signals. To improve the subjective quality of synthesized speech, the difference between synthesized and actual speech was established by calculating the distance in the perceptual domains instead of using the conventional mean square error (MSE). Two deep neural networks (DNNs) were employed to separately estimate the speech spectra and the filter coefficients of HE, connected in a cascading manner. The DNNs were trained to incrementally and iteratively minimize both the MSE and the perceptual distance (PD). A feasibility test showed that the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility measure (STOI) were improved by 17.8 and 2.9%, respectively, compared with previous methods. Subjective listening tests revealed that the proposed method yielded perceptually preferred results compared with that of the conventional MSE-based method.

2022 ◽  
Vol 31 (1) ◽  
pp. 159-167
Yijun Wu ◽  
Yonghong Qin

Abstract In order to improve the efficiency of the English translation, machine translation is gradually and widely used. This study briefly introduces the neural network algorithm for speech recognition. Long short-term memory (LSTM), instead of traditional recurrent neural network (RNN), was used as the encoding algorithm for the encoder, and RNN as the decoding algorithm for the decoder. Then, simulation experiments were carried out on the machine translation algorithm, and it was compared with two other machine translation algorithms. The results showed that the back-propagation (BP) neural network had a lower word error rate and spent less recognition time than artificial recognition in recognizing the speech; the LSTM–RNN algorithm had a lower word error rate than BP–RNN and RNN–RNN algorithms in recognizing the test samples. In the actual speech translation test, as the length of speech increased, the LSTM–RNN algorithm had the least changes in the translation score and word error rate, and it had the highest translation score and the lowest word error rate under the same speech length.

2021 ◽  
Vol 32 (4) ◽  
pp. 295-304
Olga V. Knorz ◽  

The semantic and pragmatic potential of statements of refusal to communicate allows shedding light on the peculiarities of the phenomenon of silence, identifying its varieties, the specifics of manifestation in the Russian linguistic picture of the world and analyzing the functionality as part of a literary text. The concept of “refusal to communicate” is a broader phenomenon than the actual speech genre of refusal, since refusal as such implies a negative reaction only to initial motivational responses, while refusal of communication can become both a reaction to any statement or not be reactive at all, that is, to be the initial remark in the dialogue. The peculiarity of statements with the semantics of silence is manifested in the fact that their illocutionary goal is the impossibility or unwillingness to continue communication and is achieved using various linguistic means, first of all, lexemes denoting the speaking process, as well as modal modifiers expressing the reason for such speech behavior. The differences between speech genres, which are based on rejection, lie in the very object of rejection, in what the speaker rejects. In the Russian linguistic picture of the world, silence is characterized by the fact that it is a communicative action, it consists not in the absence of speech, but in the transmission of information in a non-verbal way. In this case, it is called communicatively meaningful silence. The analysis of the lexical structure of E. Vodolazkin’s novel “Laure” made it possible to identify important fragments of the text associated with silence, to obtain information about the author’s worldview and his attitude to the phenomenon of silence.

2021 ◽  
Vol 1 (1) ◽  
pp. 34-39
Abdu Rahim III Kenoh

Despite being competent in the English language, pre-service teachers struggle a lot when it comes to public speaking. Their ability to deliver and speak competently is hindered by speaking anxiety. The purpose of this study is to determine the causes of speaking anxiety among pre-service teachers and identify how pre-service teachers cope up with speaking anxiety. This study was administered to 7 pre-service teachers using a convenience sampling technique from a reputable public state university in Southern Philippines. The findings showed that speaking anxiety among pre-service teachers is caused by the fear of committing mistakes, having high expectations from the audience, nervousness, and lack of preparation. Additionally, the techniques used by pre-service teachers to cope up with speaking anxiety include preparing an outline, practicing before the actual speech, and boosting one’s self-confidence. Research revealed that speaking anxiety can be eased by employing techniques such as exposure to speaking engagements, preparation, and believing in oneself.

Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2420
Lukáš Beňo ◽  
Rudolf Pribiš ◽  
Peter Drahoš

Containerization has been mainly used in pure software solutions, but it is gradually finding its way into the industrial systems. This paper introduces the edge container with artificial intelligence for speech recognition, which performs the voice control function of the actuator as a part of the Human Machine Interface (HMI). This work proposes a procedure for creating voice-controlled applications with modern hardware and software resources. The created architecture integrates well-known digital technologies such as containerization, cloud, edge computing and a commercial voice processing tool. This methodology and architecture enable the actual speech recognition and the voice control on the edge device in the local network, rather than in the cloud, like the majority of recent solutions. The Linux containers are designed to run without any additional configuration and setup by the end user. A simple adaptation of voice commands via configuration file may be considered as an additional contribution of the work. The architecture was verified by experiments with running containers on different devices, such as PC, Tinker Board 2, Raspberry Pi 3 and 4. The proposed solution and the practical experiment show how a voice-controlled system can be created, easily managed and distributed to many devices around the world in a few seconds. All this can be achieved by simple downloading and running two types of ready-made containers without any complex installations. The result of this work is a proven stable (network-independent) solution with data protection and low latency.

2021 ◽  
Vol 16 (7) ◽  
pp. 159-167
E. I. Galyashina ◽  
V. D. Nikishin

The paper discusses some of the features of administrative cases on the recognition of information materials posted on the Internet as extremist. An analysis of judicial practice in cases of recognition of information materials as extremist (Article 265.8 of the Administrative Procedure Code) highlighted their specifics and problematic aspects associated with expert opinions used to substantiate administrative claims. Presumably, extremist materials are detected by law enforcement agencies during the monitoring of social networks and other Internet resources and are sent for linguistic expertise. If a linguistic expert reveals any signs of extremism, the prosecutor issues a legal opinion and in the interests of the Russian Federation and an indefinite circle of persons applies to a federal court with an administrative claim to recognize information posted on Internet sites as extremist material, i.e. information, the distribution of which is prohibited in the Russian Federation. The paper concludes that to substantiate the arguments of administrative claims, the conclusions of linguistic experts are used, the quality of which determines the validity of the court decisions taken. As the main reason for expert errors, the authors cite the ambiguity of the interpretation of the concept of “extremist materials”, which entails a mixture of information calling for committing an extremist action or justifying or substantiating it, and the actual speech action of calling or justifying or justifying. It seems necessary to change the existing expert approach towards the development of a unified criterion for determining diagnostic complexes of signs necessary and sufficient to substantiate the extremist essence of information materials, taking into account the duality of their legal and linguistic assessment.

Litera ◽  
2021 ◽  
pp. 33-41
Ol'ga Viktorovna Murzina ◽  
Anastasiya Gennad'evna Gotovtseva

The subject of this research is the transformation of classical ancient rhetoric in modern media, namely of such a mandatory part of the classical presentation of speech according to Marcus Fabius Quintilianus, as rebuttal to an opponent's argument. The article employs posts of the authors of various blogs on the entertainment portals and their interaction with users’ commentaries. Response to an objection was an important element of the canon of presentation of speech in antiquity: by doing so, the speaker demonstrated a confident command of the topic, and at the same time, that the topic is objectionable and requires argumentation. The reduction of competitive eloquence turned this part of the canon into a ritual weakened its ties with the actual speech practice. In modernity, we can observe the return of rebuttal to an opponent's arguments as an independent genre – the author in his publications counts on rebuttal and prepares in advance. The novelty of this research consists in the fact that the Neo-Quintilian paradigm of modern youth media is analyzed for the first time. The main conclusions lies in revelation of transformation and deformation of the classical canon: being the so-called cultural constant, the classical canon of ancient rhetoric is conveyed to modern users through interaction with accepted patterns. The modern young audience perceives the canon indirectly, through approved or criticized examples of eloquence, eliciting rfagments thereof – thus, the semantic connection of argument and rebuttal is one of the versions of the deformed, but recognizable canon.

2020 ◽  
Vol 51 (4-5) ◽  
pp. 457-486
Jacob P.B. Mortensen

Abstract This article examines Judith’s prayer in chapter 9 of the book of Judith from the perspective of the guidelines on speech-in-character found in Aelius Theon’s Progymnasmata (mid/end of the first century CE). According to the guidelines, it is important for an author of prose to achieve correspondence between the literary persona and the actual speech-in-character. This article examines the extent to which Judith’s prayer in chapter 9 observes Theon’s guidelines, as well as the theological implications of this.

2020 ◽  
Vol 10 (3) ◽  
pp. 148
Franziska Stephan ◽  
Henrik Saalbach ◽  
Sonja Rossi

Speech production not only relies on spoken (overt speech) but also on silent output (inner speech). Little is known about whether inner and overt speech are processed differently and which neural mechanisms are involved. By simultaneously applying electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), we tried to disentangle executive control from motor and linguistic processes. A preparation phase was introduced additionally to the examination of overt and inner speech directly during naming (i.e., speech execution). Participants completed a picture-naming paradigm in which the pure preparation phase of a subsequent speech production and the actual speech execution phase could be differentiated. fNIRS results revealed a larger activation for overt rather than inner speech at bilateral prefrontal to parietal regions during the preparation and at bilateral temporal regions during the execution phase. EEG results showed a larger negativity for inner compared to overt speech between 200 and 500 ms during the preparation phase and between 300 and 500 ms during the execution phase. Findings of the preparation phase indicated that differences between inner and overt speech are not exclusively driven by specific linguistic and motor processes but also impacted by inhibitory mechanisms. Results of the execution phase suggest that inhibitory processes operate during phonological code retrieval and encoding.

2020 ◽  
Vol 32 (2) ◽  
pp. 226-240 ◽  
Benedikt Zoefel ◽  
Isobella Allard ◽  
Megha Anil ◽  
Matthew H. Davis

Several recent studies have used transcranial alternating current stimulation (tACS) to demonstrate a causal role of neural oscillatory activity in speech processing. In particular, it has been shown that the ability to understand speech in a multi-speaker scenario or background noise depends on the timing of speech presentation relative to simultaneously applied tACS. However, it is possible that tACS did not change actual speech perception but rather auditory stream segregation. In this study, we tested whether the phase relation between tACS and the rhythm of degraded words, presented in silence, modulates word report accuracy. We found strong evidence for a tACS-induced modulation of speech perception, but only if the stimulation was applied bilaterally using ring electrodes (not for unilateral left hemisphere stimulation with square electrodes). These results were only obtained when data were analyzed using a statistical approach that was identified as optimal in a previous simulation study. The effect was driven by a phasic disruption of word report scores. Our results suggest a causal role of neural entrainment for speech perception and emphasize the importance of optimizing stimulation protocols and statistical approaches for brain stimulation research.

Sign in / Sign up

Export Citation Format

Share Document