Write, Attend and Spell

Author(s):  
Qian Zhang ◽  
Dong Wang ◽  
Run Zhao ◽  
Yinggang Yu ◽  
JiaZhen Jing

Text entry on a smartwatch is challenging due to its small form factor. Handwriting recognition using the built-in sensors of the watch (motion sensors, microphones, etc.) provides an efficient and natural solution to deal with this issue. However, prior works mainly focus on individual letter recognition rather than word recognition. Therefore, they need users to pause between adjacent letters for segmentation, which is counter-intuitive and significantly decreases the input speed. In this paper, we present 'Write, Attend and Spell' (WriteAS), a word-level text-entry system which enables free-style handwriting recognition using the motion signals of the smartwatch. First, we design a multimodal convolutional neural network (CNN) to abstract motion features across modalities. After that, a stacked dilated convolutional network with an encoder-decoder network is applied to get around letter segmentation and output words in an end-to-end way. More importantly, we leverage a multi-task sequence learning method to enable handwriting recognition in a streaming way. We construct the first sequence-to-sequence handwriting dataset using smartwatch. WriteAS can yield 9.3% character error rate (CER) on 250 words for new users and 3.8% CER for words unseen in the training set. In addition, WriteAS can handle various writing conditions very well. Given the promising performance, we envision that WriteAS can be a fast and accurate input tool for smartwatch.

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hui Jiang ◽  
Ping wang ◽  
Lei Peng ◽  
Xiaofeng Wang

In recent years, athlete action recognition has become an important research field for showing and recognition of athlete actions. Generally speaking, movement recognition of athletes can be performed through a variety of modes, such as motion sensors, machine vision, and big data analysis. Among them, machine vision and big data analysis usually contain significant information which can be used for various purposes. Machine vision can be expressed as the recognition of the time sequence of a series of athlete actions captured through camera, so that it can intervene in the training of athletes by visual methods and approaches. Big data contains a large number of athletes’ historical training and competition data which need exploration. In-depth analysis and feature mining of big data will help coach teams to develop training plans and devise new suggestions. On the basis of the above observations, this paper proposes a novel spatiotemporal attention map convolutional network to identify athletes’ actions, and through the auxiliary analysis of big data, gives reasonable action intervention suggestions, and provides coaches and decision-making teams to formulate scientific training programs. Results of the study show the effectiveness of the proposed research.


2003 ◽  
Vol 12 (06) ◽  
pp. 783-804 ◽  
Author(s):  
GERGELY TÍMÁR ◽  
KRISTÓF KARACS ◽  
CSABA REKECZKY

This report describes analogic algorithms used in the preprocessing and segmentation phase of offline handwriting recognition tasks. A segmentation-based handwriting recognition approach is discussed, i.e., the system attempts to segment the words into their constituent letters. In order to improve their speed, the utilized CNN algorithms, whenever possible, use dynamic, wave front propagation-based methods instead of relying on morphologic operators were embedded into iterative algorithms. The system first locates the handwritten lines in the page image, then corrects their skew as necessary. It then searches for the words within the lines and corrects the skew at the word level as well. A novel trigger wave-based word segmentation algorithm is presented, which operates on the skeletons of words. Sample results of experiments conducted on a database of 25 handwritten pages along with suggestions for future development are presented.


2022 ◽  
Vol 11 (1) ◽  
pp. 45
Author(s):  
Xuanming Fu ◽  
Zhengfeng Yang ◽  
Zhenbing Zeng ◽  
Yidan Zhang ◽  
Qianting Zhou

Deep learning techniques have been successfully applied in handwriting recognition. Oracle bone inscriptions (OBI) are the earliest hieroglyphs in China and valuable resources for studying the etymology of Chinese characters. OBI are of important historical and cultural value in China; thus, textual research surrounding the characters of OBI is a huge challenge for archaeologists. In this work, we built a dataset named OBI-100, which contains 100 classes of oracle bone inscriptions collected from two OBI dictionaries. The dataset includes more than 128,000 character samples related to the natural environment, humans, animals, plants, etc. In addition, we propose improved models based on three typical deep convolutional network structures to recognize the OBI-100 dataset. By modifying the parameters, adjusting the network structures, and adopting optimization strategies, we demonstrate experimentally that these models perform fairly well in OBI recognition. For the 100-category OBI classification task, the optimal model achieves an accuracy of 99.5%, which shows competitive performance compared with other state-of-the-art approaches. We hope that this work can provide a valuable tool for character recognition of OBI.


1995 ◽  
Vol 7 (6) ◽  
pp. 1289-1303 ◽  
Author(s):  
Yoshua Bengio ◽  
Yann LeCun ◽  
Craig Nohl ◽  
Chris Burges

We introduce a new approach for on-line recognition of handwritten words written in unconstrained mixed style. The preprocessor performs a word-level normalization by fitting a model of the word structure using the EM algorithm. Words are then coded into low resolution "annotated images" where each pixel contains information about trajectory direction and curvature. The recognizer is a convolution network that can be spatially replicated. From the network output, a hidden Markov model produces word scores. The entire system is globally trained to minimize word-level errors.


2021 ◽  
Vol 13 (8) ◽  
pp. 194
Author(s):  
Ibsa K. Jalata ◽  
Thanh-Dat Truong ◽  
Jessica L. Allen ◽  
Han-Seok Seo ◽  
Khoa Luu

Using optical motion capture and wearable sensors is a common way to analyze impaired movement in individuals with neurological and musculoskeletal disorders. However, using optical motion sensors and wearable sensors is expensive and often requires highly trained professionals to identify specific impairments. In this work, we proposed a graph convolutional neural network that mimics the intuition of physical therapists to identify patient-specific impairments based on video of a patient. In addition, two modeling approaches are compared: a graph convolutional network applied solely on skeleton input data and a graph convolutional network accompanied with a 1-dimensional convolutional neural network (1D-CNN). Experiments on the dataset showed that the proposed method not only improves the correlation of the predicted gait measure with the ground truth value (speed = 0.791, gait deviation index (GDI) = 0.792) but also enables faster training with fewer parameters. In conclusion, the proposed method shows that the possibility of using video-based data to treat neurological and musculoskeletal disorders with acceptable accuracy instead of depending on the expensive and labor-intensive optical motion capture systems.


Sign in / Sign up

Export Citation Format

Share Document