Write, Attend and Spell
Text entry on a smartwatch is challenging due to its small form factor. Handwriting recognition using the built-in sensors of the watch (motion sensors, microphones, etc.) provides an efficient and natural solution to deal with this issue. However, prior works mainly focus on individual letter recognition rather than word recognition. Therefore, they need users to pause between adjacent letters for segmentation, which is counter-intuitive and significantly decreases the input speed. In this paper, we present 'Write, Attend and Spell' (WriteAS), a word-level text-entry system which enables free-style handwriting recognition using the motion signals of the smartwatch. First, we design a multimodal convolutional neural network (CNN) to abstract motion features across modalities. After that, a stacked dilated convolutional network with an encoder-decoder network is applied to get around letter segmentation and output words in an end-to-end way. More importantly, we leverage a multi-task sequence learning method to enable handwriting recognition in a streaming way. We construct the first sequence-to-sequence handwriting dataset using smartwatch. WriteAS can yield 9.3% character error rate (CER) on 250 words for new users and 3.8% CER for words unseen in the training set. In addition, WriteAS can handle various writing conditions very well. Given the promising performance, we envision that WriteAS can be a fast and accurate input tool for smartwatch.