Applying the Bell’s Test to Chinese Texts

Igor A. Bessmertny; Xiaoxi Huang; Aleksei V. Platonov; Chuqiao Yu; Julia A. Koroleva

doi:10.3390/e22030275

Applying the Bell’s Test to Chinese Texts

Entropy ◽

10.3390/e22030275 ◽

2020 ◽

Vol 22 (3) ◽

pp. 275

Author(s):

Igor A. Bessmertny ◽

Xiaoxi Huang ◽

Aleksei V. Platonov ◽

Chuqiao Yu ◽

Julia A. Koroleva

Keyword(s):

Quantum Entanglement ◽

Chinese Text ◽

Search Engines ◽

Text Processing ◽

Word Segmentation ◽

Significant Problem ◽

Text Segmentation ◽

Text Documents ◽

Segmentation Algorithms ◽

Chinese Texts

Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell’s test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.

Download Full-text

Linguistic Annotation of Translated Chinese Texts: Coordinating Theory, Algorithms and Data

Journal of Linguistics/Jazykovedný casopis ◽

10.2478/jazcas-2021-0054 ◽

2021 ◽

Vol 72 (2) ◽

pp. 590-602

Author(s):

Kirill I. Semenov ◽

Armine K. Titizian ◽

Aleksandra O. Piskunova ◽

Yulia O. Korotkova ◽

Alena D. Tsvetkova ◽

...

Keyword(s):

Chinese Text ◽

Text Processing ◽

Word Segmentation ◽

The Other ◽

Pos Tagging ◽

Theoretical Comparison ◽

Linguistic Annotation ◽

Corpus Data ◽

Chinese Texts ◽

The One

Abstract The article tackles the problems of linguistic annotation in the Chinese texts presented in the Ruzhcorp – Russian-Chinese Parallel Corpus of RNC, and the ways to solve them. Particular attention is paid to the processing of Russian loanwords. On the one hand, we present the theoretical comparison of the widespread standards of Chinese text processing. On the other hand, we describe our experiments in three fields: word segmentation, grapheme-to-phoneme conversion, and PoS-tagging, on the specific corpus data that contains many transliterations and loanwords. As a result, we propose the preprocessing pipeline of the Chinese texts, that will be implemented in Ruzhcorp.

Download Full-text

A new statistical formula for Chinese text segmentation incorporating contextual information

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '99 ◽

10.1145/312624.312659 ◽

1999 ◽

Cited By ~ 9

Author(s):

Yubin Dai ◽

Teck Ee Loh ◽

Christopher S. G. Khoo

Keyword(s):

Chinese Text ◽

Contextual Information ◽

Text Segmentation ◽

Statistical Formula

Download Full-text

Line and Word Segmentation of handwritten text documents written in Gurmukhi Script using mid point detection technique

2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS) ◽

10.1109/raecs.2015.7453388 ◽

2015 ◽

Cited By ~ 4

Author(s):

Payal Jindal ◽

Balkrishan Jindal

Keyword(s):

Word Segmentation ◽

Detection Technique ◽

Text Documents ◽

Handwritten Text ◽

Point Detection ◽

Gurmukhi Script

Download Full-text

Exploiting unlabeled internal data in conditional random fields to reduce word segmentation errors for Chinese texts

10.21437/interspeech.2007-542 ◽

2007 ◽

Author(s):

Richard Tzong-Han Tsai ◽

Hsi-Chuan Hung ◽

Hong-Jie Dai ◽

Wen-Lian Hsu

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

Word Segmentation ◽

Chinese Texts

Download Full-text

Varying Naïve Bayes Models With Applications to Classification of Chinese Text Documents

Journal of Business and Economic Statistics ◽

10.1080/07350015.2014.903086 ◽

2014 ◽

Vol 32 (3) ◽

pp. 445-456 ◽

Cited By ~ 3

Author(s):

Guoyu Guan ◽

Jianhua Guo ◽

Hansheng Wang

Keyword(s):

Chinese Text ◽

Naive Bayes ◽

Naïve Bayes ◽

Text Documents

Download Full-text

A Chinese ancient book digital humanities research platform to support digital humanities research

The Electronic Library ◽

10.1108/el-10-2018-0213 ◽

2019 ◽

Vol 37 (2) ◽

pp. 314-336 ◽

Cited By ~ 1

Author(s):

Chih-Ming Chen ◽

Chung Chang

Keyword(s):

Social Network ◽

Network Analysis ◽

Chinese Text ◽

Digital Humanities ◽

Word Segmentation ◽

Digital Archives ◽

Content Type ◽

Text Annotation ◽

Research Platform ◽

Ancient Texts

PurposeWith the rapid development of digital humanities, some digital humanities platforms have been successfully developed to support digital humanities research for humanists. However, most of them have still not provided a friendly digital reading environment and practicable social network analysis tool to support humanists on interpreting texts and exploring characters’ social network relationships. Moreover, the advancement of digitization technologies for the retrieval and use of Chinese ancient books is arising an unprecedented challenge and opportunity. For these reasons, this paper aims to present a Chinese ancient books digital humanities research platform (CABDHRP) to support historical China studies. In addition to providing digital archives, digital reading, basic search and advanced search functions for Chinese ancient books, this platform still provides two novel functions that can more effectively support digital humanities research, including an automatic text annotation system (ATAS) for interpreting texts and a character social network relationship map tool (CSNRMT) for exploring characters’ social network relationships.Design/methodology/approachThis study adopted DSpace, an open-source institutional repository system, to serve as a digital archives system for archiving scanned images, metadata, and full texts to develop the CABDHRP for supporting digital humanities (DH) research. Moreover, the ATAS developed in the CABDHRP used the Node.js framework to implement the system’s front- and back-end services, as well as application programming interfaces (APIs) provided by different databases, such as China Biographical Database (CBDB) and TGAZ, used to retrieve the useful linked data (LD) sources for interpreting ancient texts. Also, Neo4j which is an open-source graph database management system was used to implement the CSNRMT of the CABDHRP. Finally, JavaScript and jQuery were applied to develop a monitoring program embedded in the CABDHRP to record the use processes from humanists based on xAPI (experience API). To understand the research participants’ perception when interpreting the historical texts and characters’ social network relationships with the support of ATAS and CSNRMT, semi-structured interviews with 21 research participants were conducted.FindingsAn ATAS embedded in the reading interface of CABDHRP can collect resources from different databases through LD for automatically annotating ancient texts to support digital humanities research. It allows the humanists to refer to resources from diverse databases when interpreting ancient texts, as well as provides a friendly text annotation reader for humanists to interpret ancient text through reading. Additionally, the CSNRMT provided by the CABDHRP can semi-automatically identify characters’ names based on Chinese word segmentation technology and humanists’ support to confirm and analyze characters’ social network relationships from Chinese ancient books based on visualizing characters’ social networks as a knowledge graph. The CABDHRP not only can stimulate humanists to explore new viewpoints in a humanistic research, but also can promote the public to emerge the learning interest and awareness of Chinese ancient books.Originality/valueThis study proposed a novel CABDHRP that provides the advanced features, including the automatic word segmentation of Chinese text, automatic Chinese text annotation, semi-automatic character social network analysis and user behavior analysis, that are different from other existed digital humanities platforms. Currently, there is no this kind of digital humanities platform developed for humanists to support digital humanities research.

Download Full-text

ERROR DETECTION AND CORRECTION BASED ON CHINESE PHONEMIC ALPHABET IN CHINESE TEXT

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488508005261 ◽

2008 ◽

Vol 16 (supp01) ◽

pp. 89-105 ◽

Cited By ~ 2

Author(s):

CHUEN-MIN HUANG ◽

MEI-CHEN WU ◽

CHING-CHE CHANG

Keyword(s):

Chinese Text ◽

Error Detection ◽

Experimental Results ◽

Text Editor ◽

Correction Rate ◽

Error Detection And Correction ◽

Chinese Texts ◽

Chinese Writing ◽

Automatic Error

Misspelling and misconception resulting from similar pronunciation appears frequently in Chinese texts. Without double check-up, this situation will be getting worse even with the help of Chinese input editor. It is hoped that the quality of Chinese writing would be enhanced if an effective automatic error detection and correction mechanism is embedded in text editor. Therefore, the burden of manpower to proofread shall be released. Until recently, researches in automatic error detection and correction of Chinese text have undergone many challenges and suffered from bad performance compared with that of Western text. In view of the prominent phenomenon in Chinese writing problem, this study proposes a learning model based on Chinese phonemic alphabets. The experimental results demonstrate that this model is effective in finding out misspellings and further improves detection and correction rate.

Download Full-text

Chinese text segmentation for text retrieval: Achievements and problems

Journal of the American Society for Information Science ◽

10.1002/(sici)1097-4571(199310)44:9<532::aid-asi3>3.0.co;2-m ◽

1993 ◽

Vol 44 (9) ◽

pp. 532-542 ◽

Cited By ~ 41

Author(s):

Zimin Wu ◽

Gwyneth Tseng

Keyword(s):

Chinese Text ◽

Text Retrieval ◽

Text Segmentation

Download Full-text

On the unsupervised analysis of domain-specific Chinese texts

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1516510113 ◽

2016 ◽

Vol 113 (22) ◽

pp. 6154-6159 ◽

Cited By ~ 5

Author(s):

Ke Deng ◽

Peter K. Bol ◽

Kate J. Li ◽

Jun S. Liu

Keyword(s):

Chinese Text ◽

Context Analysis ◽

Text Data ◽

Training Corpus ◽

Domain Specific ◽

Association Pattern ◽

Supervised Segmentation ◽

Chinese Texts ◽

Chinese Text Mining ◽

Better Than

With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method.

Download Full-text

ACTS: An automatic Chinese text segmentation system for full text retrieval

Journal of the American Society for Information Science ◽

10.1002/(sici)1097-4571(199503)46:2<83::aid-asi2>3.0.co;2-0 ◽

1995 ◽

Vol 46 (2) ◽

pp. 83-96 ◽

Cited By ~ 22

Author(s):

Zimin Wu ◽

Gwyneth Tseng

Keyword(s):

Chinese Text ◽

Full Text ◽

Text Retrieval ◽

Text Segmentation ◽

Full Text Retrieval

Download Full-text