Compression of Probabilistic XML Documents

Algorithms for Generating XML Documents from Probabilistic XML

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1578 ◽

2012 ◽

Vol 263-266 ◽

pp. 1578-1583

Author(s):

Yan Zhu ◽

Hai Tao Ma

Keyword(s):

Time Complexity ◽

Probabilistic Models ◽

Linear Time ◽

Uncertain Information ◽

Possible World ◽

Xml Documents ◽

Probabilistic Xml ◽

Xml Document ◽

High Flexibility ◽

Independent Distribution

Uncertain relational data management has been investigated for a few years, but few works on uncertain XML. The natural structures with high flexibility make XML more appropriate for representing uncertain information. Based on the semantic of possible world and probabilistic models with independent distribution and mutual exclusive distribution nodes, the problem of how to generate instance from a probabilistic XML and calculate its probability was studied, which is one of the key problems of uncertain XML management. Moreover, an algorithm for a generating XML document from a probabilistic XML and calculating its probability are also proposed, which has linear time complexity. Finally, experiment results are made to show up the correct and efficiency of the algorithm.

Download Full-text

Modeling, Querying, and Mining Uncertain XML Data

Data Mining ◽

10.4018/978-1-4666-2455-9.ch034 ◽

2013 ◽

pp. 669-691 ◽

Cited By ~ 1

Author(s):

Evgeny Kharlamov ◽

Pierre Senellart

Keyword(s):

Data Mining ◽

Data Model ◽

Uncertain Data ◽

Data Models ◽

Use Cases ◽

Automatic Processes ◽

Relational Models ◽

Xml Data ◽

Xml Documents ◽

Probabilistic Xml

This chapter deals with data mining in uncertain XML data models, whose uncertainty typically comes from imprecise automatic processes. We first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semistructured counterparts. We focus on a specific probabilistic XML model, which allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. We summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.

Download Full-text

Modeling, Querying, and Mining Uncertain XML Data

Advances in Data Mining and Database Management - XML Data Mining ◽

10.4018/978-1-61350-356-0.ch002 ◽

2011 ◽

pp. 29-52

Author(s):

Evgeny Kharlamov ◽

Pierre Senellart

Keyword(s):

Data Mining ◽

Data Model ◽

Uncertain Data ◽

Data Models ◽

Use Cases ◽

Automatic Processes ◽

Relational Models ◽

Xml Data ◽

Xml Documents ◽

Probabilistic Xml

This chapter deals with data mining in uncertain XML data models, whose uncertainty typically comes from imprecise automatic processes. We first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semistructured counterparts. We focus on a specific probabilistic XML model, which allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. We summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.

Download Full-text

Efficiently Subtree Matching between XML and Probabilistic XML Documents

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.575 ◽

2014 ◽

Vol 571-572 ◽

pp. 575-579

Author(s):

Hai Tao Ma ◽

Chang Yong Yu ◽

Chang Ming Xu ◽

Miao Fang

Keyword(s):

Edit Distance ◽

Distribution Model ◽

Tree Edit Distance ◽

Matching Problem ◽

Distance Computation ◽

Xml Documents ◽

Probabilistic Xml ◽

Xml Document ◽

Query Tree ◽

Minimum Solution

We explored the subtree matching problem of probabilistic XML documents: finding the matches of an XML query tree over a probabilistic XML document, using the canonical tree edit distance as a similarity measure between subtrees. Probabilistic XML is a probability distribution model capturing uncertainty of both value and structure. Query over probabilistic XML documents is difficult: an naivie algorithm has exponential complexity by directly compute the tree edit distance between the query tree and each certain XML tree represented by the probabilistic XML document. Based on the method of tree edit distance computation over certain XML subtrees, we defined a minimum-solution to the edit distance computation, which means the minimum cost to translate the query tree to the probabilistic XML tree. Furthermore, we developed an algorithm---ASM (Algorithm of Subtree Matching) to compute the minimum solution. Finally, we proved the complexity of ASM is linear in the size of the probabilistic XML document.

Download Full-text