scholarly journals Compression of Probabilistic XML Documents

Author(s):  
Irma Veldman ◽  
Ander de Keijzer ◽  
Maurice van Keulen
2012 ◽  
Vol 263-266 ◽  
pp. 1578-1583
Author(s):  
Yan Zhu ◽  
Hai Tao Ma

Uncertain relational data management has been investigated for a few years, but few works on uncertain XML. The natural structures with high flexibility make XML more appropriate for representing uncertain information. Based on the semantic of possible world and probabilistic models with independent distribution and mutual exclusive distribution nodes, the problem of how to generate instance from a probabilistic XML and calculate its probability was studied, which is one of the key problems of uncertain XML management. Moreover, an algorithm for a generating XML document from a probabilistic XML and calculating its probability are also proposed, which has linear time complexity. Finally, experiment results are made to show up the correct and efficiency of the algorithm.


Data Mining ◽  
2013 ◽  
pp. 669-691 ◽  
Author(s):  
Evgeny Kharlamov ◽  
Pierre Senellart

This chapter deals with data mining in uncertain XML data models, whose uncertainty typically comes from imprecise automatic processes. We first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semistructured counterparts. We focus on a specific probabilistic XML model, which allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. We summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.


Author(s):  
Evgeny Kharlamov ◽  
Pierre Senellart

This chapter deals with data mining in uncertain XML data models, whose uncertainty typically comes from imprecise automatic processes. We first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semistructured counterparts. We focus on a specific probabilistic XML model, which allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. We summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.


2014 ◽  
Vol 571-572 ◽  
pp. 575-579
Author(s):  
Hai Tao Ma ◽  
Chang Yong Yu ◽  
Chang Ming Xu ◽  
Miao Fang

We explored the subtree matching problem of probabilistic XML documents: finding the matches of an XML query tree over a probabilistic XML document, using the canonical tree edit distance as a similarity measure between subtrees. Probabilistic XML is a probability distribution model capturing uncertainty of both value and structure. Query over probabilistic XML documents is difficult: an naivie algorithm has exponential complexity by directly compute the tree edit distance between the query tree and each certain XML tree represented by the probabilistic XML document. Based on the method of tree edit distance computation over certain XML subtrees, we defined a minimum-solution to the edit distance computation, which means the minimum cost to translate the query tree to the probabilistic XML tree. Furthermore, we developed an algorithm---ASM (Algorithm of Subtree Matching) to compute the minimum solution. Finally, we proved the complexity of ASM is linear in the size of the probabilistic XML document.


Sign in / Sign up

Export Citation Format

Share Document