Advanced Applications and Structures in XML Processing
Latest Publications





Published By IGI Global

9781615207275, 9781615207282

Yan Qi ◽  
Huiping Cao ◽  
K. Selçuk Candan ◽  
Maria Luisa Sapino

In XML Data Integration, data/metadata merging and query processing are indispensable. Specifically, merging integrates multiple disparate (heterogeneous and autonomous) input data sources together for further usage, while query processing is one main reason why the data need to be integrated in the first place. Besides, when supported with appropriate user feedback techniques, queries can also provide contexts in which conflicts among the input sources can be interpreted and resolved. The flexibility of XML structure provides opportunities for alleviating some of the difficulties that other less flexible data types face in the presence of uncertainty; yet, this flexibility also introduces new challenges in merging multiple sources and query processing over integrated data. In this chapter, the authors discuss two alternative ways XML data/schema can be integrated: conflict-eliminating (where the result is cleaned from any conflicts that the different sources might have with each other) and conflict-preserving (where the resulting XML data or XML schema captures the alternative interpretations of the data). They also present techniques for query processing over integrated, possibly imprecise, XML data, and cover strategies that can be used for resolving underlying conflicts.

Norman May ◽  
Guido Moerkotte

Early approaches to XQuery processing proposed proprietary techniques to optimize and evaluate XQuery statements. In this chapter, the authors argue for an algebraic optimization and evaluation technique for XQuery as it allows us to benefit from experience gained with relational databases. An algebraic XQuery processing method requires a translation into an algebra representation. While many publications already exist on algebraic optimizations and evaluation techniques for XQuery, an assessment of translation techniques is required. Consequently, they give a comprehensive survey for translating XQuery into various query representations. The authors relate these approaches to the way normalization and translation is implemented in Natix and discuss these two steps in detail. In their experience, their translation method is a good basis for further optimizations and query evaluation.

Andreas M. Weiner ◽  
Theo Härder

Since the very beginning of query processing in database systems, cost-based query optimization has been the essential strategy for effectively answering complex queries on large documents. XML documents can be efficiently stored and processed using native XML database management systems. Even though such systems can choose from a huge repertoire of join operators (e. g., Structural Joins and Holistic Twig Joins) and various index access operators to efficiently evaluate queries on XML documents, the development of full-fledged XML query optimizers is still in its infancy. Especially the evaluation of complex XQuery expressions using these operators is not well understood and needs further research. The extensible, rule-based, and cost-based XML query optimization framework proposed in this chapter, serves as a testbed for exploring how and whether well-known concepts from relational query optimization (e. g., join reordering) can be reused and which new techniques can make a significant contribution to speed-up query execution. Using the best practices and an appropriate cost model that will be developed using this framework, it can be turned into a robust cost-based XML query optimizer in the future.

Zhen Hua Liu ◽  
Anguel Novoselsky ◽  
Vikas Arora

Since the advent of XML, there has been significant research into integrating XML data management with Relational DBMS and Object Relational DBMS (ORDBMS). This chapter describes the XML data management capabilities in ORDBMS, various design approaches and implementation techniques to support these capabilities, as well as the pros and cons of each design and implementation approach. Key topics such as XML storage, XML Indexing, XQuery and SQL/XML processing, are discussed in depth presenting both academic and industrial research work in these areas.

Samir Mohammad ◽  
Patrick Martin

Extensible Markup Language (XML), which provides a flexible way to define semistructured data, is a de facto standard for information exchange in the World Wide Web. The trend towards storing data in its XML format has meant a rapid growth in XML databases and the need to query them. Indexing plays a key role in improving the execution of a query. In this chapter the authors give a brief history of the creation and the development of the XML data model. They discuss the three main categories of indexes proposed in the literature to handle the XML semistructured data model and provide an evaluation of indexing schemes within these categories. Finally, they discuss limitations and open problems related to the major existing indexing schemes.

Huayu Wu ◽  
Tok Wang Ling

Existing XML twig pattern query processing algorithms fall into two classes: the relational approach and the native approach. Both kinds of approaches have their advantages and limitations. Particularly, the relational approach can search for data values (content search) efficiently using tables, but it is not efficient to match query structure to documents (structural search). The native approach processes structural search efficiently, but it has problem dealing with values. In this chapter, a hybrid approach for XML query processing is introduced. In this approach, the content search and the structural search in a twig pattern query are performed separately using the data structures in the relational approach and the native approach, i.e. relational tables and inverted lists. The authors show that this hybrid style technique can process both structural search and content search efficiently, and then improve the query processing performance comparing to the existing approaches. Furthermore, when more semantic information on object class and relationship between objects in the XML document is known, the relational tables used can be optimized according to such semantic information to achieve a better performance. Finally after performing twig pattern matching, value results can be extracted easily using relational tables, rather than navigating the document again in many other approaches.

Dario Colazzo ◽  
Giovanna Guerrini ◽  
Marco Mesiti ◽  
Barbara Oliboni ◽  
Emmanuel Waller

Purpose of this chapter is to describe the different research proposals and the facilities of main enabled and native XML DBMSs to handle XML updates at document and schema level, and their versions. Specifically, the chapter will provide a review of various proposals for XML document updates, their different semantics and their handling of update sequences, with a focus on the XQuery Update proposal. Approaches and specific issues concerned with schema updates will then be reviewed. Document and schema versioning will be considered. Finally, a review of the degree and limitations of update support in existing DBMSs will be discussed.

Chin-Wan Chung ◽  
Myung-Jae Park ◽  
Jihyun Lee

To effectively reduce the redundancy and verbosity of XML data, various studies for XML compression have been conducted. Especially, XML data management systems and applications require the support of direct query processing and update on compressed XML data, the stream based compression/decompression, and the reduction of the size of the compressed data. In order to fully support the various aspects of XML compression, existing XML compression techniques should be carefully examined and the additional requirements for XML compression techniques should be considered. In this chapter, the authors first classify existing representative XML compression techniques according to their characteristics. Second, they explain the details of XML specific compression techniques. Third, they summarize the performance of the compression techniques in terms of the compression ratio and the compression and decompression time. Lastly, they present some future research directions.

Guoli Li ◽  
Shuang Hou ◽  
Hans Arno Jacobsen

XML-based data dissemination networks are rapidly gaining momentum. In these networks XML content is routed from data producers to data consumers throughout an overlay network of content-based routers. Routing decisions are based on XPath expressions (XPEs) stored at each router. To enable efficient routing, while keeping the routing state small, we introduce advertisement-based routing algorithms for XML content, present a novel data structure for managing XPEs, especially apt for the hierarchical nature of XPEs and XML, and develop several optimizations for reducing the number of XPEs required to manage the routing state. The experimental evaluation shows that our algorithms and optimizations reduce the routing table size by up to 90%, improve the routing time by roughly 85%, and reduce overall network traffic by about 35%. Experiments running on PlanetLab show the scalability of our approach.

Sylvain Hallé ◽  
Roger Villemaire

Web service interface contracts define constraints on the patterns of XML messages exchanged between cooperating peers. The authors provide a translation between Linear Temporal Logic (LTL) and a subset of the XML Query Language XQuery, and show that an efficient validation of LTL formulæ can be achieved through the evaluation of XQuery expressions on message traces. Moreover, the runtime monitoring of interface constraints is possible by feeding the trace of messages to a streaming XQuery processor. This shows how advanced XML query processing technologies can be leveraged to perform trace validation and runtime monitoring in web service production environments.

Sign in / Sign up

Export Citation Format

Share Document