View Selection and Materialization

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch007 ◽

2010 ◽

pp. 114-130

Author(s):

Zohra Bellahsene

Keyword(s):

Data Warehousing ◽

Database Systems ◽

Selection Method ◽

Materialized Views ◽

Storage Space ◽

View Selection ◽

Dynamic View ◽

Processing Cost ◽

Speed Up ◽

Warehousing Systems

There are many motivations for investigating the view selection problem. At first, materialized views are increasingly being supported by commercial database systems and are used to speed up query response time. Therefore, the problem of choosing an appropriate set of views to materialize in the database is crucial in order to improve query processing cost. Another application of the view selection issue is selecting views to materialize in data warehousing systems to answer decision support queries. The problem addressed in this paper is similar to that of deciding which views to materialize in data warehousing. However, most existing view selection methods are static. Moreover, none of these methods have considered the problem of de-materializing the already materialized views. Yet it is a very important issue since the size of storage space is usually restricted. This chapter deals with the problem of dynamic view selection and with the pending issue of removing materialized views in order to replace less beneficial views with more beneficial ones. We propose a view selection method for deciding which views to materialize according to statistic metadata. More precisely, we have designed and implemented our view selection method, including a polynomial algorithm, to decide which views to materialize.

Download Full-text

Dynamic View Selection for OLAP

Strategic Advancements in Utilizing Data Mining and Warehousing Technologies ◽

10.4018/978-1-60566-717-1.ch005 ◽

2011 ◽

pp. 91-106

Author(s):

Lawrence Michael ◽

Rau-Chaplin Andrew

Keyword(s):

Data Warehousing ◽

Materialized Views ◽

Challenging Problem ◽

View Selection ◽

Dynamic View ◽

Aggregate Queries ◽

Speed Up ◽

Analytical Processing ◽

Changes Over Time ◽

Over Time

In a data warehousing environment, aggregate views are often materialized in order to speed up aggregate queries of online analytical processing (OLAP). Due to the increasing size of data warehouses, it is often infeasible to materialize all views. View selection, the task of selecting a subset of views to materialize based on updates and expectations of the query load, is an important and challenging problem. In this article, we explore dynamic view selection in which the distribution of queries changes over time and the set of materialized views must be tuned by replacing some of the previously materialized views with new ones.

Download Full-text

Materialized View Selection in the Data Warehouse

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.29-32.1133 ◽

2010 ◽

Vol 29-32 ◽

pp. 1133-1138 ◽

Cited By ~ 1

Author(s):

Li Juan Zhou ◽

Hai Jun Geng ◽

Ming Sheng Xu

Keyword(s):

Decision Support ◽

Data Warehouse ◽

Materialized Views ◽

Storage Space ◽

View Selection ◽

Materialized View ◽

Query Response Time ◽

Materialized View Selection ◽

Optimal Efficiency ◽

The Cost

A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. Materialized view selection is one of the crucial decisions in designing a data warehouse for optimal efficiency. The goal is to select an appropriate set of views that minimizes sum of the query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In this article, we present an improved PGA algorithm to accomplish the view selection problem; the experiments show that our proposed algorithm shows it’s superior.

Download Full-text

A Genetic Algorithm for Selecting Horizontal Fragments

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch142 ◽

2011 ◽

pp. 920-925

Author(s):

Ladjel Bellatreche

Keyword(s):

Database Systems ◽

Data Partitioning ◽

Materialized Views ◽

Access Path ◽

Multiple Dimensions ◽

Physical Database Design ◽

Join Queries ◽

Speed Up ◽

Disjoint Sets ◽

Horizontal Partitioning

Decision support applications require complex queries, e.g., multi way joins defining on huge warehouses usually modelled using star schemas, i.e., a fact table and a set of data dimensions (Papadomanolakis & Ailamaki, 2004). Star schemas have an important property in terms of join operations between dimensions tables and the fact table (i.e., the fact table contains foreign keys for each dimension). None join operations between dimension tables. Joins in data warehouses (called star join queries) are particularly expensive because the fact table (the largest table in the warehouse by far) participates in every join and multiple dimensions are likely to participate in each join. To speed up star join queries, many optimization structures were proposed: redundant structures (materialized views and advanced index schemes) and non redundant structures (data partitioning and parallel processing). Recently, data partitioning is known as an important aspect of physical database design (Sanjay, Narasayya & Yang, 2004; Papadomanolakis & Ailamaki, 2004). Two types of data partitioning are available (Özsu & Valduriez, 1999): vertical and horizontal partitioning. Vertical partitioning allows tables to be decomposed into disjoint sets of columns. Horizontal partitioning allows tables, materialized views and indexes to be partitioned into disjoint sets of rows that are physically stored and usually accessed separately. Contrary to redundant structures, data partitioning does not replicate data, thereby reducing storage requirement and minimizing maintenance overhead. In this paper, we concentrate only on horizontal data partitioning (HP). HP may affect positively (1) query performance, by performing partition elimination: if a query includes a partition key as a predicate in the WHERE clause, the query optimizer will automatically route the query to only relevant partitions and (2) database manageability: for instance, by allocating partitions in different machines or by splitting any access paths: tables, materialized views, indexes, etc. Most of database systems allow three methods to perform the HP using PARTITION statement: RANGE, HASH and LIST (Sanjay, Narasayya & Yang, 2004). In the range partitioning, an access path (table, view, and index) is split according to a range of values of a given set of columns. The hash mode decomposes the data according to a hash function (provided by the system) applied to the values of the partitioning columns. The list partitioning splits a table according to the listed values of a column. These methods can be combined to generate composite partitioning. Oracle currently supports range-hash and range-list composite partitioning using PARTITION - SUBPARTITION statement. The following SQL statement shows an example of fragmenting a table Student using range partitioning.

Download Full-text

How reduce the View Selection Problem through the CoDe Modeling

Journal on Advances in Theoretical and Applied Informatics ◽

10.26729/jadi.v2i2.2090 ◽

2016 ◽

Vol 2 (2) ◽

pp. 19

Author(s):

Valentina Indelli Pisano ◽

Michele Risi ◽

Genoveffa Tortora

Keyword(s):

Data Warehouse ◽

Ad Hoc ◽

Lattice Structure ◽

Large Data ◽

Real Data ◽

Query Complexity ◽

Materialized Views ◽

View Selection ◽

Big Data Visualization ◽

Speed Up

Big Data visualization is not an easy task due to the sheer amount of information contained in data warehouses. Then the accuracy on data relationships in a representation becomes one of the most crucial aspects to perform business knowledge discovery. A tool that allows to model and visualize information relationships between data is CoDe, which by processing several queries on a data-mart, generates a visualization of such data. However on a large data warehouse, the computation of these queries increases the response time by the query complexity. A common approach to speed up data warehousing is precompute a set of materialized views, store in the warehouse and use them to compute the workload queries. The goal and the objectives of this paper are to present a new process exploiting the CoDe modeling through determining the minimal number of required OLAP queries and to mitigate the problem of view selection, i.e., select the optimal set of materialized views. In particular, the proposed process determines the minimal number of required OLAP queries, creates an ad hoc lattice structure to represent them, and selects on such structure the views to be materialized taking into account an heuristic based on the processing time cost and the view storage space. The results of an experiment on a real data warehouse show an improvement in the range of 36-98% with respect the approach that does not consider materialized views, and 7% wrt. an approach that exploits them. Moreover, we have shown how the results are affected by the lattice structure.

Download Full-text

Research on Materialized Views under Disk-Space Constraint in Data Warehouse

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.1955 ◽

2014 ◽

Vol 989-994 ◽

pp. 1955-1958

Author(s):

Lei Ma

Keyword(s):

Data Warehouse ◽

Space Constraint ◽

Maintenance Cost ◽

Materialized Views ◽

Ranking Algorithm ◽

Disk Space ◽

View Selection ◽

Materialized View ◽

Processing Cost ◽

Materialized View Selection

Materialized view is an important topic in data warehouse research, and also affects the query efficiency and maintenance cost. The disk-space view-selection problem is to select a set of materialized views for the purpose of minimizing the total query processing cost and the total maintenance cost. In this paper we introduce evolutionary algorithm using stochastic ranking algorithm, which can enable materialized view selection under disk-space constraint. The algorithm improve the stochastic ranking algorithm, which can find a near-optimal feasible solution. This paper use the algorithm in Police data warehouse.

Download Full-text

On efficient storage space distribution among materialized views and indices in data warehousing environments

Proceedings of the ninth international conference on Information and knowledge management - CIKM '00 ◽

10.1145/354756.354846 ◽

2000 ◽

Cited By ~ 5

Author(s):

Ladjel Bellatreche ◽

Kamalakar Karlapalem ◽

Michel Schneider

Keyword(s):

Data Warehousing ◽

Space Distribution ◽

Materialized Views ◽

Storage Space ◽

Efficient Storage

Download Full-text

Research on Materialized View Selection in the Data Warehouse

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.55-57.361 ◽

2011 ◽

Vol 55-57 ◽

pp. 361-366

Author(s):

Li Juan Zhou ◽

Hai Jun Geng ◽

Ming Sheng Xu

Keyword(s):

Data Warehouse ◽

High Efficiency ◽

Resource Constraints ◽

Maintenance Cost ◽

Materialized Views ◽

View Selection ◽

Materialized View ◽

Cost Constraints ◽

Processing Cost ◽

Materialized View Selection

Materialized view is an effective method for improving the efficiency of queries in data warehouse system, and the problem of materialized view selection is one of the most important decisions. In this paper, an algorithm was proposed to select a set of materialized views under maintenance cost constraints for the purpose of minimizing the total query processing cost; the algorithm adopts the dynamic penalty function to solve the resource constraints view selection. The experimental study shows that the algorithm has better solutions and high efficiency.

Download Full-text

Universal Data Warehousing Based on a Meta-Data Modeling Approach

International Journal of Cooperative Information Systems ◽

10.1142/s0218843003000772 ◽

2003 ◽

Vol 12 (03) ◽

pp. 325-363 ◽

Cited By ~ 2

Author(s):

Joseph Fong ◽

Qing Li ◽

Shi-Ming Huang

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Object Oriented ◽

Data Models ◽

Heterogeneous Databases ◽

Materialized Views ◽

Prototype System ◽

Relational View ◽

Metadata Model ◽

Star Schema

Data warehouse contains vast amount of data to support complex queries of various Decision Support Systems (DSSs). It needs to store materialized views of data, which must be available consistently and instantaneously. Using a frame metadata model, this paper presents an architecture of a universal data warehousing with different data models. The frame metadata model represents the metadata of a data warehouse, which structures an application domain into classes, and integrates schemas of heterogeneous databases by capturing their semantics. A star schema is derived from user requirements based on the integrated schema, catalogued in the metadata, which stores the schema of relational database (RDB) and object-oriented database (OODB). Data materialization between RDB and OODB is achieved by unloading source database into sequential file and reloading into target database, through which an object relational view can be defined so as to allow the users to obtain the same warehouse view in different data models simultaneously. We describe our procedures of building the relational view of star schema by multidimensional SQL query, and the object oriented view of the data warehouse by Online Analytical Processing (OLAP) through method call, derived from the integrated schema. To validate our work, an application prototype system has been developed in a product sales data warehousing domain based on this approach.

Download Full-text