Analyses and Evaluation of Responses to Slowly Changing Dimensions in Data Warehouses

Author(s):  
Lars Frank ◽  
Christian Frank

A Star Schema Data Warehouse looks like a star with a central, so-called fact table, in the middle, surrounded by so-called dimension tables with one-to-many relationships to the central fact table. Dimensions are defined as dynamic or slowly changing if the attributes or relationships of a dimension can be updated. Aggregations of fact data to the level of the related dynamic dimensions might be misleading if the fact data are aggregated without considering the changes of the dimensions. In this chapter, we will first prove that the problems of SCD (Slowly Changing Dimensions) in a datawarehouse may be viewed as a special case of the read skew anomaly that may occur when different transactions access and update records without concurrency control. That is, we prove that aggregating fact data to the levels of a dynamic dimension should not make sense. On the other hand, we will also illustrate, by examples, that in some situations it does make sense that fact data is aggregated to the levels of a dynamic dimension. That is, it is the semantics of the data that determine whether historical dimension data should be preserved or destroyed. Even worse, we also illustrate that for some applications, we need a history preserving response, while for other applications at the same time need a history destroying response. Kimball et al., (2002), have described three classic solutions/responses to handling the aggregation problems caused by slowly changing dimensions. In this chapter, we will describe and evaluate four more responses of which one are new. This is important because all the responses have very different properties, and it is not possible to select a best solution without knowing the semantics of the data.

Author(s):  
Michel Schneider

Basically, the schema of a data warehouse lies on two kinds of elements: facts and dimensions. Facts are used to memorize measures about situations or events. Dimensions are used to analyse these measures, particularly through aggregation operations (counting, summation, average, etc.). To fix the ideas let us consider the analysis of the sales in a shop according to the product type and to the month in the year. Each sale of a product is a fact. One can characterize it by a quantity. One can calculate an aggregation function on the quantities of several facts. For example, one can make the sum of quantities sold for the product type “mineral water” during January in 2001, 2002 and 2003. Product type is a criterion of the dimension Product. Month and Year are criteria of the dimension Time. A quantity is so connected both with a type of product and with a month of one year. This type of connection concerns the organization of facts with regard to dimensions. On the other hand a month is connected to one year. This type of connection concerns the organization of criteria within a dimension. The possibilities of fact analysis depend on these two forms of connection and on the schema of the warehouse. This schema is chosen by the designer in accordance with the users needs.


2010 ◽  
pp. 865-886
Author(s):  
Pedro Furtado

Data Warehouses are a crucial technology for current competitive organizations in the globalized world. Size, speed and distributed operation are major challenges concerning those systems. Many data warehouses have huge sizes and the requirement that queries be processed quickly and efficiently, so parallel solutions are deployed to render the necessary efficiency. Distributed operation, on the other hand, concerns global commercial and scientific organizations that need to share their data in a coherent distributed data warehouse. In this article we review the major concepts, systems and research results behind parallel and distributed data warehouses.


Author(s):  
Michel Schneider

Basically, the schema of a data warehouse lies on two kinds of elements: facts and dimensions. Facts are used to memorize measures about situations or events. Dimensions are used to analyse these measures, particularly through aggregation operations (counting, summation, average, etc.). To fix the ideas let us consider the analysis of the sales in a shop according to the product type and to the month in the year. Each sale of a product is a fact. One can characterize it by a quantity. One can calculate an aggregation function on the quantities of several facts. For example, one can make the sum of quantities sold for the product type “mineral water” during January in 2001, 2002 and 2003. Product type is a criterion of the dimension Product. Month and Year are criteria of the dimension Time. A quantity is so connected both with a type of product and with a month of one year. This type of connection concerns the organization of facts with regard to dimensions. On the other hand a month is connected to one year. This type of connection concerns the organization of criteria within a dimension. The possibilities of fact analysis depend on these two forms of connection and on the schema of the warehouse. This schema is chosen by the designer in accordance with the users needs. Determining the schema of a data warehouse cannot be achieved without adequate modelling of dimensions and facts. In this article we present a general model for dimensions and facts and their relationships. This model will facilitate greatly the choice of the schema and its manipulation by the users.


Author(s):  
Pedro Furtado

Data Warehouses are a crucial technology for current competitive organizations in the globalized world. Size, speed and distributed operation are major challenges concerning those systems. Many data warehouses have huge sizes and the requirement that queries be processed quickly and efficiently, so parallel solutions are deployed to render the necessary efficiency. Distributed operation, on the other hand, concerns global commercial and scientific organizations that need to share their data in a coherent distributed data warehouse. In this article we review the major concepts, systems and research results behind parallel and distributed data warehouses.


2017 ◽  
Vol 10 (04) ◽  
pp. 745-754
Author(s):  
Mudasir M Kirmani

Data Warehouse design requires a radical rebuilding of tremendous measures of information, frequently of questionable or conflicting quality, drawn from various heterogeneous sources. Data Warehouse configuration assimilates business learning and innovation know-how. The outline of theData Warehouse requires a profound comprehension of the business forms in detail. The principle point of this exploration paper is to contemplate and investigate the transformation model to change over the E-R outlines to Star Schema for developing Data Warehouses. The Dimensional modelling is a logical design technique used for data warehouses. This research paper addresses various potential differences between the two techniques and highlights the advantages of using dimensional modelling along with disadvantages as well. Dimensional Modelling is one of the popular techniques for databases that are designed keeping in mind the queries from end-user in a data warehouse. In this paper the focus has been on Star Schema, which basically comprises of Fact table and Dimension tables. Each fact table further comprises of foreign keys of various dimensions and measures and degenerate dimensions if any. We also discuss the possibilities of deployment and acceptance of Conversion Model (CM) to provide the details of fact table and dimension tables according to the local needs. It will also highlight to why dimensional modelling is preferred over E-R modelling when creating data warehouse.


2021 ◽  
Vol 66 (4) ◽  
pp. 69-80
Author(s):  
Mihai Enăchescu ◽  

Continuity and Discontinuity in the Transmission of Spanish Inherited Words Competed by Arabisms: oliva and aceituna, olio and aceite, olivo and aceituno. The loss and replacement of Arabisms by Latin loanwords was a frequent phenomenon between the sixteenth and the seventeenth centuries; the opposite movement, the replacement of an inherited word by an Arabism is far less frequent. Oliva, an inherited word, is competed by the Arabism aceituna; currently the common name for the fruit in the Hispanic world is aceituna, and oliva has a restricted use to the phrase aceite de oliva or to refer to a colour. Similarly, the inherited word olio will be replaced by aceite, and with a specialized meaning will be eliminated by the euphuism óleo, its etymological doublet. On the other hand, olivo prevails over aceituno and represents a special case of continuity in this lexical family. The research will be carried out in two directions: first, I will analyse the old academic dictionaries and other specialized dictionaries and glossaries from the fifteenth-twentieth centuries. Second, I will conduct a corpus analysis, based on the diachronic corpora available for the Spanish language. This study will try to answer the questions how? and why? of these neological movements of vocabulary. Keywords: inherited words, Arabisms, oliva, aceituna, lexical substitution


2011 ◽  
Vol 268-270 ◽  
pp. 1006-1011
Author(s):  
Jun Qing Jiang ◽  
Wei Rui Feng ◽  
Yun Ting Wu

The multitude of operations over IFS is barely studied empirically. At the other hand they are quite complex and their properties are studied just theoretically. The IFDW that was developed was used to demonstrate the application of some operations over IFSs. It could serve as a basis for implementation of other IFS operations and the data gathered – to explore empirically the validity of the theoretical concepts.


2021 ◽  
Vol 153 (3) ◽  
pp. 269-290
Author(s):  
Nadja Germann

Medieval architectures of knowledge designed in the Islamic world constitute a special case: They neatly reflect the competition between different intellectual traditions and approaches. On the one hand, there are those classifications that are centered on what was perceived as the indigenous sciences during the formative period, i.e. those sciences that arose in connection with the new religion, Islam, and the language of its revelation, Arabic. On the other hand, scholars eagerly took over and adapted disciplines deriving from non-Arab and non-Muslim cultures, primarily Greek science and philosophy. These traditions, however, transmitted their own conceptions of knowledge that partly stood in conflict with Arabic-Islamic ideas. In this article, I first give an overview of the various approaches and then concentrate on Fārābī and Avicenna, in order to trace a remarkable development: the gradual dissolution of boundaries both within and between the different scientific spheres and paradigms on epistemological grounds.


2008 ◽  
pp. 429-436
Author(s):  
Juha Kontio

Reporting is one of the basic processes in all organizations. It provides information for planning and decision making and, on the other hand, information for analyzing the correctness of the decisions made at the beginning of the process. Reporting is based on the data that the operational information systems contain. Reports can be produced directly from these operational databases, but an operational database is not organized in a way that naturally supports analysis. An alternative way is to organize the data in such a way that supports analysis easily. Typically, this method leads to the introduction of a data warehouse.


Author(s):  
Juha Kontio

Reporting is one of the basic processes in all organizations. Reports should offer relevant information for guiding the decision-making. Reporting provides information for planning and on the other hand it provides information for analyzing the correctness of the decisions made at the beginning of the processes. Reporting is based on the data the operational information systems contain. Reports can be produced directly from these operational databases, but an operational database is not organized in a way that naturally supports analysis. An alternative way is to organize the data in such a way that supports analysis easily. Typically this leads to the introduction of a data warehouse. In summer 2002 a multiple case study research was launched in six Finnish organizations. The research studied the databases of these organizations and identified the trends in database exploitation. One of the main ideas was to study the diffusion of database innovations. In practice this meant that the present database architecture was described and the future plans and present problems were identified. The data was mainly collected with semi-structured interviews and altogether 54 interviews were arranged. The research processed data of 44 different information systems. Most (40 %) of the analyzed information systems were online transaction processing systems like order-entry systems. Second biggest category (30 %) was information systems relating to decision support and reporting. Only one pilot data warehouse was among these, but on the other hand customized reporting systems was used for example in SOK, SSP and OPTI. Reporting was anyway commonly recognized as an area where interviewees were not satisfied and improvements were hoped. Turku University of Applied Sciences is one of the largest of its kind in Finland with almost 9000 students and 33 Degree Programs. Our University is organized in six units of education that promote multidisciplinary learning. In autumn 2005 an enterprise resource planning system was introduced at Turku University of Applied Sciences. At the heart of this information system is a data warehouse collecting necessary information from the operational databases. This paper concentrates on briefly describing the identified problems in reporting in the earlier research and how a data warehouse might help overcoming these problems (a more thorough description is provided at (Kontio 2005)). These ideas are benchmarked with usage experiences of the data warehouse based ERP at Turku University of Applied Sciences resulting to some generalizations and confirmation.


Sign in / Sign up

Export Citation Format

Share Document