What are Fact and Dimension tables?

Wednesday, January 27, 2010

Overview
• A multidimensional storage model involves two kind of tables:
1. Dimension tables
2. Fact tables
Fact Tables: A fact table contains the fact data of business like sales amount, quantity, customers and reference keys to dimension tables.



· Dimension Tables:
· A dimension table holds the data that we use to group the value to derive summary.
· A dimension table has primary key.
E.g. Customer table has information about city, state and zip code to summaries geographically sales. A dimension table contains the more detailed information of fact data like product details, customer details, store details etc.


· The fact table mainly holds transaction data.
· E.g. fact table data shows units amount and sold quantities.

Data Modelling for Data Warehousing

Overview
Data Cubes =
A data which are available in multidimensional matrices is called Data Cubes.
· A excel sheet is a two dimensional matrix, it can have sales revenue of corresponding geographical regions.
· Change process of converting one dimensional to another in the data cube is called pivoting.
• In pivoting, the data cube can rotate to see the data in different orientation.
• So, this technique is equivalent to product sales table by region.
• There are two kinds of Hierarchical views of multidimensional models:
1. Roll-up display
In the Roll-up display the hierarchy will move up and it will group into the larger units.
E.g.: Weekly data summation by quarter and by then by year.
2. Drill-down display
In the Drill-down display it will have opposite view.
E.g. dividing the region sales into sub-region and then again dividing sub-region sales country specific sales.

Storage types in Data Warehousing

Tuesday, January 26, 2010

Overview

There are mainly three kinds of storage types based on where the data or facts are stored.
1. MOLAP (Multi-dimensional OLAP)
2. ROLAP (Relational OLAP)
3. HOLAP (Hybrid OLAP)

MOLAP
• In this storage mode the data and aggregated data are stored in multi dimensional cube.
• It processes data which is already stored in multidimensional array with all possible combinations of data reflected, each in a cell that can be accessed directly.
• The main advantage is it can retrieve data very fast.

ROLAP
• The data which is stored in the relational database. Aggregated or summary data is also stored in the relational database.
• Performance can be slow. Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large.

HOLAP
• A combination of relational OLAP (ROLAP) and multidimensional OLAP.
• Data will be stored in relational format (ROLAP). Aggregations, on the other hand, are stored in multi dimensional format(MOLAP) in the cube to give better query performance.

Key elements of Data Wahrehousing

Overview
• A multi-dimensional database is created from fact and dimension tables to form objects called dimensions and cubes

Dimensions
•Dimensions are most often made up of several hierarchies
•Examples: time, geography, employee

Hierarchy
• Logical entity by which a business user might want to analyze fact data
• Each hierarchy can have one or more levels
Example: A hierarchy in the geography dimension –Country, State, County, City

Completely Balanced Hierarchy
• All leaf(end) nodes would be an equal level from the top level.
• E.g. : Geography dimension

Unbalanced Hierarchy
• Hierarchy in dimensions having an unbalanced distribution of leaf nodes relative to the top level.
E.g : Organization chart

Ragged Hierarchy
• Some hierarchies are typically balanced but are missing a unique characteristic of some members in a level.
• Example: geography hierarchy that contains the levels Country, State, and City. (USA-Washington-Seattle, Greece-Athens)

Cubes
• The cube is a multi dimensional data structure which we can query for business information.
• Each block of the cube is called a cell and is uniquely identified by a member in each dimension.
• Cubes reduce the query response time for the information worker to extract knowledge from the data – contain precalculated summary data called aggregations
• That is, cubes not only store multi dimensional data from fact tables, but also aggregations of that data.
(summing of sales figures up from stores level, to district level, to regional level)

Overview of Data Warehousing

Monday, January 25, 2010

A data warehousing is a process to transform one data source to structure conductive to business analysis. Mathematical calculations are also performed on the newly created organized data to make usefulness for making business decisions. Finally, the data is available for user for analysis.
A well architecture data warehouse can return query results very quickly.
There are two types of Data Analysis.
1. OLAP
2. Data Mining
Data Warehouse is a collection of decision support technologies aimed at enabling the knowledge worker (executive, manager, and analyst) to make better and faster decisions.
OLTP Database
• An OLTP database is a transaction-based and normalized to reduce the amount of redundant data storage generated.
• Results in fast update.
• For speed of information retrieval, especially for the purpose of business analytics, an OLAP database is called for.

OLAP Database
• An OLAP database is highly de-normalized and therefore has rows of data that may be redundant.
•This makes for very fast query responses because relatively few joins are involved.
• Fast responses are what we want while doing business intelligence work.
• Data Marts – Mini data warehouses and quite often act as part of a larger warehouse.
Data Marts are subject-oriented data stores for cleaned data
•E.g. Sales data mart, an inventory data mart, or any subject rooted at the departmental level.
• A Data Warehouse functions at the enterprise level and typically handles data across the entire organization.

 
 
 
Your Ad Here