Data Warehouse

Importance of Data Warehouse

Data received from different sources

We may receive the data from various sources it could be in flat files, mdb, SQL Server or Oracle etc...

End of the day our business requires to integrate the data from different sources  to take business decisions.
It is not simple task to create a single report by adding all different source of data.

It can be handled by data ware house.

Data ware house collects data from different sources and store the data in Data Marts.
Data marts are well designed for Reporting tools.

Load on Production System

Users who are working the production database always expects the faster results with present required details.

If we use Production System data base to generate  'ad-hoc'  eports with huge data, it could impact the performance for users who are working on the applications.

Example:
When we are going to buy a product from online store, we always expect the better user experience. 
At the same time Seller want know his sales for the last one year on different products. If we create report by taking data source as production database for this kind of reports which requires high volume data then it impacts the database performance.

So we need a separate data base for  'ad-hoc' reports. 
Data warehouse is the good solution for this. 

All previous historical data can be moved to the different Data Marts which are well deigned for  'ad-hoc' reports by denormalization.

Advantages of Data Ware House

  • Combine data from multiple systems and resolve inconsistencies between those systems.
  • Make reporting easier
  • Reduce the load on production systems
  • Provide consistency among system transitions
  • provide for long term storage data

Why can't we use OLTP system for Reporting?

  • OLTP - On- Line Transaction Processing.
  • Designed for working with single record at a time.
  • Data is highly "normalized", i,e duplicate values have been removed.
  • Getting all data for a record can involve many table joins.
  • Can be quite confusing for 'ad-hoc' reporting.
  • Can also be slow, having an impact on the OLTP system.

How Data Warehouse solve this?

  • Data ware house typically use a design called OLAP
  • On-Line Analytical processing
  • Number of tables are reduced, reducing number of joins and increasing simplicity
  • Data is de-normalized into structures easier to work with.

Normalized vs Denormalized


Normalization of data is very useful for OLTP. User can filter the data very easily on UI.
If the user selects the Product in the Combo, Then Color Combo filter the available colors for the selected products with best performance.
It consumes less space in the database.


It takes more space in the database than normalized tables. But the permanence with be good because of less joins on the system with all required data in one table.

Types of tables in warehouse

  • Facts
  • Dimensions
  • Both Require concept of Surrogate Keys
  • A new key, typically some type of INT, that is used in place of any other key as the Primary Key.

What is Surrogate Key?

Surrogate key is not for the user. This is to identify each individual record in the database.
Surrogate key cannot be used as storage model. Which means we cannot use the Surrogate key to insert any data based on surrogate key. This only for logical relation.

Fact Tables

  • Facts join dimensions, "who", "what", "when", and "where"
  • Facts also hold numeric measures to quantify the fact, "how much".

Fact table contains the Primary Key of Dimension Tables( SoldByID, SoldToID, ProductID).
Where as ID is the Surrogate key which identifies the each record as an Unique record.

Dimensions


  • Dimensions hold the values that describe facts.
  • "Look up Values"
  • Some  example: Time, Geography, Employees, Prodcuts, Cusotmers
  • When a Dimension can chage over time, it's known as Slowly Changing Dimension(SCD).
  • Many tpes of Dimensions
Refer the blog for Start And Snow Flake Schema

Types of Dimensions

  • Conformed Dimension
  • Junk Dimension
  • Degenerate Dimension

Conformed Dimension

  • When pulling in data from multiple systems, you often have to reconcile different primary keys.
  • This process is known as conforming your dimensions.

In this Example Inventor, Purchase, WorkManagement tables are having different primary key to identify "Widget" Product. 

Dimension table with keys from multiple source systems is called Conformed Dimension.

Here ID is the surrogate key which can be used across all fact tables of Inventory, Purchasing, WorkManagemen to fetch the data for Product "Widget".






Comments

Popular posts from this blog

SQL Server Integration Services (SSIS)

CTE/Temp/Table Variable