ETL

What is ETL?
ETL is acronym for Etract Transform Load

Companies required data to make Strategic decisions.
Data from various sources in various data formats need to store in Data Ware House
with required format which is used to make Business decisions.

Example: I worked with Mobile Company. We have more Stores to sell the mobile connections. Some of them are having online connectivity to the database and
some of them are not having online connectivity.
End of the day we need to populate the data to the DataWare house from Online database and offline Data(MsAccess Tables) to create Reports for top management.

For this we have two type of data sources
1) Online Transaction Database (SQL Server)
2) MS Access table

In SQL Server 2000 DTS was using to Extract, Transform, Load the data.

Extract - the process of reading data from a database
Transform - the process of converting data from one form into another
i.e changing data types and data formats
Load - the process of writing data into the target database

Extraction
Gathering the Data
. Raw data that want to written directly into the disk.
. Data writen to flat files or relational tables from structured source systems.
. Data can be read multiple times, if needed.
Cleansing the Data
. Eliminate duplicates or fragmented data.
. Exclude unwanted/ unneeded information.

Tansformation
Preparing the data to be housed in the data ware house
Converting the extracted data
. Using rules and lookup tables
. Combining data
. Verification / Validity checks
. Standardization

Loading
Storing the transformed data in the data warehouse.
Batch / Real-Time processing.
Can follow Star Schema and Snowflake Schema.


DTS packages extract the data from these sources and transfer the data by transforming into various data formats to the Dataware house.

In SQL Server 2005 we call it as SQL Server Integration Services (SSIS).

Any ETL Architecture must meet the following requirements
• Business Requirements
• Compliance Requirements
• Data Profiling
• Data Security
• Data Integration
• Right Data at the Right Time
• Archiving and Lineage
• Final End-USer Delivery Interfaces
• Available Skills
• Legacy Licenses
• Alignment with overall Enterprise Architecture

Comments

Popular posts from this blog

Data Warehouse

SQL Server Integration Services (SSIS)

CTE/Temp/Table Variable