Posts

Data Warehouse

Image
Importance of Data Warehouse Data received from different sources We may receive the data from various sources it could be in flat files, mdb, SQL Server or Oracle etc... End of the day our business requires to integrate the data from different sources  to take business decisions. It is not simple task to create a single report by adding all different source of data. It can be handled by data ware house. Data ware house collects data from different sources and store the data in Data Marts. Data marts are well designed for Reporting tools. Load on Production System Users who are working the production database always expects the faster results with present required details. If we use Production System data base to generate   'ad-hoc'   eports with huge data, it could impact the performance for users who are working on the applications. Example: When we are going to buy a product from online store, we always expect ...

SSIS - Control Flow Basics

Image
Overview Components which use in Control flow to perform sequence of operations. Connection Manager Tasks Precedence Constrains Containers Connection Manager Microsoft SQL Server Integration Services packages use connections to perform different tasks and to implement Integration Services features. Type Description Topic ADO Connects to ActiveX Data Objects (ADO) objects. ADO Connection Manager ADO.NET Connects to a data source by using a .NET provider. ADO.NET Connection Manager CACHE Reads data from the data flow or from a cache file (.caw), and can save data to the cache file. Cache Connection Manager DQS Connects to a Data Quality Services server and a Data Quality Services database on the server. DQS Cleansing Connection Manager EXCEL Connects to an Excel workbook file. Excel Connection Manager FILE Connects to a file or a folder. File Connection Manager FLATFILE C...

SQL Server Integration Services (SSIS)

Image
Integration Services Project Items Package - DTSX file Data Source - DS file Data Source View - DSV file Package A package is an organized collection of connections, control flow elements, data flow elements, event handlers, variables, and configurations, that you assemble using either the graphical design tools that SQL Server Integration Services provides, or build programmatically. You then save the completed package to SQL Server, the SSIS Package Store, or the file system. The package is the unit of work that is retrieved, executed, and saved.                                                 Data Source A data source is a connection reference that you create outside a package. A data source represents a simple connection to a data store, which includes all tables and views in the data store. For more advanced features, such as the a...

CTE/Temp/Table Variable

Image
CTE:(Common Table Expression) A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query. How to use CTE Refer the Microsoft link http://msdn.microsoft.com/en-us/library/ms175972.aspx Good Example from Microsoft on "Recursive Queries Using Common Table Expressions" http://msdn.microsoft.com/en-us/library/ms186243(v=sql.105).aspx Temp tables  1. The table "exists" - that is, it's materialized as a table, at least in memory, which contains the result set and can be reused. 2. In some cases, performance can be improved or blocking reduced when you have to perfor...

Star and Snow Flake Schema

Image
Star Schema The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries. Fact Tables A fact table typically has two types of columns: foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact's data on detail or aggregated level. In the above image Date_id, Store_id, Product_id are foreign keys. Units_Sold contains numeric values. Dimension Tables The dimension tables store the textual descriptions of the dimensions (e.g., customer name, product description).  Dimensions are usually based on the master files in the business, for example customer master, employee master. Snow Flake Schema In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relation...

ETL

Image
What is ETL? ETL is acronym for Etract Transform Load Companies required data to make Strategic decisions. Data from various sources in various data formats need to store in Data Ware House with required format which is used to make Business decisions. Example: I worked with Mobile Company. We have more Stores to sell the mobile connections. Some of them are having online connectivity to the database and some of them are not having online connectivity. End of the day we need to populate the data to the DataWare house from Online database and offline Data(MsAccess Tables) to create Reports for top management. For this we have two type of data sources 1) Online Transaction Database (SQL Server) 2) MS Access table In SQL Server 2000 DTS was using to Extract, Transform, Load the data. Extract - the process of reading data from a database Transform - the process of converting data from one form into another i.e changing data types and data formats Load - the process of writing data into ...

Slowly Changing Dimension

What is SCD? . In Data Ware Housing world SCD plays important role. SCD stands for Slowly Changing Dimension. It means changes in Dimension table. Dimension table would contains Customer table. Initially consider this is a master table with only one Entry for each customer. If there are any changes in the customer details. We need to make new entry for the same customer with the changed details. For this we have two options. 1) Create new entry in the same table by creating a current record status flag 2) Update the modifications in the master table and create the entry for old records in history table with Foreign key relation. Type-1 Dimension         In this it overwrites the old data with new data. No track for history data. Type-2 Dimension         The Type 2 method tracks historical data by creating multiple records for a given natural    key in the dimensional tables with separate surrogate keys and/or different ...