A Data Warehouse can be defined as a centralized, consistent data store or Decision Support System (OLAP), for the end business users for analysis, prediction and decision making in their business operations. Data from various enterprise-wide application/transactional source systems (OLTP), are extracted, cleansed, integrated, transformed and loaded in the DWH.
Properties of a Data Warehouse:-
Subject Oriented, Integrated, Time-Variant, Non-Volatile
Subject-oriented: DWH is designed based on the major subjects areas of the business, rather than the major application areas of the enterprise. The ultimate necessity is to store decision-support data rather than application-oriented data.
Integrated:The data in DWH comes from different enterprise-wide application source systems. The source data is often inconsistent in nature. During the Integration process, source data must be made consistent in DWH so that the data becomes homogeneous and uniformity.
Time-variant:The source data in the DWH is only accurate and valid at some point in time or over some time interval.
Non-volatile:Data in DWH is not updated or replaced in real time but is refreshed from Opeartional Systems on a regular basis. New data is always added incrementally integrating it with previous data set.
OLTP and OLAP:-
OLTP: On Line Transaction Processing, describes processing of short and simple transaction data at operational sites i.e.Day to day operations in the Source systems. The Database is designed as Application-oriented (E-R based) i.e Highly Normalised so as to efficeintly support INSERT and UPDATE operations. Data stored in these systems are raw Current (Up-to-date) and Isolated Data, in a much Detailed level in flat relational tables.
OLAP: On Line Analytical Processing, describes processing at the Centralised, Integrated and Consistent Data Warehouse. It acts as the Decision Support System for the Business End Users. The Database is designed as Subject-oriented (Star/Snowflake Schema) i.e. Highly Denormalised to support the SELECT operations. Data in these systems are generally Consolidated,Summarized and Historical Data in nature.
Why we go for DataWarehouse instead of OLTP Reporting and Analysis:
1) OLAP reporting requires HISTORICAL DATA whereas the operational databases do not maintain history and updates the transactions.
2) Data in DWH is loaded after consolidation, integration, aggregation as well as summarization of data from various heterogeneous OLTP systems. Hence in that case we will fail to have OLTP reporting.
3) Also the different sources data use inconsistent data representations, codes, and formats which have to be reconciled before loading.
4) Complex OLAP queries will obviously degrade the performance of operational transactional source systems. The OLTP databases are also designed and tuned to support insert and update operations mainly.