DataWarehouse

A Data Warehouse can be defined as a centralized, consistent data store or Decision Support System (OLAP), for the end business users for analysis, prediction and decision making in their business operations. Data from various enterprise-wide application/transactional source systems (OLTP), are extracted, cleansed, integrated, transformed and loaded in the DWH.

Properties of a Data Warehouse:-
Subject Oriented, Integrated, Time-Variant, Non-Volatile


Subject-oriented: DWH is designed based on the major subjects areas of the business, rather than the major application areas of the enterprise. The ultimate necessity is to store decision-support data rather than application-oriented data.

Integrated:The data in DWH comes from different enterprise-wide application source systems. The source data is often inconsistent in nature. During the Integration process, source data must be made consistent in DWH so that the data becomes homogeneous and uniformity.

Time-variant:The source data in the DWH is only accurate and valid at some point in time or over some time interval.

Non-volatile:Data in DWH is not updated or replaced in real time but is refreshed from Opeartional Systems on a regular basis. New data is always added incrementally integrating it with previous data set.


OLTP and OLAP:-

OLTP: On Line Transaction Processing, describes processing of short and simple transaction data at operational sites i.e.Day to day operations in the Source systems. The Database is designed as Application-oriented (E-R based) i.e Highly Normalised so as to efficeintly support INSERT and UPDATE operations. Data stored in these systems are raw Current (Up-to-date) and Isolated Data, in a much Detailed level in flat relational tables.


OLAP: On Line Analytical Processing, describes processing at the Centralised, Integrated and Consistent Data Warehouse. It acts as the Decision Support System for the Business End Users. The Database is designed as Subject-oriented (Star/Snowflake Schema) i.e. Highly Denormalised to support the SELECT operations. Data in these systems are generally Consolidated,Summarized and Historical Data in nature.


Why we go for DataWarehouse instead of OLTP Reporting and Analysis:

1) OLAP reporting requires HISTORICAL DATA whereas the operational databases do not maintain history and updates the transactions.

2) Data in DWH is loaded after consolidation, integration, aggregation as well as summarization of data from various heterogeneous OLTP systems. Hence in that case we will fail to have OLTP reporting.

3) Also the different sources data use inconsistent data representations, codes, and formats which have to be reconciled before loading.

4) Complex OLAP queries will obviously degrade the performance of operational transactional source systems. The OLTP databases are also designed and tuned to support insert and update operations mainly.

Oracle Analytic Functions




Oracle Analytic Functions compute an aggregate value based on a group of rows. It opens up a whole new way of looking at the data. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row.

Oracle provides many Analytic Functions such as
AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE.

The Syntax of analytic functions:
Analytic-Function(Column1,Column2,...)
OVER (
[Query-Partition-Clause]
[Order-By-Clause]
[Windowing-Clause]
)

Analytic functions take 0 to 3 arguments.

An Example:
SELECT ename, deptno, sal,
SUM(sal)
OVER (ORDER BY deptno, ename) AS Running_Total,
SUM(sal)
OVER ( PARTITION BY deptno
ORDER BY ename) AS Dept_Total,
ROW_NUMBER()
OVER (PARTITION BY deptno
ORDER BY ename) As Sequence_No
FROM emp
ORDER BY deptno, ename;



For more details visit:

www.dwbiconcepts.com

Why RDBMS against traditional File System




We can store as well as retrieve data from traditional file system. We go for Relational Database Management System to avoid the problems in file system handling which are as follows:

1. Data Redundancy and Inconsistency
2. Difficulty in Accessing data
3. High Complexity in Updating of data
4. Concurrent Access Anomalies
5. Security problem
6. Integrity problem

These clearly shows why we should choose DBMS.

Oracle External Tables - Loading XML file




The Oracle external tables feature allows us to access data in external sources as if it is a table in the database. An XML file is an Extensible Markup Language file with markup and content. It consists of markup constructs like Tag, Attribute and Contents or Element.
Oracle can parse a XML file in many ways like the XML as a datatype, DOM parser .

Now to load a xml file in oracle database with the concept of External Table is shown below:

Suppose we take a xml file namely addressbook.xml which contains address information in xml file format as follows:

<?xml version="1.0" encoding="UTF-8"?>
<addressbook>
<contact>
<contact_name>Sam Disuza </contact_name>
<address>Jurong Point, Singapore</address>
<phone>87654321</phone>
<dob>01121980</dob>
</contact>
<contact>
<contact_name>Suzy Disuza</contact_name>
<address>Jurong West, Singapore</address>
<phone>12345678</phone>
<dob>15081983</dob>
</contact>
</addressbook>

Now the external table definition goes here:

CREATE TABLE addressbook_ext
(contact_name VARCHAR2(2000),
address VARCHAR2(2000),
phone VARCHAR2(2000),
dob VARCHAR2(2000)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY ext_tab_dir
ACCESS PARAMETERS
(
records delimited by "</contact>"
fields
(
dummy1 char(2000) terminated by "<contact>",
contact_name char(2000) enclosed by "<contact_name>" and "</contact_name>",
address char(2000) enclosed by "<address>" and "</address>",
phone char(2000) enclosed by "<phone>" and "</phone>",
dob char(2000) enclosed by "<dob>" and "</dob>"
)
)
LOCATION ('addressbook.xml')
)
PARALLEL
REJECT LIMIT UNLIMITED;

Now we make a query on the external table:
select * from addressbook_ext;

CONTACT_NAME ADDRESS PHONE DOB
Sam Disuza Jurong Point, Singapore 87654321 01121980
Suzy Disuza Jurong West, Singapore 12345678 15081983