Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations.
A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making.
According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to take informed decisions in an organization.
A data warehouses provides us generalized and consolidated data in multidimensional view. Along with generalized and consolidated view of data, a data warehouses also provides us Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective analysis of data in a multidimensional space. This analysis results in data generalization and data mining.
Data mining functions such as association, clustering, classification, prediction can be integrated with OLAP operations to enhance the interactive mining of knowledge at multiple level of abstraction.
A data warehouse helps business executives to organize, analyse, and use their data for decision making. Data warehouses are widely used in the following fields:
- Financial services
- Banking services
- Consumer goods
- Retail sectors
- Controlled manufacturing
An operational database undergoes frequent changes on a daily basis on account of the transactions that take place.
Understanding a Data Warehouse
- A data warehouse helps executives to organize, understand, and use their data to take strategic decisions.
- Data warehouse systems help in the integration of diversity of application systems.
- It possesses consolidated historical data, which helps the organization to analyze its business.
- There is no frequent updating done in a data warehouse.
- A data warehouse is a database, which is kept separate from the organization’s operational database.
COMPONENTS OF A DATA WAREHOUSE
FEATURES OF DATA WAREHOUSE
- Subject Oriented – A data warehouse is subject oriented because it provides information around a subject rather than the organization’s ongoing operations. These subjects can be product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making.
- Time Variant – The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical point of view.
- Integrated – A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. This integration enhances the effective analysis of data.
Non-volatile – Non-volatile means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse.
DIFFERENCES BETWEEN A DATA WAREHOUSE IS SEPARATED FROM OPERATIONAL DATABASES
|DATA WAREHOSE [OLAP]||OPERATIONAL DATABASES [ OLTP]|
|An OLAP query needs only read only access of stored data||An operational database query allows to read and modify operations|
|Data ;warehouse queries are often complex and they present a general form of data||An operational database is constructed for well-known tasks and workloads such as searching particular records, indexing|
|Concurrency control and recovery mechanisms are NOT required for Data warehousing.||Operational databases support concurrent processing of multiple transactions.|
|A data warehouse maintains historical data.||An operational database maintains current data|
|OLAP systems are used by knowledge workers such as executives, managers, and analysts.||OLTP systems are used by DBAs or database professionals.|
|It is used to analyse the business.||It is used to run a business.|
|It is based on Star Schema, Snowflake Schema, and Fact Constellation Schema.||It is based on Entity Relationship Model.|
|It focuses on Information out.||It is application oriented.|
|The number of users is in hundreds.||The number of users is in thousands|
|These are highly flexible.||It provides high performance.|
TYPES OF DATA WAREHOUSE
Information processing, analytical processing, and data mining are the three types of data warehouse applications that are discussed below:
- Information Processing – A data warehouse allows to process the data stored in it. The data can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts, or graphs.
- Analytical Processing – A data warehouse supports analytical processing of the information stored in it. The data can be analyzed by means of basic OLAP operations, including slice-and-dice, drill down, drill up, and pivoting.
- Data Mining – Data mining supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification and prediction. These mining results can be presented using the visualization tools.
There are decision support technologies that help utilize the data available in a data warehouse. These help to use the warehouse quickly and effectively. They can gather data, analyse it, and take decisions based on the information present in the warehouse. The information gathered in a warehouse can be used in any of the following domains:
- Tuning Production Strategies – The product strategies can be well tuned by repositioning the products and managing the product portfolios by comparing the sales quarterly or yearly.
- Customer Analysis – Customer analysis is done by analyzing the customer’s buying preferences, buying time, budget cycles, etc.
- Operations Analysis – Data warehousing also helps in customer relationship management, and making environmental corrections. The information also allows us to analyze business operations.
3 TIER DATA WAREHOUSE ARCHITECTURE
Generally, a data warehouse adopts a three-tier architecture.
- Top-Tier – This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools.
- Middle Tier – This consists of the OLAP Server that can be implemented in either of the following ways.
By Relational OLAP (ROLAP): An extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational operations.
By Multidimensional OLAP (MOLAP) model: Directly implements the multidimensional data and operations
- Bottom Tier – The bottom tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.