Sunday, October 4, 2009

Data warehouse

Data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis.A Data Warehouse houses a standardized, consistent, clean and integrated form of data sourced from various operational systems in use in the organization, structured in a way to specifically address the reporting and analytic requirements.

This definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.

Subject-oriented

    The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together.
Non-volatile
    Data in the data warehouse is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting.
Integrated
    The data warehouse contains data from most or all of an organization's operational systems and this data is made consistent.The top-down design methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. Top-down design has also proven to be robust against business changes. Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope. The up-front cost for implementing a data warehouse using the top-down methodology is significant, and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. In addition, the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases

Benefits of data warehousing
Some of the benefits that a data warehouse provides are as follows:

    * A data warehouse provides a common data model for all data of interest regardless of the data's source. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc.
    * Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.
    * Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time.
    * Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems.
    * Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems.
    * Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals.

Disadvantages of data warehouses

There are also disadvantages to using a data warehouse. Some of them are:
    * Data warehouses are not the optimal environment for unstructured data.
    * Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data.
    * Over their life, data warehouses can have high costs. The data warehouse is usually not static. Maintenance costs are high.
    * Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization.
    * There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa.

No comments:

Post a Comment