Seven base reasons why organizations build data warehouses

May 10, 2009

In this short blog, I am bringing to your attention the list of principal causes, which lead companies to implement data warehouses. The reason why I wanted to put these reasons in one blurb is that too much material about data warehouses is dedicated to advantages of using warehouses indirectly. However, the warehouses’ existence, in fact, is the direct consequence of the resolving business problems. Look through for marketing materials available in the Web. You will see that folks are using data warehouses for important yet simple reasons like “to be closer to the customer”, “to transform of the raw data into business intelligence information, “to make decision-making, based on facts, instead of on intuition”, and, possibly, most often mentioned rationale, “to get an edge over the competitors”. Actually, in 99% of projects the data warehouse itself is only one of the steps on a way to achieve the declared goals.

In reality, the main causes to persuading organizations to introduce a data warehouse are:

1. Need to perform analytical inquiries and reports generation utilizing computing resources that are not yet taken by the core information systems

The majority of companies desire that process scheduling to be setup in such way that the probability of the transactions to be completed within a practical time is reasonably high. Reports and inquiries can demand much more processing resources including, among other things, disk and memory. Therefore, running reports and inquiries on the systems that are occupied with transactional processing, severely reduces likelihood of timely completion of such reports and queries. This, in turn, threatens the completion of the business operations.
In other words, performance of analytical inquiries and generation of reports on the servers occupied with transactional systems creates a big problem of processing of transactions in a comprehensible time. The companies conclude that the least expensive and/or organizationally the most simple and fast way of maintaining a high speed of work on the basic systems consists in the introduction of the data warehouse on a separate server with its own disk and memory.

2. The necessity of implementing data models and technologies that accelerate process and increase performance of inquiries and reporting, but not those intended for processing of transactions

There are some ways of designing the structure of the data, which usually accelerates performance of inquiries such as “star” schemas and derivatives of that. However, on the other hand, these structures are not suitable for transactional systems due to reduced speed of processing transactions. There is also a number of technologies which are good at accelerating performance of inquiries and but are not tailored for OLTP (for example, bit indexes) and, on the contrary, applicable only in OLTP (restoration of transactions).

3. Creating environments in which even rudimentary knowledge of RDBMS is enough for creation of inquiries and building reports. It means a reduction of time, cost and risks that the IT personnel demands for support of system

As a rule, OLAP (to be exact the “star” schema and its derivatives) simplifies reports, data warehouse inquiries, and hence, requires less knowledge from the employees working with system. Despite the fact that end users still face problems in preparation of reports and require help from the experts in the IT Department, it is much easier and faster to prepare the necessary reports based on the warehouse data, rather than on transactional database. Notice that the big role in the increase of efficiency of work for the IT Personnel occurs from reduction of the procedural delays arising at the interaction of end users with the IT Department.

4. Creating a source with previously cleared information

The data warehouse gives the possibility of improvement of information quality without changing the data in the transactional system. Clearing of the data is accomplished at the stage prior of loading the warehouse. Moreover, notice that some installations of warehouses allow possibility of updating of the data in the primary sources based on the corrections, which have been carried out at the stage of loading of the data in warehouse.

5. Simplification of the process of report building based on the information from several OLTP systems and/or external sources of the data used exclusively for BI purposes

For organizations which need to prepare reports on several sources of the data (this is the most common case), it is necessary to do an unloading of data from the source, re-sort, “massage” and “cleanse” the data and only after that build the report on the received dataset without using the warehouse. In some cases, it is an adequate strategy. However if the company has great volumes of information required to be mixed, often if the data received from several transactional systems, and it is necessary for generation of reports, and, if the data need to be “clean”, the data warehouse will be most “correct” solution.

6. Constructing the allocated source (dedicated server) when the OLTP systems do not match up to the frequency of data storage required by the business and/or the possibility of needing to prepare reports for certain moments of time in the past (“as was” reporting).

For accelerating the response to inquiries about the data gets removed after a certain time from the transactional systems. For maintaining performance of inquiries and reporting, the historical and current data can be stored in the allocated warehouse that will provide the necessary productivity, both analytical (OLAP) and transaction systems (OLTP).
Building of reports for a certain moment in time (“as was” reporting) is extremely difficult in some cases or even impossible. For example, if you need to a report on salaries of employees with a certain educational level “1234” on some corporate scale for every month of 2005, but you cannot do it, because the only educational levels stored are for the year 2009. For similar problems to be resolved within a company, it is necessary to create data warehouse, which will help by using slowly changing dimensions (SCD).

7. Protection of the end users from being involved in any degree with the underlying structure and logic of how the DB and OS function

Usage and business analysis systems and all mechanisms of processing and storing data allow the hiding of the data warehouse from the end users. A push to more dense analytical work with the information, from outside management and analysts, means an increase in the degree of efficiency by corporate information activities.
So what about the business purposes?

Some the companies create warehouses for the decision of only one of the above-named problems, others face the full list. However, in no event it is impossible to say that the building of a data warehouse solves only particularly technical problems and does not pursue the business purpose.
While you looking at the list, note that the requirement for data warehouse stems from the limitations imposed by transitional system. In certain conditions, these limitations are not apparent; but they are there; one thing is clear: the warehouse of the data, to some extent, is necessary for each company and its introduction – a matter of time.
Revenons à nos moutons, “Let us get back to our sheep”. The company is seeking the best support of decision-making, getting ahead the competitors, wishing to become closer to the customer and for this purpose decided to quickly duct-tape data warehouse can be very surprised by a negative result. For the achievement of these purposes it is required, that the company has understood, usually by trial and error, how to change running the business for the most effective utilization of date stores, data warehouse and data marts. Moreover, it can appear to be more of a challenge than one would anticipate.