Data Vaults have been gaining huge attention in recent years all over the world. Invented by Dan Linstedt, Data Vaults is truly an optimal way to build enterprise data warehouses. WhereScape – an industry leading information systems software company – fully supports Data Vault methodology. Read the blog for more about it…
A common sentiment of a Big Data has been that it is too big and too costly for small and medium businesses because it requires enormous processing power to crunch all those oodles of data. But it is becoming more clear that this is an outdated perception! Processing power is cheaper and more accessible than any time in the past, granting small and medium-sized enterprises ability to tame Big Data and use it for small operations with limited IT resources and shortage of time and money.
A modern information system serves as an integrated set of components for collecting, storing, and processing data and delivering knowledge. Building an information system – whether it is operational store, data warehouse, or accessing layer et al – has always been a difficult task. WhereScape – as productivity tool – provides a holistic answer for information systems build consisting of Automation Solution, Iterative (Agile) Solution, and Integrated Solution.
This presentation has been first presented on the TDWI Florida Chapter meeting, October 16, 2009
To view it please click on the link below
I have received a thank you note from TDWI Florida Chapter officers for this presentation
“We wanted to thank you for taking the time to speak at our local chapter meeting. Your presentation on “Best Practices for Building a Warehouse Quickly” was very informative and contained a lot of valuable information for our attendees. Your recommendations on delivering results quickly to the business and what to avoid while deploying a warehouse solution were very insightful, and sparked good follow-up conversations with you during our networking breaks.
We will be posting your presentation on our chapter web page for our members to enjoy. Thank you,
Michael Jay Freer Chapter President / Illya Singh Chapter Vice-President / Maylee Sanchez Chapter Secretary”
In this short blog, I am bringing to your attention the list of principal causes, which lead companies to implement data warehouses. The reason why I wanted to put these reasons in one blurb is that too much material about data warehouses is dedicated to advantages of using warehouses indirectly. However, the warehouses’ existence, in fact, is the direct consequence of the resolving business problems. Look through for marketing materials available in the Web. You will see that folks are using data warehouses for important yet simple reasons like “to be closer to the customer”, “to transform of the raw data into business intelligence information, “to make decision-making, based on facts, instead of on intuition”, and, possibly, most often mentioned rationale, “to get an edge over the competitors”. Actually, in 99% of projects the data warehouse itself is only one of the steps on a way to achieve the declared goals.
In reality, the main causes to persuading organizations to introduce a data warehouse are:
1. Need to perform analytical inquiries and reports generation utilizing computing resources that are not yet taken by the core information systems
The majority of companies desire that process scheduling to be setup in such way that the probability of the transactions to be completed within a practical time is reasonably high. Reports and inquiries can demand much more processing resources including, among other things, disk and memory. Therefore, running reports and inquiries on the systems that are occupied with transactional processing, severely reduces likelihood of timely completion of such reports and queries. This, in turn, threatens the completion of the business operations.
In other words, performance of analytical inquiries and generation of reports on the servers occupied with transactional systems creates a big problem of processing of transactions in a comprehensible time. The companies conclude that the least expensive and/or organizationally the most simple and fast way of maintaining a high speed of work on the basic systems consists in the introduction of the data warehouse on a separate server with its own disk and memory.
2. The necessity of implementing data models and technologies that accelerate process and increase performance of inquiries and reporting, but not those intended for processing of transactions
There are some ways of designing the structure of the data, which usually accelerates performance of inquiries such as “star” schemas and derivatives of that. However, on the other hand, these structures are not suitable for transactional systems due to reduced speed of processing transactions. There is also a number of technologies which are good at accelerating performance of inquiries and but are not tailored for OLTP (for example, bit indexes) and, on the contrary, applicable only in OLTP (restoration of transactions).
3. Creating environments in which even rudimentary knowledge of RDBMS is enough for creation of inquiries and building reports. It means a reduction of time, cost and risks that the IT personnel demands for support of system
As a rule, OLAP (to be exact the “star” schema and its derivatives) simplifies reports, data warehouse inquiries, and hence, requires less knowledge from the employees working with system. Despite the fact that end users still face problems in preparation of reports and require help from the experts in the IT Department, it is much easier and faster to prepare the necessary reports based on the warehouse data, rather than on transactional database. Notice that the big role in the increase of efficiency of work for the IT Personnel occurs from reduction of the procedural delays arising at the interaction of end users with the IT Department.
4. Creating a source with previously cleared information
The data warehouse gives the possibility of improvement of information quality without changing the data in the transactional system. Clearing of the data is accomplished at the stage prior of loading the warehouse. Moreover, notice that some installations of warehouses allow possibility of updating of the data in the primary sources based on the corrections, which have been carried out at the stage of loading of the data in warehouse.
5. Simplification of the process of report building based on the information from several OLTP systems and/or external sources of the data used exclusively for BI purposes
For organizations which need to prepare reports on several sources of the data (this is the most common case), it is necessary to do an unloading of data from the source, re-sort, “massage” and “cleanse” the data and only after that build the report on the received dataset without using the warehouse. In some cases, it is an adequate strategy. However if the company has great volumes of information required to be mixed, often if the data received from several transactional systems, and it is necessary for generation of reports, and, if the data need to be “clean”, the data warehouse will be most “correct” solution.
6. Constructing the allocated source (dedicated server) when the OLTP systems do not match up to the frequency of data storage required by the business and/or the possibility of needing to prepare reports for certain moments of time in the past (“as was” reporting).
For accelerating the response to inquiries about the data gets removed after a certain time from the transactional systems. For maintaining performance of inquiries and reporting, the historical and current data can be stored in the allocated warehouse that will provide the necessary productivity, both analytical (OLAP) and transaction systems (OLTP).
Building of reports for a certain moment in time (“as was” reporting) is extremely difficult in some cases or even impossible. For example, if you need to a report on salaries of employees with a certain educational level “1234” on some corporate scale for every month of 2005, but you cannot do it, because the only educational levels stored are for the year 2009. For similar problems to be resolved within a company, it is necessary to create data warehouse, which will help by using slowly changing dimensions (SCD).
7. Protection of the end users from being involved in any degree with the underlying structure and logic of how the DB and OS function
Usage and business analysis systems and all mechanisms of processing and storing data allow the hiding of the data warehouse from the end users. A push to more dense analytical work with the information, from outside management and analysts, means an increase in the degree of efficiency by corporate information activities.
So what about the business purposes?
Some the companies create warehouses for the decision of only one of the above-named problems, others face the full list. However, in no event it is impossible to say that the building of a data warehouse solves only particularly technical problems and does not pursue the business purpose.
While you looking at the list, note that the requirement for data warehouse stems from the limitations imposed by transitional system. In certain conditions, these limitations are not apparent; but they are there; one thing is clear: the warehouse of the data, to some extent, is necessary for each company and its introduction – a matter of time.
Revenons à nos moutons, “Let us get back to our sheep”. The company is seeking the best support of decision-making, getting ahead the competitors, wishing to become closer to the customer and for this purpose decided to quickly duct-tape data warehouse can be very surprised by a negative result. For the achievement of these purposes it is required, that the company has understood, usually by trial and error, how to change running the business for the most effective utilization of date stores, data warehouse and data marts. Moreover, it can appear to be more of a challenge than one would anticipate.
Please find below some of the deepest pitfalls that you might want to be aware of. In addition, see how WhereScape RED can help you avoid such drawbacks.
1. Insufficient User’s Involvement
· IT departments sometimes try to run Data Warehouse/Business Intelligence projects in a vacuum. DW/BI projects are not IT projects! They are rather initiated by and addressed to Business Users.
· Lack of user involvement typically leads to either solutions that are too hard to use or constructs that do not solve any business problems.
The RED Tips
· Get Business users involved as early as possible
· Work with users to help them understand how to formulate the “right” questions. They need to be able to express the precise needs of the business to the IT development team. Create a synergy between IT developers and Business users.
· Show users frequent updates to make sure you are on the path to solving an identified problem and keep the momentum going.
· Develop guidelines and best practice principles based on successful prior iterations.
· Establish a training plan to secure consistency in DW development, tool usage and the organization’s standards.
2. Enterprise Warehouse at Start
· Some companies attempt to begin by building an enterprise warehouse. It is hard to bite, chew and swallow a very risky endeavor. Adding more team-members does not help: Data Warehouse projects do not scale well.
· Enterprise warehouses have slow ROI, cost a lot of money and, therefore, usually end in disaster.
· Painful lessons learned are more difficult to implement.
The RED Tips
· Start by building a Data Mart – the methodology is identical but scope is far more manageable. Data Warehouses need to grow gradually, analysis area by analysis area, building towards the full enterprise Data Warehouse.
· Make use of rapid prototyping and speedy development cycles; they give you swift success by
– solving a real business problem quickly;
– wining a powerful sponsor of the DW/BI project;
– helping secure financial support for future projects;
– gaining experience and outcome at the prior iteration (helps to justify the subsequent ones);
– establishing a single point of the organization’s business rules and DW and OLAP standards;
· Be aware that the initial iteration should take no longer than three months (hopefully much shorter).
3. Poor Data Quality
· The Data Warehouse, perhaps, is the best way to discover (and report) “noisy” data.
· Most “bad” data is easy to spot but some can slither into the Data Warehouse.
· Fixing data in the Data Warehouse does nothing to stop new bad data coming from the source.
The RED Tips
· Identify the resources that contribute the most into the Data Warehouse and verify its accuracy.
· Use the Data Warehouse methodology to drive repairs to source systems to prevent bad input.
· Do not fix data; reject the whole “chunk” or record if invalid data is found.
· Create “unknown” buckets in the Data Warehouse so data issues get visibility.
· Assign invalid data to a zero dimensions key; auto-add dimension entries to ensure consistency.
· Re-process the failed rows once the source issues are resolved.
4. Deficient Funding
· Even when starting with a small Data Mart, money is always an issue.
· Warehousing projects often fail to subsidize all aspects, including hardware and training.
· Data cleanup always takes longer than expected and will dissolve a significant part of the budget.
The RED Tips
· Obtain high-level sponsorship from an area of the business.
· Prepare in advance for adequate hardware and training in addition to the licensing, data and report work.
· Make sure to schedule delivery of reliable information to the decision-makers.
· Validate the funding from non-technical points of view:
– Real business advantage for the organization is pragmatic, iterative Data Warehouse building;
– Lower risk by effective use of time, money and staff;
– Implementation of red-hot latest DW/BI methodology (Kimball, Inmon, Agile development, etc).
5. Corporate Jungle
· Power struggles in the organization may occur over the resources involved in the project or the vision and focus of the warehouse, to name a few.
· Certain people may feel threatened by the outcome of warehousing projects.
· Some folks claim they own certain data and no one else is going to use it.
The RED Tips
· You only have two possibilities here: play more politics or play less politics.
· Have the organization support of a clear vision on how to solve a specific business problem.
· Very vigilantly consider opinions and “pain points” from various business groups; hang on their words.
· Carefully choose your sponsors; this should be the most influential decision-maker that you can recruit.
· Have your business top dog cut through the maze and drive the project forward.
· Provide fast offline query performance capability for ad-hoc analysis by business community.
6. Incorrect or Partial KPIs
· Even simple cubes may not contain valuable information. E.g., a cube that contains sales information is useless when trying to analyze sales reps’ performance.
· Warehouses that are built by IT often include just the existing data and not the KPIs needed.
· A lack of proper KPIs means that the warehouse falls short of solving a business need.
The RED Tips
· Work with users and business patrons to solve specific business problems; one problem at a time.
· Remember that KPIs must be focused on solving an explicit problem such as trends, distributions, and empowerments.
· Adjust the display of KPIs thoroughly to show the organization’s competitive advantage.
· Understand what metrics are used to calculate bonuses for upper management.
· Consider delivering the KPIs through a dashboard or scorecard.
7. Exceedingly Complex Cubes
· Cube complexity is a major obstacle to the success of the Data Warehouse.
· The IT department tends to build extremely complex cubes; OLAP’s with many dimensions are very difficult for end users to understand, much less use.
The RED Tips
· Keep in mind, that there are two very different types of cubes: Analytical and Reporting.
· Use as few dimensions in an Analytical cube as possible; rule of thumb – maximum of six to eight dimensions.
· Take advantage of rich OLAP functionality, provided by the DW tool such as hierarchies, calculated members and so on.
· Build multiple, simple cubes and link them when necessary using the Virtual cube concept. Alternatively, create simplified views of complex cubes.
8. Limited or Unreachable Access to Information
· Many companies can build a Data Warehouse but fail to provide users with a simple way to access the data. For example, many companies use Excel as the standard client tool but, in fact, it is one of the weakest tools available.
· Different users need different ways to interact with the data.
The RED Tips
· Always remember do not just get the data in but also get the information out; have the right tools for the right users to look at the data:
– scorecards, dashboards for decision-makers and management;
– powerful analytical tools for business analysts and SME’s;
– reporting with limited analytical functionality for operation workers;
· Design the Data Warehouse independently from end user tools.
· Expect a range of functionality from the Data Warehouse, which will mean a variety of front-end tools.
Summing up …
· Data Warehouse / Business Intelligence efforts do not succeed because of a variety of reasons, many of which are non-technical.
· What does WhereScape RED put forward for your organization:
– Offers an integrated Data Warehouse development environment.
– Promotes a built in methodology that supports the entire Data Warehouse lifecycle.
– Enables rapid prototype-iterations and deployment of the project
– Supports a low risk, proven, pragmatic approach to Data Warehousing.
– Treats Data Warehousing as a process – not a project.
– Allows for the effective use of time, money and resources.
– Shaves off two to four months per year from your development and support costs.
– Leverages existing, readily available SQL skills in market and collaborates.
– Complies with the Data Warehouse Lifecycle Management principles.