Best Practices for Building a Warehouse Quickly

October 29, 2009

View Raphael Klebanov's profile on LinkedIn

This presentation has been first presented on the TDWI Florida Chapter meeting, October 16, 2009

To view it please click on the link below

Best Practices for Building a Warehouse Quickly

I have received a thank you note from TDWI Florida Chapter officers for this  presentation

“We wanted to thank you for taking the time to speak at our local chapter meeting.  Your presentation on “Best Practices for Building a Warehouse Quickly”  was very informative and contained a lot of valuable information for our attendees. Your recommendations on delivering results quickly to the business and what to avoid while deploying a warehouse solution were very insightful, and sparked good follow-up conversations with you during our networking breaks.

We will be posting your presentation on our chapter web page for our members to enjoy. Thank you,

Michael Jay Freer Chapter President / Illya Singh Chapter Vice-President /  Maylee Sanchez Chapter Secretary”


Seven base reasons why organizations build data warehouses

May 10, 2009

In this short blog, I am bringing to your attention the list of principal causes, which lead companies to implement data warehouses. The reason why I wanted to put these reasons in one blurb is that too much material about data warehouses is dedicated to advantages of using warehouses indirectly. However, the warehouses’ existence, in fact, is the direct consequence of the resolving business problems. Look through for marketing materials available in the Web. You will see that folks are using data warehouses for important yet simple reasons like “to be closer to the customer”, “to transform of the raw data into business intelligence information, “to make decision-making, based on facts, instead of on intuition”, and, possibly, most often mentioned rationale, “to get an edge over the competitors”. Actually, in 99% of projects the data warehouse itself is only one of the steps on a way to achieve the declared goals.

In reality, the main causes to persuading organizations to introduce a data warehouse are:

1. Need to perform analytical inquiries and reports generation utilizing computing resources that are not yet taken by the core information systems

The majority of companies desire that process scheduling to be setup in such way that the probability of the transactions to be completed within a practical time is reasonably high. Reports and inquiries can demand much more processing resources including, among other things, disk and memory. Therefore, running reports and inquiries on the systems that are occupied with transactional processing, severely reduces likelihood of timely completion of such reports and queries. This, in turn, threatens the completion of the business operations.
In other words, performance of analytical inquiries and generation of reports on the servers occupied with transactional systems creates a big problem of processing of transactions in a comprehensible time. The companies conclude that the least expensive and/or organizationally the most simple and fast way of maintaining a high speed of work on the basic systems consists in the introduction of the data warehouse on a separate server with its own disk and memory.

2. The necessity of implementing data models and technologies that accelerate process and increase performance of inquiries and reporting, but not those intended for processing of transactions

There are some ways of designing the structure of the data, which usually accelerates performance of inquiries such as “star” schemas and derivatives of that. However, on the other hand, these structures are not suitable for transactional systems due to reduced speed of processing transactions. There is also a number of technologies which are good at accelerating performance of inquiries and but are not tailored for OLTP (for example, bit indexes) and, on the contrary, applicable only in OLTP (restoration of transactions).

3. Creating environments in which even rudimentary knowledge of RDBMS is enough for creation of inquiries and building reports. It means a reduction of time, cost and risks that the IT personnel demands for support of system

As a rule, OLAP (to be exact the “star” schema and its derivatives) simplifies reports, data warehouse inquiries, and hence, requires less knowledge from the employees working with system. Despite the fact that end users still face problems in preparation of reports and require help from the experts in the IT Department, it is much easier and faster to prepare the necessary reports based on the warehouse data, rather than on transactional database. Notice that the big role in the increase of efficiency of work for the IT Personnel occurs from reduction of the procedural delays arising at the interaction of end users with the IT Department.

4. Creating a source with previously cleared information

The data warehouse gives the possibility of improvement of information quality without changing the data in the transactional system. Clearing of the data is accomplished at the stage prior of loading the warehouse. Moreover, notice that some installations of warehouses allow possibility of updating of the data in the primary sources based on the corrections, which have been carried out at the stage of loading of the data in warehouse.

5. Simplification of the process of report building based on the information from several OLTP systems and/or external sources of the data used exclusively for BI purposes

For organizations which need to prepare reports on several sources of the data (this is the most common case), it is necessary to do an unloading of data from the source, re-sort, “massage” and “cleanse” the data and only after that build the report on the received dataset without using the warehouse. In some cases, it is an adequate strategy. However if the company has great volumes of information required to be mixed, often if the data received from several transactional systems, and it is necessary for generation of reports, and, if the data need to be “clean”, the data warehouse will be most “correct” solution.

6. Constructing the allocated source (dedicated server) when the OLTP systems do not match up to the frequency of data storage required by the business and/or the possibility of needing to prepare reports for certain moments of time in the past (“as was” reporting).

For accelerating the response to inquiries about the data gets removed after a certain time from the transactional systems. For maintaining performance of inquiries and reporting, the historical and current data can be stored in the allocated warehouse that will provide the necessary productivity, both analytical (OLAP) and transaction systems (OLTP).
Building of reports for a certain moment in time (“as was” reporting) is extremely difficult in some cases or even impossible. For example, if you need to a report on salaries of employees with a certain educational level “1234” on some corporate scale for every month of 2005, but you cannot do it, because the only educational levels stored are for the year 2009. For similar problems to be resolved within a company, it is necessary to create data warehouse, which will help by using slowly changing dimensions (SCD).

7. Protection of the end users from being involved in any degree with the underlying structure and logic of how the DB and OS function

Usage and business analysis systems and all mechanisms of processing and storing data allow the hiding of the data warehouse from the end users. A push to more dense analytical work with the information, from outside management and analysts, means an increase in the degree of efficiency by corporate information activities.
So what about the business purposes?

Some the companies create warehouses for the decision of only one of the above-named problems, others face the full list. However, in no event it is impossible to say that the building of a data warehouse solves only particularly technical problems and does not pursue the business purpose.
While you looking at the list, note that the requirement for data warehouse stems from the limitations imposed by transitional system. In certain conditions, these limitations are not apparent; but they are there; one thing is clear: the warehouse of the data, to some extent, is necessary for each company and its introduction – a matter of time.
Revenons à nos moutons, “Let us get back to our sheep”. The company is seeking the best support of decision-making, getting ahead the competitors, wishing to become closer to the customer and for this purpose decided to quickly duct-tape data warehouse can be very surprised by a negative result. For the achievement of these purposes it is required, that the company has understood, usually by trial and error, how to change running the business for the most effective utilization of date stores, data warehouse and data marts. Moreover, it can appear to be more of a challenge than one would anticipate.


A Different Approach to a Data Warehousing Effort in a Shaky Economy Times

February 23, 2009

With the economy decelerating all over, the needs for business intelligence swell; decision-making becomes crucial.  It is not about making mega-bucks any longer but rather about survival of the entire business.

 

Nobody is questioning the vital necessity for a successful business to constantly monitor a company’s performance, its profitability, its best and worst customers, its best and worst products and its overall health  especially now, when resources are so stretched. The driving force behind such business intelligence functions is a healthy data warehouse.

Nowadays the question is this: “How to achieve the same – and more for less?” How to build such a brawny data warehouse with a constant shortage of money?

 

Most of the organizations still start with a data warehousing project, using the ol’ traditional waterfall approach. It worked before but does not cut it any longer …

Groundwork for such a project includes taught questions with no good answers:

•          Business users are asked what data they need. But the business generally doesn’t know what it wants. “I’ll tell you what I want when I see it” is their familiar mantra.

•          Then business users are asked what reports they use and what is missing in those reports. But reports are “moving targets” they change swiftly, especially when decisions are about the life and death of the company.

•          Where to find right people? Well, with a continual shortage of the skilled ETL developers, data- modelers and business analysts – good luck. A decent IT chap costs a fortune.

•          How long will the data warehousing project take? Organizations can’t afford the common 6 to 9 months, possibly, longer with painful approvals and rework.

•          How do we keep documentation up to date? What is the risk of such lengthy warehouse project? How to find an enthusiastic sponsor of the business intelligence effort?

•          And many more questions….   Oh stress, the spice of life!

 

That is the alternative? Not to life it is – to stress!

A different approach that presents a true unique business value for a stressed out business that includes:

•       Built-in methodology that supports the entire data warehouse lifecycle an Iterative approach.  The Ability to stay focused on the goals.

•       Spiral-like rapid “prototyp–>iterate–>deploy” venture. Quick response to changing business needs. The only questions asked: how to make the right business decisions right and what are the measures of the truth?

•       Integrated, metadata-driven data warehouse development environment. All that the developer needs is on his/her fingertips, Auto-generation of procedural code and documentation.

•       A low risk, proven, pragmatic approach to data warehousing. An instrument to get business users and stakeholders involved earlier.

•       Effective use of time & money; quicker ROI.  A process for trimming off 2-4 months a year from the development and support costs; reducing time to deliver.

•       The data warehousing as a process – not a project.  Well-timed accommodation to changing business needs.

•       Leveraging existing, readily available SQL skills in market. Theory and best practices are in the tool, not in developer’s head.


Eight Reasons why Data Warehouse and, Subsequently, Business Intelligence Efforts Fail

January 31, 2009

 Folks,

Please find below some of the deepest pitfalls that you might want to be aware of. In addition, see how WhereScape RED can help you avoid such drawbacks.

1.     Insufficient User’s Involvement

·         IT departments sometimes try to run Data Warehouse/Business Intelligence projects in a vacuum. DW/BI projects are not IT projects! They are rather initiated by and addressed to Business Users.

·         Lack of user involvement typically leads to either solutions that are too hard to use or constructs that do not solve any business problems.

The RED Tips

·         Get Business users involved as early as possible

·         Work with users to help them understand how to formulate the “right” questions. They need to be able to express the precise needs of the business to the IT development team. Create a synergy between IT developers and Business users.

·         Show users frequent updates to make sure you are on the path to solving an identified problem and keep the momentum going.

·         Develop guidelines and best practice principles based on successful prior iterations.

·         Establish a training plan to secure consistency in DW development, tool usage and the organization’s standards.

2.     Enterprise Warehouse at Start

·         Some companies attempt to begin by building an enterprise warehouse.  It is hard to bite, chew and swallow a very risky endeavor. Adding more team-members does not help: Data Warehouse projects do not scale well.

·         Enterprise warehouses have slow ROI, cost a lot of money and, therefore, usually end in disaster.

·         Painful lessons learned are more difficult to implement.

The RED Tips

·         Start by building a Data Mart – the methodology is identical but scope is far more manageable. Data Warehouses need to grow gradually, analysis area by analysis area, building towards the full enterprise Data Warehouse. 

·         Make use of rapid prototyping and speedy development cycles; they give you swift success by

-   solving a real business problem quickly;

-   wining a powerful sponsor of the DW/BI project;

-   helping secure financial support for future projects;

-   gaining experience and outcome at the prior iteration (helps to justify the subsequent ones);

-   establishing a single point of the organization’s business rules and DW and OLAP standards;

·         Be aware that the initial iteration should take no longer than three months (hopefully much shorter).

3.     Poor Data Quality

·         The Data Warehouse, perhaps, is the best way to discover (and report) “noisy” data.

·         Most “bad” data is easy to spot but some can slither into the Data Warehouse.

·         Fixing data in the Data Warehouse does nothing to stop new bad data coming from the source.

The RED Tips

·         Identify the resources that contribute the most into the Data Warehouse and verify its accuracy.

·         Use the Data Warehouse methodology to drive repairs to source systems to prevent bad input.

·         Do not fix data; reject the whole “chunk” or record if invalid data is found.

·         Create “unknown” buckets in the Data Warehouse so data issues get visibility.

·         Assign invalid data to a zero dimensions key; auto-add dimension entries to ensure consistency.

·         Re-process the failed rows once the source issues are resolved.

4.     Deficient Funding

·         Even when starting with a small Data Mart, money is always an issue.

·         Warehousing projects often fail to subsidize all aspects, including hardware and training.

·         Data cleanup always takes longer than expected and will dissolve a significant part of the budget.

The RED Tips

·         Obtain high-level sponsorship from an area of the business.

·         Prepare in advance for adequate hardware and training in addition to the licensing, data and report work.

·         Make sure to schedule delivery of reliable information to the decision-makers.

·         Validate the funding from non-technical points of view:

-   Real  business advantage for the organization is pragmatic, iterative Data Warehouse building;

-   Lower risk  by effective use of time, money and staff;

-   Implementation of red-hot latest DW/BI methodology (Kimball, Inmon, Agile development, etc).

5.     Corporate Jungle

·         Power struggles in the organization may occur over the resources involved in the project or the vision and focus of the warehouse, to name a few.

·         Certain people may feel threatened by the outcome of warehousing projects.

·         Some folks claim they own certain data and no one else is going to use it.

The RED Tips

·         You only have two possibilities here: play more politics or play less politics.

·         Have the organization support of a clear vision on how to solve a specific business problem.

·         Very vigilantly consider opinions and “pain points” from various business groups; hang on their words.

·         Carefully choose your sponsors; this should be the most influential decision-maker that you can recruit.

·         Have your business top dog cut through the maze and drive the project forward.

·         Provide fast offline query performance capability for ad-hoc analysis by business community.

6.     Incorrect or Partial KPIs

·         Even simple cubes may not contain valuable information. E.g., a cube that contains sales information is useless when trying to analyze sales reps’ performance.

·         Warehouses that are built by IT often include just the existing data and not the KPIs needed.

·         A lack of proper KPIs means that the warehouse falls short of solving a business need.

The RED Tips

·         Work with users and business patrons to solve specific business problems; one problem at a time.

·         Remember that KPIs must be focused on solving an explicit problem such as trends, distributions, and empowerments.

·         Adjust the display of KPIs thoroughly to show the organization’s competitive advantage.

·         Understand what metrics are used to calculate bonuses for upper management.

·         Consider delivering the KPIs through a dashboard or scorecard.

7.     Exceedingly Complex Cubes

·         Cube complexity is a major obstacle to the success of the Data Warehouse.

·         The IT department tends to build extremely complex cubes; OLAP’s with many dimensions are very difficult for end users to understand, much less use.

The RED Tips

·         Keep in mind, that there are two very different types of cubes: Analytical and Reporting.

·         Use as few dimensions in an Analytical cube as possible; rule of thumb – maximum of six to eight dimensions.

·         Take advantage of rich OLAP functionality, provided by the DW tool such as hierarchies, calculated members and so on.

·         Build multiple, simple cubes and link them when necessary using the Virtual cube concept. Alternatively, create simplified views of complex cubes.

8.     Limited or Unreachable Access to Information

·         Many companies can build a Data Warehouse but fail to provide users with a simple way to access the data. For example, many companies use Excel as the standard client tool but, in fact, it is one of the weakest tools available.

·         Different users need different ways to interact with the data.

The RED Tips

·         Always remember do not just get the data in but also get the information out; have the right tools for the right users to look at the data:

-   scorecards, dashboards for decision-makers and management;

-   powerful analytical tools for business analysts and SME’s;

-   reporting  with limited analytical functionality for operation workers;

·         Design the Data Warehouse independently from end user tools.

·         Expect a range of functionality from the Data Warehouse, which will mean a variety of front-end tools.

Summing up …

·         Data Warehouse / Business Intelligence efforts do not succeed because of a variety of reasons, many of which are non-technical.

·         What does WhereScape RED put forward for your organization:

-   Offers an integrated Data Warehouse development environment.

-   Promotes a built in methodology that supports the entire Data Warehouse lifecycle.

-   Enables rapid prototype-iterations and deployment of the project

-   Supports a low risk, proven, pragmatic approach to Data Warehousing.

-   Treats Data Warehousing as a process – not a project.

-   Allows for the effective use of time, money and resources.

-   Shaves off two to four months per year from your development and support costs.

-   Leverages existing, readily available SQL skills in market and collaborates.

-   Complies with the Data Warehouse Lifecycle Management principles.


Here are the Endorsements from my Previous Jobs.

December 1, 2008

Please review:

Technical Partner Manager / Tech. Analyst

WhereScape USA, Inc (CURRENT JOB)

 

“Raphael worked for me as a pre-sales and post sales data warehouse solutions architect. He has a unique blend of both skills, providing pre-sales demo support to prospects but also being able to provide highly technical data warehouse implementations with customers. I highly recommend him for pre-sales or post-sales data warehousing / business intelligence positions.” February 16, 2009

Richard Smith, VP Sales, WhereScape USA Inc.
managed Raphael at WhereScape USA Inc.

 

“I worked with Raphael at CIBER and have continued to work with him as a client since he moved to Wherescape. Raphael is an intelligent, sharp individual who is able to bring ideas to a level that helps the business. Raphael is easy to work with and is knowledgeable about data warehouse structure and how it can be used to grow or manage business needs. I would work with Raphael again in a minute.” January 2, 2009

Mark Johnson, dba/tech analyst, ciber
worked directly with Raphael at WhereScape USA, Inc

 

“Raphael is one of the best consultants I’ve worked with. He is articulate, empathetic, highly motivated and extremely technical. He has an effective manner working with people and is the kind of high-caliber person I would put on any project for any client we support. He is a decent human being and I’m glad to know him.” January 3, 2008

Top qualities: Personable, Expert, High Integrity

Thomas Fisher
hired Raphael as a IT Consultant in 2005, and hired Raphael more than once

 

“Raphael is detail oriented and was involved for the full cycle of Oracle database development at Lipper. His attention to detail and ability to write clear documents was very helpful to our project. He was able to contribute immediately and asked many detailed questions, which really contributed to the client experience of our team. I recommend Raphael to anyone seeking a detail oriented Oracle/ETL database developer.” January 3, 2008

John Adams, Computer Consultant, Lipper
managed Raphael indirectly at WhereScape USA, Inc

 

 

Clients / Customers

WhereScape USA

 

“…We had another very productive week working with Mr. Klebanov of WhereScape building Byram’s Data warehouse. During the previous sessions of our consulting engagement with Raphael we were able to build, the I phase of the Byram data warehouse. During our current week’s session, we have gained momentum with our DW development activity and have resolved the following items…

 Raphael managed to accomplish all goals that were outlined as part of the kick-off meeting on Monday.  The amount of work done and quality of consulting were completely satisfactory.” November 21, 2008

Sumeet Nagrani, DW/BI Manager, Byram Healthcare, White Plains, NY

Worked directly with Raphael as a WhereScape consultant

 

“…Thank you so much for recommend Mr. Klebanov … His knowledge and expertise has helped our BI program tremendously…

Along with his technical expertise, Mr. Klebanov has good personal skills that allow him to work well with my staff, thus making the engagement much more productive.  …

I look forward in working with your firm and Mr. Klebanov in the future as our paths crossed.” January 16, 2009

Tho V. Dao, IT Manager, City of Shoreline, WA

Worked directly with Raphael as a WhereScape consultant

 

 

 

 

IT Consultant

WhereScape USA

 

“To Whom it may Concern: Raphael and I worked together for approximately 1 year… in the time I’ve know him he is one of the most dedicated and loyal employees that you would ever want to meet. He has the heart of a champion and the will to succeed at everything he does. He is probably one of the most knowledgeable that I’ve ever met in the data warehousing world. He really understands the methodologies and follows procedure to a T. I recommend Raphael Klebanov for any position he feels qualified for. Sincerely, Waid Essick Software Sales” November 5, 2008

Waid Essick, ADR, CDC Software
worked directly with Raphael at WhereScape USA

 

“Raphael is an experienced Data Warehouse technologist, and a very hard worker with a great attitude. He’s always willing to expend the “extra effort” to insure that clients get value from his services. He’s a great communicator, particularly notable since his primary language has been Russian. This was never an issue in dealing with clients, in fact, most found his knowledge to be top-notch, and his communication skills to be great. Would recommend him anytime…..” August 25, 2008

Tim Clark, Account Manager, WhereScape Software USA
worked directly with Raphael at WhereScape USA

 

 

Sr. Programmer Analyst / Bus. Associate

INVESCO / AIM Investments

 

“Raphael Klebanov has been appointed to work in the Houston Data Services group for the last 6+ months. His function includes working on various aspects of enterprise Siebel Analytics Project. Raphael’s previous experience with range of database applications, his well-rounded set of technical skills, especially Informatica, made him a valuable reinforcement to the existing group.

 During the work on the project, Raphael has reveled strong professional skills both technical and interpersonal. He volunteered to take on additional responsibility as project progressed. The scope of the development required extensive learning of new tools and techniques that Raphael achieved with ease.

He has demonstrated strong ability and discipline to support the group’s effort remotely from Denver. However when it was necessary he was reporting to Houston office without the delay. As an individual, Raphael is honest, energetic and enjoys helping people. He has excellent communication skills, is liked by his management, and peers.

I take this opportunity to recommend Raphael Klebanov’ candidature and express my interest in supporting it further, if you desire so.” September 15, 2005

Mindy W. Bredthauer, Sr. Data Services Manager, AIM Management Group Houston

managed Raphael at AIM Management Group


Raphael possesses excellent technical and interpersonal skills necessary to administer assigned projects as well as to analyze, design, and prepare programs and systems.

Raphael demonstrates strong ability in diagnosing and solving information system problems while maintaining professionalism and courtesy in addition to his ability to manage projects, establish priorities, meet deadlines, and concentrate on detailed information in a fast-paced, demanding work environment.

During his five plus years of services with INVESCO/AIM Investments Raphael has proven himself a hard-working, self-motivated individual with strong work ethics and a positive attitude.

I would highly recommend him to any company seeking these qualities in an individual. June 22, 2005

Chris Marlowe, Data Services Manager. INVESCO / AIM Investments

managed Raphael at INVESCO and AIM Management Group

 

 

 

 


Dimension Types: What are the different dimension types and how they are implemented in WhereScape RED?

August 22, 2008

RED operates with dimension types that are recognized by modern data warehouse theory. When a developer starts working on dimension tables he gets to chose from four options for the default generation of the dimension table and its update procedure via the following dialog box: 

 

A Normal Dimension is where a dimension record is updated and changed whenever any of the non-business key information changes.   This is commonly called a type 1 dimension and is used when historical values of the change are not kept. 

Overwriting the old values.  A Normal dimension type is the most common dimension type. RED auto-generates a stored procedure to handle update and insert paths. Consider this simple example. The “Product” table contains a product named, “Thingamabob” with Product ID being the primary key. In the year 2007, the price of Thingamabob was $150 and over the time, the product’s price changes from $150 to $350. 

In the year 2007, if the price of the product changes to $250, then the old values of the columns “Year” and “Product Price” have to be updated and replaced with the new values. In this Normal Type, there is no way to find out the old value of the product “Thingamabob” in year 2006 since the table now contains only the new price and year information. 

The logic that WhereScape RED applies to the stored procedure handling the Normal dimension type works as follows:

- The stored procedure gets all unknown records from the dim table. If an unknown dim record does not exist, then the procedure allows an explicit value to be inserted into IDENTITY field

- The procedure loops through all the input records, attempting to update the dimension records based on the business key value. If the update attempt fails, then the procedure performs an insert. 

A Slowly Changing dimension is where new dimension records are created when certain identified columns in the dimension change.   This is commonly called a type 2 dimension and tracks point in time historical changes. 

Creating another added record. In this type, the old values will not be updated but a new row containing the new values will be inserted in the product table. Consequently, at any point in time, the difference between the old values and new values can be retrieved and easily compared. This can be very useful for reporting purposes. 

The problem with the above mentioned data structure is the “Product ID” column cannot store duplicate values for “Thingamabob” since “Product ID” is the business /primary key. Also, the current data structure doesn’t clearly specify the effective date and expiration date of the product when the change to its price happened. So, it would be better to change the current data structure to overcome the above primary key violation. 

 

In the changed Product table’s Data structure, “Product ID” and “Effective Datetime” are now a composite business / primary key. Therefore, there will be no violation of the primary key constraint. Addition of the new columns, “Effective Datetime” and “Expiration Datetime” provides the information about the Thingamabob’s Effective date and Expiration date adding more clarity and noticeably improving the scope of this table. The slowly changing dimension (SCD), type 2 approach, may need added space in the database, since for every changed record; an extra row has to be stored. Since dimensions are usually not that big in the real world, extra space is negligible. 

RED performs the following additional tasks in the stored procedures to handle Slowly Changing Dimensions (SCD), type 2 dimensions.

- Three new special variables are added: insert indicator, version indicator and version number.

- Four extra columns are added to support slowly changing dimensions: current_flag; version; start_date; end_date.

- RED automatically generates an alter table SQL script, based on type. It also adds a unique, business key index to support the slowly changing columns.

- In the update stored procedure loop, if the record is not found, the procedure inserts a new record and sets the insert flag on. Otherwise, the procedure checks to see if we have an existing dimension row and have just had a field change for data that is not part of the slowly changing data that we are interested in and performs an update. If there is an existing dimension row and it verifies that any of slowly-changing fields have changed. The procedure then updates the version indicator and version number, and inserts a new row. 

A Previous Values dimension commonly called a type 3 dimension, allows the storing of the last value of selected fields in secondary columns.

Creating new fields. In this dimension type, the latest update to the changed values can be seen. The example shown below illustrates how to add new columns and keep track of the changes. From the example, we can see the current price and the previous price of the product, “Thingamabob”. 

The problem with the Previous Values approach is over time if the product price has multiple changes; the complete history may not be stored, only the latest change will be stored. For instance, in year 2008, if the Thingamabob’s price changes to $350, then we would not be able to see the complete history of 2007 prices, since the old values would have been updated with 2008 product information. 

- RED guides you through the process of adding secondary columns. 

A Date Ranged dimension supports source systems that provide date ranged information. It rolls over time based on specified columns and date range and the history is kept. Otherwise, it looks just like the Slowly Changing dimension type. The difference in that the date range dimension is valid for a given range or set of dates on specified columns. 

The following example might be appropriate here.  Thingamabobs are on sale for the date range 1/1/08 – 1/31/08 at $150 and on sale for the date range 2/1/08 – 2/29/08 for $200.  So the dimensional records would be for 1/1/08…1/31/08, 2/1/08…2/29/08 and 3/1/08 to present each range would be represented by a date range dimension key and the product on sale for that dimension key could be reported on.   

- RED will guide you through the process of identifying which column(s) will be date ranged columns, i.e. “from” datetime column followed by the “until” datetime column. E.g. such “from” column could be “Effective Datetime” and “until” column – Expiration Datetime  

Note: RED is using slightly different naming convention.

The above table will look like this:

(*)          = (Business Key)

(**)       = (Primary Artificial Key)

(+)          = (Slowly Changing Business Key) 

A Get Dimension Key Stored Procedure 

The ‘Get Key’ stored procedure is auto-generated by RED to return an artificial key when a business key is supplied. The normal syntax is to call the function (usually from update_stage_% stored procedure) passing the business key and be returned a status indicating the result of the lookup.  

For example:

EXEC   @v_return_status = get_dim_product_key 

      @v_stage_product     

     ,’N’ — Auto add flag     

     ,’N’ — Write error trail entry if not found flag     

     ,@p_sequence

     ,@p_job_name

     ,@p_task_name

     ,@p_job_id

     ,@p_task_id     

     ,@v_stage_product_key OUTPUT     

     ,@v_getkey_status OUTPUT

On a successful lookup, the artificial key for the dimension record is also returned. If the lookup fails because the business key is not found in the dimension then an appropriate status is returned and the artificial key is set to 0. 

The main logic as follows:

- Lookup the key: If dss_end_date is NULL, then we are just looking up the current record. If not null then we are trying to match to a specific slowly changing record, so look through all of the records until finding the appropriate one.

- The procedure then loops through all the old keys looking for a match on the date. If key is found then the procedure sets the key and the flag.

- If no match found then the procedure gets the current version.

- If a dimension key is not found, then the procedure will add it if we have the auto_add flag set.

- If requested (auto add log message), the procedure will log a message about the auto add and log a message about the lookup failure.


Impact Analysis: What can fizzle while applying changes on the database and how to sustain a healthy, evolving data warehouse?

June 10, 2008

Every action in the information database and warehouse in particular has a consequence. Database developers, business analysts and DBAs  feel agonizing pain when an organization makes changes to their data warehouse applications without having complete insight into the true impact and cost that a modification will have.

So, what is an impact analysis?  It is an identification of the consequences of change on an object to its related objects. Impact analysis, in case of a data warehouse, consists of:

• understanding of the processes performed within a data warehouse as a whole and its components;
• comprehension of interdependencies between processes and objects;
• assessing the effect the change brings to the data warehouse structure;
• developing recovery procedures in case of  a failed modification.

WhereScape RED provides a variety of needed mechanisms to identify a data warehouse’s weak spot when it comes to a change.

In the development process we have put our best effort into creating a perfect addition to our data warehouse. Now it’s time to deploy our changes to a higher environment. But how can we do so without crashing things that we had developed previously? How do we insure that the new modification will not destroy the integrity of the data warehouse instead of improving it? With WhereScape RED it’s easy.

Here are a few recommendations on how to avoid oopses and ouches:

1. Structural Impact Analysis.  i.e.  what objects have been changed.
Click “Tools” >  Click “Search for string…” > Fill out dialog box for string (e.g. either enter the procedure name being modified or the column being changed) > examine “Object search results” screen for all objects that contain this string.

For the structural-impact analysis you can use the Browser screen for “DataWarehouse” connection.
Click “Invoke SQL Admin” > execute something like this (SQL Server):

select distinct(a.name) from sys.procedures a inner join sys.syscomments b on a.object_id=b.id
where charindex(‘stored procedure or column’,b.text)<>0;
– Check load where clause too
select distinct(a.lt_table_name)from ws_load_tab a
where charindex(‘stored procedure or column’,a.lt_where_clause)<>0;

2. Evaluation of the Impact. i.e. what possible impact the data warehouse’s alteration renders on various data warehouse objects. WhereScape uses an array of pre-defined reports. Click the “Reports” tab from the main Browser menu. The list of available reports expands from release to release. There are about two dozen reports that are directly related to Impact Analysis. Some of them are:
•  “Object modification dates (without indexes)” with date created and date modified, same for “Index modification dates”;
• “Modified procedures” i.e. procedures that have been modified since creation;
• “Table objects that have no associated procedure or script” and “Procedures that are not related to a table”;
• “All objects that have been Refreshed or Imported”;
• “Differences between current and selected repository…”;
• “Query data warehouse objects… where custom validations scripts can be run”;
• “Track back on specified column usage…” and many others.

3. Impact during the Promotion from one environment to the other. e.g.  from DEV to TEST,  when managing such deployments using the “Build Application tables…” built-in utility. While the developer plunges through the wizard-like dialog boxes, WhereScape RED continuously warns him/her of the possible implications of this activity. E.g. if the particular object is to be overwritten, warning message pops up to alert the developer on the possible impact on the existing object.

4. Organizational Documentation Update. The considerable problem for a conventional data warehousing project is that database changes within data warehouse are not reflected in documentation: nobody has time to update them. WhereScape RED automatically generates a new documentation set after each iteration of data warehousing development.

5. Versioning of Metadata Objects. Several kinds of version management are utilized in WhereScape RED. For example, choosing the “Automated Version Creation” option from the Tools menu will thwart a developer from losing valuable DDL scripts, procedural code and other metadata and database information.

6. Standardization and Best Practice Usage.    WhereScape RED uses a state-of-the-art methodology in building warehousing prototypes. Following the developer’s guidelines delineated by the WhereScape RED methodology, wizards and the automatic generation of database objects, in combination with an enterprise standard “look and feel” dramatically decreases the chances of glitches in the data warehouse processing.


Follow

Get every new post delivered to your Inbox.