Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts




August 12, 2002

What Changed?

Use a multivalued outrigger table to add expressive power to your dimensions

by Joe Caserta, edited by Ralph Kimball

The dimensional data warehouse is a specially architected database that primarily consists of two types of tables: dimensions and facts. But crafting these tables is just the beginning of designing a complete data warehouse solution. Naturally, only the simplest warehouse solutions could be accomplished using just these primitive table types. Business requirements consistently force data warehouse designers to embellish on these basic table types to develop creative and effective business solutions.

For example, as I'll describe in this column, there's a powerful technique you can add to any "slowly changing" dimension (SCD) table — extending it well beyond its typical capabilities. Remember that SCDs greatly boost the basic dimensional model. SCDs help achieve the primary purpose of the data warehouse: to enable users to understand how changes over time affect the business. We as data warehouse designers need to accurately capture change and allow the business to perform analysis on these changes to make informed decisions.

Matter of Fact

The SCD comes in a few different flavors. The most interesting is commonly referred to as the Type 2 SCD. The Type 2 SCD records changes of the values of dimensional entity attributes over time. The technique requires adding a new row to the dimension each time there's a change in the value of an attribute (or group of attributes) and assigning a unique surrogate key to the new row.

The Type 2 SCD works in conjunction with associated fact tables. The fact table must hold the surrogate key that matches the appropriate surrogate key in the dimension table, thereby correctly reflecting the precise characteristics of the dimension at the time the fact occurred. But this powerful solution, which associates exactly the right dimensional context to each fact table record, will not divulge all the answers required of the data warehouse. What happens when questions are asked of the dimension independently, without an associated fact record?

Difficult questions include: "When did this customer's marital status change?" "When was the first sale to this customer with a changed marital status?" "Which other attributes typically change when a customer's marital status changes?" and, "What group of attributes usually change together?"

An ordinary Type 2 SCD doesn't easily support these types of questions, largely because of the following two issues:

  • Even when the new row in the dimension correctly tracks a change, it's impossible to examine the row in isolation to determine exactly what changed, especially if more than one attribute has been altered.
  • Comparing one dimension row with another to determine what changed involves performing analysis across rows; something at which SQL is notoriously bad and that requires highly developed SQL techniques.

Nonetheless, the data warehouse must be able to provide answers to these types of questions regardless of associated facts. Furthermore, it must reveal these answers without requiring the users to have a Ph.D. in SQL.

Follow the Policy

A solution to this problem is to adapt a technique Ralph Kimball developed for handling multivalued dimensions. (See the DBMS magazine column "Help for Dimensional Modeling"). Our solution, shown in Figure 1, includes a typical Type 2 customer SCD, an SCD group bridge (or helper) table, and an SCD column outrigger table.

Figure 1 A typcial Type 2 SCD, an SCD group bridge table, and an SCD column outrigger table.

For this example, let's assume we need to capture customer changes and we've established an SCD policy to capture the history of attribute changes in the Last_Name, Marital_Status, City, and State fields. (Remember, most well-designed customer dimensions contain at least 50 attributes.) This policy means that when the values in these four columns change between the last load and the current, a new row will be added to the customer dimension. Changes in other attributes will trigger only a simple Type 1 (overwrite) change.

Step By Step

You start by creating the customer SCD column table and loading it with the list of the four columns defined in our SCD policy, giving each a unique key. This table functions as metadata for ETL developers, detailing exactly which columns need to track changes. This table can also be published to inform users about exactly which columns are in the SCD policy to partition history. Keeping track of the columns in the SCD policy is too often difficult for users and data warehouse managers alike, because more often than not this information can be found only in the ETL code.






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address