Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts




June 29, 2001



A Trio of Interesting Snowflakes

Beat three common modeling challenges with extensions of the dimensional model

By Ralph Kimball

"When can I use a snowflake?" is a question data warehouse designers have asked me hundreds of times. I usually answer that it's a bad idea to expose the end users to a physical snowflake design, because it almost always compromises understandability and performance. But in certain situations a snowflake design is not only acceptable, but recommended.

CLASSIC SNOWFLAKE

The way to create a classic snowflake, let us remind ourselves, is to remove low cardinality attributes from a dimension table and place these attributes in a secondary dimension table connected by a snowflake key. In cases where a set of attributes form a multilevel hierarchy, the resulting string of tables looks a little like a snowflake - hence the name.

A classic physical snowflake design may be useful in the backroom staging area as a way to enforce the many-to-one relationships in a dimension table. But in the front room presentation part of your data warehouse, you have to demonstrate to me that the end users find the snowflake easier to understand and, moreover, that queries and reports run faster with the snowflake, before I am comfortable with the snowflake design.

But having issued this warning, I have found three cases where variations on a snowflake are not only acceptable, but are the keys to a successful design.

LARGE CUSTOMER DIMENSIONS

The customer dimension is probably the most challenging dimension in a data warehouse. In a large organization, the customer dimension can be huge, with millions of records, and wide, with dozens of attributes.

To make matters worse, the biggest customer dimensions commonly contain two categories of customers, which I will call "visitor" and "customer."

Visitors are anonymous. You may see them more than once, but you don't know their names or anything else about them. On a Web site, the only knowledge you have about visitors is a cookie indicating they have returned. In a retail operation, a visitor engages in an anonymous transaction.

Customers, conversely, are reliably registered with your company. You know customers' names, addresses, and as much demographic and historical data as you care to elicit directly from them or purchase from third parties.

Let us assume that at the most granular level of your data collection, 80 percent of the fact table measurements involve visitors and 20 percent involve customers. You accumulate just two simple behavior scores for visitors consisting only of recency (when they last visited you) and frequency (how many times they have visited).

On the other hand, let us assume you have 50 attributes and measures for a customer, covering all the components of location, payment behavior, credit behavior, directly elicited demographic attributes, and purchased demographic attributes.

Now you combine visitors and customers into a single logical dimension called shopper. You give the visitor or customer a single, permanent shopper ID, but make the key to the table a surrogate key so that you can track changes to the shopper over time. Logically, the shopper dimension has the following attributes.

The attributes for both visitors and customers are:

  • - Shopper surrogate key
  • - Shopper ID (fixed ID for each physical shopper)
  • - Recency
  • - Frequency.

Attributes for customers only are:

  • - Five name attributes
  • - 10 location attributes
  • - 10 behavior attributes
  • - 25 demographic attributes.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address