Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts




March 20, 2003

The Soul of the Data Warehouse, Part One: Drilling Down

Drilling down just means "show me more detail"

by Ralph Kimball

Continued from Page 1

2. The data warehouse must therefore support drilling down at the user interface level, at all times, with the most atomic data possible because the most atomic data is the most dimensional. The most atomic data is the most expressive; more dimensions are attached to atomic data than to any form of aggregated or rolled-up data.

3. Combining the first two points means that for all practical purposes, the atomic data must be in the same schema format as any aggregated form of the data: The atomic data must be a smoothly accessible target for drill-down paths using standard ad hoc query tools. Failure in this area is the showstopper for a weird architecture you'll occasionally hear about, one in which atomic data is hidden in the back room in an entity-relation physical format, and somehow is accessed after "drilling through" aggregated data marts in dimensional formats. The proponents of this architecture have never explained how this magic occurs and you could conclude they've never implemented a system that used commercial query tools to drill down from aggregated data to atomic data. Fortunately, this crisis evaporates if you use the same data structures at all levels of aggregation including the atomic level. See the next point.

4. To build a practical system for drilling down, you want standard ad hoc query tools to present the drill-down choices without special schema-dependent programming, and you want these tools to emit the correct resulting SQL without schema-dependent programming. Schema-dependent programming is the kiss of death for a data warehouse shop, because it means that each schema requires custom-built applications. This problem was a crisis on the 1980s, and there's no excuse for it to remain a problem now. Avoiding schema-dependent programming means choosing a standard schema methodology for all end-user-facing data sets. I call these end-user-facing data sets the presentation layer of the data warehouse.

5. Only one standard schema methodology exists that's capable of expressing data in a single, uniform format that looks the same at the atomic layers as in all aggregated layers, and at the same time requires no schema-dependent programming: the star schema, otherwise known as the dimensional model. Star schemas support all forms of drilling down described in this column. All possible many-to-one and many-to-many data relationships are capable of representation in a star schema, thus the star schema is the ideal platform for ad hoc querying.

6. The star schema design in the presentation layer smoothly supports prebuilt aggregation tables and snowflake designs. Aggregations and snowflakes are related. An aggregated fact table is a mechanically derived table of summary records. The most common reason to build aggregated fact tables is that they offer immense performance advantages compared to using the large, atomic fact tables. But you get this performance boost only when the user asks for an aggregated result! The first example query asking for the manufacturer sales of products was a good example of an aggregated result.

A modern data warehouse environment uses a query-rewrite facility called an aggregate navigator to choose a prebuilt aggregate table whenever possible. Each time the end user asks for a new drill-down path, the aggregate navigator decides in real time which aggregate fact table will support the query most efficiently. Whenever the user asks for a sufficiently precise and unexpected drill down, the aggregate navigator gracefully defaults to the atomic data layer.

Drilling down is probably the most basic capability that a data warehouse needs to support. Drilling down most directly addresses the natural end-user need to see more detail in an interesting result. In the next Fundamentals column, I'll describe drilling across to new data sources, a companion technique to drilling down within the same data source. I'll reveal how it's possible to combine these techniques, drilling down and across at the same time.



Rate This Article

Comments:

Optional e-mail address:

Ralph Kimball (founder of the Ralph Kimball Group) co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Warehouse Toolkit, Second Edition (Wiley, 2002). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. You can reach him through his Web site, www.ralphkimball.com.








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address