Managing Your ParentsBe mindful of reporting needs when designing parent and child fact tablesBy Ralph Kimball The parent-child data relationship is one of the fundamental structures in the business world. An invoice (the parent), for instance, comprises many line items (the children). Other examples include orders, bills of lading, insurance policies, and retail sales tickets. Basically, any business document with an embedded repeating group qualifies as a parent-child application, especially when the embedded line items contain interesting numerical measurements such as dollars or physical units. Parent-child applications are extremely important to data warehousing because most of the basic control documents that transfer money and goods (or services) from place to place take the parent-child form. But a parent-child source of data presents a classic design dilemma. Some of the data is available at only the parent level and some at only the child level. Do you need two fact tables in your dimensional model, or can you do with just one? And what do you do with the data that is available at only the parent level when you want to drill down to the child level? Imagine a typical product sales invoice. Each line item on the invoice represents a different product sold to the customer. The parent-level data includes:
The child-level data includes:
With the dimensions and facts spelled out, you might think you are done. You have two nice fact tables. The parent invoice fact table has five dimensions and five facts, and the child, line-item fact table has seven dimensions and three facts. But this design is a failure. You can't roll up your business by product! If you constrain by a specific product, you don't know what to do with invoice-level discounts, freight charges, and tax. All of the business's higher-level rollups are forced to omit the product dimension. In most businesses, this omission is unacceptable. There is only one way to fix this problem. You have to take the invoice-level data and allocate down to the line-item level. Yes, this allocation is somewhat controversial and yes, you must make some arbitrary decisions in the process. But the alternative is not being able to analyze your business in terms of your products. Replace the two fact tables with a single fact table whose grain is the invoice line item. In other words, you will consistently drop to the most atomic child level when you create a parent-child dimensional design. When you design a fact table around a specific type of measurement, you "decorate" the measurement with everything you know to be true at that time. The measurements in this case are found in the context of individual line items. Everything at the line-item and invoice levels is true at the time of the measurement. So your single line-item grain fact table has the following dimensions:
What do you do with the invoice number? It is certainly single-valued, even at the line-item level, but you have already "exposed" everything you know about the invoice in your first six dimensions. You should keep the invoice number in the design. But you don't need to make a dimension out of it because that dimension would turn out to be empty. We call this characteristic result a "degenerate dimension."
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|




