Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts




March 1, 2003

Declaring the Grain

It's the most important dimensional design step after identifying data sources

by Ralph Kimball

In debugging literally thousands of dimensional designs from my students over the years, I have found that the most frequent design error by far is not declaring the grain of the fact table at the beginning of the design process. If the grain isn't clearly defined, the whole design rests on quicksand. Discussions about candidate dimensions go around in circles, and rogue facts that introduce application errors sneak into the design.

Declaring the grain means saying exactly what a fact table record represents. Remember that a fact table record captures a measurement. Example declarations of the grain include:

  • An individual line item on a customer's retail sales ticket as measured by a scanner device
  • An individual transaction against an insurance policy
  • A line item on a bill received from a doctor
  • A boarding pass used by someone on an airplane flight
  • An inventory measurement taken every week for every product in every store.

In Business Terms

Notice that most of these grain declarations are expressed in business terms. Perhaps you were expecting the grain to be a traditional declaration of the fact table's primary key. Although the grain ultimately is equivalent to the primary key, it's a mistake to list a set of dimensions and then assume that this list is the grain declaration. This is the most common mistake in my students' designs. In a properly executed dimensional design, the grain is first anchored to a clear business object (no pun intended) and a set of business rules. Then, the dimensions that implement that grain become obvious.

So, when you make a grain declaration, you can then have a very precise discussion of which dimensions are possible and which are not. For example, a line item of a doctor's bill likely would have the following dimensions:

  • Date (of treatment)
  • Doctor (may be called "provider")
  • Patient
  • Procedure
  • Primary Diagnosis
  • Location (presumably the doctor's office)
  • Billing Organization (an organization the doctor belongs to)
  • Responsible Party (either the patient or the patient's legal guardian)
  • Primary Payer (often an insurance plan)
  • Secondary Payer (maybe the responsible party's spouse's insurance plan).

And quite possibly others.

Powerful Effects

If you've been following this example, I hope you've noticed some powerful effects from declaring the grain. First, you can visualize the dimensionality of the doctor bill line item very precisely, and you can therefore confidently examine your data sources, deciding whether or not a dimension can be attached to this data. For example, you probably would exclude "treatment outcome" from this example because most medical billing data doesn't tie to any notion of outcome.

But a general entity/relation-oriented "data model" of doctor visits might well include treatment outcome. After all, in an abstract sense, doesn't every treatment have an outcome?






IE Weekly Newsletter
Subscribe to the newsletter
    Email Address