The Soul of the Data Warehouse, Part 3: Handling TimeThe data warehouse takes a pledge to preserve historyThe three most fundamental maneuvers in every data warehouse are drilling down, drilling across, and handling time. I discussed the first two maneuvers in the previous two installments of this Fundamentals series. (See Resources.) The third, handling time, makes good on a pledge that every data warehouse provider implicitly takes: The data warehouse shall preserve history. In practice, this pledge generates three main requirements for the data warehouse: First, every piece of data in the data warehouse must have a clearly understood time validity. In other words, when did the data become valid, and when did it cease to be valid? Second, if the detailed description of a data warehouse entity has changed over time, you must correctly associate each version of that entity with the contemporary versions of other measurements and entities in the data warehouse. In other words, if a customer made a purchase a year ago, the description of the customer attached to that purchase must be correct for that time frame. Last, the data warehouse must support the natural ways people have of viewing data over time. These natural ways include seeing instantaneous events, regular periodic reports, and latest status. Dimensional modeling provides a convenient framework for dealing with each of these requirements. Remember that dimensional models are organized around measurements. Measurements, which are usually numeric, occupy fact tables in a dimensional model. The contexts of the measurements are in dimension tables, which surround the fact tables and connect to them through a series of simple relationships between foreign keys and primary keys. Time ValidityA measurement is usually a physical act that takes place at a specific time, so it's natural, even irresistible, to attach a time stamp to each fact table record. Every fact table has a time dimension. Time stamps are commonly recorded at a daily grain because many legacy systems don't record time of day when posting a measurement. In a dimensional schema, the daily time stamp consists of a surrogate (vanilla integer) foreign key in the fact table joined to a corresponding primary key in the daily time dimension table. You want the time stamp in the fact table to be a surrogate key rather than a real date for three reasons: First, the rare time stamp that is inapplicable, corrupted, or hasn't happened yet needs a value that cannot be a real date. Second, most end-user calendar navigation constraints, such as fiscal periods, end-of-periods, holidays, day numbers, and week numbers aren't supported by database time stamps. Therefore, they need to come from a table with a verbose time dimension, rather than computed in the requesting query tool. Third, integer time keys take up much less disk space than full dates. When the source system provides a detailed time stamp for the measurement down to the minute or the second, the time of day needs to be a separate dimension. Otherwise, a combined day and time-of-day dimension would be impractically large. In multinational applications, there are often two time-stamp perspectives: the remote party's and the home office's. Certainly, when recording the time of day, it usually makes sense to include two pairs of time stamps (day and time-of-day as well as remote and local) rather than leaving it up to the query tool to work out time zones. Mainly because of daylight savings time rules, the calculation of time zone differences is horrendously complicated. (For more on this topic, see my book, Data Webhouse Toolkit.) Correct AssociationDimensional modeling assumes that dimensions are largely independent. This assumption, combined with the fact that time is its own dimension, implies that other entities, such as customer and product, are independent of time. In the real world, this inference isn't quite true. Entities such as customer and product slowly change over time, usually in episodic, unpredictable ways. When the data warehouse encounters a legitimate revised description of, for example, a customer, there are three fundamental choices for handling this slowly changing dimension:
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|




