What Not to DoDimensional modeling mistakes to guard againstIn nearly all the 85 columns I have written for Intelligent Enterprise and its predecessor, DBMS, I described design techniques and criteria needed to build some part of a data warehouse. But despite all those columns, something is missing. The tone of the columns has almost always been imperative: "In situation A, use design techniques X, Y, and Z." I realize that data warehouse designers also need boundaries. So this column is devoted to design techniques not to use. I have restricted myself to dimensional modeling design techniques; a future column will take a broader view of the entire data warehouse life cycle. The following list of 12 dimensional modeling techniques to avoid runs in reverse order of importance. But even the first few mistakes I list can be enough to seriously compromise your data warehouse. This column is necessarily a broad-brush treatment. If you want more detail, please search the free Intelligent Enterprise/DBMS column archive on my Web site www.ralphkimball.com or at IntelligentEnterprise.com, look in my books, or email me and I'll send you my monthly design tips newsletter. Mistake 12: Place text attributes in a fact table if you mean to use them as the basis of constraining and grouping. Creating a dimensional model is a kind of triage. Start by identifying the numeric measurements delivered from an operational source. Those go in the fact table. Then identify the descriptive textual attributes from the context of the measurements. These go in the dimensions. Finally, make a case-by-case decision about the leftover codes and pseudonumeric items, placing them in the fact table if they are more like measurements, and in the dimension table if they are more like physical descriptions of something. But don't lose your nerve and leave true text in the fact table - especially comment fields! Get these text attributes off the main runway of your data warehouse and into dimension tables. Mistake 11: Limit the use of verbose descriptive attributes in dimensions to save space. You might think that you're being a good, conservative designer by keeping the size of your dimensions "under control." But in virtually every data warehouse, the dimension tables are geometrically smaller than the fact tables. So what if you have a 100MB product dimension table, if the fact table is 100 times as large! To design an easy-to-use data warehouse, you must supply as much verbose descriptive context in each dimension as you can. Make sure code is augmented with readable descriptive text. Better yet, remove the codes entirely. Remember that the textual attributes in the dimensions "implement" the user interface to browsing your data, provide the entry points for constraining, and supply the content for your row and column headers in the final reports. Mistake 10: Split hierarchies and hierarchy levels into multiple dimensions. A hierarchy is a cascaded series of many-to-one relationships. Many products roll up to a single brand. Many brands roll up to a single category, and so on. If your dimension is expressed at the lowest level of granularity (such as Product) then all the higher levels of the hierarchy can be expressed as unique values in the product record. Users understand hierarchies, and your job is to present them in the most natural and efficient way. A hierarchy belongs in a single, physical flat dimension table. Resist the urge to "snowflake" a hierarchy by generating a set of progressively smaller subdimension tables. Don't confuse back-room data cleaning with front-room data presenting! And finally, if you have more than one hierarchy existing simultaneously, in most cases it makes sense to include all the hierarchies in the same dimension, if the dimension has been defined at the lowest possible grain. Mistake 9: Delay dealing with a slowly changing dimension (SCD). Too many data warehouses are designed to regularly overwrite the most important dimensions, such as Customer and Product, from the underlying data sources. This goes against a basic data warehouse oath: The data warehouse will represent history accurately, even if the underlying data source does not. SCDs are an essential design element of every data warehouse and are discussed extensively in the references I mentioned previously. A good SCD design also needs to avoid the next mistake. Mistake 8: Use smart keys to join a dimension table to a fact table. Beginning data warehouse designers tend to be somewhat too literal-minded when designing the primary keys in dimension tables that must necessarily connect to the foreign keys of the fact table. It is counterproductive to declare a whole suite of dimension attributes as the dimension table key and then use them all as the basis of the physical join to the fact table. All sorts of ugly problems eventually arise. Replace the smart physical key with a simple integer surrogate key that is numbered sequentially from 1 to N (the number of records in the dimension table). Mistake 7: Add dimensions to a fact table before declaring its grain. All dimensional designs should start with the numeric measurements and work outward. First, identify the source of the measurements. Second, specify the exact granularity and meaning of the measurements. Third, surround these measurements with dimensions that are true to that grain. Staying true to the grain is a crucial step in the design of a dimensional data model. If you follow this rule, you will understand the next nontechnique.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|




