What Changed?Use a multivalued outrigger table to add expressive power to your dimensionsby Joe Caserta, edited by Ralph Kimball Continued from Page 1 The next step is to create the customer SCD group table a two-column bridge table. Should your SCD policy be as simple as this one considering only four columns you could make the SCD group table prepopulate, loading the table with every possible combination of changed columns. Doing so would cause an initial load of 16 rows (24), eliminating the need for inserts during the incremental process. However, customer SCD policies can involve 20 columns or more. In these cases, the prepopulation method would generate 1,048,576 rows (220), with most combinations never being used. In this case, an incremental load approach is preferred. While incrementally loading, each time a new record is added to the customer dimension, the ETL process checks, by customer, to see which columns in the SCD policy have changed values. It then creates a list consisting of the keys for the respective columns, with an associated unique identifier for the group of columns the SCD group key. If a specific group of columns has already changed, the SCD group key is reused. The SCD group key that the ETL process generates is then included in the record being loaded into the customer dimension. The SCD group table now serves as a many-to-many table between the customer dimension and the SCD column outrigger as shown in Figure 2. Using this process, users can now find all occurrences of customer marital status changes, for example, by simply joining the three tables and constraining on the desired SCD column name. Similarly, users can find which attributes changed with the marital status by selecting the entire group of column names associated with the transaction (in this case the customer's last name). This correlation of attribute changes is beneficial because it provides new insight into your customers' behavior patterns. Understanding your customers' behavior is an integral part of your CRM solution. Think of the value the marketing team realizes when it can correlate customer demographic changes as they occur. It Gets BetterIn Figure 2, you'll notice that the SCD group key in the first record is 0. During the initial
load of the SCD, the first occurrence of each customer is given this group key so users can easily
identify when the customer's profile first entered the system. This alleviates users of the need to
write complex queries consisting of Figure 2: The SCD group table serves as a many-to-many table
between the customer dimension and the SCD column outrigger. You'll also notice the customer dimension includes three transaction fields that support the SCD
process. The Trans_Effective_Date is the date the attribute group change took effect in the
transaction system. The Trans_End_Date of the last previously loaded record is updated to be exactly
the same as the Trans_Effective_Date of the next record being inserted for the customer. For the
last transaction of each customer, the Trans_End_Date is set to 12/31/9999 (the latest date allowed
by most DBMSs). Users can now use the The Last_Trans_Flag for the new record being inserted is set to Y and the previous records for the customer are set to N. This flag allows users to find the current version of the customer very quickly. You can create views (or roles) of the customer dimension to further enhance the functionality of the data warehouse. By creating a view of the customer dimension with a constraint on the SCD_Group_Key equal to 0, users can drill instantly to the starting profile of the customer. A view of the customer dimension with a constraint on the Last_Trans_Flag equal to Y would allow users to drill to the customer's current profile. Both of these profiles are meant to be revealed while the customer dimension is associated with any fact, at any point in time, by simply constraining the customer dimension to the role via the customer's natural key. As you can see, answering any of the questions asked at the beginning of this column can now be accomplished much more easily. Obviously, this technique is not limited to customers. Think of its application to products, employees, vendors, or service policies. Guest columnist Joe Caserta [joe@casertaconcepts.com] is the founder of Caserta Concepts LLC, a data warehouse architecture and implementation consulting firm.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|




