Kimball University: Five Alternatives for Better Employee Dimension ModelingThe employee dimension presents one of the trickier challenges in data warehouse modeling. These five approaches ease the complication of designing and maintaining a 'Reports To' hierarchy for ever-changing reporting relationships and organizational structures. By Joy Mundy August 17, 2009
The employee dimension presents one of the trickier challenges in data warehouse modeling. These five approaches ease the complication of designing and maintaining a 'Reports To' hierarchy for ever-changing reporting relationships and organizational structures. Most enterprise data warehouses will eventually include an Employee dimension. This dimension can be richly decorated, including not only name and contact information, but also job-related attributes such as job title, departmental cost codes, hire dates, even salary-related information. One very important attribute of an employee is the identity of the employee's manager. For any manager, we'd like to work down the Reports To hierarchy, finding activity for her direct reports or her entire organization. For any employee, we'd like to work up the hierarchy, identifying his entire management chain. This Reports To hierarchy presents significant design and management challenges to the unwary. This article describes approaches for including this relationship in a dimensional model. The Employee Dimension The basic structure of the Employee dimension is shown in Figure 1. The unique feature of a Reports To hierarchy is that a manager is also an employee, so Employee has a foreign key reference to itself, from Manager Key to Employee Key.
![]() However, if you want to query this table in the relational environment, you'd have to use a CONNECT BY syntax. This is very unattractive and probably unworkable:
Alternative 1: Bridge Table using Surrogate Keys The classic solution to the Reports To or variable-depth hierarchy problem is a bridge table technique described in The Date Warehouse Toolkit (Wiley 2002), p.162-168 and illustrated by Figure 2. The same Employee dimension table as above relates to the fact table through a bridge table.
![]() The Reports To Bridge table contains one row for each pathway from a person to any person below him in the hierarchy, both direct and indirect reports, plus an additional row for his relationship to himself. This structure can be used to report on each person's activity; the activity of their entire organization; or activity down a specified number of levels from the manager. There are several minor disadvantages to this design:
The major challenge comes when we want to manage Employee and the Reports To hierarchy as a Type 2 dimension -- a dimension for which we are tracking history rather than updating in place. This bridge table would still work in theory; the problem is the explosion of Employee and Reports To Bridge records to track the changes. To understand the problem, look back at Figure 1 and think about it as a Type 2 dimension for a medium-sized company with 20,000 employees. Imagine that the CEO -- the top of the hierarchy -- has 10 senior VPs reporting to her. Let's give her a Type 2 change that generates a new row and hence a new Employee Key. Now, how many employees are pointing to her as their manager? It's a brand new row, so of course no existing rows point to it; we need to propagate 10 new Type 2 rows for each of the senior VPs. The change ripples through the entire table. We end up replicating the complete Employee table because of one attribute change in one row. Even aside from the obvious implication of data volume explosion, simply teasing apart the logic of which rows need to be propagated is an ETL nightmare.
|
New on the BLOG
Text Data Quality: Mistakes and More
11.25.2009
Read more from Seth Grimes >>
Data warehouse appliance and software appliance vendors like to claim that they've worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes... 11.24.2009 Read more from Curt Monash >> Google Chrome OS: Don't Link it to Cloud Computing 11.23.2009
Read more from David Linthicum >> Most Popular This Week
Intelligent Enterprise Newsletters
Subscribe Here:
| |||||||||||||||||
|
|






