Kimball University: Five Alternatives for Better Employee Dimension Modeling > > Intelligent Enterprise: Better Insight for Business Decisions

Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
RSS
Webcasts
Digital Library
Subscribe
Home


  • EMAIL
  • PRINT
  • REPRINTS
  • Follow Us on Twitter
  • FOLLOW US
  • Share

Kimball University: Five Alternatives for Better Employee Dimension Modeling


The employee dimension presents one of the trickier challenges in data warehouse modeling. These five approaches ease the complication of designing and maintaining a 'Reports To' hierarchy for ever-changing reporting relationships and organizational structures.


By Joy Mundy
August 17, 2009

The employee dimension presents one of the trickier challenges in data warehouse modeling. These five approaches ease the complication of designing and maintaining a 'Reports To' hierarchy for ever-changing reporting relationships and organizational structures.

Most enterprise data warehouses will eventually include an Employee dimension. This dimension can be richly decorated, including not only name and contact information, but also job-related attributes such as job title, departmental cost codes, hire dates, even salary-related information. One very important attribute of an employee is the identity of the employee's manager. For any manager, we'd like to work down the Reports To hierarchy, finding activity for her direct reports or her entire organization. For any employee, we'd like to work up the hierarchy, identifying his entire management chain. This Reports To hierarchy presents significant design and management challenges to the unwary. This article describes approaches for including this relationship in a dimensional model. The Employee Dimension

The basic structure of the Employee dimension is shown in Figure 1. The unique feature of a Reports To hierarchy is that a manager is also an employee, so Employee has a foreign key reference to itself, from Manager Key to Employee Key.

Someone new to dimensional modeling might leave the table as it is currently designed as the Manager/Employee relationship is fully described. Assuming you can populate the table, this design will work if an OLAP environment is used to query the data. Popular OLAP tools contain a Parent-Child hierarchy structure that works smoothly and elegantly against a variable-depth hierarchy modeled as shown here. This is one of the strengths of an OLAP tool.

However, if you want to query this table in the relational environment, you'd have to use a CONNECT BY syntax. This is very unattractive and probably unworkable:

  • Not every SQL engine supports CONNECT BY.
  • Even SQL engines that support CONNECT BY may not support a GROUP BY in the same query.
  • Not every ad hoc query tool supports CONNECT BY.

Alternative 1: Bridge Table using Surrogate Keys

The classic solution to the Reports To or variable-depth hierarchy problem is a bridge table technique described in The Date Warehouse Toolkit (Wiley 2002), p.162-168 and illustrated by Figure 2. The same Employee dimension table as above relates to the fact table through a bridge table.


The Reports To Bridge table contains one row for each pathway from a person to any person below him in the hierarchy, both direct and indirect reports, plus an additional row for his relationship to himself. This structure can be used to report on each person's activity; the activity of their entire organization; or activity down a specified number of levels from the manager.

There are several minor disadvantages to this design:

  • The bridge table is somewhat challenging to build.
  • The bridge table has many rows in it, so query performance can suffer.
  • The user experience is somewhat complicated for ad hoc use, though we've seen many analysts use it effectively.
  • In order to drill up -- to aggregate information up rather than down a management chain -- the join paths have to be reversed.

The major challenge comes when we want to manage Employee and the Reports To hierarchy as a Type 2 dimension -- a dimension for which we are tracking history rather than updating in place. This bridge table would still work in theory; the problem is the explosion of Employee and Reports To Bridge records to track the changes.

To understand the problem, look back at Figure 1 and think about it as a Type 2 dimension for a medium-sized company with 20,000 employees. Imagine that the CEO -- the top of the hierarchy -- has 10 senior VPs reporting to her. Let's give her a Type 2 change that generates a new row and hence a new Employee Key. Now, how many employees are pointing to her as their manager? It's a brand new row, so of course no existing rows point to it; we need to propagate 10 new Type 2 rows for each of the senior VPs. The change ripples through the entire table. We end up replicating the complete Employee table because of one attribute change in one row. Even aside from the obvious implication of data volume explosion, simply teasing apart the logic of which rows need to be propagated is an ETL nightmare.


  • EMAIL
  • PRINT
  • REPRINTS
  • Follow Us on Twitter
  • FOLLOW US
  • Share


 





New on the BLOG
Text Data Quality: Mistakes and More
11.25.2009
blog author
Seth Grimes
I wrote recently on Text Data Quality, looking at issues that affect analytical accuracy, that "the basic text data quality issue is that humans make mistakes, and the challenge is that people's natural-language mistakes defy easy, automated detection." This topic and related non-erroneous vagaries of human language bear further exploration...

Read more from Seth Grimes >>

Curt Monash
Reports of Perfectly-Balanced Hardware Configurations are Greatly Exaggerated
Data warehouse appliance and software appliance vendors like to claim that they've worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes...

11.24.2009
Read more from Curt Monash >>

Google Chrome OS: Don't Link it to Cloud Computing
11.23.2009
blog author
David Linthicum
With much fanfare, the Google Chrome OS launched last week. Chrome OS is a Web operating system that boots quickly, right into a browser... We've been here before... And I would rather not bind Chrome to cloud computing because I don't think the OS will be around long.

Read more from David Linthicum >>



Intelligent Enterprise Newsletters
Subscribe Here:
*Email:
 First Name:
 Last Name:
  Intelligent Enterprise Blogosphere Newsletter:
  Intelligent Enterprise Newsletter:

Email Type: