Two Powerful IdeasThe foundations for modern data warehousingContinued from Page 1 Another example of a high-end analytic tool the data warehouse should avoid is the allocation system for assigning costs to the various lines of business in your organization so as to compute overall profitability. Not only can this be a complex processing step outside the capabilities of most query and reporting tools, it's also a political hot potato. Let the finance department do the allocations, and you (the data warehouse) will be glad to store the results. Symmetrical Stars and CubesMost presentation areas today are dominated by relational star schemas and multi-dimensional OLAP data cubes. These data structures have proven over the last 30 years to be the ones end users can understand. Remember that understandability is one of our two nonnegotiable design constraints. The simplicity of the star schemas and OLAP cubes has allowed smart software designers to focus on very powerful algorithms for rapid query performance. Remember that speed is the other nonnegotiable design constraint. The symmetry of both the star schema and OLAP cube also makes for:
Of course, the star schema and OLAP cube are intimately related. Star schemas are most appropriate for very large data sets, with many millions or billions of numerical measurements, or many millions of members in a customer entity or a product entity. OLAP cubes are most appropriate for smaller data sets where analytic tools can perform complex data comparisons and calculations. In almost all OLAP cube environments, it's recommended that you originally source data into a star schema structure, and then use wizards to transform the data into the OLAP cube. In that way, all the complex staging area ETL tools that deal with flat files and entity/relationship schemas can be part of the OLAP data pipeline. And, of course, hybrid star schema-OLAP systems allow very large data sets in star schema format to be smooth drill-down targets from smaller OLAP data cubes, all under a single user interface. The Big PayoffThe final big payoff for building the presentation system in the data warehouse around symmetrical star schemas and OLAP cubes is the predictable set of points of commonality for linking together data from across the enterprise. In my next column, I'll lay bare the techniques for conforming the definitions of the dimensions and the facts across all the disparate data sources of your large enterprise. These conformed dimensions and facts will be the basis for a data warehouse bus architecture a set of standard connection points that provide power to your data warehouse, just like the bus bar in a power station provides power to all the transmission lines, and just like the bus in your computer provides data to all the peripherals. What Have We Accomplished?So far we've implemented two powerful ideas. First we've logically, physically, and administratively separated the systems in our environment into four distinct types. You really do need four different computer systems, but you're responsible for only two of them! Our two data warehouse systems also allow us to separate the incompatible responsibilities of data staging and end-user querying. Second, we've populated our data warehouse presentation area with star schemas and OLAP cubes, the only structures that are understandable, and fast, and can stand up to the ad hoc attack. Although we clearly haven't addressed all the complex design constraints and unavoidable realities, we've chipped away at an impressive part of the overall challenge, just by leveraging these two powerful ideas. Go back to my previous column and look through the list. We've effectively addressed large parts of understandability, query speed, all three types of costs mentioned in that column, the risks of inappropriate centralization, the need for incremental development, handling continuous change consisting of little surprises and big surprises, and how to think about the role of data marts. Maybe there's hope. To be continued.... Ralph Kimball [www.ralphkimball.com] co-invented the Star Workstation at Xerox and founded Red Brick Systems. He has three best-selling data warehousing books in print, including The Data Warehouse Toolkit, Second Edition (Wiley, 2002). He teaches dimensional data warehouse design through Kimball University and critically reviews large data warehouse projects. RESOURCESRelated Article at IntelligentEnterprise.com: "Design Constraints and Unavoidable Realities," Sept. 3, 2002
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|



