The New DealA model-driven approach to BI and data warehousing puts analytical power back in step with business requirements. By Neil Raden March 20, 2004
Data warehousing arrived 20 years ago to solve a set of problems that, to a large extent, no longer exist. Computing resources were once scarce and prohibitively expensive; dozens of homegrown systems scattered data everywhere and carried their own sets of semantics, business rules, and data management. Businesses simply could not get the information in a timely manner to do reporting, analysis, and planning. Today, computing resources are practically boundless; though not inexpensive, the cost of servers, networks, and storage as a proportion of total project expenses has dropped dramatically. Data is more organized, in fewer systems, and getting better all the time as a result of general perceptions that it has value beyond the application programs that produce it and that good data quality and architecture are enterprise requirements. Plus, organizations are benefiting from what the 1990s brought us: not only universal connectivity (the Internet) but also universal information access (the Web). It stands to reason that, given these fundamental changes in the problem space, the solution set would change as well. But it hasn't. Data warehousing remains stubbornly focused on data and databases instead of information processes, business models, and closed-loop solutions. Methodologies and best practices for data warehousing have barely budged. Our approach to building data warehouses and business intelligence (BI) environments around them is out of step with the reality of today's information technology. The Truth HurtsA good example is the notion of a "single version of the truth." Many use this to justify the long, expensive construction of enterprise data warehouses that require person-years of modeling just to get started and extensive development to put into production and many never get that far. The data warehouse never was a good repository of truth; it is merely a source for downstream applications, which add value to the data through application of formulas, aggregations, and other business rules. In addition, the functional reach of the source systems (such as ERP, customer relationship, and supply chain management) is now so broad that, unlike the operational systems of 20 years ago, much of the "truth" is already contained in and reported from them. The rigidity of data warehouses endows them with calamitous potential. Current best practices in enterprise systems development call for formal approaches, such as the Zachman Framework. Because the architectures are designed first and then based on the data modeler's grasp of requirements gathered at the start of the process rendered into relational data models, organizations burn a version of reality into their data warehouse circuitry right from the beginning. Unfortunately, no one knows ahead of time what the data warehouse requirements really will be. The gathered "requirements" should be suspect from the beginning. And because relational database data models lack fuller expression, boiling down requirements into one takes us one step further away from a real working business model. A data model cannot capture the subtleties and dynamics of the business. Such designs have the fluidity of poured concrete: that is, only fluid until set. Later, as real requirements begin to emerge, the only choices are to jackhammer designs and start over (rarely happens) or to add more layers of design (data marts, operational data stores, cubes, and so forth), gradually building a structure that is not only expensive and brittle, but also difficult to maintain. Current methodologies stress the need for iterations: an indication that participants agree that it's not possible to specify a data warehouse all at once. Never, however, is it made clear what's supposed to happen with the previous version of the data model. Or, for that matter, the data models: a layered design like the Corporate Information Factory has no less than seven schema types, and potentially dozens of schemas in the single warehouse. Organizations direct extract, transform, and load (ETL) development against the physical schemas. Most BI tools interact directly with the physical schemas. Does it make sense to have each of the iterations invalidate previous efforts?
|
New on the BLOG
Is Gartner's Quadrant the Problem, Or Is It How It's Used?
02. 8.2010
Read more from Cindi Howson >>
Add "customer" to Jimi Hendrix' song title and you have a question central to last week's Clarabridge Customer Connections (C3) conference, Are You Customer Experienced? 02. 5.2010 Read more from Seth Grimes >> Quick Thoughts on Sybase/Aleri 02. 4.2010
Read more from Curt Monash >> Most Popular This Week
Intelligent Enterprise Newsletters
Subscribe Here:
| |||||||||||||||||
|
|




