Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts


November 16, 1999, Volume 2 Number 16

The hot response cache is a supercharged aggregate navigator

Working in Web Time


Ralph Kimball

Web-created demands are drawing the data warehouse increasingly closer to the front line of operational reporting and operational response generation, forcing us to rethink our data warehouse architecture. Ten years ago we considered the data warehouse a kind of background resource for management, who queried it in a non-urgent, contemplative mode. But today's dramatically increased pace of business decision-making requires not only a comprehensive snapshot of the business in real time, but simultaneously, answers to broad questions about customer behavior.

With this data warehouse evolution, we have managed to make three big technical design factors more difficult, all at once:

Timeliness. Business results must now be available in real time. “As of the previous day” reporting, on the wish list two years ago, is no longer a sufficient pace. Increasingly more-efficient delivery pipelines with smaller, just-in-time inventories, along with mass customization, force us to quickly understand and respond to demand.

Data Volumes. The big move to mass customization means we now capture, analyze, and respond to every transaction in the business including every gesture a customer makes, both before and after operational or sales transactions and there seems to be no volume limit. For instance, the combined Microsoft-related Web sites, analyzed daily as a single entity, on some busy days have captured more than a billion page events!

Response Times. The Web makes fast response times critical. If something useful doesn't happen within 10 seconds, I'm on to another page. Those of us who run big data warehouses know that many queries will take more than 10 seconds. But our pleas to the users to be understanding of performance issues are falling on deaf ears.

As these design factors have become more difficult, we find ourselves supporting a broader continuum of users and requests. With the increased operational focus of the data warehouse, and the increased ability of many people worldwide to present themselves at our Web site doorstep, we must provide data warehouse services to a widely varying mix of external customers, business partners, and suppliers as well as internal sales people, employees, analysts, and executives. We must deliver a mixture of query results, top line reports, data mining results, status updates, support answers, custom greetings, images, and downloadable OLAP cubes. Most of these things aren't nice rows from an answer set. They are messy, complex objects.

To address these issues, we need to adjust our data warehouse architecture. We can't just make our single database server increasingly powerful. We can't make it deliver all these complicated objects and hope to keep up with these escalating requirements.

FIGURE 1 The data webhouse architecture, showing the role of the hot response cache. The data webhouse application server runs batch jobs to create the contents of the cache.



One way to take pressure off the main database engines is to build a powerful hot response cache (see Figure 1, page 30) that anticipates as many of the predictable and repeated information requests as possible. The hot response cache adjoins the application servers that feed the public Web server and the private firewall entry point for employees. A series of batch jobs running in the main webhouse application server creates the cache's data. Once stored in the hot response cache, the data objects can be fetched on demand through either a public Web server application or a private firewall application.

The fetched items are complex file objects, not low-level data elements. The hot response cache is therefore a file server, not a database. Its file storage hierarchy will inevitably be a simple kind of lookup structure, but it does not need to support a complex query access method.

Security is the requesting application server's responsibility, not the cache's. The application servers should be the only entities able to access the hot response cache directly, and they make their security decisions based on centrally administered, named roles. I described this security architecture in detail in “Remove Security from Your Database Tables,” in the October 5 issue.

The hot response cache is more than the “operational data store” (ODS) that we built in the early part of the 1990s. The ODS was built most often when legacy operational systems were incapable of responsively reporting status on individual accounts. The hot response cache not only provides this original ODS function, but also provides:
•  Custom greetings to Web visitors, consisting of both text and graphics
•  Cross-selling and up-selling propositions to Web visitors perhaps based on data-mining applications looking for other cohort members of the Web visitor's demographic cluster or behavior cluster
•  Dynamically chosen promotion content to Web visitors
•  XML-based, structured-form content to business partners (what we used to call EDI) requesting delivery status, order status, hours' supply in inventory (we used to measure days' supply, which is becoming obsolete), and critical-path warnings in the delivery pipeline
•  Low-level FAQ-like answers to problems and support requests
•  Midline reports, to customers and business partners in the delivery pipeline, needing a moderate amount of integration across time (such as last 10 orders or returns) or across business function (manufacturing, inventory, and so on)
•  Top-line reports to management, needing significant integration across time (multi-year trends), customers, product lines, or geographies all delivered in three interchangeable formats including page-oriented report, pivot table, and graph, and frequently accompanied by images
•  Downloadable precomputed OLAP cubes for exploratory analysis
•  Data-mining studies, both near-term and long-term, showing the evolution of customer demographic and behavior clusters, and the effects of decisions about promotion content and Web site content on business done through the Web
•  Conventional aggregations that enhance query performance when drilling up through standard hierarchies in the major dimensions such as customer, product, and time.

The hot response cache's management must help it support the application servers' needs. Ideally, a batch job will have computed and stored in advance the information object that the application server needs. All applications need to be aware that the hot response cache exists and should be able to probe it to see if the answer they want is already there. The hot response cache has two distinct modes of use; the nature of the visitor session requesting the data determines which one to use.

The guaranteed response time request must produce some kind of answer in response to a page request that the Web server is handling, usually in less than a second. If the requested object (such as a custom greeting, a custom cross-selling proposition, an immediate report, or an answer to a question) has not been precomputed and hence is not stored, a default response object must be delivered in its place, all within the guaranteed response time.

The accelerated response time request hopes to produce a response to the Web visitor's request but will default to computing the response directly from the underlying data warehouse if the precomputed object is not found immediately. The application server should optionally be able to warn the user that there may be a delay in providing the response in this case. The Web server needs to be able to alert the application server if it detects that the user has gone on to another page, so the application server can halt the data warehouse process.

Note that this strategy of seeking a precomputed answer and defaulting if necessary to the base data is exactly the way conventional aggregates have always worked in the data warehouse. The data warehouse aggregate navigator has always searched for aggregates to answer portions of an overall report query. If the navigator finds the aggregate, it uses it. But if it doesn't find the aggregate, it gracefully defaults to computing the answer slowly from the base data. Viewed this way, the hot response cache is a kind of supercharged aggregate navigator.

Any time you design something for the Web, especially if it's used in conjunction with the public Web server, you must pay special attention to scaling and explosive surges in demand. The hot response cache is, by its nature, an I/O engine, not a computing engine. It is, after all, a file server. The scalability bottleneck for the hot response cache, therefore, is not computing power, but I/O bandwidth. In periods of peak demand, the hot response cache must provide a flood of large file objects to the requesting application servers.

Building a hot response cache is not a panacea. It introduces another server in what is already a complex architecture. The hot response cache implies administrative support and a particular disciplined application development style. But it is still worth it. The hot response cache takes enormous pressure off of the database management systems and the application systems when they are faced with the timeliness, data volume, and response time requirements so typical of the Web.

The design of the hot response cache and other components of the overall data webhouse are described in more detail in Ralph's new book, The Data Webhouse Toolkit, to be released by Wiley Computer Books in January, 2000.



Ralph Kimball, Ph.D., co-invented the Star Workstation at Xerox and founder of Red Brick Systems, works as an independent consultant designing large data warehouses. He is the author of The Data Warehouse Toolkit (Wiley, 1996) and the newly published The Data Warehouse Lifecycle Toolkit (Wiley, 1998). You can reach him through his Web page at www.ralphkimball.com.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address