Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts




June 17, 2003

Real Time: Get Real

Take the idea of a real-time data warehouse with a grain of salt, then realize the possibilities

by Neil Raden
edited by Ralph Kimball

Is the next new data warehouse type "real time"? That question may already be answered, as many real-time data warehouses (RTDWs) are being built today, and some are already in production.

RTDWs present some challenges in architecture and design, but early adopters are finding benefits for business processes such as yield management, fraud detection, dynamic pricing, replenishment, profiling, business activity monitoring, and business process management. Surely, many more applications will be thought up.

The demand for real-time applications will likely grow for several reasons. New ideas will be pushed out from vendors to customers. The proliferation of rules engines, for example, will create pressure to implement more automated business processes (meaning no humans involved), and this automation is fertile ground for RTDWs. When those processes require answers to analytical questions, there won't be time for a human to manipulate a query in an online analytic processing (OLAP) tool. The application itself will communicate with a query generator. The RTDW concept will also inspire new business processes as organizations become familiar with it. And, as they gain currency, RTDWs will put pressure on practitioners like us to vastly reduce latency and query response time. What's the point of having data that's only minutes old if it takes five minutes to resolve a query or an hour to build a data mart or cube?

Defining Real Time

Definitions of RTDW vary, but most refer to the latency in data refresh cycles. I consider an RTDW one that's updated more frequently than your current extract, transform, load (ETL) processes can support. Most likely, it will require redesigning your ETL process and perhaps your schema as well. If your business intelligence (BI) environment depends on data marts and extracts, that will surely require redesign. In most cases, an RTDW requires more than one refresh cycle per day.

Some consider a data warehouse that provides very fast query response times (say, less than five seconds) to be real time. Although that use of the term "real time" isn't technically correct, fast query responses might be a necessary component of a real-time BI environment even if the refresh cycle is daily or even less frequent. Consider an online application for a credit card from a lending institution. Suppose an interactive application fires off a query to the data warehouse to develop a credit worthiness score based on 20, 30, or even 50 attributes. If the data warehouse can return the results of this query reliably within a few seconds, the definition of RTDW might stretch to include it. In Ralph Kimball's book The Data Webhouse Toolkit: Building the Web-Enabled Data Warehouse (Wiley, 2000), he defines a "hot response cache" filled by the data warehouse in offline batch mode that anticipates a number of likely queries (typically from Web visitors) and has the responses queued up for rapid delivery.

Querying up-to-the-instant data from an operational system is normally best left to the operational system. The data is already there. But in the same way that people need data warehouses for integrated analytic data, real-time requests for operational data can't always be satisfied with data from a single system. Enterprise application integration (EAI) tools coupled with generic messaging architectures opened the doors a few years ago. However, most of these implementations were application-oriented, so implementation required a fair amount of development. And EAI tools weren't intended to integrate information (they were designed to enable OLTP systems to interact); therefore, they include no notion of analytic queries, metadata, or any of the other useful constructs of a data warehouse.

Some vendors are moving beyond EAI into enterprise information integration (EII), which provides real-time query capability across multiple operational systems. Neither EAI nor EII alone is adequate for a BI environment because both approaches lack the historical context and rich reference data of a good data warehouse. But data warehouse practitioners should "listen" to this flood of EAI message-grams and construct a proper historical context.

Who Needs It?

It's easy to speculate about business reporting or analytic processes that might be enhanced with more timely data, but how do you know for sure whether more current data would make a difference? Would it really matter if brand managers had sales data in 15-second increments during in-store promotions? It may not be possible for an organization to react as quickly as the data comes in. Does it matter if the continuous replenishment system can recalculate the shipments every 10 minutes if the truck only leaves once a day?








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address