Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts


January 1, 2000, Volume 3 - Number 1


Predictions for a Webhousing Revolution

Millennium Ahead


Ralph Kimball                

As we cross the threshold into the third millennium, it’s tempting to guess what might happen in the next thousand years. That’s obviously so ridiculous that I’ll start by cutting the task down to something slightly more reasonable. Ten years sounds about right. And I’ll try to stick to subjects that surround data webhousing, because this column doesn’t grant me a license to speculate on other subjects. I realize in writing a column like this that on the one hand my predictions may seem outlandish, and on the other hand they are almost certain to be bland when compared with what really happens. So, plunging forward, here’s my outlook for the immediate future, colored heavily by the effects of the Web revolution we have unleashed just as we cross the millennium boundary. I am also assuming that we will enjoy the benefits of vastly increased communications bandwidths.

Change Will Accelerate

If you think things are changing fast now, you ain’t seen nothing yet. We have developed an appetite and an expectation that everything will change even more rapidly. Businesses will make profound changes in their relationships with customers, in many cases doing away with the middlemen, so that the customers can be engaged directly, usually over the Web. We will be intolerant of delay, whether it is in viewing Web page painting on our personal computer, or in seeing the correct status of an order. Everything must be up to date, and in real time.

Enough Choices and Too Many Choices

The Web has made us choice-junkies. We want dozens, if not hundreds, of alternatives, whether we are choosing a hotel or software products to help us extract, transform, and load. But there is clearly a limit to our tolerance for manually scanning undifferentiated alternatives. We want the alternatives, but a major focus in the upcoming decade will be better search engines that cut down all of the wonderful choices so that we look at only a few really good ones.

But too many choices is a cousin of too many opinions. A fascinating recent discovery relating to medical information searching on the Web is that some people are learning much more than they need about diseases and symptoms. If you worry too much that your toe hurts, pretty soon it will hurt.

Critical Mass Can Win Web Loyalty

Since we yearn for ever-more rapid change, we will be increasingly fickle in our online relationships. We may buy from Amazon today, but how deep is our loyalty, and what must Amazon do to really lock in that loyalty? The answer to online loyalty can be found in today’s Yahoo and eBay. Both of them have critical mass, and that critical mass makes it harder for their competitors to ever be successful. Yahoo gets so many Web sessions that it is almost the only e-business that can stay profitable on 1999’s version of Web advertising. eBay has created the world’s biggest flea market, and the depth of that market is irresistible to both buyers and sellers. Who would want to sell in a place with very few buyers? Who wants to visit a market where there is almost nothing for sale? But please, eBay, give us a better search engine for finding things.

Data Is Winning the Battle But Will Lose the War

In the early part of the 1990’s, it seemed like disk drives and CPUs were gaining ground on the masses of data we were collecting. We graduated from megabytes to gigabytes, and we are now becoming more comfortable with terabytes. It seemed as if we were reaching a kind of final summit of the mountain when we succeeded in capturing and storing in our data warehouses every atomic transaction in really big businesses such as retail, telco, and insurance.

But then along came the Web. The clickstream gives us a very revealing look into the future. There is no upper limit to the amount of data we can collect. The clickstream contains every gesture a Web site visitor makes. The visitor may make hundreds of low-level gestures made before making a single purchase transaction! By the middle of the next decade, clickstream databases will dwarf all the other conventional data warehouse examples. And the worst part is that the clickstream is really interesting.

During the next decade, we will be hooking increasingly more devices to the Web. Not only will we record every page event, but every subpage event, such as parameter changes in queries. We will also have hundreds of nanocomputers in our cars, homes, and offices that will record every mile driven, turn taken, light bulb turned on, and keystroke. All these nanocomputers will merge to produce a biblical flood of data.

And we are still talking about text and number records. Although corporations have romanced us for years with multimedia “data blades,” maybe this time it is real, because the Web is relentlessly pushing audio, video, and cartographic media in our direction. The only retarding factor in this surge to higher bandwidth media is the lack of high-performance wiring across our country.

Even though the latest clickstream battle has gone in favor of data, our disk drives, CPUs, and communication channels will eventually prevail in the overall war. Within two years, 500GB disk drives will appear in the consumer marketplace. At the end of the next decade, it will be possible to buy a laptop with a terabyte of writable storage, and many of our desktop PCs will have 10 terabytes of storage. A terabyte of storage is enough for 3 solid days of compressed video. 10 terabytes will be enough to hold a picture of every citizen of the United States.

Everything Is a Module

From the perspective of a computer scientist growing up in the 1960s, the trend toward software bloat is pretty dismaying. Does anyone still read Donald Knuth’s wonderful books on how to write really tight algorithms? Does anyone program in Assembler any more? Certainly one of the amazingly successful bets that Microsoft has made is that CPU power is more effective than tight low-level coding over the long run. Microsoft has been very successful in concentrating on adding functionality to its systems, not making them smaller or faster.

Actually, the message is subtler than “hardware has won over software.” The real message is that modern systems are horribly complex by necessity, and the only way to build these systems is from modules. The high-level designers of both hardware and software don’t need to and can’t descend to the lowest bit-twiddling levels. The technique of building systems from modules pervades every level of our computing environments. CPU, memory chips, operating systems, and applications consist of modules. Engineers will build data warehouses and webhouses almost exclusively from modules in the next decade. Rather than programming in C or Visual Basic, IT shops will be connecting icons with “visual pipes” and filling in screens of parameters. Rather than typing in hundreds of CREATEs and GRANTs, we will transfer major applications from prototype to production by dragging a master application icon from a picture of the test machine to a picture of the production machine. If you open the master application icon, inside you will find a maze of sub-icons (modules, actually).

A corollary of building systems out of modules is that most IT shops will buy, rather than make, suites of modules that they can rapidly configure into the final system. It’s crucially important to customize each system, but an IT shop only wants to customize the topmost layer.

We Will Finally Need Data Mining

The next decade will be the coming of age for data mining, although it’s unclear whether the current set of data mining vendors will be smart enough to catch the wave. The name of the game is behavior. Our ever-more-detailed data sources are increasingly describing customer, citizen, and Web site visitor behavior. The Web has introduced enormous urgency to capture and understand this behavior. Frankly, most businesses don’t care which data mining algorithm programs it uses. Any vendor caught describing the difference between neural networks and genetic algorithms is going to get left behind. But effective results are important. What is behavior? Can we identify which customers are happy? Loyal? High-value? Likely to default? The data mining shakeout has not yet occurred, but will likely happen in the next decade.

The CD-ROM Content Industry Is At Risk

Earlier in the 1990s,it seemed as if the CD-ROM content providers had an interesting future. Encyclopedias and maps seemed like promising applications that would require a CD-ROM to deliver the data. The Web has erased that hope. There is no reason to send a CD-ROM to someone if you can download the same data over the Web.

Note: this prediction is risky, because it depends on the widespread deployment of very high bandwidth communications. It will be an interesting race to see whether the highest bandwidth communication channel in 2010 will be a) 17GB DVD ROM sent through the United States mail, or b) an online Internet connection available in most major U.S. cities. Even if the DVD ROM wins in 2010, it is clear that when we get properly “wired,” the online option will be the big winner.

Of course, if you change the rules and tell me that a DVD ROM in 2010 will hold a terabyte of data, then maybe I will change my mind…

Goodbye Privacy

Technology is so far ahead of the legislators that even if they passed a new law every month, they could not cope with the robust energy of the Web economy and Web community. This is a bit of a cop-out, but let’s face it: our trusted Web site partners are going to know a lot about each of us. Certainly, if you never go online, you may avoid much of the upcoming privacy degradation, but one-to-one marketing is simply intimate. If they can see your every gesture, then they know a lot about you.

Further Out

I promised not to talk about subjects that couldn’t be related to data warehousing. But data warehousing is a really broad subject. I mean, it is about data, information, and knowledge. So let me propose a subject well beyond even the next decade. Here’s one for the upcoming century.

I believe that in the next few decades, there will be a grand synthesis between biology and computing. I think we will make computers small enough to float around in human beings’ bloodstreams. These computers will be able to recognize viruses, pathogens, plaque, and cancers. These computers will be able to take actions that affect the health of humans. I think I should buy some stock right now in Norton’s or McAfee’s anti-virus software. It’s a long-term play, but trust me.



Ralph Kimball, Ph.D., co-invented the Star Workstation at Xerox and founder of Red Brick Systems, works as an independent consultant designing large data warehouses. He is the author of The Data Warehouse Toolkit (Wiley, 1996) and the newly published The Data Warehouse Lifecycle Toolkit (Wiley, 1998). You can reach him through his Web page at www.ralphkimball.com.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address