Much tougher database requirements are just around the next corner
|
|
The E-Scalability Challenge |
|||||
|
|||||||
Growing Requirements of E-commerce
Here are a few of the fundamental, breathtaking changes putting pressure on the capabilities of even the most advanced database architectures:
Huge numbers of concurrent users. As e-commerce gathers momentum in the distribution of retail products and services, we are marching steadily toward the moment when there will be online, constantly updated databases that will be used directly by truly enormous populations much larger than even the largest ones existing today.
Established online service providers already serve millions of people. As such notions as pervasive computing and the Internet appliance catch on, we will have millions of Internet users who do not even need a PC. Services for such populations as most U.S. consumers, most European homeowners, or most soccer fans worldwide appear to be just around the next corner. Governments will be in this game, too, with online services for all taxpayers and other immense populations.
As these services take hold, online users of a single e-commerce service may soon reach levels ranging from 50 to 100 million. I think we will pass the hundred million mark within a few years.
With huge online user populations, we will see enormous swings in the number of online, concurrent users. IBM already cites data showing that on e-commerce sites the peak demand is much higher compared to the average demand than on ordinary server sites.
Continuous availability. Its now beyond discussion: Large e-business operations simply must be up all the time. The difficulty of meeting this requirement increases with two factors: database size and transaction volume. In e-commerce, both these factors grow rapidly as user volume grows.
Extremely large stored-data volume. Today, the trend is toward storing the clickstream in a data warehouse where it is then analyzed and mined to better understand customer behavior, customer and product profitability, and other e-business issues. Clickstreams for large user populations are the biggest things around. It doesnt take long to accumulate a terabyte of clickstream when you have a large-scale, actively used Web site. So, I think well see data warehouses with a hundred terabytes of clickstream data within a few years and probably petabyte databases of e-commerce activity in something like five years.
Near realtime, very large-scale decision support. The data warehouse became a mainstream phenomenon in business on a foundation of periodic batch update. That is, data warehouses are most widely used to support analytical applications that produce results based on events from yesterday or last week.
But e-commerce is increasingly focused on a faster moving world, in which the events of a minute ago are important to decision making. An e-commerce operator may want to extend a special offer to a customer based on something the customer communicated a moment ago. In fact, it may be critical to reconfigure the store the customer is visiting based on information that can be pieced together from something the customer did a moment ago, combined with something he or she did at the site last month or last year.
So, information must flow into the data warehouse continuously and in large volumes, be exploited more or less immediately, and then be relayed to operational systems all in a time frame that might plausibly range from 10 seconds to a minute.
The e-commerce requirements for the database of the next few years then include:
A large step upward in user populations and transaction rates
Continuous availability
Extremely large data volumes in the data warehouse
Continuous update of the data warehouse
Immediate exploitation of new data in the data warehouse.
Considering the size and growth rate of e-commerce operations, this combination of requirements would severely strain todays database engines and todays decision-support architectures.
So once again, only a few years after the last set of revolutionary advances in database architecture (such as parallelism, cost-based optimizers, and advanced indexing techniques), changes in the business environment have brought us a new set of scale-related database challenges.
Pervasive Computing
In my view, one of the fundamental forces powering this change is pervasive computing: the notion that specialized, often connected, information appliances will be just about everywhere in our lives, performing tasks on our behalf as we move around the house, office, or neighborhood.
The Internet has brought astonishing change already, but we forget that virtually all Internet use today occurs when someone is seated at a relatively expensive, complex device we know as a personal computer. As inexpensive Internet appliances (mobile and otherwise) really catch on, the Internet will become accessible to hundreds of millions more people around the world people who might never buy or extensively use a personal computer.
Many people who are attracted to the Internet still make limited use of it because they simply cannot be seated at a computer more than a few hours a day. As pervasive computing catches on, these people will generate many more Internet transactions a day because the use of the Internet will be integrated into a many more of their daily activities.
Of course, there is already endless talk in the industry about the scalability requirements of e-commerce. But most people forget that the scalability requirement is most intense at the database.
Why? At the heart of most e-commerce sites is a database that has to be updated. At a minimum, the database tracks the identity and registration of the services users. More typically, it is recording data on customers, accounts, shopping baskets, transactions, inventory, and other subjects needed to operate the service.
The updatable database is different from all other elements of the e-commerce infrastructure in that it cannot be replicated in the ordinary sense.
For example, if a front-end Web server becomes overloaded, there is a more-or-less straightforward solution: replicate it. The Web servers operate essentially autonomously of one another. Infrastructure is needed to route users, distribute and manage workload, and coordinate failover.
Operating multiple, replicated Web servers has its complications, but it is a well worked out problem. Not much information must flow between them. The same is true for the application server.
However, the database server is different. You cant simply replicate a large-scale, continuously updated, high transaction-volume database. At least, you cant do it and double your capacity. And, the extent to which you can do it at all with the scale, reliability, and performance required in high-end e-commerce is in doubt.
So at present, the volatile elements of the database must remain a single, integral, unreplicated element of the e-commerce infrastructure. And a single database server or perhaps a cluster of such servers must handle the immense, rapidly growing workload.
E-commerce sites may have a shadow database for backup or disaster recovery, and they may replicate static databases that are used to serve up information that doesnt change moment-to-moment. But the heart of the operation is a volatile database implemented on a centralized database architecture. And that volatile database is at the fulcrum of rapidly escalating requirements for scalability and performance, with no room to compromise on availability.
The bottom line is this: Our most successful, booming e-commerce operations are going to need another very large factor in scalability over the next three to five years a factor that far exceeds any projected growth in the capacity of hardware components. Furthermore, the data warehouses that go with them, expected to provide the business intelligence so integral to the e-commerce strategies, need to handle much larger data volumes and continuous updates. Some large-scale data warehouses in operation do continuously update. There certainly has also been noteworthy progress in database scalability and availability over the last several years. But we have a long way to go before the next level of e-commerce requirements can be satisfied. Those requirements are just around the next corner.
Just around the Next Corner
Richard Winter is a specialist in large data base technology and implementation, and is president of Boston-based Winter Corp. You can reach him via email at Richard.Winter@wintercorp.com or by fax at 617-338-4499.
|
|
|
|
|



