|
|
|
E-Business EverlastingPlanning ahead for scalability under growing customer demand whether through partitioning or replication may be the best e-business move you can makeBy Ike Van Cruyningen
Its not hard to imagine: Your development team has done an outstanding job on your latest Web application, a high-visibility corporate purchasing system. Performance is wonderful during development and testing, so you deploy the application with great confidence. Your first customers are delighted and your customer list begins to snowball. Youre really riding high. Suddenly, your application slows to a crawl, and the complaints pour in. You realize that unless you repair the problem immediately, customers will soon leave in droves. A system can meet all its functional and performance goals, but still fail to scale in the real world. America Online (AOL) found this out the hard way just after it offered unlimited monthly access to its online service. System overloads led to millions of unhappy customers and legal threats from dozens of states. Only after a round-the-clock effort to upgrade the system did the service get back on track. AOL had to issue refunds totaling millions of dollars to restore customer confidence. Whether youre managing, sponsoring, funding, or developing a Web application, you absolutely must plan for scalability before you deploy. In the frenetic world of the Web, it is exceedingly difficult to patch in a solution later on. In this article, Ill discuss some field-tested replication and partitioning techniques that will ensure your applications can handle enormous audiences without missing a beat. Congestion Takes Its Toll Why would a well-written, well-behaved application suddenly start acting up under load? The explanation is remarkably similar to your commuting experience. Performance as measured during development is like driving home on a Sunday afternoon, while actual performance under load is like driving home at rush hour. The freeway and your application both behave according to this formula, which is charted in Figure 1: actual response time = unloaded service time/ ( 1 utilization ).
Of course, you dont want your customers to go somewhere else, so you must plan for scalability. Alloy.com recently decided to supplement its online marketing with a mail-order catalog. When the catalog hit, response was so enthusiastic it took out the e-commerce servers, negating much of the mailings value. You must plan for scalability early in the development process because your application responsiveness will deteriorate exponentially under load. You cannot put this issue on the back burner and hope it will resolve itself. It will escalate and cause you more and more stress and put your business at more and more risk as time goes on. If youre lucky, you can start the project off with a demand forecast from marketing studies. Capacity planning will give you an estimate for your server size based on the forecast. Dont question the methodology behind the forecast too closely, because all too often, it simply seems to be someones best guess. But whether you have a reliable forecast or not, you have to ask yourself, What if we get twice that many customers? The goal of a scalable design is a system that will accommodate more and more customers simply by adding hardware. If you start with a scalable hardware platform, youll be able to double capacity without significant disruption by simply replacing the server with a more powerful one. However, the Web is growing exponentially, so you have to ask about doubling, redoubling, and so on. Regardless of your hardware vendors skill and reputation, at some point youll reach the capacity limit for a single server. Then the only solution is to replicate or partition your application across several servers. In application replication, you add duplicate hardware in parallel and distribute requests to the duplicate servers. In application partitioning, you split the application so you can move parts of it to other servers as the system outgrows the original hardwares capacity. The two strategies are complementary. As your customer base increases, you should first partition your initial deployment. After further growth, you can replicate the partitions that outgrow their new servers. Replication Replicating your application across duplicate hardware is like adding lanes to a congested freeway. The major design issues are: First, determine where bottlenecks will occur, because thats where the new hardware will help most; and second, determine how to distribute requests among the duplicate hardware systems to balance the load. Figure 2 (page 32) shows five potential replication strategies at different tiers of a Web application.
The database engine itself may be a bottleneck during complex joins or in the long-running queries in data mining or exploration tasks. If your system bogs down when the marketing folks connect for sophisticated reports, you probably have a database engine bottleneck. As Figure 2b shows, a parallel or replicated database will increase capacity. A parallel database runs one database instance on parallel hardware. In such products, database developers have figured out how to balance the load and maintain data consistency. In replicated databases, two separate systems manage two instances of the same data. Replicated databases incur major overheads in maintaining consistency between the two sets of data. They work for slowly changing content such as situations where several hours of inconsistency between the databases is tolerable but they perform poorly for databases that have to stay synchronized at all times. In such cases, trying to segment or partition the database (as described in the next section) is a better idea. (The architectures in Figures 2c, 2d, and 2e assume the use of a parallel database engine.) The application server may become a bottleneck if the business rules are very complex. In my experience, relatively minor changes in user interface layout and application navigation can automatically satisfy a surprising number of complex business rules. For example, if a workflow has to occur in a specific order, guide your clients through the application in that order. If dependencies between data elements exist, make it impossible to enter dependent data without having already entered the antecedent data. If certain clients do not have the authority to take specific actions, simply dont offer such actions to those clients. In fact, one of the primary benefits of a consultant is to offer a fresh, independent look at application design, with an eye to removing as much code as possible. The fastest line of code to develop and run is the line of code you dont write at all. An app server may also become a bottleneck is if it has to manipulate or rearrange data. Databases excel at sorting, shuffling, joining, aggregating, and otherwise massaging data. If youre writing app server code containing nested loops or arrays, you should consider a more advanced SQL statement or a stored procedure instead. Finally, an app server may become a bottleneck if the technology is underpowered. Interpreted languages are slower than compiled languages, so long scripts with lots of interpreted code will cause problems. Active Server Page (ASP) scripts are notoriously slow; even Microsoft advises the use of compiled components for significant app server tasks. And server-side Java may have many benefits, but high performance is not among them. Compiled components certainly improve the performance of an app server, but how do you replicate that capability for larger loads? Although you could deploy additional networked app servers (see Figure 2c) and implement load balancing, the communication overhead tends to overwhelm performance benefits. A LAN remote procedure call (RPC) takes roughly two orders of magnitude more time than a local RPC because it requires tens of thousands of machine instructions to get a message onto the LAN. Load balancing while the request is still on the LAN by replicating the Web server is a better idea. The Web server itself is almost never a bottleneck, but the architecture of Figure 2d is attractive because it is an easy way to load balance the rest of the system. The simplest implementations use a router to distribute requests to Web servers based on the IP packet source address; each Web server manages a specific range of client IP addresses. This approach neatly solves the problem of session state. HTTP does not maintain a continuous connection between client and server during a session; rather, each request is entirely independent. However, many applications have to maintain state information, such as shopping cart contents, between requests. If the state information is maintained in the app server, then all the requests from a client during a session have to route to the same server. Routing based on source IP address ensures this process occurs. One drawback here is that requests are not distributed evenly among source IP addresses. Requests from major networks such as AOL travel through just a few gateways. You may have to dedicate one Web server just for clients from AOL. When that server becomes overloaded, this approach doesnt scale further. A second problem is that the load balancing here is static. Dynamic load balancing based on current server workloads is more effective; if the server dedicated to AOL is overloaded or fails, you can shunt some or all of the requests to an idle server. To achieve dynamic load balancing, you must replace the simple router with one of the following: A smarter redirector that polls the Web servers for current load and understands not only source IP addresses, but also session information in cookies, logins, URLs, SSL session IDs, or specific hidden fields in forms. Examining the contents of HTTP requests within IP packets definitely slows down a redirector. A cluster manager to distribute requests among equivalent machines in a cluster. The clustered machines transparently exchange session state information using a separate LAN protocol. This solution is the best one for load balancing as well as failover at the intranet level, but it is also the most expensive. A dedicated server for your home or login pages that redirects client requests to peer servers for the rest of a session. The redirection is based on the peer servers current workload. This approach, which Netscape Netcenter and other large sites use, is relatively easy to implement. For an Internet application (as opposed to an intranet one), the bottleneck is most likely the communication across the network. The architecture of Figure 2e, which includes several replicated or mirror servers close to different backbones and in widely dispersed geographic locations, is most attractive for an Internet application. The Domain Name System (DNS) allows association of multiple IP addresses with a single name and will return those IP addresses in a round-robin fashion.
The browser caches the DNS-returned IP address during the time-to-live (TTL) interval. If a server fails, the client has to wait at least one TTL interval before a new DNS request returns a different server, so failover is not transparent. If the clients session is all within a TTL interval, the requests will all go to one server and the server session state will be maintained correctly. If the TTL interval expires, resulting in a new DNS lookup, a client may be directed to a different server and the session state will be lost. If youre maintaining session state in the server and plan to use the round-robin DNS to distribute requests, make sure your server session timeout is shorter than the DNS TTL interval. Several companies have developed smarter DNS servers to improve on the round-robin DNS scheme. A smarter DNS server acts as a front end for several Web servers supporting the same URL. The enhanced DNS server tracks the load and availability of the Web servers. In response to a DNS name lookup, it returns the IP address of the least busy, but still available, Web server. The client then works with that IP address for the duration of the session. Another approach is to redirect clients when they make an HTTP request. A primary server corresponding to the IP address returned by DNS tracks the load and availability of a number of secondary servers. When the primary server receives an HTTP request, it redirects the request to the secondary server with the lowest utilization. The secondary servers might be geographically dispersed, so a smart primary server also takes into account the proximity of the secondary server to the client, to minimize total client response time. Replication can be used at each tier in a Web application to remove bottlenecks. For an intranet application, the architecture of Figure 2d is most attractive from a scalability and availability point of view. For an Internet application with widely dispersed clients, the architecture of Figure 2e provides the best performance, scalability, and availability. Partitioning Application partitioning is another technique for distributing workload. Here different server tasks are allocated to different hardware systems, splitting the application either horizontally by tier or vertically by function.
Database operations are commonly I/O bound because disk access times are thousands of times slower than CPU operations. Combining a Web server waiting on network I/O, an app server waiting for the CPU, and a database server waiting on disk I/O could result in a balanced system, provided it has enough memory to keep all the processes and cached files memory resident. More commonly, the database server resides on a separate system. Often the database server has been operational for years and serves many other applications, so it is already on a separate system that should not be modified. You might also partition your Web application vertically by function. You separate the different tasks onto hardware systems optimized to perform those tasks. For example, my company worked with Mindscape, a leading developer of innovative software titles for the education and entertainment markets, to develop an application scalable to hundreds of thousands of customers. Mindscape wanted to let customers download additional content for its software titles using cost-effective Web distribution. It also wanted to market additional content from within the application and support distribution of application updates across the Web. These requirements map to two distinctly different types of hardware systems. Downloading static content such as clip-art, styles, design tips, or application updates requires a simple Web server optimized for static content a system with lots of memory and multiple network cards but only modest CPU and disk I/O subsystems. Verifying client licenses and supporting purchases of additional content requires an e-commerce Web application that accesses a database. Database access requires a system with fast disk I/O. Based on the relative frequencies of the two types of operations, we designed a system with one database-backed system supported by multiple content-download systems. (See Figure 5.) This architecture makes the same hardware and database server investment scale much better than multiple identical systems that support transactions as well as download content.
Separating data that changes quickly from data that changes slowly improves performance and scalability in many contexts. A financial application should replicate historical data and price quotes, but centralize trades and customer accounts. A human resources application should replicate benefit plans and company policies, but centralize employee requests and individual benefits tracking. An online bookseller should replicate the Books in Print catalog, but centralize orders and customer preferences. Beyond simply separating static data from volatile data, you might consider a distributed database. In this architecture, you segment your data by geography, customer, product line, or some other criterion and then put each segment in its own database. If you choose your segmentation correctly, most of your database operations will access only one of the split databases. For example, real-estate databases segment very nicely by location; you can easily use one database for each state. Unfortunately, most corporate data does not separate quite as cleanly. In distributed databases, the scalability and performance risks arise from transactions that span more than one database: The transaction monitor that coordinates distributed transactions uses two-phase commit, which can be a major performance drain. Even worse is the potential for cross-database deadlocks. The bottom line is, be careful with distributed databases. If you have to divide a database, you may be better off splitting the relatively static data from the volatile data. Keep all the transactions and write locks in a single database containing all the volatile data. Beyond separating the static and volatile data, it makes sense to further partition data onto separate systems. As long as you can cache all the content or most of the data in memory, youre better off replicating the entire system, rather than partitioning. Replication lets you balance the load and support failover, providing a more flexible system than specialized servers would. With HTTP 1.1 persistent connections and pipelined requests, supplying as much of a session as possible from one server will improve client response time. Be Prepared Deploying a new e-business application on the exponentially expanding Web can be daunting, but the partitioning and replication techniques Ive described here can give you an edge. Overall, appropriate application partitioning gives you the best bang for your hardware buck, and replication eliminates bottlenecks and supports failover. A combination of these techniques ensures a cost-effective solution capable of handling phenomenal growth.
Copyright © 2004 CMP Media Inc. ALL RIGHTS RESERVED
| |