August 03, 1999, Volume 2 - Number 11
XML Again
A practical view of XMLs importance
OK, we know we missed the extensible markup language (XML) issue. But lets face it, were CIOs, and CIOs are always a little late. Besides which, that issue covered mostly what XML is and why its likely to be successful, but not really why CIOs should care. Well talk here about why XML is important, what problems it solves, what it doesnt address, and what organizations need to do to take advantage of it. Actually, we werent late, we were just busy. Both of us have been working on implementing XML standards for the property and casualty (P&C) insurance industry. P&C is probably a good example of an industry that understands XMLs importance. E-commerce has never really taken off before in the insurance space, and XML holds the promise of revolutionizing an industry, influencing not only the way business will be conducted, but who the likely winners and losers will be.
For a technology with as far-reaching an impact as XML is beginning to have, its actually pretty simple. But then again, so was HTML. The power of XML, as with HTML before it, is how little people actually have to agree on to use it. Lets face it, XML is pretty much the only thing other than history that IBM, Microsoft, and Sun Microsystems all agree on.
Humble Beginnings
XML started as a way of describing content on a publishable page in much the same way that HTML defines appearance. This approach lets publishers better separate the underlying content of a page from its presentation, which is a huge boon when publishing complex Web sites in which consistency in appearance is a plus. In this regard, XML and extensible stylesheet language (XSL) the mapping language primarily used to format XML and an XML grammar itself are logical extensions of current generation HTML and cascading style sheets (CSS). These appeared with the recognized need to give site publishers, rather than simply browsers, more control information presentation. Indeed, this combination of XML and XSL has been described as CSS on steroids. But there is significantly more going on here than a better way to describe a Web page.
Or less.
XML provides a simplified (from SGML), flexible, and defined way to describe information hierarchically. It turns out that this is pretty close to an ideal mechanism for facilitating rich information exchange among machines as well as among machines and humans. And this capability is the core requirement for successful e-commerce.
E-Commerce at Last
E-commerce has been around for a long time since the first time it occurred to someone that printing something from a computer, sending it to someone else, and reentering the information into their computer was less than optimally efficient. But e-commerce has hardly been pervasive. In fact, in most regards, pre-Internet e-commerce has been pretty much of a flop. E-commerce has been so uncompelling in some industries, such as insurance, that its embarrassingly commonplace to see standard electronic data interchange (EDI) transmissions received, printed, scanned, displayed, and rekeyed.
The problem is that EDI transmission standards are overly rigorous and onerous. Consider the simple problem of representing a name and address. To achieve an EDI standard, we first need agreement on how we will represent a name. Will it be one field (Name) or five (Prefix, FirstName, MiddleName, LastName, and Suffix)? That structure works pretty well in the United States, at least in the English-speaking part, but it certainly doesnt fare as well internationally. And even if we agree on the more complex, U.S. structure or an even more complex international structure, we have to agree on the length of each of the fields, the order in which they will be presented, as well as permitted and required error checking. Not to mention that the parties who must agree on these issues are typically competitors, or at least protagonists, within an industry. And address is a lot harder to standardize on than name is. What usually happens is that the software vendors and industry participants lose interest in the wait and develop multiple competing, proprietary standards, which take even longer to resolve.
XML radically simplifies the EDI-e-commerce situation in two major ways. First, XML reduces the scope of the argument. In one round of the Vietnam peace negotiations in Paris in the 1960s, the group spent several weeks arguing about the proposed shape of the table at which the negotiations would be conducted. One group held for a square table, the other for a round one. No one remembers the actual table shape used, but several hundred additional people died during the extra time it took to resolve this issue. With XML we can define name pretty simply:
<NAME>
John Q Public
</NAME>
To support the more complex semantics, we simply allow the additional fields on a nested basis and make them optional:
<NAME>
<FIRST NAME/>John
<MIDDLENAME/>Q
<LASTNAME/>Public
</NAME>
The magic is that both forms work at the same time. By supporting multiple representations, we dont have to argue about them. But whats even more important is the second way that XML simplifies the situation and what we dont have to argue about at all: field presentation details. We dont have to care about order or field length or blank padding. Because the data structure is self-describing; its pretty easy for the receiving program to figure out what it has, and because bandwidth is relatively free, the overhead of the extra text tags is no longer meaningful. This isnt multimedia; its just text. No more table shape arguments. Moreover, you can steal many of the constructs that had to be developed denovo for each EDI implementation from other XML work or simplified using native XML features, so each industry doesnt have to develop them over and over again using XML as an interchange format. Consider, for example, relationship and containment mechanisms for associating XML entities. Where these would have to be explicitly defined for an EDI transaction, they can be implicitly assembled using XML. One fewer thing to argue about.
XSL Magic
Earl Weaver, the fabled Baltimore Orioles manager, once described the three-run home run as the most underrated weapon in baseball. XSL is the three-run homer of XML. XSL was developed to take an XML tree and transform it into an HTML tree for presentation purposes, preserving the programmatic access to the underlying XML. That means it looks like HTML to a human observer, but it still looks like XML to a computer reading the file. To do this, XSL provides an incredibly smart, rich set of tree-mapping and transformation tools, as well as an integral extension mechanism through Java and scripting. And its built in (albeit with a few bugs) to the current generation of Internet Web browsers and distributed for free. But there is no requirement that you use XSL to generate HTML. XSL can generate any well-defined tree structure, so it is ideal for mapping one XML expression into another. Moreover, XSL can run multiple passes, so you can map one XML structure to another in one pass and then to HTML in a second. What this means is that given a basic XML DTD vocabulary for an industry, organizations can use XML to develop even before we have fully defined standards with the ability to map their internal structures to the standards as they emerge, preserving any work they do in advance of standards formalization. This ability is critically important, because it provides a de facto mechanism for remapping that should minimize the temptation to proliferate competing standards without discouraging innovation.
Whose XML?
XML, of course, is not a panacea. Mapping standardized external information to legacy systems environments is not trivial, although a few vendors are beginning to offer products and services that make it a lot easier to do. The key issues here are mapping to the ill-defined data structures prevalent in many legacy environments, as well as the difficulty of interfacing realtime systems to fundamentally batch environments. To truly take advantage of e-commerce, many companies will be forced to substantially replace many of their core systems. The key factors for taking advantage of this technology today are twofold: getting started and working on vertical XML vocabularies. Getting started presents the usual obstacles; moreover, the rigor imposed by using XML rather than HTML development may cause some problems, especially where marketing groups are still organizing Web sites.
As for vertical standards, the World Wide Web Consortium (W3C) has so far opted to stay out of the fray, so there is no meta-XML standards group. Cross-industry organizations such as the Open Applications Group are helping to fill the void, but the real role will fall to the industry standards organizations. These groups will need to adapt to the increased demands of life on Internet time, with radically shortened development and approval processes and much more sensitivity to widespread application deployment. Companies should be stepping up their participation in their industry standards groups as well as keeping watch on a few noteworthy industry groups such as ACORD for the insurance industry. We know we will.
John Trustman is a principal of Delta Technologies Inc., a consulting firm specializing in corporate software development with a focus on very large systems and Internet delivery. You can reach him at jwt@trustman.com.
Susan Meshako is the CIO of National Grange Mutual, an East Coast P&C Insurance Co. You can reach her at meshakos@ngmmail.msagroup.com
|
|