Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts




May 31, 2003

Center of the Universe: The IBM View

The vision of 21st century data management as told through Web-exclusive interviews with key strategists at IBM, Microsoft, and Oracle

by Ken North

IBM Distinguished Engineers Rob High, chief architect for the WebSphere Application Server family, and Nelson Mattos, director of information integration at the IBM Silicon Valley Laboratory, provided insights into IBM's view of SQL and XML integration, the importance of metadata, database and messaging integration, and the future of grid data access.

North: There's an ongoing debate about the impedance mismatch between objects, object-oriented languages and SQL databases because of different type systems, different computational models, and so on. Do XML, XQuery and XPath, which DB2 UDB supports, introduce yet another impedance mismatch for developers using object-oriented languages? Do we need an XML-centric database programming language?

Mattos: Any time the persistence layer (the database) and the application layer (the programming language) have different data models, computation models, type systems, type encodings, nationalization of values, value or object comparison semantics, or a host of other differences, there is an impedance mismatch. [The mismatch] isn't merely limited to differences such as data model, but includes many other environmental differences, such as data type encodings that are natural between different platforms. As a result there's virtually always a mismatch. The question is the degree of mismatch. Because XML is a more flexible data model than relational (semi-structured vs. structured), the mismatch between object languages and XML is typically smaller than between object languages and relational — but there's still a mismatch. An XML-centric programming language would further reduce that mismatch, but would not eliminate it entirely.

North: Do XML, XQuery, and XPath present an impedance mismatch problem for SQL?

Mattos: Absolutely. In general, trying to move an XML construct into a relational schema is a big problem. The differences in the data models (for example, set oriented vs. sequence oriented, value based vs. reference based, and strong schema vs. schema chaos) mean that hard choices must be made to store XML in relational systems. Most successful cases today restrict the XML to more structured subsets. Some vendors require that XML Schema must be present to use the full power of their XML support.

Enhancements in the SQL language and standard under the name SQL/XML are enabling XML to be better processed in relational systems, but fundamental differences in the models make this an ongoing challenge. That's one the reason why our Xperanto project has a longer term focus on native XML support.

North: Lack of metadata is often a problem when querying disparate data sources or exchanging business intelligence data. In addition, large organizations may have an inventory of application objects that are not interoperable. What do XML and Web services (specifically the OMG's Meta-Object Facility and XML Metadata Interchange) contribute to solving those integration problems?

Mattos: While Web services give us a nice framework for connecting distributed components, we need metadata to discover existing components or services, decide how best to integrate them, and understand, once they're integrated, how they're being used. [That understanding] allows us to tune the composed system. In other words, the coupling of Web services and XML with the emerging metadata frameworks gives us the hope of easier application creation and deployment and more dynamic creation and tuning of services. Metadata is often considered passive. The real value, however, isn't only in having descriptions but in what you can conceivably do with the descriptions.

North: IBM is a strong advocate of grid computing and open grid services, which divide large problems and distribute them for processing across many computers. Is this model a problem or a solution for querying very large databases?

High: The issues of data and computational locality are essentially no different in a grid than they are in any other distributed systems architecture. As a rule, you'll gain maximum performance when data is located close to the computational unit that needs it. This needs to be balanced against the architectural requirement to share the same data amongst multiple computational units, if there is such a requirement. So grids tend to introduce new problems for computing the optimal partitioning and distribution of work and data. And they can intensify the need for accessing diverse data over a distributed environment.

Mattos: The key concept in grid computing is virtualization — the ability to hide the heterogeneity of the environment and allow individual resources to plug into and out of the grid as needed. As we relate that to data, the key issues are enabling access to diverse and distributed data, integrating it in a meaningful way for an application, and placing or distributing it for optimal performance.

The virtualization of individual data resources is being addressed through the GGF workgroup defining the Data Access and Integration Specification [DAIS]. DAIS will be an open standard, based on Web services, meant to accommodate multiple database paradigms. The ability to integrate diverse data is handled through federation technology, such as that introduced with DB2 Information Integrator, which enables diverse and distributed data to be accessed as though it were a single resource, regardless of where the data resides. Replication and caching technologies enable data placement across the grid to locate data in proximity to the computational unit that needs it. Clustered systems such as DB2 UDB manage the transparent movement of data to computing power. [For grids] this movement must be handled on a much larger scale and with far greater automation than is currently done today.

North: What's the relationship between the grid computing model and model-driven architectures (MDAs)? Can you easily express an information model as containing objects that will be distributed across a grid and dynamically load balanced?

High: I don't think the relationship between MDA and grid computing is particularly unique. The most common mistake made in distributed application design is not recognizing or appreciating the importance of choosing an appropriate level of granularity in the partitioning of application components. Every distribution boundary in an application is subject to potential latency, failure, and attack that are orders of magnitude greater than what you would normally encounter in a simple function call within the same address space. The potential for these vulnerabilities in a local function call are small enough to ignore. These same potential vulnerabilities in a distributed system are big enough that you can't ignore them. On the other hand, distributed computing offers tremendous opportunities to maximize the utilization of your available computing resources and to localize parts of the application to where it needs to be — close to the end user or close to the data, for example. So you need to understand these issues at some point in the modeling of the application. The desire to reduce the potential for latency and vulnerability motivates a coarser level of distributed component granularity. The desire to increase parallel exploitation of available resources motivates a finer level of distributed component granularity. The tension between these two desires can usually be balanced to achieve an optimal component design model. These same issues apply equally to grid computing systems.

North: Over the next few years, developers will be involved with XML, Web services, grid services, and objects in a distributed environment. Messaging will be an integral part of the application infrastructure. How important is it for messaging software and queues to be tightly bound with databases?

High: There are some interesting corollaries between reliable messaging based on persistent message queues and distributed databases. However, that's probably not a critical rationale for tightly integrating messaging and databases. What may be more compelling is the potential relationship of distributed notification systems for signaling state transitions in loosely coupled, stateful services. The messaging system can be used to register interest in state transitions. Those transitions can drive cascading functional and business processes. We can, for example, trigger risk calculations from registered thresholds on product balances.

North: How important is it to have messaging software and queues tightly bound with shared object repositories?

High: The tight integration of message queuing systems and distributed components will become increasingly important for a broad class of applications in traditional distributed computing and grid computing systems. Message queuing is an important enabler of asynchronous parallelism between collaborating components and is critical for enabling the scalability of grid computing.








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address