Breakthrough Analysis: A Data Space for Information Coexistence > > Intelligent Enterprise: Better Insight for Business Decisions

Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts


  • EMAIL
  • PRINT
  • REPRINTS
  • Follow Us on Twitter
  • FOLLOW US
  • Share

Breakthrough Analysis: A Data Space for Information Coexistence


Rather than trying to force conformity, a "data space" might allow disparate information to co-exist.


By Seth Grimes
May 1, 2006

Sixty years of computing have created the tantalizing prospect of "360-degree views" and "total information awareness." These notions of knowing everything about a subject, regardless of information source or form, are compelling. They seek nothing short of complete predictability through comprehensive knowledge--a consolidated store of information available to any organization that can pay the freight. But you'll never get to this Memory Alpha (for you Trekkies) if the data you need is dispersed across operational databases and data warehouses, dozens of spreadsheets and thousands (or more) documents on PDAs, desktop machines, servers and the Web. That's how diversity-embracing "data space" abstraction comes into play.

Three computer scientists, writing in the December 2005 SIGMOD Record, proposed the term data space to describe the collection of disparate information that represents and is used by individuals and organizations. Although the concept isn't new--I've been tracking use of the term by another computer scientist, Professor Robert Grossman, for several years--the authors nonetheless offer a far-reaching "new agenda for data management," one that provides a framework for a view of both queryable, structured databases and searchable, "unstructured" documents that is unified by the highest possible degree of semantic and administrative integration.

But where Professor Grossman defines data space as the contents of the nodes of a "data web" assembled for distributed data mining, the SIGMOD Record authors, Michael Franklin, Alon Halevy and David Maier, go further in proposing the development of generalized Data Space Support Platforms. DSSPs insulate the user from the challenges otherwise inherent in accessing diversely formatted, described and managed data available via disparate but interrelated services. Where Grossman's work is practical, focused on crafting high-bandwidth protocols that operate over grids and conventional networks, with the specific goal of enabling distributed data mining, Franklin, Halevy and Maier aim at something more abstract and general.

Their analysis acknowledges that much of the information we use is outside our administrative control. It's in someone else's database or files. It's described by someone else's metadata schema (or none at all) and therefore possesses a low level of semantic integration (or common definitions) with other information that interests us. These are the conditions that launched Google and the other search giants. It's hard to find documents and even harder to find meaning, whether on your desktop or on the Internet. Per Franklin, Halevy and Maier, we should move toward data coexistence rather than enforced conformity.

If we can't pull all the information we need into our own, semantically uniform databases, how about pushing capabilities out to the dizzying array of devices that comprise evolving data webs? Why not put everything into databases or under DBMS control? Add in support for complex processing (such as data mining) and workflow management so you can use your DBMS to orchestrate distributed computing processes. This is the current version of the dream that object-relational databases, the supposed next great wave in database technology in the mid-'90s, were designed to enable.

Writing in Queue magazine last year, database and transaction-processing pioneer Jim Gray of Microsoft Research asserted that DBMSs can do it all. They can host diverse data types as well as abstractions such as data cubes. They can optimize complex queries, make sense of workflows, process semistructured and streaming data, and embed specialized code written in portable programming languages. With further advances in handling inexact, approximate reasoning and in structuring databases to offer Web services, we'll be able to distribute robust, DBMS-centered operating platforms to servers, desktops and other devices to integrate data stores and mediate access.

My take is that a network of interconnected database environments would make an ideal data web, but one that will never be close to completely realized. The programmer in me says that practical, task-oriented approaches like Grossman's are the way to get stuff done. Regardless of how it's realized, the data-space concept provides an excellent framework for work toward robust knowledge networks.

Seth Grimes is a principal of Alta Plana Corp., a Washington, D.C.-based consultancy specializing in large-scale analytic computing systems. Write to him at grimes@altaplana.com.


  • EMAIL
  • PRINT
  • REPRINTS
  • Follow Us on Twitter
  • FOLLOW US
  • Share


 





New on the BLOG
Is Gartner's Quadrant the Problem, Or Is It How It's Used?
02. 8.2010
blog author
Cindi Howson
Bashing Gartner's Magic Quadrants seems to be a popular industry pastime, but in truth, I kind of like the quadrants. My biggest gripe is in how the quadrants are used, not necessarily the quadrants themselves...

Read more from Cindi Howson >>

Seth Grimes
Clarabridge Asks, Are You Customer Experienced?
Add "customer" to Jimi Hendrix' song title and you have a question central to last week's Clarabridge Customer Connections (C3) conference, Are You Customer Experienced?

02. 5.2010
Read more from Seth Grimes >>

Quick Thoughts on Sybase/Aleri
02. 4.2010
blog author
Curt Monash
Sybase today announced an asset purchase that amounts to a takeover of CEP (Complex Event Processing) vendor Aleri, which last year acquired Coral8. Quick reactions include...

Read more from Curt Monash >>



Intelligent Enterprise Newsletters
Subscribe Here:
*Email:
 First Name:
 Last Name:
  Intelligent Enterprise Blogosphere Newsletter:
  Intelligent Enterprise Newsletter:

Email Type: