|
|
||||||
Thirty Years of Relational: Extending the Relational ModelWhen's an extension not an extension?By C.J.Date
Since its inception, the relational model has been the target of an unusually high degree of criticism. To be more specific, many claims have been made over the years (a) to the effect that the model is seriously deficient in some respect or another, and accordingly (b) that it therefore needs to be extended in some way. In this installment, I want to examine this business of "extending the relational model" in some detail; in particular, I want to take a brief look at Codd's own extended version known as RM/T. Bogus vs. Genuine Extensions Some claims of deficiency in the model are valid, others aren't. As a consequence, some proposed extensions to the model are genuine, meaning that they do truly serve to add useful functionality; others, however, are bogus, meaning either (a) that they don't add any new functionality or (b) that the functionality they do add isn't useful. Examples of genuine extensions include such things as the
Several SQL vendors have attempted (with varying degrees of success, I might add) to extend their SQL products to incorporate some kind of object functionality. They then go on to claim that their extended products thus support an "extended" version of the relational model, which they refer to as the "object/relational model" (O/R model for short). But this claim is absurd! As Hugh Darwen and I have shown in The Third Manifesto,2 object functionality and the relational model are completely orthogonal to one another. To quote: "The relational model needs no extension, no correction, no subsumption, and above all no perversion, in order [to support object functionality]." All that's needed is to support relational domains properly (which SQL never did), recognizing those domains for what they are, which is basically just abstract data types (ADTs), With All That That Entails. In other words, the so-called O/R model is just the relational model, pure and simple; there aren't any (genuine) "relational model extensions" involved at all. The RM/T Paper: Basics Let's turn to some interesting genuine extensions to the model. In 1979, Codd published yet another important paper, this one entitled "Extending the Relational Database Model to Capture More Meaning."3 I'll refer to this paper as the RM/T paper, for reasons that will quickly become clear. As the title suggests, the primary purpose of the RM/T paper was to suggest a set of "semantic" extensions to the original model; however, it began by summarizing the basic model (as of 1979), and I'd like to make a few remarks in that regard first before getting into details on the proposed extensions. First of all, I believe I'm right in saying that the RM/T paper was actually the first of Codd's papers to include an explicit definition of the term relational model! Here is that definition: 1. A collection of time-varying tabular relations (with the properties cited above -- note especially the keys and domains) 2. The insert-update-delete rules (Rules 1 and 2 cited above) 3. The relational algebra described... below. As an aside, I have a few comments on this definition:
End of aside. The RM/T paper was also the first by Codd to make explicit mention of the idea of relational assignment. However, the mention occurs only in connection with the proposed semantic extensions; it isn't part of the "basic relational model" definition given above, though it's certainly part of the model as now commonly understood. Moreover, there's no discussion of the fact that Third, the paper also has this to say: "Closely associated with the relational model are various [semantic] concepts... Examples are... nonloss (natural) joins and functional dependencies, multivalued dependencies, and normal forms." Here, then, we have a clear statement of Codd's position that these matters are to be seen as separate from the model per se (though I think he might subsequently have changed his mind on this point4). Fourth, the RM/T paper was also the first in which Codd embraced the idea of surrogates -- that is, system-assigned identifiers. (Again, the concept is brought in only in connection with the proposed semantic extensions, but there's no reason why it can't be used with the basic model, and indeed there are often good arguments in favor of doing so.) Unfortunately, however, the paper states that surrogates must be hidden from users -- a clear violation of the paper's own earlier definition of a relational database, which says, to paraphrase, that all data in the database must be accessible to (authorized) users. In fact, an argument could be made that hiding surrogates constitutes a violation of Codd's own Information Principle, which states that all information in the database must be cast explicitly in terms of values in relations and in no other way. (Just as an aside, let me remind you that -- as we saw last month -- relations per se are the only essential data construct allowed in a relational database. If I now add that relations are also the only allowable inessential construct, then what we wind up with is effectively a statement of the Information Principle.) Finally, the RM/T paper devotes a brief (too brief) section to the relationship between the relational model and predicate logic: "A database [is] a set of [propositions] in first-order predicate logic... [We can] factor out the predicate common to a set of simple [propositions] and then treat the [propositions] as an... n-ary relation and the predicate as the name of the relation." Codd goes on to refer to the "propositions" portion of the database as the extension and the "predicates" portion as the intension (extension and intension here being technical terms from logic). "One may ... view the intension as a set of integrity constraints." And he briefly discusses the closed vs. open world interpretations. (Under the closed interpretation, the omission of a given row from a given relation means the corresponding proposition is false; under the open interpretation, it means we don't know whether it's true or false.) The RM/T Paper: Extensions As I've already indicated, the bulk of reference3 is concerned with an extended version of the relational model called RM/T ("T for Tasmania, where these ideas were first presented"). It opens with some nice preliminary remarks on the matter of semantic extensions and "semantic data modeling" in general: (What a pleasing contrast to the exaggerated claims so often encountered in the semantic modeling field!)
Later, Codd makes another good point: Nice analogy! To turn now to RM/T specifically: RM/T generally falls into the same broad category as the rather better known "entity/relationship model" (E/R model for short).5 Even if never implemented, therefore (and to my knowledge it never has been), it can still serve -- just as the E/R model can -- as the basis for a systematic database design methodology; in fact, I personally prefer it to the E/R model for this purpose, since I find it to be more precisely specified. Some immediate differences between the two are as follows:
2. The structural and integrity aspects of RM/T are more extensive, and more precisely defined, than those of the E/R model. 3. RM/T includes its own special operators, over and above the operators of the basic relational model (though much additional work remains to be done in this last area). In outline, RM/T works as follows:
2. A variety of relationships can exist among entities; for example, entity types A and B might be linked together in an association (RM/T's term for a many-to-many relationship), or entity type Y might be a subtype of entity type X. RM/T includes a formal catalog structure by which such relationships can be made known to the system. The system is thus capable of enforcing the various integrity constraints that are implied by the existence of such relationships. 3. As already mentioned, a number of high-level operators are provided to facilitate the manipulation of the various RM/T objects (E-relations, P-relations, catalog relations, and so forth). RM/T also provides an entity classification scheme, which in many respects constitutes the most significant aspect -- or, at least, the most immediately visible aspect -- of the entire model. To be more specific, entities are classified (though only informally, please note) into three categories, called kernels, characteristics, and associations:
In addition:
The foregoing concepts can be related (somewhat loosely) to their E/R analogs as follows: A kernel corresponds to an E/R "regular entity"; a characteristic to an E/R "weak entity"; and an association to an E/R "relationship" (many-to-many variety only). Note: In addition to the aspects discussed briefly above, RM/T also includes support for (a) the time dimension and (b) various kinds of data aggregation. For more detailed discussions, see Codd's original paper3 or my own tutorial description of RM/T.6 References 1. Date, C. J. "Don't Mix Pointers and Relations!" and "Don't Mix Pointers and Relations -- Please!". In C. J. Date, Hugh Darwen, and David McGoveran: Relational Database Writings 1994-1997. Reading, Mass.: Addison-Wesley, 1998. 2. Date, C. J. and Hugh Darwen. Foundation for Object/Relational Databases: The Third Manifesto. Reading, Mass.: Addison-Wesley, 1998. 3. Codd, E. F. "Extending the Relational Database Model to Capture More Meaning." IBM Research Report RJ2599 (August 6th, 1979). Republished in ACM Transactions on Database Systems 4(4), December 1979. 4. Codd, E. F. The Relational Model For Database Management Version 2. Reading, Mass.: Addison-Wesley, 1990. 5. Pin-Shan Chen, P. "The Entity-Relationship Model -- Toward a Unified View of Data." ACM Transactions on Database Systems 1(1), March 1976. Republished in Michael Stonebraker (ed.): Readings in Database Systems (2nd edition). San Mateo, Calif.: Morgan Kaufmann, 1994. 6. Date, C. J. "The Extended Relational Model RM/T." In C. J. Date, Relational Database Writings 1991-1994. Reading, Mass.: Addison-Wesley, 1995. 7. Date, C. J. and Hugh Darwen. A Guide to the SQL Standard (4th edition). Reading, Mass.: Addison-Wesley, 1997.
C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto, coauthored with Hugh Darwen, and Relational Database Writings 1994-1997, both published by Addison-Wesley in 1998. Correspondence may be sent to him in care of Intelligent Enterprise, iemagazine@mfi.com.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
| |||||
|
|