Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts



Mutability


Are mutators necessary?

by C.J. Date

One of the most important of the many logical differences identified in The Third Manifesto [1] is that between values and variables. Just to remind you, here are the definitions [2]:

· A value is an "individual constant" (for example, the individual constant 3). Values have no location in time or space; however, they can be represented in memory by some encoding, and of course those representations or encodings do have locations in time and space (see the next bullet item).

· A variable is a holder for the representation or encoding of a value. Variables do have location in time and space.

And the crucial point is this: a value, by definition, can't be updated (for if it could, then after such an update it wouldn't be that value any longer); a variable, by contrast, certainly can be updated (after all, that's what "variable" means).

As a direct consequence of the foregoing, we can immediately identify yet another important logical difference: namely, that between read-only and update operators. I did discuss this distinction briefly a while back when I was laying the groundwork for this inheritance series [3,4], but I didn't really elaborate on it at that time; now I need to do so.

Read-Only vs. Update Operators

Note: Some of the material of this section is repeated from references [3] and [4].

The basic distinctions between read-only and update operators should be pretty much self-explanatory, but let me spell them out anyway. First, an update operator is an operator for which at least one argument must be specified as a variable specifically, instead of as some arbitrary expression, and invocation causes values to be assigned to such arguments (at least potentially); parameters corresponding to such arguments are said to be subject to update. A read-only operator is an operator that's not an update operator.

Here are a couple of examples. First a read-only operator, ABS ("absolute value of"):

 OPERATOR ABS ( Z RATIONAL ) RETURNS RATIONAL ;
RETURN CASE 
WHEN Z _ò_ 0.0 THEN +Z 
WHEN Z < 0.0 THEN -Z 
END CASE ;
END OPERATOR ;

And here's an update operator, REFLECT:

 OPERATOR REFLECT ( P POINT ) UPDATES P ;
BEGIN ;
THE_X ( P ) := - THE_X ( P ) ; 
THE_Y ( P ) := - THE_Y ( P ) ; 
RETURN ;
END ;
END OPERATOR ;

REFLECT effectively moves the point with Cartesian coordinates (x,y) to the inverse position (-x,-y).

Observe now that invoking a read-only operator returns a result, while Invoking an update operator doesn't -- at least, not in the model defined in The Third Manifesto [1]. Thus, the ABS definition includes a RETURNS clause, while the REFLECT definition includes an UPDATES clause instead; furthermore, the RETURN statement for ABS specifies a return value, while that for REFLECT doesn't.

As a direct consequence of the foregoing, an invocation of ABS, say ABS(X+Y), has a value. Such an invocation thus constitutes a legal expression in its own right, and so can be nested inside other expressions, as in this example:

Z * ABS ( X + Y )

By contrast, an invocation of REFLECT doesn't have a value; thus, it doesn't constitute a legal expression in its own right, and it can't be nested inside expressions. To invoke REFLECT, therefore, we have to use a separate CALL statement (or something logically equivalent to such a statement) -- for example:

CALL REFLECT ( Q8 ) ;

Note: I explained why we require update operator invocations not to have values in reference [3]. The basic reason is that we don't want the possibility of apparently read-only expressions producing side effects; in particular, we don't want the possibility of apparently "simple retrievals" having the side effect of updating the database. Accordingly, The Third Manifesto requires:

a. Arguments corresponding to parameters that are subject to update to be passed by name or reference, and

b. All other arguments to be passed by value [1].

I should add that The Third Manifesto uses the read-only/update terminology because it's both traditional and self-explanatory. You should be aware, however, that read-only and update operators are often referred to, especially in the object world, as observers and mutators, respectively (and mutability is then used as a synonym for updatability); hence my title for this month's column.

THE_ Pseudovariables Revisited

A particularly important kind of mutability is provided by THE_ pseudovariables. Loosely speaking, THE_ pseudovariables provide a way of updating one component of a variable while leaving the other components unchanged ("components" here referring to components of some possible representation, of course, not necessarily an actual representation). For example, let variable C be of declared type CIRCLE. Then the assignment

THE_CTR ( C ) := POINT ( ... ) ;

"mutates" the center of C without changing its radius.

Recall now that THE_ pseudovariables as such are logically unnecessary [3]. For example, the assignment just shown, which uses a THE_ pseudovariable, can logically be replaced by the following one which doesn't:

C := CIRCLE ( THE_R ( C ), POINT ( ... ) ) ;

(I'm assuming for the moment that variable C here has current most specific type -- as well as declared type -- CIRCLE.)

So if THE_ pseudovariables are really nothing but shorthand, why does The Third Manifesto support them? Well, suppose once again, as we did in an earlier installment [6], that type CIRCLE has an immediate subtype O_CIRCLE, where an "O-circle" is a circle that's centered on the origin:

 TYPE O_CIRCLE ... 
IS CIRCLE 
CONSTRAINT THE_CTR ( CIRCLE ) = POINT ( 0.0, 0.0 ) 
POSSREP { R = THE_R ( CIRCLE ) } ;

Then the current value of variable C at some given time might be of most specific type O_CIRCLE instead of just CIRCLE -- for example:

C := O_CIRCLE ( LENGTH ( 3.0 ) ) ;

Now suppose the end user tells the system that the circle denoted by variable C (regardless of whether or not it's really an O-circle) is to have its radius increased to five. We might try the following:

C := O_CIRCLE ( LENGTH ( 5.0 ) ) ;

If the previous value of C is "just a circle" and not an O-circle, however, this assignment will have the side effect of changing the current most specific type of C. (In general, of course, we can't say for sure ahead of time which of the two possibilities, O-circle or "just a circle," is in fact the case.)

Alternatively, we might try the following:

C := CIRCLE ( LENGTH ( 5.0 ), THE_CTR ( C ) ) ;

But if the previous value of C was in fact an O-circle, this assignment will (again) have the side effect of changing the most specific type of C (the expression on the right-hand side of the assignment returns "just a circle," not an O-circle). Indeed, an attempt to TREAT DOWN the variable C as an O-circle will fail after this assignment, even if it succeeded before. Caveat: Once again I have to warn you that I'm still assuming that (for example) a value that's "only" of type CIRCLE -- that is, a value whose most specific type is CIRCLE -- might actually correspond to an O-circle in the real world. As I've said before in this series, this is an issue I'll be revisiting in a future installment.

We might of course try something like the following:

 C := CASE 
WHEN IS_O_CIRCLE ( C ) THEN 
O_CIRCLE ( LENGTH ( 5.0 ) )
WHEN IS_CIRCLE ( C ) THEN 
CIRCLE ( LENGTH ( 5.0 ),
THE_CTR ( C ) ) 
END CASE ;

Note: I'll be discussing operators like IS_CIRCLE in detail next time; for now, I'll just assume they're self-explanatory (in fact, of course, I've made use of such operators in this series before, as you might recall).

The drawbacks to this approach are surely obvious! -- it's complex, it's error-prone, and it's vulnerable to changes of various kinds (consider what could happen if the type hierarchy changes in some way, for example). Thus, the pseudovariable shorthand

THE_R ( C ) := LENGTH ( 5.0 ) ;

does seem preferable, at least to me. I must stress, however, that we're only talking about syntax here (the pseudovariable version really is just shorthand for the longer version, and there are no logical differences involved). The only "mutator" that's actually required is good old plain assignment! In practice, however, it's probably fair to say that THE_ pseudovariables, or something logically equivalent to them, are effectively required in order to support inheritance and subtyping properly. As I pointed out when I first introduced the notion [3], it's not just that THE_ pseudovariables are more user-friendly -- they also provide a higher degree of imperviousness to changes in the syntax of the corresponding selector, and they might even perform better (though performance per se has nothing to do with the model, of course).

The "MOVE" Example Revisited

Since we must have assignment, and THE_ pseudovariables at least are very convenient, we might as well go the whole hog and allow update operators of arbitrary complexity. Furthermore, there are significant practical advantages in doing so, as I now want to show.

Recall the example we discussed a few installments ago of an operator called MOVE that "moves" its first argument so that it's centered on the center of its second, loosely speaking [5]. Originally, I defined two MOVE versions, one for ellipses and one for circles. Here are those two definitions, now shown complete:

 OPERATOR MOVE ( E ELLIPSE, R RECTANGLE ) RETURNS ELLIPSE 
VERSION ER_MOVE ;
RETURN ELLIPSE ( THE_A ( E ), THE_B ( E ), CENTER ( R ) ) ;
END OPERATOR ;
OPERATOR MOVE ( C CIRCLE, R RECTANGLE ) RETURNS CIRCLE 
VERSION CR_MOVE ;
RETURN CIRCLE ( THE_R ( C ), CENTER ( R ) ) ;
END OPERATOR ;

Note that ER_MOVE and CR_MOVE are both read-only. Also, in accordance with our rejection of the notion of "argument contravariance," I've shown the declared type of the second parameter as RECTANGLE (not SQUARE) in both cases. I've also assumed that an operator CENTER ("get the center of") is defined for type RECTANGLE.

Now, in my original discussion of MOVE in reference [5], I also said that the explicit specialization of MOVE to circles might not be needed in practice. Now I can explain that remark. The basic point is that we can define MOVE to be an update operator instead of a read-only one, as follows (note the UPDATES clause, which replaces the RETURNS clause from the read-only versions):

 OPERATOR MOVE ( E ELLIPSE, R RECTANGLE ) UPDATES E 
VERSION ER_MOVE ;
THE_CTR ( E ) := CENTER ( R ) ;
END OPERATOR ;

Observe that -- quite apart from the fact that the code is now arguably a little more compact -- an invocation of this version of MOVE updates its first argument (loosely, it "changes the center" of that argument). Observe further that the update works regardless of whether that first argument is of most specific type ELLIPSE or most specific type CIRCLE. In other words, the explicit specialization for circles is no longer needed!

Thus, a possibly significant practical advantage of update operators in general is that they save us from having to write out certain explicit operator specializations (note the implications for program maintenance in particular). I should remind you, however, that explicit specialization might be desirable, or even necessary, if the operator code needs access to a possible -- or actual -- representation component that exists only at the subtype level, or if more efficiency is desired and can be achieved by means of such explicit specialization.

Let me conclude by responding to the question in this installment's subtitle: "Are mutators necessary?" The answer is that the assignment operator is necessary (logically necessary, that is), but it's the only update operator that is. But if the question is "Are mutators necessary in practice?" (mutators other than assignment, that is) ... well, then the answer is yes, as I've tried to show.



REFERENCES

1. C.J. Date and Hugh Darwen: Foundation for Future Database Systems: The Third Manifesto (Second Edition). Reading, Mass.: Addison-Wesley (2000).
2. C.J. Date: "Why Is It Important to Think Precisely? (Part 2 of 4)," Database Programming & Design 12, No.11 (November 1999).
3. C.J. Date: "Decent Exposure," Database Porgramming & Design 11, No. 12 (December 19998).
4. C.J. Date: "Defining Scalar Types," Database Programming & Design 12, No 1 (January 1999).
5. C.J. Date: "Covariance and Contravariance," DBP&D Online (June 2000).
6. C.J. Date: "Talk about a Treat," DBP&D Online (August 2000).

C.J. Date is an independent author, lecturer, researcher, and, specializing in relational database systems (a field he pioneer). His most recent books are Foundation for Future Systems: The Third Manifesto (2nd edition, co-authored with Darwen); The Database Relational Model: A Retrospective Review Analysis; WHAT Not HOW: The Business Rules Approach to Application; and An Introduction to Database Systems (7th edition), of them published by Addison-Wesley in 2000.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address