-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathmethodDefinition.html
325 lines (316 loc) · 15 KB
/
methodDefinition.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html> <head>
<link REL=stylesheet HREF=Rtech.css>
<title>The Definition of Generic Functions and Methods</title>
</head>
<body>
<h1>The Definition of Generic Functions and Methods</h1>
The following notes review the model for generic functions and the
methods defined for them, in the S language generally and in the R
implementation in particular.
The description in the book <i>Programming with Data</i> (Springer,
1998) is the general background, with some of the ideas below being
revisions or extensions of the programming model described there.
<p>
<h2>Methods in the S Language</h2>
<p>
Methods play a different role in the S language than they do in other
major programming languages.
Analogies with other languages can be useful, but only if they are
re-interpreted keeping the different underlying philosophy in mind.
Two distinctions are particularly important: S is a functional
language, and its use covers a spectrum from casual interaction to
large-scale programming.
<p>
<h3>OOP and Functional Languages</h3>
S is a functional language, and methods in the language are
function-based, not class-based.
<p>
The most commonly encountered languages emphasizing methods
follow what is generally called the Object-Oriented Programming
(OOP) model; more precisely, the languages follow a class-based
organization of software. For the most part, programming is
organized around the definition of classes of objects. Methods
are <i>invoked on an object</i>. In S-style notation,
<pre><code>
x$plot(y)
</code></pre>
invokes the <code>plot</code> method on the object
<code>x</code>, passing it an argument <code>y</code>.
<p>
The languages are class-based in that a method is determined
based only on the class of the object; generally, no other
properties of <code>x</code> are relevant in determining the
method to be invoked. (There have been some
<i>instance-based</i> languages in the past, but as far as I
know none of these are in serious use now.)
The essential organizing category of these languages is the
class. In Java, for example, essentially all programming is
done by defining classes.
<p>
In particular, aside from possible inheritance there is no necessary
connection between the method called <code>plot</code> defined
for two different classes.
<p>
The OOP model is useful in many contexts. It can be added
to the S language model as a specialized package, with many
useful applications (see the Omegahat <a
href="http://www.omegahat.org/OOP/">OOP package</a> for an
experiment in this direction).
However, OOP computations must be a specialized addition to the
basic model.
<p>
The basic organizing principle of programming in the S language
is the function definition. Programmers spend most of their time
writing function definitions, and the S evaluator spends most of its
time evaluating calls to functions.
Other program-organizing concepts have evolved as important adjuncts
or extensions to function definitions. Packages or libraries allow
sets of functions to be grouped together meaningfully. Formal classes
and methods organize information about objects and functions in a more
explicit and distributed way.
<p>
But the model (or at least my model) is that most programming in the S
language evolves from a user/programmer wanting do <i>do</i> something
with data. The something to be done is expressed as one or more
functions, which typically start out simple and then evolve. Methods
arise as part of the evolution: The definition of functions very
often depends on the properties of the objects passed as arguments,
and method and class definitions are often the best way to encapsulate
such properties.
<p>
The distinction between function-based and class-based methods has
implications for many aspects of the language. For example,
function-based methods need to be integrated with the user's
understanding of the function they specialize, which has implications
for argument definition (see <a href="#arguments">below</a>).
<p>
<h3>Interaction and Programming</h3>
The S language is used interactively to a much greater extent than
other languages supporting formal classes and methods.
The other languages supporting function-based methods (chiefly Common
Lisp and more recent languages such as Dylan influenced by Common
Lisp) deal with <i>programming</i> in the sense of constructing
complete program units, typically as files, which are then used in some way.
<p>
On the other hand, nearly all discussions of the S language, even those
emphasizing programming, are written against the assumed background of
S expressions being evaluated interactively, typically in response to
something a user types.
Programming in the S language aims to extend the
computations available for such interaction.
In contrast, for example, to Dylan, S does not have a concept of a
``complete program''.
Instances of the S evaluator are processes that go on indefinitely,
waiting for expressions to evaluate.
<p>
A corollary to the importance of interaction is that it comes as a
continuum from the user/programmer's perspective.
At one end are expressions so simple that they are typed straight-off
without pause (or, perhaps, hidden behind a graphical interface).
But even a fairly simple user interface supplements this with the
ability to recall expressions, cut-and-paste editing changes, and
navigate around the text of the expression before evaluating it.
The recent history of the interactive session becomes an informal
programming environment.
<p>
Moving from this stage to defining functions is the major leap from
interaction to programming.
But simple function definitions can be entered directly from the
command line, and editing a small source file that will
be parsed immediately is only a little less interactive.
<p>
Many of the innovations throughout the history of the language have
tried to help the user's evolution from simple interaction to
increasingly extensive programming.
Both the early, informal method definitions and the formal
class/method mechanisms are best seen from this perspective.
An implication for the design of the language, and in particular for
that of the method mechanism, is that we
should avoid making the user do a large amount of
programming in order to add a conceptually simple extension to what
exists.
<h2>Formal Arguments and Method Dispatch</h2><a name="#arguments"></a>
<p>
This section discusses questions related to the formal arguments for generic functions and
how these can be coordinated with the design of methods.
<p>
As the term <i>method</i> suggests, a method is a definition of how a
function call should be evaluated, as determined by the classes of the
actual arguments.
It is part of the current API for formal methods that argument
matching takes place at the call to the (generic) function: Arguments
are not re-matched after the method is selected.
Therefore, the arguments of the method are treated as identical to
those of the generic function.
Not re-matching arguments is moderately important for efficiency of
method dispatch, but more fundamentally, any general departure from
this requirement would confuse the semantics of method dispatch. (If
an argument that was used to select the method is then re-matched to a
different formal argument, is the method selection still valid and
meaningful?)
<p>
This model for method dispatch has implications for choosing formal
arguments for generic functions, and for the design of methods. With
a little care, the designer of methods can have full flexibility in
dealing with arguments.
<p>
From the viewpoint of someone designing a method for a particular
generic function, possible formal arguments may fall into three
categories:
<ol>
<li> Formal arguments in the generic function that are meaningful and, in
particular, may be involved in selecting methods (that is, the
class of the object corresponding to the argument might be part
of the signature of the method);
<p>
<li> Formal arguments that appear in the generic (and quite possibly
might be part of the signature for some <i>other</i> method),
but which are meaningless for this method;<p>
<li> Arguments that are meaningful for this method but not
generally, and <i>not</i> currently formal arguments to the
generic function.
</ol><p>
The first category raises no problems. The second and third can be
handled in a reasonably convenient way as well, but need some
consideration.
And just which arguments are important enough to be included in the
generic (i.e., should they fall in the second or third category) will
always be open to discussion.
<p>
Two examples will illustrate the issues. The function <code>plot</code>is defined in
the R <code>methods</code> package to have arguments:
<pre><code>
plot(x, y, ...)
</code></pre>
The arguments <code>x</code> and <code>y</code> represent the datasets
providing values to plot on the x- and y-axis respectively.
Both arguments are included in the generic, since it may well be
useful to define methods based on either dataset.
Some methods, however, will be defined for only a single object.
At the same time, the definition of the generic function implies that
additional arguments are not relevant in dispatching methods for
<code>plot</code>. Individual methods can have specific needs for
additional arguments, however.
<p>
As a second example, consider the <code>"["</code> operator.
The formal arguments for this operator in the R <code>methods</code> package are:
<pre><code>
x[i, j, ..., drop]
</code></pre>
Here, there are three arguments included in the generic that
might reasonably be part of a method signature: the object
<code>x</code> for which as subset is extracted or replaced, and the
first and second subscripts, <code>i</code> and <code>j</code>.
(Including these enables methods to be defined separately, for
example, for text-based and numerical subsetting, or for other more
specialized subsetting situations.)
Once again, for many methods the second subscript will not be
meaningful, but matrix and matrix-like objects are so central to
applications of the S language that we need to enable methods for
subsetting such objects.
The <code>drop</code> argument, on the other hand, is also specialized
to matrix- or array-like objects and is <i>not</i> likely to be useful
in method selection. If we were starting over, this argument would
more naturally fall into our third category, but it's there (for now) as part of
the traditional S language definition.
<p>
The suggested approaches to handling conceptual differences in
function and method arguments are as follows.
<p>
<h3>Arguments Not Meaningful in Methods</h3>
The formal method-dispatch model for the S language provides directly
for arguments that are not meaningful in a particular method.
These arguments should appear in the signature of the method with
class <code>"missing"</code>.
The corresponding method will then never be selected if the call
includes the meaningless argument.
The point for the method designer to keep in mind is the distinction
between including the argument with class <code>"missing"</code> and
omitting the argument from the signature.
The latter implies that <i>any</i> object may appear as this argument
(it corresponds formally to class <code>"ANY"</code> in the
signature), which is not correct if the argument is not meaningful.<p>
<p>
The R implementation of methods provides for an equivalent way of
specifying that an argument should not be included.
If the method definition is a function whose formal arguments are a
subset of those to the generic (in the same order) then the
<code>setMethod</code> function infers that missing formal arguments
are to have class <code>"missing"</code> in the signature.
While this corresponds to a notion of conforming arguments in the
Dylan language, it is pretty much just syntactic sugar in S.
<p>
<h3>Arguments that are Meaningful for Some Methods</h3>
The suggestion for handling such arguments is that they be made formal
arguments to a function that is <i>called</i> as the corresponding
method, and that the generic function include <code>...</code> as a
formal argument to allow the arguments to be passed down.
<p>
Again, this is not an extension to the language, just making use of
existing features.
For example, suppose a method is defined for subsetting objects
maintained in some particular remote database, and suppose we want an
argument <code>copy</code> to the method that says whether to copy the
subset or create a remote reference for it.
There is no <code>copy</code> argument to the generic, and it's
reasonable to say that the concept isn't sufficiently general to
justify redefining the generic.
<p>
The suggestion is to define a function, say <code>subsetRemote</code>
with the appropriate argument list, and to call that function as the
method:
<pre><code>
subsetRemote <- function(x, i, copy = TRUE)
{ .... }
setMethod("[", "remoteObject",
function(x, i, ...) subsetRemote(x, i, ...)
)
</code></pre>
With this definition, an expression like
<pre><code>
newObj <- myObj[sample(length(myObj), 1000), copy = FALSE]
</code></pre>
would invoke the method, assuming <code>myObj</code> had a suitable
class extending <code>"remoteObject"</code>.
<p>
A few points of detail are relevant. First, notice that while
the formal method uses the <code>...</code> argument from the generic,
the actual function defining the method does not, with the result that
invalid argument names will be detected.
On the other hand, the requirement that there be no argument
re-matching means that the special arguments <i>must</i> pass through
<code>...</code>, which in turn means that <code>copy</code> must be
supplied by name (otherwise it would match <code>j</code>).
As a third point, notice that we've used the R feature of omitting the
<code>j</code> argument from the method definition, forcing that
argument to be missing.
<p>
There is an objection to this mechanism, in that as written it
clutters up the name space with the additional function
(<code>subsetRemote</code> in this case). The proposals for
namespaces in R would be helpful here.
It is possible to embed the new function in the method itself, at the
cost of less readable code and some (trivial) extra computation:
<pre><code>
setMethod("[", "remoteObject",
function(x, i, ...) {
subsetRemote <- function(x, i, copy = TRUE)
{ .... }
subsetRemote(x, i, ...)
}
)
</code></pre>
<p>
We could provide a convenience mechanism for this feature as we did
for the missing arguments.
For example, <code>setMethod</code> could interpret arguments not
found in the generic to create a function in the form shown. (The
current Splus implementation does something similar.)
On the whole, such a mechanism seems a little dangerous.
<hr>
<address><a href="http://cm.bell-labs.com/cm/ms/departments/sia/jmc/">John Chambers</a><a href=mailto:[email protected]><[email protected]></a></address>
<!-- hhmts start -->
Last modified: Wed Jan 2 11:40:39 EST 2002
<!-- hhmts end -->
</body> </html>