Suggestion: Dimension-sensitive attributes
At 11:14 09.07.2009, SIES 73 wrote:
If "objattr", "dimattr" and "cellattr" are
lists, they would offer save places for all
attributes that should be kept on subsetting.
My proposed design would be that:
* "objattr" would be a list of
attributes (just preserved on subsetting)
* "dimattr" would be a list with as
many elements as array dimensions. Each element
can be any object whose length matches the
corresponding array dimension's length and that
can be itself subsetted with "[": so it could
be a vector, a list, a data frame...
* "cellattr" would be any object whose
dimensions match the array dimensions: another array, a data frame...
In my view this would be very useful, because
that way a general solution for data description, like variabel names, variable labels, units, ... could be reached. Indeed, that's the objective: attaching user-defined metadata that is automatically synchronized with subsetting operations to the actual data. I've had dozens of use cases on my own R programs that needed this type of pattern, and seen it implemented in different ways in several classes (xts, timeSeries, AnnotatedDataFrame, etc.) As you point, this could offer a unified design for a common need. Enrique
For my personal use it was sufficient to create a class called "documented" with a corresponding subsetting method and one attribute, also called "documented". This attribute may contain 'varlabel', 'varname', 'value.labels', 'missing.values', 'code.ordered', 'comment', ... It is copied on subsetting. I think attributes concerning e.g. dimensions, i.e. parts of an object should stay in this object-related attribute and be extracted on subsetting. Since subsetting an object leads to a new object, this could then have its own, new persisting attribute. The more difficult part may to be the binding of objects. Heinz
-----Original Message----- From: Heinz Tuechler [mailto:tuechler at gmx.at] Sent: jueves, 09 de julio de 2009 10:56 To: Bengoechea Bartolom? Enrique (SIES 73); Tony Plate; r-devel at r-project.org Cc: Henrik Bengtsson Subject: Re: [Rd] Suggestion: Dimension-sensitive attributes At 10:01 09.07.2009, SIES 73 wrote:
I've also had several use cases where I needed "cell-like" attributes,
that is, attributes that have the same dimensions as the original array
and are subsetted in the same way --along all its dimensions.
So we're talking about a way to add metadata to matrices/arrays at 3
possible levels:
1) at the "whole object" level:
attributes that are not dropped on subsetting
2) at the "dimension" level: attributes that behave like
"dimnames", i.e. subsetted along each dimension
3) at the "cell" level: attributes that are subsetted in the
same way as the original array
My proposal would be simpler that Tony's
suggestion: like "dimnames", just have reserved attribute names for
each case, say "objdata", "dimdata", and "celldata" (or "objattr",
"dimattr" and "cellattr").
If "objattr", "dimattr" and "cellattr" are lists, they would offer save places for all attributes that should be kept on subsetting. In my view this would be very useful, because that way a general solution for data description, like variabel names, variable labels, units, ... could be reached.
On the other hand, Tony's pattern would allow as many attributes of each type as necessary (some multiplicity is already possible with the simpler design as dimdata or celldata could be lists of lists), at the cost of a more complex scheme of attributes that needs to be "parsed" each time. On Tony's suggestion, "attr.keep.on.subset" and "attr.dimname.like" (and possible "attr.cell.like") could be kept on a single list with 3 elements, something like:
attr(x, "attr.subset.with") <- list(object=..., dims=..., cells=...)
Would something like this make sense for R-core --either for standard arrays or as a new class-- or would it be better implemented in a package? Enrique