Skip to content

model.frame mangles time series (PR#121)

6 messages · Peter Dalgaard, Thomas Lumley

#
This one showed up while looking at one of Ripley's other reports:
y
1962.25 8.79236
1962.5  8.79137
1962.75 8.81486
1963    8.81301
1963.25 8.90751
1963.5  8.93673
1963.75 8.96161
1964    8.96044
1964.25 9.00868
1964.5  9.03049
Warning: Replacement length not a multiple of the elements to replace in matrix(...) 
Error: length of dimnames[1] not equal to array extent
Qtr1    Qtr2    Qtr3    Qtr4
1962:      NA 8.79236 8.79137 8.81486
1963: 8.81301 8.90751 8.93673 8.96161
1964: 8.96044 9.00868 9.03049      NA
structure(c(8.79236, 8.79137, 8.81486, 8.81301, 8.90751, 8.93673, 
8.96161, 8.96044, 9.00868, 9.03049), .Tsp = c(1962.25, 1971.75, 
4), class = "ts")

The upshot of this is that glm(...,subset=...) fails on the freeny data.

The cause is seen by
Qtr1    Qtr2    Qtr3    Qtr4
1962:      NA 8.79236 8.79137 8.81486
1963: 8.81301 8.90751 8.93673 8.96161
1964: 8.96044 9.00868 9.03049      NA
structure(c(8.79236, 8.79137, 8.81486, 8.81301, 8.90751, 8.93673, 
8.96161, 8.96044, 9.00868, 9.03049), .Tsp = c(1962.25, 1971.75, 
4), class = "ts")

Notice that the .Tsp attribute doesn't reflect the shorter time series
(1971.75 should be 1964.5).

--please do not edit the information below--

Version:
 platform = i586-unknown-linux
 arch = i586
 os = linux
 system = i586, linux
 status = in progress
 status.rev = 0
 major = 0
 minor = 63.3
 year = 1999
 month = February
 day = 19
 language = R

Search Path:
 .GlobalEnv, Autoloads, package:base

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 18 Feb 1999 pd@biostat.ku.dk wrote:

            
This is the attributes/subsetting problem rearing its ugly head again.
model.frame(,subset) has to copy *some* attributes over (eg contrasts) and
can't copy *all* of them (eg dim).

Currently we use copyMostAttributes to drop the dangerous ones. It
clearly didn't know about tsp. 

Questions:
(1) How do we know which attributes to copy?
(2) For an unknown attribute what should the default be?
(3) Is there just one set of attributes that needs special treatment (in
which case copyMostAttributes is broken) or are there different sets in
different circumstances (in which case we need a new function)?
(4) Could we just handle this by having the subset operator retain
attributes?

	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Thomas Lumley <thomas@biostat.washington.edu> writes:
(5) is there really any reason to do subsetting in the .Internal code?
the hot spot is 

    variables <- eval(attr(formula, "variables"), data, sys.frame(sys.parent()))
    extranames <- as.character(substitute(list(...))[-1])
    extras <- substitute(list(...))
    extras <- eval(extras, data, sys.frame(sys.parent()))
    subset <- eval(substitute(subset), data, sys.frame(sys.parent()))
    data <- .Internal(model.frame(formula, rownames, variables, 
        varnames, extras, extranames, subset, na.action))

but what's keeping us from saying 
    data <- .Internal(model.frame(formula, rownames, variables, 
        varnames, extras, extranames, na.action))[subset,]

or ("Uncle Scrooge" version)

   data <- .Internal(model.frame(formula, rownames, variables[subset,], 
        varnames, extras[subset,], extranames, na.action))

(well, variables and extras are lists, not dataframes, but you get the
picture. Do we allow unequal length variables anyway?)

(6) since [subset,] seems to work, perhaps do_subset is what should be
called from inside do_modelframe?
#
On 18 Feb 1999, Peter Dalgaard BSA wrote:

            
No, subset doesn't work -- it loses contrasts

R>  contrasts(df$x)
  [,1] [,2]
1    1    0
2    0    1
3   -1   -1
R> contrasts(df[1:5,]$x)
  2 3
1 0 0
2 1 0
3 0 1
 
	-thomas


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Thomas Lumley <thomas@biostat.washington.edu> writes:
That can be fixed by a one-line change to [.factor :
.L   .Q         .C
1 -0.6708204  0.5 -0.2236068
2 -0.2236068 -0.5  0.6708204
3  0.2236068 -0.5 -0.6708204
4  0.6708204  0.5  0.2236068

I suspect that is the correct solution.
#
Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:
A quick check shows that S does the same, ergo: change committed.