Skip to content
Prev 68291 / 398503 Next

Strange data frame

Hello, 
I'm playing around with the PLS package and found a data set (NIR) whose
structure I don't understand. Forgive me if this is a stupid question,
as I feel like it must be since I am less experienced with aspects of
modeling. 

My problem, the pls NIR data frame does not seem to be a typical data
frame as, while it is a list, its variables are not of equal length.
Furthermore, I have no idea how to reproduce such a structure.

But, let's look at the NIR data...
[1] "data.frame"
`data.frame':	28 obs. of  3 variables:
 $ X    : num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.10 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ y    : num  100.0  80.2  79.5  60.8  60.0 ...
 $ train: logi  TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE ...
[1] "matrix"
[1] "numeric"
[1] 7504
[1] 28

Ok, what this looks like to me is that NIR is a data frame (i.e. "a list
of variables of the same length with unique row names"), with a matrix
of length 7504 as one variable, and a numeric vector of length 28 as
another variable, which seems to contradict the definition of a data
frame.

Moreover, despite my best efforts, I'm unable to put any of my own data
in this structure, as the data.frame() and as.data.frame() functions
removes the matrix structure i.e.
return a different animal altogether.

Lastly, this particular structure is useful, because the PLS authors are
able to concisely write models such as,

mvr(y ~ X, data = NIR[NIR$train, ])

instead of what I imagine would be a more complicated alternative if
they didn't have a data frame of a matrix and a vector as they do. Any
pointers to something I overlooked is appreciated.

Best,
Robert