Corrupt data frame construction - bug?
On 29/04/2009 6:41 PM, Steven McKinney wrote:
Hi useRs, A recent coding infelicity along these lines yielded a corrupt data frame. foo <- matrix(1:12, nrow = 3) bar <- data.frame(foo) bar$NewCol <- foo[foo[, 1] == 4, 4] bar lapply(bar, length)
foo <- matrix(1:12, nrow = 3) bar <- data.frame(foo) bar$NewCol <- foo[foo[, 1] == 4, 4] bar
X1 X2 X3 X4 NewCol 1 1 4 7 10 <NA> 2 2 5 8 11 <NA> 3 3 6 9 12 <NA> Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs
lapply(bar, length)
$X1 [1] 3 $X2 [1] 3 $X3 [1] 3 $X4 [1] 3 $NewCol [1] 0 Is this a bug in the data.frame machinery? If an attempt is made to add a new column to a data frame, and the new object does not have length = number of rows of data frame, or cannot be made to have such length via recycling, shouldn't an error be thrown? Instead in this example I end up with a "corrupt data frame" having one zero-length column. Should this be reported as a bug, or did I misinterpret the documentation?
I don't think "$" uses any data.frame machinery. You are working at a lower level. If you had added the new column using bar <- data.frame(bar, NewCol=foo[foo[, 1] == 4, 4]) you would have seen the error: Error in data.frame(bar, NewCol = foo[foo[, 1] == 4, 4]) : arguments imply differing number of rows: 3, 0 But since you treated it as a list, it let you go ahead and create something that was labelled as a data.frame but wasn't. This is one of the reasons some people prefer S4 methods: it's easier to protect against people who mislabel things. Duncan Murdoch
sessionInfo()
R version 2.9.0 (2009-04-17) powerpc-apple-darwin8.11.1 locale: en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] nlme_3.1-90 loaded via a namespace (and not attached): [1] grid_2.9.0 lattice_0.17-22 tools_2.9.0 Also occurs on Windows box with R 2.8.1 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.