Bug in print for data frames?
On 25/10/2023 2:18 a.m., Christian Asseburg wrote:
Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think? Using R 4.3.1:
x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1) x
A B C 1 1 2 3
x$B <- y$A # works as expected x
A B C 1 1 1 3
x$C <- y[1] # makes C disappear x
A B A 1 1 1 1
str(x)
'data.frame': 1 obs. of 3 variables: $ A: num 1 $ B: num 1 $ C:'data.frame': 1 obs. of 1 variable: ..$ A: num 1 Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer.
y[1] is a dataframe with one column, i.e. it is identical to y. To get the result you expected, you should have used y[[1]], to extract column 1. Since dataframes are lists, you can assign them as columns of other dataframes, and you'll create a single column in the result whose rows are the columns of the dataframe you're assigning. This means that x$C <- y[1] replaces the C column of x with a dataframe. It retains the name C (you can see this if you print names(x) ), but since the column contains a dataframe, it chooses to use the column name of y when printing. If you try x$D <- x you'll see it generate new names when printing, but the names within x remain as A, B, C, D. This is a situation where tibbles do a better job than dataframes: if you created x and y as tibbles instead of dataframes and executed your code, you'd see this: library(tibble) x <- tibble(A = 1, B = 2, C = 3) y <- tibble(A = 1) x$C <- y[1] x #> # A tibble: 1 ? 3 #> A B C$A #> <dbl> <dbl> <dbl> #> 1 1 2 1 Duncan Murdoch