Skip to content

Strange data frame behavior

3 messages · Raoni Rodrigues, Jeff Newmiller, PIKAL Petr

#
Hello all,

I don't understand a strange behavior in data frame manipulation.

data_frame1 = data.frame(Site = c("S1", "S2", "S3", "S4", "L1", "L2",
"L3", "L4"),
	     				Number = c(1, 3, 5, 2, 1, 1, 2, 1))

data_frame2 = data_frame1 [data_frame1$Site != "S1", ]

dput (data_frame2)

structure(list(Site = structure(c(6L, 7L, 8L, 1L, 2L, 3L, 4L), .Label = c("L1",
"L2", "L3", "L4", "S1", "S2", "S3", "S4"), class = "factor"),
    Number = structure(c(3L, 4L, 2L, 1L, 1L, 2L, 1L), .Label = c("1",
    "2", "3", "5"), class = "factor")), .Names = c("Site", "Number"
), row.names = 2:8, class = "data.frame")

Why site "S1" do not disappeared from data_frame2's structure?

And what I have to do to eliminate it definitively from my new data
frame (data_frame2)?

Sorry for this basic question, but I really did not understand...

Thanks in advanced,

Raoni
#
This has nothing to do with data frames and everything to do with how factors behave.

The levels of a factor are not necessarily linked with the content of the factor. For example, a factor representing "Male" and "Female" has both of those levels even if all the data in a subset represents "Male".

If you want traces of those eliminated values removed, consider using character data rather than factors. In particular, using the as.is=TRUE or the stringsAsFactors= FALSE argument to read.table and similar functions will prevent automatic generation of factors. You can then choose when to convert to factor after you have manipulated your data.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Raoni Rodrigues <caciquesamurai at gmail.com> wrote:

            
#
Hi
Because Site is a factor and its levels are preserved in subset operations.

See ?"[" and especially factor part and drop parameter.

You can either get rid of factor and change it to character or explicitly call factor to Site variable

factor(data_frame2$Site)

to get rid of empty levels

Regards
Petr