Skip to content

Why is my data always imported as a list?

7 messages · Samantha Sifleet, R. Michael Weylandt, Sarah Goslee +4 more

#
They aren't quite lists --- they are actually data.frame()s which are
a special sort of list with rownames and other nice things.

To your immediate question, I think you're looking for the formula interface:

boxplot(Value ~ State.Fips, data = CB_un)

The data= argument is important so boxplot knows where to look for
"Value" and "State.Fips"

Best,
Michael

On Mon, Jun 11, 2012 at 11:29 AM, Samantha Sifleet
<Sifleet.Samantha at epamail.epa.gov> wrote:
#
Hi,

Have you tried
str(CB_un)
to make sure the structure of your data is what you expect?

Does
boxplot(CB_un[, "Value"]~CB_un[, "State.Fips"])
work?

Look at this:
[1] "data.frame"
[1] "integer"
[1] "integer"


A data frame is a special form of list, so the usual list subsetting
rules apply. Extracting a named component of a data frame with single
square brackets gives you a data frame. Using row, column notation or
double brackets gives a vector.

?"["
will give you more detail.

You have to use a data frame, and thus a list, for your data, since
you can't mix factor and numeric data types in a matrix.

Sarah

On Mon, Jun 11, 2012 at 12:29 PM, Samantha Sifleet
<Sifleet.Samantha at epamail.epa.gov> wrote:

  
    
#
On Jun 11, 2012, at 12:29 PM, Samantha Sifleet wrote:

            
If you were steadfastly intent on using direct extraction in the  
formula, then this would be the way to do so:

boxplot(CB_un[["Value"]]~CB_un[["State.Fips"]])

Beter would be to use the formula interface the way it was designed to  
operate:

boxplot( Value ~ State.Fips, data=CB_un)

-- 
David.
It's in a dataframe ..... dataframes are lists
#
A data.frame is a list with some extra attributes.  When you
subset a data.frame as
   z["Column"]
you get a one-column data.frame (which boxplot rejects because
it want numeric or character data).  Subsetting it as either
   z[, "Column"]
or
   z[["Column"]]
gives you the column itself, not a data.frame containing one column.

  > z <- data.frame(One=log(1:10), Two=rep(c("i","ii","iii"),c(3,4,3)))
  > str(z["One"])
  'data.frame':   10 obs. of  1 variable:
   $ One: num  0 0.693 1.099 1.386 1.609 ...
  > str(z[, "One"])
   num [1:10] 0 0.693 1.099 1.386 1.609 ...
  > str(z[["One"]])
   num [1:10] 0 0.693 1.099 1.386 1.609 ...

In the particular case of the formula interface to boxplot (and to other
functions), you can avoid having to choose the column-extraction operator
by using the data= argument.  The following three examples give the same
result:
  boxplot(data=z, One ~ Two)
  boxplot(z[["One"]] ~ z[["Two"]])
  boxplot(z[, "One"] ~ z[, "Two"])

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Noone has yet mentioned tjhe 'standard' way to reference a data frame item, which uses $

boxplot(CB_un[["Value]]~CB_un[["State.Fips"]])  #Note the double [[ ]]
# is equivalent to
boxplot(CB_un$Value~CB_un$State.Fips)
# and can be achieved by 
boxplot(Value~State.Fips, data=CB_un)

The rather awkward [["name"]] notation is usually only needed if the name does not comply with variable naming requirements (including list and data frame names); for example, names containing spaces or operator symbols (like "+") are not allowed for ordinary variables. It is possible and sometimes useful to create non-standard data frame names for display; for example, anova.lm actually returns a data frame with names "Df", "Sum Sq", "Mean Sq", F value" and "Pr(>F)". But it makes manipulation a tad trickier and you'd be unable to use them in the context of a formula with a data argument. 

S Ellison
This email and any attachments are confidential. Any use...{{dropped:8}}
#
On Mon, 11-Jun-2012 at 12:29PM -0400, Samantha Sifleet wrote:
|> Hi,
|> 
|> I am a relatively new to R. So, this is probably a really basic issue that 
|> I keep hitting.
|> 
|> I read my data into R using the read.csv command:
|> 
|> x = rep("numeric", 3)
|> CB_un=read.csv("Corn_Belt_unirr.csv", header=TRUE, colClasses=c("factor", 
|> x))
|> 
|> # I have clearly told R that I have one factor variable and 3 numeric 
|> variables in this table.
|> #but if I try to do anything with them, I get an error
|> 
|> boxplot(CB_un["Value"]~CB_un["State.Fips"])

Others have given good suggestions, but a slight modification of your
code would work if your dataframe is what we'd like to think it is:

boxplot(CB_un[,"Value"]~CB_un[,"State.Fips"])

or

boxplot(CB_un[["Value"]]~CB_un[["State.Fips"]])

I can't check if those will work, but even if they do, the formula
with a data argument is more elegant.  

HTH




|> 
|> Error in model.frame.default(formula = CB_un["Value"] ~ 
|> CB_un["State.Fips"]) : 
|>   invalid type (list) for variable 'CB_un["Value"]'
|> 
|> # Because  these variables are all stored as lists.
|> #So, I have to unpack them. 
|> 
|> CB_unirr_rent<-as.numeric(unlist(CB_un["Value"]))
|> CB_unirr_State<-as.factor(unlist(CB_un["State.Fips"]))
|> 
|> #before I can do anything with them
|> 
|> boxplot(CB_unirr_rent~CB_unirr_State)
|> 
|> Is there a reason my data is always imported as lists?  Is there a way to 
|> skip this upacking step?
|> 
|> Thanks,
|> 
|> Sam
|> 	[[alternative HTML version deleted]]
|> 
|> ______________________________________________
|> R-help at r-project.org mailing list
|> https://stat.ethz.ch/mailman/listinfo/r-help
|> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> and provide commented, minimal, self-contained, reproducible code.