How to deal with a dataframe within a dataframe?

5 messages · Robert Latest, R. Michael Weylandt, David Winsemius +1 more

Original

1

5

Tue, May 8, 2012 6:19 AM #

Hello all,

I am doing an aggregation where the aggregating function returns not a
single numeric value but a vector of two elements using return(c(val1,
val2)). I don't know how to access the individual columns of that
vector in the resulting dataframe though. How is this done correctly?
Thanks, robert

+     FUN=cp.cpk, lsl=1300, usl=1500)

df$quarter df$tool           df$value
1       09Q3    VS1A 1.800534, 1.628483
2       10Q1    VS1A 1.299652, 1.261302
3       10Q2    VS1A 1.699018, 1.381570
4       10Q3    VS1A 1.311681, 1.067232

df$value
1 1.800534, 1.628483
2 1.299652, 1.261302
3 1.699018, 1.381570
4 1.311681, 1.067232

[1] "data.frame"

df$value
1 1.800534, 1.628483
2 1.299652, 1.261302
3 1.699018, 1.381570
4 1.311681, 1.067232

Error in `[.data.frame`(agg["df$value"], 2) : undefined columns selected

# FWIW, here's the aggregating function
function(data, lsl, usl) {
    if (length(data) < 15) {
        return(NA)
    } else {
        return (c(
            (usl-lsl)/(6*sd(data)),
            min(mean(data)-lsl, usl-mean(data))/(3*sd(data)))
        )
    }
}

R. Michael Weylandt

Tue, May 8, 2012 6:38 AM #

So this actually looks like something of a tricky one: if you wouldn't
mind sending the result of dput(head(agg)) I can confirm, but here's
my hunch:

Try this:

agg2 <- aggregate(len ~ ., data = ToothGrowth, function(x) c(min(x), max(x)))
print(agg2)
str(agg2)

You'll see that the third "column" is actually a matrix that has two
columns: so what you really need is

agg2[,3][,1]

if you want the mins.

What's funny is that this doesn't work for you (as checking the class
suggests by giving df and then confirmed with what i would have
guessed worked on the column side. )

Instead, it looks like your data somehow got stuck together (possibly
as factors?) -- either way, I think you need to use double brackets to
get the inner multi-column structure to take a look at it:

agg[["df$value"]][,1]

or more easy, specify column subsetting (which will use df-ness and
not list-ness)

agg[, "df$value"][,1]

Anyways, hope this gets you on the right track and with
dput(head(agg)) we can definitely figure this out.

Best,
Michael

On Tue, May 8, 2012 at 9:19 AM, Robert Latest <boblatest at gmail.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Tue, May 8, 2012 11:40 PM #

On Tue, May 8, 2012 at 3:38 PM, R. Michael Weylandt

<michael.weylandt at gmail.com> wrote:

Hi Michael,

while I'm trying to get my head around the rest of your post, here's
the output of dput():

structure(list(`df$quarter` = c("09Q3", "10Q1", "10Q2", "10Q3",
"11Q1", "11Q2"), `df$tool` = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = c("VS1A", "VS1B", "VS2A", "VS2B", "VS3A", "VS3B",
"VS4A", "VS4B", "VS5B"), class = "factor"), `df$value` = structure(list(
    `0` = c(1.80053430839867, 1.62848325226279), `1` = c(1.29965212329278,
    1.26130173276939), `2` = c(1.69901753654472, 1.38156952313768
    ), `3` = c(1.31168126092175, 1.06723157138633), `4` = c(1.54165763354293,
    1.21619657757276), `5` = c(1.29925171313276, 1.18276707678292
    )), .Names = c("0", "1", "2", "3", "4", "5"))), .Names = c("df$quarter",
"df$tool", "df$value"), row.names = c(NA, 6L), class = "data.frame")

I would like this in either the form of a "flat" data frame (i.e., the
contents of "df$value" as two separate columns), or -- even preferable
-- learn a better way to retrieve multiple numeric results from a call
to aggregate().

Thanks,
robert

David Winsemius

Wed, May 9, 2012 6:14 AM #

On May 9, 2012, at 2:40 AM, Robert Latest wrote:

The reason you are having difficulty is a) that you have somehow  
(noting that you have omitted all context)  managed to construct  
column names with dollar-signs in them which the interpreter attempts  
to parse as a function and then b) the 'df$value' column is also a  
list rather than an atomic vector. It's a rather pathological  
construct in my opinion, but maybe one of the masteRs with think  
differently. This will pull the first element of that column's third  
entry:

 > agg[3,3][[1]][1]
[1] 1.699018

This will return all of the first entries:

sapply(1:6, function(x) agg[x, 3][[1]][1])
[1] 1.800534 1.299652 1.699018 1.311681 1.541658 1.299252

You might start by renaming that objects columns with valid R names.

David.

>
> Thanks,
> robert
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

Gabor Grothendieck

Wed, May 9, 2012 7:10 AM #

On Tue, May 8, 2012 at 9:19 AM, Robert Latest <boblatest at gmail.com> wrote:

Try this:

agg <- aggregate(value ~ quarter + tool, df,
     FUN=cp.cpk, lsl=1300, usl=1500)

do.call("data.frame", agg)

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com