Message-ID: <OF822A2228.D28BD9E4-ONC1257552.00321850-C1257552.00325159@precheza.cz>
Date: 2009-02-03T09:10:06Z
From: PIKAL Petr
Subject: Odp: Collapsing panel data
In-Reply-To: <OF3AB12121.2FD2FC55-ONC1257552.002EBC03-C1257552.002FE373@e-control.at>
Hi
r-help-bounces at r-project.org napsal dne 03.02.2009 09:43:04:
>
>
> Dear R-helpers,
>
> I've been thinking about this for some time, maybe someone can help. I
have
> a fairly large dataset with thousands of firms, call the a, b, c, etc..
> such as
>
> [,1] [,2]
> [1,] "A" 0.5
> [2,] "" 0.2
> [3,] "" 0.3
> [4,] "B" 0.1
> [5,] "" 0.9
> [6,] "C" 0.4
>
> Or to put it differently two vectors such as
>
> y <- c("A", "", "", "B", "", "C")
> x <- c(0.5, 0.2, 0.3, 0.1, 0.9, 0.4)
>
> The empty lines "" always belong to the firm above. Now I want to
collapse
> the dataset so that each firm (A,B, C, etc) has one line only, using
> summation.
>
> So what I would like is
>
> yNew <- c("A", "B", "C")
> xNew <- c(1, 1, 0.4)
That is what are NA values for. There are quite useful functions for
handling them.
y <- c("A", "", "", "B", "", "C")
x <- c(0.5, 0.2, 0.3, 0.1, 0.9, 0.4)
y[y==""]<-NA
from package zoo
y.na<-na.locf(y)
tapply(x,y.na, sum)
A B C
1.0 1.0 0.4
or aggregate(...)
Regards
Petr
>
> The problem I'm having is that each firm has a different number of
entries
> for x, so some like C have just one and others have ten or more, so I
have
> difficulty imagining how to use a loop in this case.
> I'd be greatful for any suggestions.
> Karina
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.