I'm new to R. I am trying to create panel data with a slight alteration from a typical dataset. At present, I have data on a few hundred people with the dates of occurrences for several events (like marriage and employment). The dates are in year/quarter format, so 68.0 equals the 1st quarter of 1968 and 68.25 equals the 2nd quarter of 1968. If the event never occurred, 0 is recorded for the Year Of Occurrence. Somewhat redundantly, I also have separate dichotomous variables indicating whether the event ever occurred (0/1 format). For example: x <- data.frame( id = c(1,2), Event1Occur = c(1,0), YearOfOccurEvent1 = c(68.25,0), Event2Occur = c(0,1), YearOfOccurEvent2 = c(0,68.5)) I need to transform that dataframe so that I have a separate row for each time period (year/quarter) for each person, with variables for whether the event had already occurred during that time period. If the event occurred during an earlier time, it is presumed to still be occurring at later times. E.g., if the person got married in the first quarter of 1968, they are presumed to still be married at all later time periods. I need those time periods marked (0/1). For example: y <- data.frame( id = c( rep (1,5), rep (2,5)), Year=c (68.0,68.25,68.50,68.75,69.0)) y $ Event1 <- c (0,1,1,1,1,0,0,0,0,0) y $ Event2 <- c (0,0,0,0,0,0,0,1,1,1) can someone get me started. Thanks Jeff
Creating panel data
6 messages · Bert Gunter, John Kane, David Winsemius +1 more
Sounds like you would find it worthwhile to read a good Intro R tutorial -- like the one that comes shipped with R. Have you done so? If not, why not? If so, how about the data import/export manual? I certainly wouldn't guarantee that these will answer all your questions. They're just places to start BEFORE posting here. Setting up proper data structures can be tricky (have you considered what form the functions/packages with which you are going to analyze the data want?). You might also find it useful to use Hadley Wickham's plyr and/or reshape2 packages, whose aim is to standardize and simplify data manipulation tasks. Vignettes/tutorials are available for both. Cheers, Bert
On Mon, Jul 23, 2012 at 8:21 AM, Jeff <r at jp.pair.com> wrote:
I'm new to R. I am trying to create panel data with a slight alteration from a typical dataset. At present, I have data on a few hundred people with the dates of occurrences for several events (like marriage and employment). The dates are in year/quarter format, so 68.0 equals the 1st quarter of 1968 and 68.25 equals the 2nd quarter of 1968. If the event never occurred, 0 is recorded for the Year Of Occurrence. Somewhat redundantly, I also have separate dichotomous variables indicating whether the event ever occurred (0/1 format). For example: x <- data.frame( id = c(1,2), Event1Occur = c(1,0), YearOfOccurEvent1 = c(68.25,0), Event2Occur = c(0,1), YearOfOccurEvent2 = c(0,68.5)) I need to transform that dataframe so that I have a separate row for each time period (year/quarter) for each person, with variables for whether the event had already occurred during that time period. If the event occurred during an earlier time, it is presumed to still be occurring at later times. E.g., if the person got married in the first quarter of 1968, they are presumed to still be married at all later time periods. I need those time periods marked (0/1). For example: y <- data.frame( id = c( rep (1,5), rep (2,5)), Year=c (68.0,68.25,68.50,68.75,69.0)) y $ Event1 <- c (0,1,1,1,1,0,0,0,0,0) y $ Event2 <- c (0,0,0,0,0,0,0,1,1,1) can someone get me started. Thanks Jeff
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
At 10:38 AM 7/23/2012, you wrote:
You might also find it useful to use Hadley Wickham's plyr and/or reshape2 packages, whose aim is to standardize and simplify data manipulation tasks. Cheers, Bert
I have already used R enough to have correctly imported the actual data. After import, it is in the approximate format at the x dataframe I previously posted. I already found the plyr and reshape2 packages and had assumed that the cast (or dcast) options might be the correct ones. Melt seemed to get me only what I already have. The examples I have seen thus far start with data in a various formats and end up in the format that I am starting with. In other words, they seem to do the exact opposite of what I'm trying to do. So I'm still stuck with how to get started and whether the functions in reshape2 are actually the correct ones to consider. ...still looking for some help on this. Jeff
This looks really ugly but it 'may' do what you want. I was too lazy to generate enough raw data to check. Note i changed the names in x as they were a bit clumsy.
x <- data.frame( id = c(1,2), Event1= c(1,0), YEvent1 =
c(68.25,0), Event2 = c(0,1), YEvent2 = c(0,68.5))
y <- data.frame( id = c( rep (1,5), rep (2,5)), Year=c
(68.0,68.25,68.50,68.75,69.0))
y $ Event1 <- c (0,1,1,1,1,0,0,0,0,0)
y $ Event2 <- c (0,0,0,0,0,0,0,1,1,1)
x <- data.frame( id = c(1,2), Event1= c(1,0), YEvent1 =
c(68.25,0), Event2 = c(0,1), YEvent2 = c(0,68.5))
dd <- melt(x, id= c("id", "Event1", "Event2"),
value.name="year.quarter" )
dd1 <- subset(dd, dd[, 5] != 0 )
dd1 <- dd1[ , c(1,2,3,5)]
John Kane
Kingston ON Canada
-----Original Message----- From: r at jp.pair.com Sent: Mon, 23 Jul 2012 11:33:37 -0500 To: gunter.berton at gene.com Subject: Re: [R] Creating panel data At 10:38 AM 7/23/2012, you wrote:
You might also find it useful to use Hadley Wickham's plyr and/or reshape2 packages, whose aim is to standardize and simplify data manipulation tasks. Cheers, Bert
I have already used R enough to have correctly imported the actual data. After import, it is in the approximate format at the x dataframe I previously posted. I already found the plyr and reshape2 packages and had assumed that the cast (or dcast) options might be the correct ones. Melt seemed to get me only what I already have. The examples I have seen thus far start with data in a various formats and end up in the format that I am starting with. In other words, they seem to do the exact opposite of what I'm trying to do. So I'm still stuck with how to get started and whether the functions in reshape2 are actually the correct ones to consider. ...still looking for some help on this. Jeff
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
On Jul 23, 2012, at 10:33 AM, Jeff wrote:
At 10:38 AM 7/23/2012, you wrote:
You might also find it useful to use Hadley Wickham's plyr and/or reshape2 packages, whose aim is to standardize and simplify data manipulation tasks. Cheers, Bert
I have already used R enough to have correctly imported the actual data. After import, it is in the approximate format at the x dataframe I previously posted. I already found the plyr and reshape2 packages and had assumed that the cast (or dcast) options might be the correct ones. Melt seemed to get me only what I already have. The examples I have seen thus far start with data in a various formats and end up in the format that I am starting with. In other words, they seem to do the exact opposite of what I'm trying to do. So I'm still stuck with how to get started and whether the functions in reshape2 are actually the correct ones to consider. ...still looking for some help on this.
I didn't see a clear way to use either reshape() or the plyr/reshape2
packages to do this, (but would enjoy seeing an example that improved
my understanding on this path) so I just looked at your "x" and then
created a scaffold with the number of rows needed to match your "y"
and filled in the the other columns by first merging to that scaffold
and then creating new columns:
> y2 <- data.frame(id=rep(1:2, each=5), Year=seq(68,69,by=0.25) )
> merge(y2, x)
id Year Event1Occur YearOfOccurEvent1 Event2Occur YearOfOccurEvent2
1 1 68.00 1 68.25 0 0.0
2 1 68.25 1 68.25 0 0.0
3 1 68.50 1 68.25 0 0.0
4 1 68.75 1 68.25 0 0.0
5 1 69.00 1 68.25 0 0.0
6 2 68.00 0 0.00 1 68.5
7 2 68.25 0 0.00 1 68.5
8 2 68.50 0 0.00 1 68.5
9 2 68.75 0 0.00 1 68.5
10 2 69.00 0 0.00 1 68.5
> y2a <- merge(y2, x)
> y2a$Event1 <- with( y2a, as.numeric( Event1Occur & Year>=
YearOfOccurEvent1) )
> y2a$Event2 <- with( y2a, as.numeric( Event2Occur & Year>=
YearOfOccurEvent2) )
# Using negative numeric column indexing to suppress then now
superfluous columns
> y2a[, -(3:6) ]
id Year Event1 Event2
1 1 68.00 0 0
2 1 68.25 1 0
3 1 68.50 1 0
4 1 68.75 1 0
5 1 69.00 1 0
6 2 68.00 0 0
7 2 68.25 0 0
8 2 68.50 0 1
9 2 68.75 0 1
10 2 69.00 0 1
David Winsemius, MD Alameda, CA
At 06:33 PM 7/23/2012, David Winsemius wrote:
I didn't see a clear way to use either reshape() or the plyr/reshape2 packages to do this, (but would enjoy seeing an example that improved my understanding on this path) so I just looked at your "x" and then created a scaffold with the number of rows needed to match your "y" and filled in the the other columns by first merging to that scaffold and then creating new columns:
That did it! Thanks Jeff