Skip to content

Creating panel data

6 messages · Bert Gunter, John Kane, David Winsemius +1 more

#
I'm new to R.
   I am trying to create panel data with a slight alteration from a typical
   dataset.
   At  present,  I  have  data  on a few hundred people with the dates of
   occurrences for several events (like marriage and employment). The dates are
   in year/quarter format, so 68.0 equals the 1st quarter of 1968 and 68.25
   equals the 2nd quarter of 1968. If the event never occurred, 0 is recorded
   for the Year Of Occurrence. Somewhat redundantly, I also have separate
   dichotomous  variables indicating whether the event ever occurred (0/1
   format).
   For example:
   x <- data.frame( id = c(1,2), Event1Occur = c(1,0), YearOfOccurEvent1 =
   c(68.25,0), Event2Occur = c(0,1), YearOfOccurEvent2 = c(0,68.5))
   I need to transform that dataframe so that I have a separate row for each
   time period (year/quarter) for each person, with variables for whether the
   event had already occurred during that time period. If the event occurred
   during an earlier time, it is presumed to still be occurring at later times.
   E.g., if the person got married in the first quarter of 1968, they are
   presumed to still be married at all later time periods. I need those time
   periods marked (0/1).
   For example:
   y   <-   data.frame(   id   =   c(   rep  (1,5),  rep  (2,5)),  Year=c
   (68.0,68.25,68.50,68.75,69.0))
   y $ Event1 <- c (0,1,1,1,1,0,0,0,0,0)
   y $ Event2 <- c (0,0,0,0,0,0,0,1,1,1)
   can someone get me started.
   Thanks
   Jeff
#
Sounds like you would find it worthwhile to read a good Intro R
tutorial  -- like the one that comes shipped with R. Have you done so?
If not, why not? If so, how about the data import/export manual?

I certainly wouldn't guarantee that these will answer all your
questions. They're just places to start BEFORE posting here. Setting
up proper data structures can be tricky (have you considered what form
the functions/packages with which you are going to analyze the data
want?). You might also find it useful to use Hadley Wickham's plyr
and/or reshape2 packages, whose aim is to standardize and simplify
data manipulation tasks. Vignettes/tutorials are available for both.

Cheers,
Bert
On Mon, Jul 23, 2012 at 8:21 AM, Jeff <r at jp.pair.com> wrote:

  
    
#
At 10:38 AM 7/23/2012, you wrote:
I have already used R enough to have correctly imported the actual 
data. After import, it is in the approximate format at the x 
dataframe I previously posted. I already found the plyr and reshape2 
packages and had assumed that the cast (or dcast) options might be 
the correct ones. Melt seemed to get me only what I already have. The 
examples I have seen thus far start with data in a various formats 
and end up in the format that I am starting with. In other words, 
they seem to do the exact opposite of what I'm trying to do. So I'm 
still stuck with how to get started and whether the functions in 
reshape2 are actually the correct ones to consider.

...still looking for some help on this.

Jeff
#
This looks really ugly but it 'may' do what you want.  I was too lazy to generate enough raw data to check.  Note i changed the names in x as they were a bit clumsy.



 x <- data.frame( id = c(1,2), Event1= c(1,0), YEvent1 =
   c(68.25,0), Event2 = c(0,1), YEvent2 = c(0,68.5))

y   <-   data.frame(   id   =   c(   rep  (1,5),  rep  (2,5)),  Year=c
   (68.0,68.25,68.50,68.75,69.0))
   y $ Event1 <- c (0,1,1,1,1,0,0,0,0,0)
   y $ Event2 <- c (0,0,0,0,0,0,0,1,1,1)


 x <- data.frame( id = c(1,2), Event1= c(1,0), YEvent1 =
   c(68.25,0), Event2 = c(0,1), YEvent2 = c(0,68.5))

dd  <-  melt(x, id= c("id", "Event1", "Event2"),
          value.name="year.quarter" )
dd1  <-  subset(dd, dd[, 5] != 0 )

dd1  <-  dd1[ , c(1,2,3,5)]


John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
#
On Jul 23, 2012, at 10:33 AM, Jeff wrote:

            
I didn't see a clear way to use either reshape() or the plyr/reshape2  
packages to do this, (but would enjoy seeing an example that improved  
my understanding on this path)  so I just looked at your "x" and then  
created a scaffold with the number of rows needed to match your "y"  
and filled in the the other columns by first merging to that scaffold  
and then creating new columns:

 > y2 <- data.frame(id=rep(1:2, each=5), Year=seq(68,69,by=0.25) )
 > merge(y2, x)
    id  Year Event1Occur YearOfOccurEvent1 Event2Occur YearOfOccurEvent2
1   1 68.00           1             68.25           0               0.0
2   1 68.25           1             68.25           0               0.0
3   1 68.50           1             68.25           0               0.0
4   1 68.75           1             68.25           0               0.0
5   1 69.00           1             68.25           0               0.0
6   2 68.00           0              0.00           1              68.5
7   2 68.25           0              0.00           1              68.5
8   2 68.50           0              0.00           1              68.5
9   2 68.75           0              0.00           1              68.5
10  2 69.00           0              0.00           1              68.5
 > y2a <- merge(y2, x)

 > y2a$Event1 <- with( y2a, as.numeric( Event1Occur & Year>=  
YearOfOccurEvent1) )
 > y2a$Event2 <- with( y2a, as.numeric( Event2Occur & Year>=  
YearOfOccurEvent2) )

# Using negative numeric column indexing to suppress then now  
superfluous columns

 > y2a[, -(3:6) ]
    id  Year Event1 Event2
1   1 68.00      0      0
2   1 68.25      1      0
3   1 68.50      1      0
4   1 68.75      1      0
5   1 69.00      1      0
6   2 68.00      0      0
7   2 68.25      0      0
8   2 68.50      0      1
9   2 68.75      0      1
10  2 69.00      0      1
#
At 06:33 PM 7/23/2012, David Winsemius wrote:

            
That did it!

Thanks

Jeff