merging to data.frames whose columns are different but follow a pattern. - R-help

ManuelS

Tue, Oct 25, 2011 9:43 AM #

Hi,

I'm working with panel data from the Swiss Houshold Panel (SHP). The data i
got came in the following way:
1.) 12 *different* /individual/ files - one for each year .
2.) 12 *different* /houshold/ files - again: one for each year

Each file came in the SPSS format (.sav). I implemented all the files in R
an managed (via rename, cbind, rbind, merge etc.) to get *two* files. 
The first file is a combination of all 12 /individual/ files orderd by
IDPERS (unique individual identification) and T (=Time 1...12) 
The second file contains all houshold waves (1-12) and is orderd by IDHOUS 
(unique houshold identification) and T (=1...12).

Now, to the problem:
As the following except shows: the individual id used in the individual file
(IDPERS) is derived from the houshold id (IDHOUS).

IDPERS  T SEX PC02 GHS Single 
1    4101  1   1    9   1     NA
2    4101  2   1   10   1     NA
3    4101  3   1   10   1     NA
4    4101  4   1   10   1     NA
.
.
.
13   4102  1   2    0   0     NA
14   4102  2   2    5   1     NA
15   4102  3   2    3   0     NA
16   4102  4   2    9   1     NA
.
.
.
26   5101  1   1    8   1     NA
27   5101  2   1    8   1     NA
28   5101  3   1    8   1     NA
29   5101  4   1    8   1     NA

IDHOUS  T COM2 NBPERS NBKID
1      41  1    2      5     3
2      41  2    2      5     3
3      41  3    2      5     3
4      41  4    2      5     3
.
.
.
13     51  1    2      1     0
14     51  2    2      1     0
15     51  3    2      1     0
16     51  4    2      1     0

I want to merge the two files (data.frames) according to their ID's and the
time dimension T. Since IDHOUS and IDPERS are not the same I cannot use
/merge() /. Doing the task "by hand" in a excel spread sheet or any
equivalent is possible but would probably take ages.
So my question is:

Does anyone know a way, a command, a trick to merge two files whose columns
are different but follow a pattern as described above?

I am gratful for any kind of input.

Manuel Steiner


--
View this message in context: http://r.789695.n4.nabble.com/merging-to-data-frames-whose-columns-are-different-but-follow-a-pattern-tp3937376p3937376.html
Sent from the R help mailing list archive at Nabble.com.

Dennis Murphy

Tue, Oct 25, 2011 12:15 PM #

Hi:

Try this:

INDIVIDUAL <- transform(INDIVIDUAL, IDHOUS = IDPERS %/% 100)
merge(INDIVIDUAL, HOUSHOLD, by = c('IDHOUS', 'T'))

   IDHOUS T IDPERS SEX PC02 GHS Single COM2 NBPERS NBKID
1      41 1   4101   1    9   1     NA    2      5     3
2      41 1   4102   2    0   0     NA    2      5     3
3      41 2   4101   1   10   1     NA    2      5     3
4      41 2   4102   2    5   1     NA    2      5     3
5      41 3   4101   1   10   1     NA    2      5     3
6      41 3   4102   2    3   0     NA    2      5     3
7      41 4   4101   1   10   1     NA    2      5     3
8      41 4   4102   2    9   1     NA    2      5     3
9      51 1   5101   1    8   1     NA    2      1     0
10     51 2   5101   1    8   1     NA    2      1     0
11     51 3   5101   1    8   1     NA    2      1     0
12     51 4   5101   1    8   1     NA    2      1     0

HTH,
Dennis

On Tue, Oct 25, 2011 at 9:43 AM, ManuelS <steiner-manuel at web.de> wrote:

Hi,

I'm working with panel data from the Swiss Houshold Panel (SHP). The data i
got came in the following way:
1.) 12 *different* /individual/ files - one for each year .
2.) 12 *different* /houshold/ files - again: one for each year

Each file came in the SPSS format (.sav). I implemented all the files in R
an managed (via rename, cbind, rbind, merge etc.) to get *two* files.
The first file is a combination of all 12 /individual/ files orderd by
IDPERS (unique individual identification) and T (=Time 1...12)
The second file contains all houshold waves (1-12) and is orderd by IDHOUS
(unique houshold identification) and T (=1...12).

Now, to the problem:
As the following except shows: the individual id used in the individual file
(IDPERS) is derived from the houshold id (IDHOUS).

head(INDIVIDUAL,20)

? IDPERS ?T SEX PC02 GHS Single
1 ? ?4101 ?1 ? 1 ? ?9 ? 1 ? ? NA
2 ? ?4101 ?2 ? 1 ? 10 ? 1 ? ? NA
3 ? ?4101 ?3 ? 1 ? 10 ? 1 ? ? NA
4 ? ?4101 ?4 ? 1 ? 10 ? 1 ? ? NA
.
.
.
13 ? 4102 ?1 ? 2 ? ?0 ? 0 ? ? NA
14 ? 4102 ?2 ? 2 ? ?5 ? 1 ? ? NA
15 ? 4102 ?3 ? 2 ? ?3 ? 0 ? ? NA
16 ? 4102 ?4 ? 2 ? ?9 ? 1 ? ? NA
.
.
.
26 ? 5101 ?1 ? 1 ? ?8 ? 1 ? ? NA
27 ? 5101 ?2 ? 1 ? ?8 ? 1 ? ? NA
28 ? 5101 ?3 ? 1 ? ?8 ? 1 ? ? NA
29 ? 5101 ?4 ? 1 ? ?8 ? 1 ? ? NA

head(HOUSHOLD,20)

? IDHOUS ?T COM2 NBPERS NBKID
1 ? ? ?41 ?1 ? ?2 ? ? ?5 ? ? 3
2 ? ? ?41 ?2 ? ?2 ? ? ?5 ? ? 3
3 ? ? ?41 ?3 ? ?2 ? ? ?5 ? ? 3
4 ? ? ?41 ?4 ? ?2 ? ? ?5 ? ? 3
.
.
.
13 ? ? 51 ?1 ? ?2 ? ? ?1 ? ? 0
14 ? ? 51 ?2 ? ?2 ? ? ?1 ? ? 0
15 ? ? 51 ?3 ? ?2 ? ? ?1 ? ? 0
16 ? ? 51 ?4 ? ?2 ? ? ?1 ? ? 0

I want to merge the two files (data.frames) according to their ID's and the
time dimension T. Since IDHOUS and IDPERS are not the same I cannot use
/merge() /. Doing the task "by hand" in a excel spread sheet or any
equivalent is possible but would probably take ages.
So my question is:

Does anyone know a way, a command, a trick to merge two files whose columns
are different but follow a pattern as described above?

I am gratful for any kind of input.

Manuel Steiner


--
View this message in context: http://r.789695.n4.nabble.com/merging-to-data-frames-whose-columns-are-different-but-follow-a-pattern-tp3937376p3937376.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.