Skip to content

Data transformation

8 messages · legen, Henrique Dallazuanna, jim holtman +1 more

#
Dear all,

I have a dataset as below:

id    code1    code2         p 
 1        4        8           0.1
 1        5        7           0.9
 2        1        8           0.4
 2        6        2           0.2
 2        4        3           0.6
 3        5        6           0.7
 3        7        5           0.9

I just want to rewrite it as this (vertical to horizontal):

id   var1  var2  var3  var4  var5  var6  var7  var8 
1        0      0      0    0.1   0.9       0   0.9    0.1
2     0.4    0.2   0.6    0.6      0    0.2      0    0.4
3        0      0      0      0    0.8    0.7    0.9      0

For the third subject, there are two values being equal to 5 in code1 and
code2, but different values in p:  0.7 and 0.9, so I assigned their average
0.8 in var5.

Does anybody can help me to handle this? Many thanks for your consideration
and time.

Legen
#
Is this what you want:
+  1        4        8           0.1
+  1        5        7           0.9
+  2        1        8           0.4
+  2        6        2           0.2
+  2        4        3           0.6
+  3        5        6           0.7
+  3        7        5           0.9"), header=TRUE)
+     variable=rep('p', 2*nrow(x)), value=c(x$p, x$p))
id   1   2   3   4   5   6   7   8
1  1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1
2  2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4
3  3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN

        
On Tue, Nov 10, 2009 at 4:30 PM, legen <legendy at gmail.com> wrote:

  
    
#
Thank you for your kind help. Your script works very well. Would you please
show me how to change NaN to zero and column variables 1, 2, ..., 8 to var1,
var2, ..., var8? Thanks again.

Legen
jholtman wrote:

  
    
#
Try this also:

xtabs(rep(p, 2) ~ rep(id, 2) + sprintf("var%d", c(code1, code2)), data = x)
On Wed, Nov 11, 2009 at 2:10 AM, legen <legendy at gmail.com> wrote:

  
    
#
Try this:
+  1        4        8           0.1
+  1        5        7           0.9
+  2        1        8           0.4
+  2        6        2           0.2
+  2        4        3           0.6
+  3        5        6           0.7
+  3        7        5           0.9"), header=TRUE)
+                    var=paste('var', c(x$code1, x$code2), sep=''),
+                    variable=rep('p', 2*nrow(x)),
+                    value=c(x$p, x$p))
+     if (length(.dat) == 0) return(0)  # test for no data; return
zero if that is the case
+     mean(.dat)
+ }))
  id var1 var2 var3 var4 var5 var6 var7 var8
1  1  0.0  0.0  0.0  0.1  0.9  0.0  0.9  0.1
2  2  0.4  0.2  0.6  0.6  0.0  0.2  0.0  0.4
3  3  0.0  0.0  0.0  0.0  0.8  0.7  0.9  0.0

        
On Tue, Nov 10, 2009 at 11:10 PM, legen <legendy at gmail.com> wrote:

  
    
#
That's what I want. Many thanks for your help.
Legen
jholtman wrote:

  
    
#
Your script works very well. Thank you very much.

Legen
Henrique Dallazuanna wrote:

  
    
#
Or fill = 0.

Hadley