Skip to content
Prev 70648 / 398526 Next

how to "singlify" entries

On Mon, May 30, 2005 at 09:09:27AM -0400, Gabor Grothendieck wrote :
Thank you very much, and to Petr Pikal too. Reshape is exactly what I had forgotten.

Now the bad news is that I have simplified my example ; I am in a
slightly more complex situation :

I have three factors, and one value
rna   lib           tc x
1  CAB 114BA T01F00380F47 1
2  CAE 114BB T01F00381273 1
3  CAJ 114BA T01F0048F6D1 1
4  CAB 114BC T01F0048F6D1 1
5  CAB 114BA T01F00498689 2
6  CAC 114BA T01F00498689 1
7  CAE 114BA T01F00498689 2
8  CAG 114BA T01F00498689 2
9  CAH 114BA T01F00498689 1
10 CAI 114BA T01F00498689 2

I would like a data frame where I have the value of x for each combination of
"rna" and "lib", for each "tc"
rna   lib x.T01F00380F47 x.T01F00381273 x.T01F0048F6D1 x.T01F00498689
1  CAB 114BA              1             NA             NA              2
2  CAE 114BB             NA              1             NA             NA
3  CAJ 114BA             NA             NA              1             NA
4  CAB 114BC             NA             NA              1             NA
6  CAC 114BA             NA             NA             NA              1
7  CAE 114BA             NA             NA             NA              2
8  CAG 114BA             NA             NA             NA              2
9  CAH 114BA             NA             NA             NA              1
10 CAI 114BA             NA             NA             NA              2

oops, the other way round :
1       2       3       4       6       7       8       9       10     
rna            "CAB"   "CAE"   "CAJ"   "CAB"   "CAC"   "CAE"   "CAG"   "CAH"   "CAI"  
lib            "114BA" "114BB" "114BA" "114BC" "114BA" "114BA" "114BA" "114BA" "114BA"
x.T01F00380F47 " 1"    NA      NA      NA      NA      NA      NA      NA      NA     
x.T01F00381273 NA      " 1"    NA      NA      NA      NA      NA      NA      NA     
x.T01F0048F6D1 NA      NA      " 1"    " 1"    NA      NA      NA      NA      NA     
x.T01F00498689 " 2"    NA      NA      NA      " 1"    " 2"    " 2"    " 1"    " 2"   

The ultimate goal is (after proper renaming of the columns) to do things like

plot(CAA-114BA[CAA-114BA >0 & CAA-114BB > 0], CAA-114BB[CAA-114BA >0 & CAA-114BB > 0])

(this combination will appear if I reshape the whole data frame, which has 200,000 rows.)

and then proper statistical tests (which I still have to learn / remember from
12 years ago).

once again, thank you, and please warn me if I am doing something stupid with
this transposition of the reshaped table.

Best regards,