Skip to content

complex transformation of data

10 messages · Den, ONKELINX, Thierry, Moritz Grenke +1 more

Den
#
Dear [R] people
Could you please help with following data transformation.
Any suggestions, hints, references and even guessing on performing any
of the following steps are highly appreciated. Those transformations are
crucial for my work. 

(n_, _n, j_, k_ signify numbers)

SOURCE DATA:   
id	cycle1	cycle2	cycle3	?	cycle_n
1	c	c	c		c
1	m	m	m		m
1	f	f	f		f
2	m	m	m		NA
2	f	f	f		NA
2	c	c	c		NA
3	a	a	NA		NA
3	c	c	c		NA
3	f	f	f		NA
3	NA	NA	m		NA
...........................................



RESULT DATA1:
id	cyc1	cyc2	cyc3	?	cyc_n
1	cfm	cfm	cfm		cfm
2	cfm	cfm	cfm		NA
3	acf	acf	cfm		NA
...........................................


RESULT DATA2:
id	treatment
1	n_cfm
2	j_cfm
3	2acf->k_cfm
...................


RESULT DATA3:
id	regimen	numOfCycles
1	cfm	n_
2	cfm	j_
3	asf->cfm	{2+k_}
.............................



Thank you
Denis
#
Denis,

Have a look at paste(), aggregate(), ddply() (from the plyr package) and melt() and cast() (both from the reshape package).

Best regards,

Thierry

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
#
Hi Denis, 

#minimal example:
test<-as.data.frame(list(id=c(1,1,1,2,2,2), cycle1=c("c", "m", "f", "m",
"f", "c")))

#gettin your first cell of Result 1
paste(sort(test$cycle1[test$id==1]), collapse="")


Hope this helps for the first task ... 
Moritz

______________________
Moritz Grenke
http://www.360mix.de

-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Den
Gesendet: Freitag, 21. Januar 2011 13:26
An: R-help
Betreff: [R] complex transformation of data

Dear [R] people
Could you please help with following data transformation.
Any suggestions, hints, references and even guessing on performing any
of the following steps are highly appreciated. Those transformations are
crucial for my work. 

(n_, _n, j_, k_ signify numbers)

SOURCE DATA:   
id	cycle1	cycle2	cycle3	
	cycle_n
1	c	c	c		c
1	m	m	m		m
1	f	f	f		f
2	m	m	m		NA
2	f	f	f		NA
2	c	c	c		NA
3	a	a	NA		NA
3	c	c	c		NA
3	f	f	f		NA
3	NA	NA	m		NA
...........................................



RESULT DATA1:
id	cyc1	cyc2	cyc3	
	cyc_n
1	cfm	cfm	cfm		cfm
2	cfm	cfm	cfm		NA
3	acf	acf	cfm		NA
...........................................


RESULT DATA2:
id	treatment
1	n_cfm
2	j_cfm
3	2acf->k_cfm
...................


RESULT DATA3:
id	regimen	numOfCycles
1	cfm	n_
2	cfm	j_
3	asf->cfm	{2+k_}
.............................



Thank you
Denis

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Den
#
Dear Henrique
Thank you again for helping me
Unfortunately, your code seems not to be working
id cycle1 cycle2 cycle3
1  1    cmf    cmf    cmf
2  2    mfc    mfc    mfc
3  3     cf     cf     cf

(letter 'a' missing in df[3,c("cycle1",cycle2")] 

You suggested very interesting approach, however. Those '.~ id' and
'as.character' gave me hope for success.
With very best regards   
Denis


? ???, 21/01/2011 ? 14:16 -0200, Henrique Dallazuanna ????:
Den
#
Thank you for your efforts.
Although it is still not working, it feels like getting closer and
closer. 
  
id cycle1 cycle2 cycle3
1  1    cmf    cmf    cmf
2  2    mfc    mfc    mfc
3  3  acfNA  acfNA  NAcfm

I really appreciate transformation from subsets ("c","m","f") to "cmf".
That was critical for me.
Hopefully, I'll figure  out the rest later with ddply from plyr package.
At least this is my idea for now.



? ???, 21/01/2011 ? 18:00 -0200, Henrique Dallazuanna ????:
Den
#
That great! It's working! Thank you so much!
It is a pure magic which makes my head spin.
aggregate(.~ id, lapply(df, as.character), FUN =
function(x)paste(sort(x), collapse = ''), na.action = na.pass)

1. help says:
 Note that ?paste()? coerces ?NA_character_?, the character missing
value, to ?"NA"'
And at the same time:
 ?na.pass? returns the object unchanged.
I am happy, that I don't have NAs in mydata.  I just don't understand
how
it happened.
2. Can't see the real difference between 'FUN = function(x) paste(x)'
and 'FUN = paste'. However, former working perfectly while latter simply
not.
3.Finally, all help says about LHS in formulas like '.~id' is that it's
name is "dot notation". And not a single word more. Thus, I have no
clue, what dot in that formula really means.


Conclusion:
1. It's a magic. 
2. You definitely saved my investigation. (When I've started I had no
idea it would be so difficult to arrange those chemotherapy cycles in
dataframe, although I dare to call myself pharmacoepidemiologist (which
sounds rather funny after that story))
3. THANK YOU!!!!!!

Sincerely yours 
Denis Kazakiewicz
Belarus 


? ???, 21/01/2011 ? 18:37 -0200, Henrique Dallazuanna ????: