Dear [R] people
Could you please help with following data transformation.
Any suggestions, hints, references and even guessing on performing any
of the following steps are highly appreciated. Those transformations are
crucial for my work.
(n_, _n, j_, k_ signify numbers)
SOURCE DATA:
id cycle1 cycle2 cycle3 ? cycle_n
1 c c c c
1 m m m m
1 f f f f
2 m m m NA
2 f f f NA
2 c c c NA
3 a a NA NA
3 c c c NA
3 f f f NA
3 NA NA m NA
...........................................
RESULT DATA1:
id cyc1 cyc2 cyc3 ? cyc_n
1 cfm cfm cfm cfm
2 cfm cfm cfm NA
3 acf acf cfm NA
...........................................
RESULT DATA2:
id treatment
1 n_cfm
2 j_cfm
3 2acf->k_cfm
...................
RESULT DATA3:
id regimen numOfCycles
1 cfm n_
2 cfm j_
3 asf->cfm {2+k_}
.............................
Thank you
Denis
complex transformation of data
10 messages · Den, ONKELINX, Thierry, Moritz Grenke +1 more
Denis, Have a look at paste(), aggregate(), ddply() (from the plyr package) and melt() and cast() (both from the reshape package). Best regards, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] Namens Den
Verzonden: vrijdag 21 januari 2011 13:26
Aan: R-help
Onderwerp: [R] complex transformation of data
Dear [R] people
Could you please help with following data transformation.
Any suggestions, hints, references and even guessing on
performing any of the following steps are highly appreciated.
Those transformations are crucial for my work.
(n_, _n, j_, k_ signify numbers)
SOURCE DATA:
id cycle1 cycle2 cycle3 ... cycle_n
1 c c c c
1 m m m m
1 f f f f
2 m m m NA
2 f f f NA
2 c c c NA
3 a a NA NA
3 c c c NA
3 f f f NA
3 NA NA m NA
...........................................
RESULT DATA1:
id cyc1 cyc2 cyc3 ... cyc_n
1 cfm cfm cfm cfm
2 cfm cfm cfm NA
3 acf acf cfm NA
...........................................
RESULT DATA2:
id treatment
1 n_cfm
2 j_cfm
3 2acf->k_cfm
...................
RESULT DATA3:
id regimen numOfCycles
1 cfm n_
2 cfm j_
3 asf->cfm {2+k_}
.............................
Thank you
Denis
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Denis,
#minimal example:
test<-as.data.frame(list(id=c(1,1,1,2,2,2), cycle1=c("c", "m", "f", "m",
"f", "c")))
#gettin your first cell of Result 1
paste(sort(test$cycle1[test$id==1]), collapse="")
Hope this helps for the first task ...
Moritz
______________________
Moritz Grenke
http://www.360mix.de
-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Den
Gesendet: Freitag, 21. Januar 2011 13:26
An: R-help
Betreff: [R] complex transformation of data
Dear [R] people
Could you please help with following data transformation.
Any suggestions, hints, references and even guessing on performing any
of the following steps are highly appreciated. Those transformations are
crucial for my work.
(n_, _n, j_, k_ signify numbers)
SOURCE DATA:
id cycle1 cycle2 cycle3
cycle_n
1 c c c c
1 m m m m
1 f f f f
2 m m m NA
2 f f f NA
2 c c c NA
3 a a NA NA
3 c c c NA
3 f f f NA
3 NA NA m NA
...........................................
RESULT DATA1:
id cyc1 cyc2 cyc3
cyc_n
1 cfm cfm cfm cfm
2 cfm cfm cfm NA
3 acf acf cfm NA
...........................................
RESULT DATA2:
id treatment
1 n_cfm
2 j_cfm
3 2acf->k_cfm
...................
RESULT DATA3:
id regimen numOfCycles
1 cfm n_
2 cfm j_
3 asf->cfm {2+k_}
.............................
Thank you
Denis
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110121/8a3c8b4e/attachment.pl>
Dear Henrique Thank you again for helping me Unfortunately, your code seems not to be working
aggregate(.~ id, lapply(df, as.character), FUN = paste, collapse = "")
id cycle1 cycle2 cycle3
1 1 cmf cmf cmf
2 2 mfc mfc mfc
3 3 cf cf cf
(letter 'a' missing in df[3,c("cycle1",cycle2")]
You suggested very interesting approach, however. Those '.~ id' and
'as.character' gave me hope for success.
With very best regards
Denis
? ???, 21/01/2011 ? 14:16 -0200, Henrique Dallazuanna ????:
Try this:
aggregate(.~ id, lapply(test, as.character), FUN = paste, collapse =
"")
On Fri, Jan 21, 2011 at 10:25 AM, Den <d.kazakiewicz at gmail.com> wrote:
Dear [R] people
Could you please help with following data transformation.
Any suggestions, hints, references and even guessing on
performing any
of the following steps are highly appreciated. Those
transformations are
crucial for my work.
(n_, _n, j_, k_ signify numbers)
SOURCE DATA:
id cycle1 cycle2 cycle3 ? cycle_n
1 c c c c
1 m m m m
1 f f f f
2 m m m NA
2 f f f NA
2 c c c NA
3 a a NA NA
3 c c c NA
3 f f f NA
3 NA NA m NA
...........................................
RESULT DATA1:
id cyc1 cyc2 cyc3 ? cyc_n
1 cfm cfm cfm cfm
2 cfm cfm cfm NA
3 acf acf cfm NA
...........................................
RESULT DATA2:
id treatment
1 n_cfm
2 j_cfm
3 2acf->k_cfm
...................
RESULT DATA3:
id regimen numOfCycles
1 cfm n_
2 cfm j_
3 asf->cfm {2+k_}
.............................
Thank you
Denis
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110121/0fe23371/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110121/8f3562bc/attachment.pl>
Thank you for your efforts.
Although it is still not working, it feels like getting closer and
closer.
id cycle1 cycle2 cycle3
1 1 cmf cmf cmf
2 2 mfc mfc mfc
3 3 acfNA acfNA NAcfm
I really appreciate transformation from subsets ("c","m","f") to "cmf".
That was critical for me.
Hopefully, I'll figure out the rest later with ddply from plyr package.
At least this is my idea for now.
? ???, 21/01/2011 ? 18:00 -0200, Henrique Dallazuanna ????:
correction:
aggregate(.~ id, lapply(df, as.character), FUN = paste, collapse = "",
na.action = na.pass)
On Fri, Jan 21, 2011 at 5:56 PM, Henrique Dallazuanna
<wwwhsd at gmail.com> wrote:
Try this:
aggregate(.~ id, lapply(replace(df, is.na(df), ''),
as.character), FUN = paste, collapse = "", na.action =
na.pass)
On Fri, Jan 21, 2011 at 5:45 PM, Den <d.kazakiewicz at gmail.com>
wrote:
Dear Henrique
Thank you again for helping me
Unfortunately, your code seems not to be working
> aggregate(.~ id, lapply(df, as.character), FUN =
paste, collapse = "")
id cycle1 cycle2 cycle3
1 1 cmf cmf cmf
2 2 mfc mfc mfc
3 3 cf cf cf
(letter 'a' missing in df[3,c("cycle1",cycle2")]
You suggested very interesting approach, however.
Those '.~ id' and
'as.character' gave me hope for success.
With very best regards
Denis
? ???, 21/01/2011 ? 14:16 -0200, Henrique Dallazuanna
????:
> Try this:
>
> aggregate(.~ id, lapply(test, as.character), FUN =
paste, collapse =
> "")
>
> On Fri, Jan 21, 2011 at 10:25 AM, Den
<d.kazakiewicz at gmail.com> wrote:
> Dear [R] people
> Could you please help with following data
transformation.
> Any suggestions, hints, references and even
guessing on
> performing any
> of the following steps are highly
appreciated. Those
> transformations are
> crucial for my work.
>
> (n_, _n, j_, k_ signify numbers)
>
> SOURCE DATA:
> id cycle1 cycle2 cycle3 ?
cycle_n
> 1 c c c c
> 1 m m m m
> 1 f f f f
> 2 m m m NA
> 2 f f f NA
> 2 c c c NA
> 3 a a NA NA
> 3 c c c NA
> 3 f f f NA
> 3 NA NA m NA
> ...........................................
>
>
>
> RESULT DATA1:
> id cyc1 cyc2 cyc3 ?
cyc_n
> 1 cfm cfm cfm cfm
> 2 cfm cfm cfm NA
> 3 acf acf cfm NA
> ...........................................
>
>
> RESULT DATA2:
> id treatment
> 1 n_cfm
> 2 j_cfm
> 3 2acf->k_cfm
> ...................
>
>
> RESULT DATA3:
> id regimen numOfCycles
> 1 cfm n_
> 2 cfm j_
> 3 asf->cfm {2+k_}
> .............................
>
>
>
> Thank you
> Denis
>
>
______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal,
self-contained, reproducible
> code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paran?-Brasil
> 25? 25' 40" S 49? 16' 22" O
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110121/974a5467/attachment.pl>
That great! It's working! Thank you so much! It is a pure magic which makes my head spin. aggregate(.~ id, lapply(df, as.character), FUN = function(x)paste(sort(x), collapse = ''), na.action = na.pass) 1. help says: Note that ?paste()? coerces ?NA_character_?, the character missing value, to ?"NA"' And at the same time: ?na.pass? returns the object unchanged. I am happy, that I don't have NAs in mydata. I just don't understand how it happened. 2. Can't see the real difference between 'FUN = function(x) paste(x)' and 'FUN = paste'. However, former working perfectly while latter simply not. 3.Finally, all help says about LHS in formulas like '.~id' is that it's name is "dot notation". And not a single word more. Thus, I have no clue, what dot in that formula really means. Conclusion: 1. It's a magic. 2. You definitely saved my investigation. (When I've started I had no idea it would be so difficult to arrange those chemotherapy cycles in dataframe, although I dare to call myself pharmacoepidemiologist (which sounds rather funny after that story)) 3. THANK YOU!!!!!! Sincerely yours Denis Kazakiewicz Belarus ? ???, 21/01/2011 ? 18:37 -0200, Henrique Dallazuanna ????:
Just change the FUN function:
aggregate(.~ id, lapply(df, as.character), FUN =
function(x)paste(sort(x), collapse = ''), na.action = na.pass)
On Fri, Jan 21, 2011 at 6:27 PM, Den <d.kazakiewicz at gmail.com> wrote:
Thank you for your efforts.
Although it is still not working, it feels like getting closer
and
closer.
id cycle1 cycle2 cycle3
1 1 cmf cmf cmf
2 2 mfc mfc mfc
3 3 acfNA acfNA NAcfm
I really appreciate transformation from subsets ("c","m","f")
to "cmf".
That was critical for me.
Hopefully, I'll figure out the rest later with ddply from
plyr package.
At least this is my idea for now.
? ???, 21/01/2011 ? 18:00 -0200, Henrique Dallazuanna ????:
> correction:
> aggregate(.~ id, lapply(df, as.character), FUN = paste,
collapse = "",
> na.action = na.pass)
>
> On Fri, Jan 21, 2011 at 5:56 PM, Henrique Dallazuanna
> <wwwhsd at gmail.com> wrote:
> Try this:
>
> aggregate(.~ id, lapply(replace(df, is.na(df), ''),
> as.character), FUN = paste, collapse = "", na.action
=
> na.pass)
>
>
>
> On Fri, Jan 21, 2011 at 5:45 PM, Den
<d.kazakiewicz at gmail.com>
> wrote:
> Dear Henrique
> Thank you again for helping me
> Unfortunately, your code seems not to be
working
>
> > aggregate(.~ id, lapply(df, as.character),
FUN =
> paste, collapse = "")
> id cycle1 cycle2 cycle3
> 1 1 cmf cmf cmf
> 2 2 mfc mfc mfc
> 3 3 cf cf cf
>
> (letter 'a' missing in
df[3,c("cycle1",cycle2")]
>
> You suggested very interesting approach,
however.
> Those '.~ id' and
> 'as.character' gave me hope for success.
> With very best regards
> Denis
>
>
> ? ???, 21/01/2011 ? 14:16 -0200, Henrique
Dallazuanna
> ????:
>
> > Try this:
> >
> > aggregate(.~ id, lapply(test,
as.character), FUN =
> paste, collapse =
> > "")
> >
> > On Fri, Jan 21, 2011 at 10:25 AM, Den
> <d.kazakiewicz at gmail.com> wrote:
> > Dear [R] people
> > Could you please help with
following data
> transformation.
> > Any suggestions, hints, references
and even
> guessing on
> > performing any
> > of the following steps are highly
> appreciated. Those
> > transformations are
> > crucial for my work.
> >
> > (n_, _n, j_, k_ signify numbers)
> >
> > SOURCE DATA:
> > id cycle1 cycle2 cycle3 ?
> cycle_n
> > 1 c c c
c
> > 1 m m m
m
> > 1 f f f
f
> > 2 m m m
NA
> > 2 f f f
NA
> > 2 c c c
NA
> > 3 a a NA
NA
> > 3 c c c
NA
> > 3 f f f
NA
> > 3 NA NA m
NA
> >
...........................................
> >
> >
> >
> > RESULT DATA1:
> > id cyc1 cyc2 cyc3 ?
> cyc_n
> > 1 cfm cfm cfm
cfm
> > 2 cfm cfm cfm
NA
> > 3 acf acf cfm
NA
> >
...........................................
> >
> >
> > RESULT DATA2:
> > id treatment
> > 1 n_cfm
> > 2 j_cfm
> > 3 2acf->k_cfm
> > ...................
> >
> >
> > RESULT DATA3:
> > id regimen numOfCycles
> > 1 cfm n_
> > 2 cfm j_
> > 3 asf->cfm {2+k_}
> > .............................
> >
> >
> >
> > Thank you
> > Denis
> >
> >
>
______________________________________________
> > R-help at r-project.org mailing list
> >
https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> >
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal,
> self-contained, reproducible
> > code.
> >
> >
> >
> > --
> > Henrique Dallazuanna
> > Curitiba-Paran?-Brasil
> > 25? 25' 40" S 49? 16' 22" O
>
>
>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paran?-Brasil
> 25? 25' 40" S 49? 16' 22" O
>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paran?-Brasil
> 25? 25' 40" S 49? 16' 22" O
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O