Dear all,
I want to use string split to parse column names, however, I am having
some errors that I don't understand.
I see a problem when I try to rbind the output from strsplit.
please let me know if I'm missing something obvious,
thanks,
alison
here are my commands:
>strsplit<-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),"\\.")
>
Rumino_Reps_agreeWalignTR<-transform(Rumino_Reps_agreeWalign,taxid=do.call(rbind,
strsplit))
Warning message:
In function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 1)
here is my data:
> head(Rumino_Reps_agreeWalign)
geneid count_Conser count_NonCons count_ConsSubst
1 657313.locus_tag:RTO_08940 7 5 5
2 457412.251848018 1 4 3
3 657314.locus_tag:CK5_20630 2 4 1
4 657323.locus_tag:CK1_33060 1 0 1
5 657313.locus_tag:RTO_09690 3 0 3
6 471875.197297106 0 2 1
count_NCSubst
1 1
2 0
3 0
4 0
5 1
6 1
here are the results from strsplit:
> head(strsplit)
[[1]]
[1] "657313" "locus_tag:RTO_08940"
[[2]]
[1] "457412" "251848018"
[[3]]
[1] "657314" "locus_tag:CK5_20630"
[[4]]
[1] "657323" "locus_tag:CK1_33060"
[[5]]
[1] "657313" "locus_tag:RTO_09690"
[[6]]
[1] "471875" "197297106"
strsplit help
6 messages · alison waller, David Winsemius, Jean V Adams
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120411/35b80172/attachment.pl>
On Apr 11, 2012, at 2:01 PM, Jean V Adams wrote:
Alison,
Your code works fine on the first six lines of the data that you
provided.
Rumino_Reps_agreeWalign <- data.frame(
geneid = c("657313.locus_tag:RTO_08940",
"457412.251848018",
"657314.locus_tag:CK5_20630",
"657323.locus_tag:CK1_33060",
"657313.locus_tag:RTO_09690",
"471875.197297106"),
count_Conser = c(7, 1, 2, 1, 3, 0),
count_NonCons = c(5, 4, 4, 0, 0, 2),
count_ConsSubst = c(5, 3, 1, 1, 3, 1),
count_NCSubst = c(1, 0, 0, 0, 1, 1))
gene.list <- strsplit(as.character(Rumino_Reps_agreeWalign$geneid),
"\\.")
Rumino_Reps_agreeWalignTR <- transform(Rumino_Reps_agreeWalign,
taxid=do.call(rbind, gene.list))
Perhaps in later rows of the data there are cases where there is no
"." in
geneid? If not, can you provide a subset of your data that results
in the
warning? Use the dput() function.
It's not a good idea to create an object named "strsplit". That
will only
mask the function strsplit() in later runs.
There is not a problem with masking the function unless the new name is replaced with a language object (which wasn't the case here). The potential confusion is in minds of users. Function names are stored separately from non-language object names so you can have a data object named 'strsplit' and it will not mask the function 'strsplit'.
David. > > If time is an issue, a slightly faster way to do this, after the > strsplit() function is: > Rumino_Reps_agreeWalign$geneid.prefix <- sapply(gene.list, "[", 1) > Rumino_Reps_agreeWalign$geneid.suffix <- sapply(gene.list, "[", 2) > > Jean > > > alison waller wrote on 04/11/2012 08:23:29 AM: > >> Dear all, >> >> I want to use string split to parse column names, however, I am >> having >> some errors that I don't understand. >> I see a problem when I try to rbind the output from strsplit. >> >> please let me know if I'm missing something obvious, >> >> thanks, >> alison >> >> here are my commands: >>> strsplit<-strsplit(as.character(Rumino_Reps_agreeWalign$geneid),"\ >>> \.") >>> >> Rumino_Reps_agreeWalignTR<-transform >> (Rumino_Reps_agreeWalign,taxid=do.call(rbind, >> strsplit)) >> Warning message: >> In function (..., deparse.level = 1) : >> number of columns of result is not a multiple of vector length (arg > 1) >> >> >> here is my data: >> >>> head(Rumino_Reps_agreeWalign) >> geneid count_Conser count_NonCons >> count_ConsSubst >> 1 657313.locus_tag:RTO_08940 7 >> 5 5 >> 2 457412.251848018 1 >> 4 3 >> 3 657314.locus_tag:CK5_20630 2 >> 4 1 >> 4 657323.locus_tag:CK1_33060 1 >> 0 1 >> 5 657313.locus_tag:RTO_09690 3 >> 0 3 >> 6 471875.197297106 0 >> 2 1 >> count_NCSubst >> 1 1 >> 2 0 >> 3 0 >> 4 0 >> 5 1 >> 6 1 >> >> here are the results from strsplit: >>> head(strsplit) >> [[1]] >> [1] "657313" "locus_tag:RTO_08940" >> >> [[2]] >> [1] "457412" "251848018" >> >> [[3]] >> [1] "657314" "locus_tag:CK5_20630" >> >> [[4]] >> [1] "657323" "locus_tag:CK1_33060" >> >> [[5]] >> [1] "657313" "locus_tag:RTO_09690" >> >> [[6]] >> [1] "471875" "197297106" > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120411/b01e566a/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120412/9625e6b4/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120412/5ed7779d/attachment.pl>