I have genetic data as follows (simple example, actual data is much larger):
comb =
ID1 A A T G C T G C G T C G T A
ID2 G C T G C C T G C T G T T T
And I wish to get an output like this:
ID1 AA TG CT GC GT CG TA
ID2 GC TG CC TG CT GT TT
That is, paste every two columns together.
I have this code, but I get the error:
Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
conc <- function(x) {
s <- seq(2, nchar(x), 2)
paste0(x[s], x[s+1])
}
combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
Thanks in advance!
Paste every two columns together
8 messages · Kate Ignatius, Jim Lemon, JSHuang +4 more
Hi Kate, Maybe you want: seq(2,length(x),by=2) Jim
On Thu, Jan 29, 2015 at 10:55 AM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
I have genetic data as follows (simple example, actual data is much larger):
comb =
ID1 A A T G C T G C G T C G T A
ID2 G C T G C C T G C T G T T T
And I wish to get an output like this:
ID1 AA TG CT GC GT CG TA
ID2 GC TG CC TG CT GT TT
That is, paste every two columns together.
I have this code, but I get the error:
Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
conc <- function(x) {
s <- seq(2, nchar(x), 2)
paste0(x[s], x[s+1])
}
combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
Thanks in advance!
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, Here is my implementation:
combine <- function(x){
+ odd <- x[1:length(x) %% 2 == 1] + even <- x[1:length(x) %%2 == 0] + paste0(odd,even)}
temp <- letters[1:24] temp
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
combine(temp)
[1] "ab" "cd" "ef" "gh" "ij" "kl" "mn" "op" "qr" "st" "uv" "wx" -- View this message in context: http://r.789695.n4.nabble.com/Paste-every-two-columns-together-tp4702429p4702433.html Sent from the R help mailing list archive at Nabble.com.
I am using just the first row of your data (i.e. ID1).
> ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G",
"T", "A")
> do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse=""))
1 2 3 4 5 6 7
"AA" "TG" "CT" "GC" "GT" "CG" "TA"
>
Is this what you are looking for? I hope this helps.
Chel Hee Lee
On 01/28/2015 05:55 PM, Kate Ignatius wrote:
I have genetic data as follows (simple example, actual data is much larger):
comb =
ID1 A A T G C T G C G T C G T A
ID2 G C T G C C T G C T G T T T
And I wish to get an output like this:
ID1 AA TG CT GC GT CG TA
ID2 GC TG CC TG CT GT TT
That is, paste every two columns together.
I have this code, but I get the error:
Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
conc <- function(x) {
s <- seq(2, nchar(x), 2)
paste0(x[s], x[s+1])
}
combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
Thanks in advance!
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
eek! Chel Hee,anything that complicated should engender fear and trembling. Much simpler and more efficient (if I understand correctly) i <- seq.int(1L,length(ID1),by = 2L) paste0(ID1[i],ID1[i+1]) That gives a vector of paired letters. If you want a single character string, just collapse with a " " (space): paste0(ID1[i],ID1[i+1],collapse= " ") Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll
On Wed, Jan 28, 2015 at 7:41 PM, Chel Hee Lee <chl948 at mail.usask.ca> wrote:
I am using just the first row of your data (i.e. ID1).
ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G", "T",
"A")
do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse=""))
1 2 3 4 5 6 7 "AA" "TG" "CT" "GC" "GT" "CG" "TA"
Is this what you are looking for? I hope this helps. Chel Hee Lee On 01/28/2015 05:55 PM, Kate Ignatius wrote:
I have genetic data as follows (simple example, actual data is much
larger):
comb =
ID1 A A T G C T G C G T C G T A
ID2 G C T G C C T G C T G T T T
And I wish to get an output like this:
ID1 AA TG CT GC GT CG TA
ID2 GC TG CC TG CT GT TT
That is, paste every two columns together.
I have this code, but I get the error:
Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
conc <- function(x) {
s <- seq(2, nchar(x), 2)
paste0(x[s], x[s+1])
}
combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
Thanks in advance!
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Bert! yes, you are VERY correct!!! Why am I making this simple thing so complicated??? ;) Thank you so much for your nice lesson! Chel Hee Lee
On 01/28/2015 09:59 PM, Bert Gunter wrote:
eek! Chel Hee,anything that complicated should engender fear and trembling. Much simpler and more efficient (if I understand correctly) i <- seq.int(1L,length(ID1),by = 2L) paste0(ID1[i],ID1[i+1]) That gives a vector of paired letters. If you want a single character string, just collapse with a " " (space): paste0(ID1[i],ID1[i+1],collapse= " ") Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Jan 28, 2015 at 7:41 PM, Chel Hee Lee <chl948 at mail.usask.ca> wrote:
I am using just the first row of your data (i.e. ID1).
ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G", "T",
"A")
do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse=""))
1 2 3 4 5 6 7 "AA" "TG" "CT" "GC" "GT" "CG" "TA"
Is this what you are looking for? I hope this helps. Chel Hee Lee On 01/28/2015 05:55 PM, Kate Ignatius wrote:
I have genetic data as follows (simple example, actual data is much
larger):
comb =
ID1 A A T G C T G C G T C G T A
ID2 G C T G C C T G C T G T T T
And I wish to get an output like this:
ID1 AA TG CT GC GT CG TA
ID2 GC TG CC TG CT GT TT
That is, paste every two columns together.
I have this code, but I get the error:
Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
conc <- function(x) {
s <- seq(2, nchar(x), 2)
paste0(x[s], x[s+1])
}
combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
Thanks in advance!
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Kate, here's a solution that uses regular expressions, rather than vector manipulation:
mystr = "ID1 A A T G C T G C G T C G T A"
gsub(" ([ACGT]) ([ACGT])", " \\1\\2", mystr)
[1] "ID1 AA TG CT GC GT CG TA" -John
-----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Chel Hee Lee Sent: Wednesday, January 28, 2015 11:07 PM To: Bert Gunter Cc: r-help Subject: Re: [R] Paste every two columns together Hi Bert! yes, you are VERY correct!!! Why am I making this simple thing so complicated??? ;) Thank you so much for your nice lesson! Chel Hee Lee On 01/28/2015 09:59 PM, Bert Gunter wrote:
eek! Chel Hee,anything that complicated should engender fear and trembling. Much simpler and more efficient (if I understand correctly) i <- seq.int(1L,length(ID1),by = 2L) paste0(ID1[i],ID1[i+1]) That gives a vector of paired letters. If you want a single character string, just collapse with a " " (space): paste0(ID1[i],ID1[i+1],collapse= " ") Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Jan 28, 2015 at 7:41 PM, Chel Hee Lee <chl948 at mail.usask.ca>
wrote:
I am using just the first row of your data (i.e. ID1).
ID1 <- c("A", "A", "T", "G", "C", "T", "G", "C", "G", "T", "C", "G",
"T",
"A")
do.call(c,lapply(tapply(ID1, gl(7,2), c), paste, collapse=""))
1 2 3 4 5 6 7 "AA" "TG" "CT" "GC" "GT" "CG" "TA"
Is this what you are looking for? I hope this helps. Chel Hee Lee On 01/28/2015 05:55 PM, Kate Ignatius wrote:
I have genetic data as follows (simple example, actual data is much
larger):
comb =
ID1 A A T G C T G C G T C G T A
ID2 G C T G C C T G C T G T T T
And I wish to get an output like this:
ID1 AA TG CT GC GT CG TA
ID2 GC TG CC TG CT GT TT
That is, paste every two columns together.
I have this code, but I get the error:
Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
conc <- function(x) {
s <- seq(2, nchar(x), 2)
paste0(x[s], x[s+1])
}
combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
Thanks in advance!
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
Hi:
Don't know about performance, but this is fairly simple for operating
on atomic vectors:
x <- c("A", "A", "G", "T", "C", "G")
apply(embed(x, 2), 1, paste0, collapse = "")
[1] "AA" "GA" "TG" "CT" "GC"
Check the help page of embed() for details.
Dennis
On Wed, Jan 28, 2015 at 3:55 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
I have genetic data as follows (simple example, actual data is much larger):
comb =
ID1 A A T G C T G C G T C G T A
ID2 G C T G C C T G C T G T T T
And I wish to get an output like this:
ID1 AA TG CT GC GT CG TA
ID2 GC TG CC TG CT GT TT
That is, paste every two columns together.
I have this code, but I get the error:
Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1
conc <- function(x) {
s <- seq(2, nchar(x), 2)
paste0(x[s], x[s+1])
}
combn <- as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)
Thanks in advance!
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.