Skip to content
Prev 336347 / 398503 Next

Regular expressions, genbank

You could also try:
library(gsubfn)


strapply(gsub("\\d+<|>\\d+","",vec1),"([0-9]+)",as.numeric,simplify=c)

A.K.
On Thursday, February 6, 2014 1:55 PM, arun <smartpink111 at yahoo.com> wrote:
Hi,
One way would be: 


vec1 <- c("CDS???????????? 3300..4037",? "CDS???????????? complement(3300..4037)", "CDS???????????? 3300<..4037", "CDS???????????? join(21467..26641,27577..28890)",? "CDS???????????? complement(join(30708..31700,31931..31984))",? "CDS???????????? 3300<..>4037")
library(stringr)
as.numeric(unlist(strsplit(str_trim(gsub("\\D+"," ",gsub("\\d+<|>\\d+","",vec1)))," ")))
# [1]? 3300? 4037? 3300? 4037? 4037 21467 26641 27577 28890 30708 31700 31931
#[13] 31984
A.K.


Hi, 

I have been using R for the past 1.5 years and usually have 
found topics to be relatively easy to learn on your own, but I am 
finding the learning curve with the regular expressions to be a little 
steep especially since I haven't found any good tutorials. While I 
intend to spend more time systematically learning proper ways of making 
regular expressions, I have a project that is coming due and can't wait 
for that so I was hoping to get some direct help. 
I need to extract all the numbers in lines with following formats: 

"CDS ? ? ? ? ? ? 3300..4037" 
or 
"CDS ? ? ? ? ? ? complement(3300..4037)" 
or 
"CDS ? ? ? ? ? ? join(21467..26641,27577..28890)" 
or 
"CDS ? ? ? ? ? ? complement(join(30708..31700,31931..31984))" 

but not if any of the numbers are preceded by "<" or followed by ">" 
Many thanks in advance!