Skip to content

how to GREP out a string like this......THANKS.

4 messages · Hon Kit (Stephen) Wong, David Winsemius, William Dunlap +1 more

#
Dear ALl,

I hope you could help me out on this simple problem. I have many thousand lines like this:
NM_019397 // Egfl6 // EGF-like-domain, multiple 6 // X F5|X 71.5 cM // 54156

I want to extract the string inside the first // //, in this case is Egf16. 


How do I apply grep function?

Thanks.

Stephen HK Wong

Stephen HK Wong
Research Associate,Cleary Lab
Lab Phone: 650-723-5340
MC 5457 
Lokey Stem Cell Research Building 
265 Campus Drive, Rm. G2035 
Stanford, California 94305-5324
#
On May 20, 2013, at 4:45 PM, Hon Kit (Stephen) Wong wrote:

            
[1] "Egfl6"

You can use;

lapply( lines, function(l) strsplit(l, " // ")[[1]][2] )
Well, grep is only going to give you a test and you want a replacement or extraction function. sub or gsub would be possibilities but they are greedy so its a bit more difficult to constrain their targeting to only the first  and second "//".
#
You suggested
  > lapply( lines, function(l) strsplit(l, " // ")[[1]][2] )

strsplit is vectorized so the following is equivalent but simpler and quicker:
   lapply(strsplit(lines, " // "), function(x)x[2])

The OP probably wants a character vector, not a list so use sapply or vapply (safer
than sapply and a bit quicker).  Any of the following would do:
  vapply(strsplit(lines, " // "), `[`, 2, FUN.VALUE="")
  vapply(strsplit(lines, " // "), function(x)x[2], FUN.VALUE="")
  sapply(strsplit(lines, " // "), `[`, 2) # wrong answer if length(lines)==0


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Hi,
May be this helps.

lines<- readLines(textConnection("NM_019397 // Egfl6 // EGF-like-domain, multiple 6 // X F5|X 71.5 cM // 54156
NM_019397 // Egfl7 // EGF-like-domain, multiple 6 // X F5|X 71.5 cM // 54158"))
library(stringr)
word(lines,2,sep=" // ")
#[1] "Egfl6" "Egfl7"

lines1<- readLines(textConnection("NM_019397 // Egfl6 // EGF-like-domain, multiple 6 // X F5|X 71.5 cM // 54156
NM_019397 // Egfl7 domain // EGF-like-domain, multiple 6 // X F5|X 71.5 cM // 54158"))
?word(lines1,2,sep=" // ")
#[1] "Egfl6"??????? "Egfl7 domain"
A.K.



----- Original Message -----
From: Hon Kit (Stephen) Wong <honkit at stanford.edu>
To: r-help at r-project.org
Cc: 
Sent: Monday, May 20, 2013 7:45 PM
Subject: [R] how to GREP out a string like this......THANKS.

Dear ALl,

I hope you could help me out on this simple problem. I have many thousand lines like this:
NM_019397 // Egfl6 // EGF-like-domain, multiple 6 // X F5|X 71.5 cM // 54156

I want to extract the string inside the first // //, in this case is Egf16. 


How do I apply grep function?

Thanks.

Stephen HK Wong

Stephen HK Wong
Research Associate,Cleary Lab
Lab Phone: 650-723-5340
MC 5457 
Lokey Stem Cell Research Building 
265 Campus Drive, Rm. G2035 
Stanford, California 94305-5324

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.