Skip to content

break string at specified possitions

11 messages · Bert Gunter, Daniel Nordlund, Jim Lemon +4 more

#
Dear R-help

I would like to split long string at specified precomputed positions.
'substring' needs beginings and ends. Is there a native function which
accepts positions so I don't have to count second argument?

For example I have vector of possitions pos<-c(5,10,19). Substring
needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
to write my own function. Just asking.

Derek
#
Dunno -- but you might have a look at Hadley Wickham's 'stringr' package:
https://cran.r-project.org/web/packages/stringr/stringr.pdf

Cheers,

Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba <jan.kacaba at gmail.com> wrote:
#
Here is my attempt at function which computes margins from positions.

require("stringr")
require("dplyr")

ends<-seq(10,100,8)  # end margins
test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Aliquam in lorem sit amet leo accumsan lacinia."

sekoj=function(ends){
  l_ends<-length(ends)
  begs=vector(mode="integer",l_ends)
  begs[1]=1
  for (i in 2:(l_ends)){
    begs[i]<-ends[i-1]+1
  }
  margs<-rbind(begs,ends)
  margs<-cbind(margs,c(ends[l_ends]+1,-1))
  #rownames(margs)<-c("beg","end")
  return(margs)
}
margins<-sekoj(ends)
str_sub(test_string,margins[1,],margins[2,]) %>% print

Code to run in browser:
http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV

2016-05-11 23:12 GMT+02:00 Bert Gunter <bgunter.4567 at gmail.com>:
#
On 5/11/2016 2:23 PM, Jan Kacaba wrote:
I think you can simply this. just create a function (I'll call it begs) 
to compute the beginning positions.

     begs <- function(x) c(0,x[-length(x)])+1

Then, then use that function in your call to str_sub

     str_sub(test_string,begs(ends),ends) %>% print


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA
#
Hi Jan,
This might be helpful:

chop_string<-function(x,ends) {
 starts<-c(1,ends[-length(ends)]-1)
 return(substring(x,starts,ends))
}

Jim
On Thu, May 12, 2016 at 7:23 AM, Jan Kacaba <jan.kacaba at gmail.com> wrote:
#
Hi again,
Sorry, that should be:

chop_string<-function(x,ends) {
 starts<-c(1,ends[-length(ends)]+1)
 return(substring(x,starts,ends))
}

Jim
On Thu, May 12, 2016 at 10:05 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
#
Nice solution Jim, thank you.



2016-05-12 2:45 GMT+02:00 Jim Lemon <drjimlemon at gmail.com>:
#
and why can't you simply use base R's  substr() function ?
Packages (such as 'stringr' in this case) have their uses and
great merits, but using base R seems more sensical to me (also
slightly more future-proof).
Yes, I'd think so , because that was also my quick thought when
I read the OP's question.

Martin


--
Martin Maechler, ETH Zurich & R Core
1 day later
#
Hi,

Here is the Biostrings solution in case you need to chop a long
string into hundreds or thousands of fragments (a situation where
base::substring() is very inefficient):

   library(Biostrings)

   ## Call as.character() on the result if you want it back as
   ## a character vector.
   fast_chop_string <- function(x, ends)
   {
     if (!is(x, "XString"))
         x <- as(x, "XString")
     extractAt(x, at=PartitioningByEnd(ends))
   }

Will be much faster than substring (e.g. 100x or 1000x) when
chopping a string like a Human chromosome into hundreds or
thousands of fragments.

Biostrings is a Bioconductor package:

   https://bioconductor.org/packages/Biostrings

Cheers,
H.
On 05/12/2016 01:18 AM, Jan Kacaba wrote:

  
    
4 days later
#
Excellent Herv?, thank you.

2016-05-13 11:48 GMT+02:00 Herv? Pag?s <hpages at fredhutch.org>:
#
Here are two ways that do not use any packages:

s <- paste(letters, collapse = "") # test input

substring(s, first, last)
## [1] "abcde"     "fghij"     "klmnopqrs"


read.fwf(textConnection(s), last - first + 1)
##         V1    V2        V3
## 1 abcde fghij klmnopqrs
On Wed, May 11, 2016 at 4:12 PM, Jan Kacaba <jan.kacaba at gmail.com> wrote: