An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100429/0ade2193/attachment.pl>
Split a vector by NA's - is there a better solution then a loop ?
7 messages · Tal Galili, Romain Francois, Henrique Dallazuanna +3 more
Maybe this :
> foo <- function( x ){
+ idx <- 1 + cumsum( is.na( x ) )
+ not.na <- ! is.na( x )
+ split( x[not.na], idx[not.na] )
+ }
> foo( x )
$`1`
[1] 2 1 2
$`2`
[1] 1 1 2
$`3`
[1] 4 5 2 3
Romain
Le 29/04/10 09:42, Tal Galili a ?crit :
Hi all,
I would like to have a function like this:
split.vec.by.NA<- function(x)
That takes a vector like this:
x<- c(2,1,2,NA,1,1,2,NA,4,5,2,3)
And returns a list of length of 3, each element of the list is the relevant
segmented vector, like this:
$`1`
[1] 2 1 2
$`2`
[1] 1 1 2
$`3`
[1] 4 5 2 3
I found how to do it with a loop, but wondered if there is some smarter
(vectorized) way of doing it.
Here is the code I used:
x<- c(2,1,2,NA,1,1,2,NA,4,5,2,3)
split.vec.by.NA<- function(x)
{
# assumes NA are seperating groups of numbers
#TODO: add code to check for it
number.of.groups<- sum(is.na(x)) + 1
groups.end.point.locations<- c(which(is.na(x)), length(x)+1) # This will be
all the places with NA's + a nubmer after the ending of the vector
group.start<- 1
group.end<- NA
new.groups.split.id<- x # we will replace all the places of the group with
group ID, excapt for the NA, which will later be replaced by 0
for(i in seq_len(number.of.groups))
{
group.end<- groups.end.point.locations[i]-1
new.groups.split.id[group.start:group.end]<- i
group.start<- groups.end.point.locations[i]+1 # make the new group start
higher for the next loop (at the final loop it won't matter
}
new.groups.split.id[is.na(x)]<- 0
return(split(x, new.groups.split.id)[-1])
}
split.vec.by.NA(x)
Thanks,
Tal
Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9aKDM9 : embed images in Rd documents |- http://tr.im/OIXN : raster images and RImageJ |- http://tr.im/OcQe : Rcpp 0.7.7
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100429/d493bc66/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100429/0cb37709/attachment.pl>
On Thu, Apr 29, 2010 at 1:27 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
Another option could be: split(x, replace(cumsum(is.na(x)), is.na(x), -1))[-1]
One thing none of the solutions so far do (except I haven't tried Tal's original code) is insert an empty group between adjacent NA values, for example in: x = c(1,2,3,NA,NA,4,5,6) > split(x, replace(cumsum(is.na(x)), is.na(x), -1))[-1] $`0` [1] 1 2 3 $`2` [1] 4 5 6 Maybe this never happens in Tal's case, or it's not what he wanted anyway, but I thought I'd point it out! Barry
On Thu, 29 Apr 2010, Barry Rowlingson wrote:
On Thu, Apr 29, 2010 at 1:27 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
Another option could be: split(x, replace(cumsum(is.na(x)), is.na(x), -1))[-1]
One thing none of the solutions so far do (except I haven't tried Tal's original code) is insert an empty group between adjacent NA values, for example in: x = c(1,2,3,NA,NA,4,5,6)
split(x, replace(cumsum(is.na(x)), is.na(x), -1))[-1]
$`0` [1] 1 2 3 $`2` [1] 4 5 6 Maybe this never happens in Tal's case, or it's not what he wanted anyway, but I thought I'd point it out!
The ever useful rle() helps
y <- rle(!is.na(x)) split(x, rep( cumsum(y$val)*y$val, y$len ) )[-1]
$`1` [1] 1 2 3 $`2` [1] 4 5 6 Chuck
Barry
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100429/afd4c33f/attachment.pl>