First define a function that returns TRUE if a column
should be dropped. E.g.,
has3Zeros.1 <- function(x)
{
x <- x[!is.na(x)] == 0 # drop NA's, convert 0's to TRUE, others to FALSE
if (length(x) < 3) {
FALSE # you may want to further test short vectors
} else {
i <- seq_len(length(x) - 2)
any(x[i] & x[i + 1] & x[i + 2])
}
}
or
has3Zeros.2 <- function (x)
{
x <- x[!is.na(x)] == 0
r <- rle(x)
any(r$lengths[r$values] >= 3)
}
The use sapply on your data.frame with this function to see which
columns to omit and use [ to omit them:
> e <- data.frame(Date=1980:1985,
+ A = c(2, 9, 18, 0, 12, 48),
+ B = c(75, NA, 15, 16, 43, 3),
+ C = c(12, 7, 0, 0, 0, 26),
+ D = c(41, 0, 0, NA, 0, 21))
> e[, !sapply(e, has3Zeros.1), drop=FALSE]
Date A B
1 1980 2 75
2 1981 9 NA
3 1982 18 15
4 1983 0 16
5 1984 12 43
6 1985 48 3
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Samir Benzerfa
Sent: Wednesday, October 12, 2011 8:35 AM
To: r-help at r-project.org
Subject: [R] exclude columns with at least three consecutive zeros
Hi everyone,
I have a large data set with about 3'000 columns and I would like to exclude
all columns which include three or more consecutive zeros (see below
example). A further issue is that it should just jump NA values if any. How
can I do this?
In the below example R should exclude column C and D (since in D jumping the
NA leaves three consecutive zeros).
I would appreciate any solutions to this issue.
Many thanks!
S.B.
Date A B C D
1980 2 75 12 41
1981 9 NA 7 0
1982 18 15 0 0
1983 0 16 0 NA
1984 12 43 0 0
1985 48 3 26 21
[[alternative HTML version deleted]]