Using NA as a break point for indicator variable?
Does anyone have any thoughts on how to code this, perhaps using the NA values as a "break point"?
You can count the cumulative number of NA breakpoints in a vector
with cumsum(is.na(vector)), as in
> cbind(d, LeakNo=with(d, cumsum(is.na(lon)|is.na(lat)|is.na(CH4))))
lon lat CH4 LeakNo
1 -71.11954 42.35068 2.595834 0
2 -71.11954 42.35068 2.595688 0
3 NA NA NA 1
4 NA NA NA 2
5 NA NA NA 3
6 -71.11948 42.35068 2.435762 3
7 -71.11948 42.35068 2.491003 3
8 NA NA NA 4
9 -71.11930 42.35068 2.464475 4
10 -71.11932 42.35068 2.470865 4
Add 1 if you want to start with 1. If you only want to increase the count
after each sequence of NA's then you could use rle() or
> na <- with(d, is.na(lon)|is.na(lat)|is.na(CH4))
> cbind(d, LeakNo=cumsum(c(TRUE, na[-1] < na[-length(na)])))
lon lat CH4 LeakNo
1 -71.11954 42.35068 2.595834 1
2 -71.11954 42.35068 2.595688 1
3 NA NA NA 1
4 NA NA NA 1
5 NA NA NA 1
6 -71.11948 42.35068 2.435762 2
7 -71.11948 42.35068 2.491003 2
8 NA NA NA 2
9 -71.11930 42.35068 2.464475 3
10 -71.11932 42.35068 2.470865 3
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
Of Max Brondfield
Sent: Wednesday, May 23, 2012 1:42 PM
To: r-help at r-project.org
Subject: [R] Using NA as a break point for indicator variable?
Hi all,
I am working with a spatial data set for which I am only interested in high
concentration values ("leaks"). The low values (< 90th percentile) have
already been turned into NA's, leaving me with a matrix like this:
< CH4_leak
lon lat CH4
1 -71.11954 42.35068 2.595834
2 -71.11954 42.35068 2.595688
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 -71.11948 42.35068 2.435762
7 -71.11948 42.35068 2.491003
8 NA NA NA
9 -71.11930 42.35068 2.464475
10 -71.11932 42.35068 2.470865
Every time an NA comes up, it means the "leak" is gone, and the next valid
value would represent a different leak (at a different location). My goal
is to tag all of the remaining values with an indicator variable to
spatially distinguish the leaks. I am envisioning a simple numeric
indicator such as:
lon lat CH4 leak_num
1 -71.11954 42.35068 2.595834 1
2 -71.11954 42.35068 2.595688 1
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 -71.11948 42.35068 2.435762 2
7 -71.11948 42.35068 2.491003 2
8 NA NA NA NA
9 -71.11930 42.35068 2.064475 3
10 -71.11932 42.35068 2.070865 3
Does anyone have any thoughts on how to code this, perhaps using the NA
values as a "break point"? The data set is far too large to do this
manually, and I must admit I'm completely at a loss. Any help would be much
appreciated! Best,
Max
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.