How do I identify non-sequential data?
Hi Don, Yes, I am error checking a dataset produced by a query. Most likely a problem with the query but wanted to assess the problem first. BTW Arun provided another solution which is similar to yours but uses the function ave instead: testSeq[!!(with(testSeq,ave(YoS,ID,FUN=function(x) any(c(0,diff(x))>1)))),] I appreciate your response on this. Dan -----Original Message----- From: MacQueen, Don Sent: Thursday, November 21, 2013 3:58 PM To: Lopez, Dan; R help (r-help at r-project.org) Subject: Re: [R] How do I identify non-sequential data? Dan, Does this do it? ## where dt is the data tmp <- split(dt, dt$ID) foo <- lapply(tmp, function(x) any(diff(x$YoS) > 1)) foo <- data.frame( ID=names(foo), gap=unlist(foo)) Note that I ignored dept. Little hard to see how YoS can increase by more than one when the year increases by only one ... unless this is a search for erroneous data. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062
On 11/21/13 3:32 PM, "Lopez, Dan" <lopez235 at llnl.gov> wrote:
Hi R Experts, About the data: My data consists of people (ID) with years of service (Yos) for each year. An ID can appear multiple times. The data is sorted by ID then by Year. Problem: I need to extract ID data with non-sequential YoS rows. For example below that would be all rows for ID 33 and 16 since they have a non-sequential YoS. To accomplish this I figured I could create a column called 'CheckVal' that takes current row YoS minus previous row YoS. The first instance for each ID will be 0. 'CheckVal' in the below data set was created in Excel. I want to know how to do this in R. Is there a package I can use or specific function or set of functions I can use to accomplish this? #My data looks like:
testSeq
ID Year YoS CheckVal dept
1 12 2010 1.1 0.0 A
2 12 2011 2.1 1.0 A
3 44 2009 1.4 0.0 C
4 44 2010 2.4 1.0 C
5 44 2011 3.4 1.0 B
6 33 2009 2.3 0.0 A
7 33 2010 4.4 2.1 A
8 16 2009 1.6 0.0 B
9 16 2010 2.6 1.0 B
10 16 2011 5.6 3.0 C
11 16 2012 6.6 1.0 A
#here is dput of data for R
Structure(list(ID = c(12, 12, 44, 44, 44, 33, 33, 16, 16, 16,
16), Year = c(2010, 2011, 2009, 2010, 2011, 2009, 2010, 2009,
2010, 2011, 2012), YoS = c(1.1, 2.1, 1.4, 2.4, 3.4, 2.3, 4.4,
1.6, 2.6, 5.6, 6.6), CheckVal = c(0, 1, 0, 1, 1, 0, 2.1, 0, 1,
3, 1), dept = structure(c(1L, 1L, 3L, 3L, 2L, 1L, 1L, 2L, 2L,
3L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names =
c("ID",
"Year", "YoS", "CheckVal", "dept"), row.names = c(NA, 11L), class =
"data.frame")
Dan
Workforce Analyst
LLNL
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.