Replacing NAs in long format
Hi, May be this helps: dat2<-read.table(text=" idr? schyear? year 1??????? 4????????? -1 1??????? 5??????????? 0 1??????? 6??????????? 1 1??????? 7??????????? 2 2??????? 9??????????? 0 2??????? 10??????????? 1 2??????? 11????????? 2 ",sep="",header=TRUE) ?dat2$flag<-unlist(lapply(split(dat2,dat2$idr),function(x) rep(ifelse(any(apply(x,1,function(x) x[2]<=5 & x[3]==0)),1,0),nrow(x))),use.names=FALSE) ?dat2 #? idr schyear year flag #1?? 1?????? 4?? -1??? 1 #2?? 1?????? 5??? 0??? 1 #3?? 1?????? 6??? 1??? 1 #4?? 1?????? 7??? 2??? 1 #5?? 2?????? 9??? 0??? 0 #6?? 2????? 10??? 1??? 0 #7?? 2????? 11??? 2??? 0 A.K. ----- Original Message ----- From: Christopher Desjardins <cddesjardins at gmail.com> To: jim holtman <jholtman at gmail.com> Cc: r-help at r-project.org Sent: Saturday, November 3, 2012 7:09 PM Subject: Re: [R] Replacing NAs in long format I have a similar sort of follow up and I bet I could reuse some of this code but I'm not sure how. Let's say I want to create a flag that will be equal to 1 if schyear? < = 5 and year = 0 for a given idr. For example
dat
idr? schyear? year 1? ? ? ? 4? ? ? ? ? -1 1? ? ? ? 5? ? ? ? ? ? 0 1? ? ? ? 6? ? ? ? ? ? 1 1? ? ? ? 7? ? ? ? ? ? 2 2? ? ? ? 9? ? ? ? ? ? 0 2? ? ? ? 10? ? ? ? ? ? 1 2? ? ? ? 11? ? ? ? ? 2 How could I make the data look like this? idr? schyear? year? flag 1? ? ? ? 4? ? ? ? ? -1? ? 1 1? ? ? ? 5? ? ? ? ? ? 0? ? 1 1? ? ? ? 6? ? ? ? ? ? 1? ? 1 1? ? ? ? 7? ? ? ? ? ? 2? ? 1 2? ? ? ? 9? ? ? ? ? ? 0? ? 0 2? ? ? ? 10? ? ? ? ? ? 1? ? 0 2? ? ? ? 11? ? ? ? ? 2? ? 0 I am not sure how to end up not getting both 0s and 1s for the 'flag' variable for an idr. For example, dat$flag = ifelse(schyear <= 5 & year ==0, 1, 0) Does not work because it will create: idr? schyear? year? flag 1? ? ? ? 4? ? ? ? ? -1? ? 0 1? ? ? ? 5? ? ? ? ? ? 0? ? 1 1? ? ? ? 6? ? ? ? ? ? 1? ? 0 1? ? ? ? 7? ? ? ? ? ? 2? ? 0 2? ? ? ? 9? ? ? ? ? ? 0? ? 0 2? ? ? ? 10? ? ? ? ? ? 1? ? 0 2? ? ? ? 11? ? ? ? ? 2? ? 0 And thus flag changes for an idr. Which it shouldn't. Thanks, Chris On Sat, Nov 3, 2012 at 5:50 PM, Christopher Desjardins <
cddesjardins at gmail.com> wrote:
Hi Jim, Thank you so much. That does exactly what I want. Chris On Sat, Nov 3, 2012 at 1:30 PM, jim holtman <jholtman at gmail.com> wrote:
x <- read.table(text = "idr? schyear year
+? 1? ? ? 8? ? 0 +? 1? ? ? 9? ? 1 +? 1? ? ? 10? NA +? 2? ? ? 4? NA +? 2? ? ? 5? -1 +? 2? ? ? 6? ? 0 +? 2? ? ? 7? ? 1 +? 2? ? ? 8? ? 2 +? 2? ? ? 9? ? 3 +? 2? ? ? 10? ? 4 +? 2? ? ? 11? NA +? 2? ? ? 12? ? 6 +? 3? ? ? 4? NA +? 3? ? ? 5? -2 +? 3? ? ? 6? -1 +? 3? ? ? 7? ? 0 +? 3? ? ? 8? ? 1 +? 3? ? ? 9? ? 2 +? 3? ? ? 10? ? 3 +? 3? ? ? 11? NA", header = TRUE)
? # you did not specify if there might be multiple contiguous NAs,
? # so there are a lot of checks to be made
? x.l <- lapply(split(x, x$idr), function(.idr){
+? ? # check for all NAs -- just return indeterminate state
+? ? if (sum(is.na(.idr$year)) == nrow(.idr)) return(.idr)
+? ? # repeat until all NAs have been fixed; takes care of contiguous
ones
+? ? while (any(is.na(.idr$year))){
+? ? ? ? # find all the NAs
+? ? ? ? for (i in which(is.na(.idr$year))){
+? ? ? ? ? ? if ((i == 1L) && (!is.na(.idr$year[i + 1L]))){
+? ? ? ? ? ? ? ? .idr$year[i] <- .idr$year[i + 1L] - 1
+? ? ? ? ? ? } else if ((i > 1L) && (!is.na(.idr$year[i - 1L]))){
+? ? ? ? ? ? ? ? .idr$year[i] <- .idr$year[i - 1L] + 1
+? ? ? ? ? ? } else if ((i < nrow(.idr)) && (!is.na(.idr$year[i +
1L]))){
+? ? ? ? ? ? ? ? .idr$year[i] <- .idr$year[i + 1L] -1
+? ? ? ? ? ? }
+? ? ? ? }
+? ? }
+? ? return(.idr)
+ })
do.call(rbind, x.l)
? ? ? idr schyear year 1.1? ? 1? ? ? 8? ? 0 1.2? ? 1? ? ? 9? ? 1 1.3? ? 1? ? ? 10? ? 2 2.4? ? 2? ? ? 4? -2 2.5? ? 2? ? ? 5? -1 2.6? ? 2? ? ? 6? ? 0 2.7? ? 2? ? ? 7? ? 1 2.8? ? 2? ? ? 8? ? 2 2.9? ? 2? ? ? 9? ? 3 2.10? 2? ? ? 10? ? 4 2.11? 2? ? ? 11? ? 5 2.12? 2? ? ? 12? ? 6 3.13? 3? ? ? 4? -3 3.14? 3? ? ? 5? -2 3.15? 3? ? ? 6? -1 3.16? 3? ? ? 7? ? 0 3.17? 3? ? ? 8? ? 1 3.18? 3? ? ? 9? ? 2 3.19? 3? ? ? 10? ? 3 3.20? 3? ? ? 11? ? 4
On Sat, Nov 3, 2012 at 1:14 PM, Christopher Desjardins <cddesjardins at gmail.com> wrote:
Hi, I have the following data:
data[1:20,c(1,2,20)]
idr? schyear year 1? ? ? 8? ? 0 1? ? ? 9? ? 1 1? ? ? 10? NA 2? ? ? 4? NA 2? ? ? 5? -1 2? ? ? 6? ? 0 2? ? ? 7? ? 1 2? ? ? 8? ? 2 2? ? ? 9? ? 3 2? ? ? 10? ? 4 2? ? ? 11? NA 2? ? ? 12? ? 6 3? ? ? 4? NA 3? ? ? 5? -2 3? ? ? 6? -1 3? ? ? 7? ? 0 3? ? ? 8? ? 1 3? ? ? 9? ? 2 3? ? ? 10? ? 3 3? ? ? 11? NA What I want to do is replace the NAs in the year variable with the following: idr? schyear year 1? ? ? 8? ? 0 1? ? ? 9? ? 1 1? ? ? 10? 2 2? ? ? 4? -2 2? ? ? 5? -1 2? ? ? 6? ? 0 2? ? ? 7? ? 1 2? ? ? 8? ? 2 2? ? ? 9? ? 3 2? ? ? 10? ? 4 2? ? ? 11? 5 2? ? ? 12? ? 6 3? ? ? 4? -3 3? ? ? 5? -2 3? ? ? 6? -1 3? ? ? 7? ? 0 3? ? ? 8? ? 1 3? ? ? 9? ? 2 3? ? ? 10? ? 3 3? ? ? 11? 4 I have no idea how to do this. What it needs to do is make sure that for each subject (idr) that it either adds a 1 if it is preceded by a value
in
year or subtracts a 1 if it comes before a year value. Does that make sense? I could do this in Excel but I am at a loss for
how
to do this in R. Please reply to me as well as the list if you respond. Thanks! Chris ? ? ? ? [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.