Subsetting problem data, 2 - R-help

Thu, Jul 19, 2012 4:17 PM #

Hello,

I guess so, and I can save you some typing.

vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste, collapse=""))


Then use it and see the result.

Rui Barradas

Em 20-07-2012 00:00, Lib Gray escreveu:

The variables are actually L11, L12, L21, L22, ... , L81, L82. Would just
creating a vector c(L11,... ,L82) be fine? (I'm about to try it, but I
wanted to check to see if that was going to be a big issue).

On Thu, Jul 19, 2012 at 3:27 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

Hello,

Try the following. The data is your example of Patient A through E, but
from the output of dput().

dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A",
"B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L),
     V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4,
     0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA,
     NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA,
     0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6,
     NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7,
     1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle",
"V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
-18L))

dat

nms <- names(dat)[grep("^V[1-9]$", names(dat))]
dd <- split(dat, dat$Patient)
fun <- function(x) any(is.na(x)) && any(!is.na(x))
ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))

dd[ix]
do.call(rbind, dd[ix])


I'm assuming that the variables names are as posted, V followed by one
single digit 1-9. To keep the Patients with complete cases just negate the
index 'ix', it's a logical index.
Note also that dput() is the best way of posting a data example.

Hope this helps,

Rui Barradas

Em 19-07-2012 15:15, Lib Gray escreveu:

Hello,

I didn't give enough information when I sent an query before, so I'm
trying
again with a more detailed explanation:

In this data set, each patient has a different number of measured
variables
(they represent tumors, so some people had 2 tumors, some had 5, etc). The
problem I have is that often in later cycles for a patient, tumors that
were originally measured are now missing (or a "new" tumor showed up). We
assume there are many different reasons for why a tumor would be measured
in one cycle and not another, and so I want to subset OUT the "problem"
patients to better study these patterns.

An example:

Patient  Cycle  V1  V2  V3  V4  V5
A  1  0.4  0.1  0.5  1.5  NA
A  2  0.3  0.2  0.5  1.6  NA
A  3  0.3  NA  0.6  1.7  NA
A  4  0.4  NA  0.4  1.8  NA
A  5  0.5  0.2  0.5  1.5  NA

I want to keep patient A; they have 4 measured tumors, but tumor 2 is
missing data for cycles 3 and 4

B  1  0.4  NA  NA  NA  NA
B  2  0.4  NA  NA  NA  NA

I do not want to keep patient B; they have 1 tumor that is measure
consistently in both cycles

C  1  0.9  0.9  0.9  NA  NA
C  3  0.3  0.5  0.6  NA  NA
C  4  NA  NA  NA  NA  NA
C  5  0.4  NA  NA  NA  NA

I do want to keep patient C; all their data is missing for cycle 4 and
cycle 5 only measured one tumor

D  1  0.2  0.5  NA  NA  NA
D  2  0.5  0.7  NA  NA  NA
D  4  0.6  0.4  NA  NA  NA
D  5  0.5  0.5  NA  NA  NA

I do not want patient D, their two tumors were measured each cycle

E  1  0.1  NA  NA  NA  NA
E  2  0.5  0.3  NA  NA  NA
E  3  0.4  0.3  NA  NA  NA

I DO want patient E; they only had one tumor register in Cycle 1, but
cycles 2 and 3 had two tumors.


Thanks for any help!

         [[alternative HTML version deleted]]

______________________________**________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html <http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.

Lib Gray

Thu, Jul 19, 2012 4:33 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120719/dafde045/attachment.pl>

Rui Barradas

Thu, Jul 19, 2012 4:55 PM #

Hello,

Sorry, forgot about that. It's trickier to write code without a dataset 
to test it.

Try

pattern <- "L[1-8][12]"

and after the grep print nms to see if it's right.

Rui Barradas

Em 20-07-2012 00:33, Lib Gray escreveu:

I'm getting this error message:

nms<-names(data)[grep(vars,names(data))]
Warning message:
In grep(vars, names(data)) :
   argument 'pattern' has length > 1 and only the first element will be used

Is there a way around this?


On Thu, Jul 19, 2012 at 6:17 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

Hello,

I guess so, and I can save you some typing.

vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste, collapse=""))


Then use it and see the result.

Rui Barradas

Em 20-07-2012 00:00, Lib Gray escreveu:

The variables are actually L11, L12, L21, L22, ... , L81, L82. Would just
creating a vector c(L11,... ,L82) be fine? (I'm about to try it, but I
wanted to check to see if that was going to be a big issue).

On Thu, Jul 19, 2012 at 3:27 PM, Rui Barradas <ruipbarradas at sapo.pt>
wrote:

  Hello,

Try the following. The data is your example of Patient A through E, but
from the output of dput().

dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A",
"B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L),
      V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4,
      0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA,
      NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA,
      0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6,
      NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7,
      1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle",
"V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
-18L))

dat

nms <- names(dat)[grep("^V[1-9]$", names(dat))]
dd <- split(dat, dat$Patient)
fun <- function(x) any(is.na(x)) && any(!is.na(x))
ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))

dd[ix]
do.call(rbind, dd[ix])


I'm assuming that the variables names are as posted, V followed by one
single digit 1-9. To keep the Patients with complete cases just negate
the
index 'ix', it's a logical index.
Note also that dput() is the best way of posting a data example.

Hope this helps,

Rui Barradas

Em 19-07-2012 15:15, Lib Gray escreveu:

  Hello,

I didn't give enough information when I sent an query before, so I'm
trying
again with a more detailed explanation:

In this data set, each patient has a different number of measured
variables
(they represent tumors, so some people had 2 tumors, some had 5, etc).
The
problem I have is that often in later cycles for a patient, tumors that
were originally measured are now missing (or a "new" tumor showed up).
We
assume there are many different reasons for why a tumor would be
measured
in one cycle and not another, and so I want to subset OUT the "problem"
patients to better study these patterns.

An example:

Patient  Cycle  V1  V2  V3  V4  V5
A  1  0.4  0.1  0.5  1.5  NA
A  2  0.3  0.2  0.5  1.6  NA
A  3  0.3  NA  0.6  1.7  NA
A  4  0.4  NA  0.4  1.8  NA
A  5  0.5  0.2  0.5  1.5  NA

I want to keep patient A; they have 4 measured tumors, but tumor 2 is
missing data for cycles 3 and 4

B  1  0.4  NA  NA  NA  NA
B  2  0.4  NA  NA  NA  NA

I do not want to keep patient B; they have 1 tumor that is measure
consistently in both cycles

C  1  0.9  0.9  0.9  NA  NA
C  3  0.3  0.5  0.6  NA  NA
C  4  NA  NA  NA  NA  NA
C  5  0.4  NA  NA  NA  NA

I do want to keep patient C; all their data is missing for cycle 4 and
cycle 5 only measured one tumor

D  1  0.2  0.5  NA  NA  NA
D  2  0.5  0.7  NA  NA  NA
D  4  0.6  0.4  NA  NA  NA
D  5  0.5  0.5  NA  NA  NA

I do not want patient D, their two tumors were measured each cycle

E  1  0.1  NA  NA  NA  NA
E  2  0.5  0.3  NA  NA  NA
E  3  0.4  0.3  NA  NA  NA

I DO want patient E; they only had one tumor register in Cycle 1, but
cycles 2 and 3 had two tumors.


Thanks for any help!

          [[alternative HTML version deleted]]

______________________________****________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/****listinfo/r-help<https://stat.ethz.ch/mailman/**listinfo/r-help>
<https://stat.**ethz.ch/mailman/listinfo/r-**help<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html <http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.

Lib Gray

Thu, Jul 19, 2012 5:17 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120719/fbefe5e9/attachment.pl>

Chris Campbell

Fri, Jul 20, 2012 1:09 AM #

Hi!

# toy data   

toyData <- data.frame(x = 1:4, y = 5:8, xy = 9:12, z = 13:16)    
vars <- c("x", "z")      
    
# "pattern" is an argument of grep      
    
args(grep)      
    
# "pattern" must only consist of a single element     
# otherwise only the first element is used      
    
grep(pattern = vars, x = names(toyData))       
    
# one way to do this - a loop     
# create a vector to collect the output of each call    
     
toyColIndexList <- vector(length = length(vars), mode = "list")    
    
# grep each element in turn     
    
for (i in seq_along(vars)) {      
    toyColIndexList[[i]] <- grep(pattern = vars[i], x = names(toyData))     
}      
     
# combine all of the answers     
     
toyColIndex <- unlist(toyColIndexList)     
    
# remove duplicated columns if present    
    
toyColIndex <- toyColIndex[!duplicated(toyColIndex)]     
     
# select the elements we want    
    
toyData[, toyColIndex]     

      
# alternatively we could use regular expressions	   
     
grep(pattern = ("x|z"), x = names(toyData))    
     
# hope this helps

Best wishes

Chris

Chris Campbell
Mango Solutions
Data Analysis that Delivers
http://www.mango-solutions.com
+44 (0) 1249 705 450  


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lib Gray
Sent: 20 July 2012 01:17
To: Rui Barradas
Cc: r-help
Subject: Re: [R] Subsetting problem data, 2

I'm still getting the message (if this is what you were suggesting I try).
The data set I'm using has many more columns other than these variables; could that be a problem? I didn't think it would affect it.

Warning message:
In grep(vars, names(data)) :
  argument 'pattern' has length > 1 and only the first element will be used

On Thu, Jul 19, 2012 at 6:55 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

Hello,

Sorry, forgot about that. It's trickier to write code without a 
dataset to test it.

Try

pattern <- "L[1-8][12]"

and after the grep print nms to see if it's right.

Rui Barradas

Em 20-07-2012 00:33, Lib Gray escreveu:

I'm getting this error message:

nms<-names(data)[grep(vars,**names(data))]
Warning message:
In grep(vars, names(data)) :
   argument 'pattern' has length > 1 and only the first element will 
be used

Is there a way around this?


On Thu, Jul 19, 2012 at 6:17 PM, Rui Barradas <ruipbarradas at sapo.pt>
wrote:

 Hello,

I guess so, and I can save you some typing.

vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste, 
collapse=""))


Then use it and see the result.

Rui Barradas

Em 20-07-2012 00:00, Lib Gray escreveu:

 The variables are actually L11, L12, L21, L22, ... , L81, L82. 
Would

just
creating a vector c(L11,... ,L82) be fine? (I'm about to try it, 
but I wanted to check to see if that was going to be a big issue).

On Thu, Jul 19, 2012 at 3:27 PM, Rui Barradas 
<ruipbarradas at sapo.pt>
wrote:

  Hello,

Try the following. The data is your example of Patient A through 
E, but from the output of dput().

dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = 
c("A", "B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, 
3L, 4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L),
      V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4,
      0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA,
      NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA,
      0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6,
      NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7,
      1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle", 
"V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = 
c(NA,
-18L))

dat

nms <- names(dat)[grep("^V[1-9]$", names(dat))] dd <- split(dat, 
dat$Patient) fun <- function(x) any(is.na(x)) && any(!is.na(x)) ix 
<- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))

dd[ix]
do.call(rbind, dd[ix])


I'm assuming that the variables names are as posted, V followed by 
one single digit 1-9. To keep the Patients with complete cases 
just negate the index 'ix', it's a logical index.
Note also that dput() is the best way of posting a data example.

Hope this helps,

Rui Barradas

Em 19-07-2012 15:15, Lib Gray escreveu:

  Hello,

I didn't give enough information when I sent an query before, so 
I'm trying again with a more detailed explanation:

In this data set, each patient has a different number of measured 
variables (they represent tumors, so some people had 2 tumors, 
some had 5, etc).
The
problem I have is that often in later cycles for a patient, 
tumors that were originally measured are now missing (or a "new" 
tumor showed up).
We
assume there are many different reasons for why a tumor would be 
measured in one cycle and not another, and so I want to subset 
OUT the "problem"
patients to better study these patterns.

An example:

Patient  Cycle  V1  V2  V3  V4  V5 A  1  0.4  0.1  0.5  1.5  NA A  
2  0.3  0.2  0.5  1.6  NA A  3  0.3  NA  0.6  1.7  NA A  4  0.4  
NA  0.4  1.8  NA A  5  0.5  0.2  0.5  1.5  NA

I want to keep patient A; they have 4 measured tumors, but tumor 
2 is missing data for cycles 3 and 4

B  1  0.4  NA  NA  NA  NA
B  2  0.4  NA  NA  NA  NA

I do not want to keep patient B; they have 1 tumor that is 
measure consistently in both cycles

C  1  0.9  0.9  0.9  NA  NA
C  3  0.3  0.5  0.6  NA  NA
C  4  NA  NA  NA  NA  NA
C  5  0.4  NA  NA  NA  NA

I do want to keep patient C; all their data is missing for cycle 
4 and cycle 5 only measured one tumor

D  1  0.2  0.5  NA  NA  NA
D  2  0.5  0.7  NA  NA  NA
D  4  0.6  0.4  NA  NA  NA
D  5  0.5  0.5  NA  NA  NA

I do not want patient D, their two tumors were measured each 
cycle

E  1  0.1  NA  NA  NA  NA
E  2  0.5  0.3  NA  NA  NA
E  3  0.4  0.3  NA  NA  NA

I DO want patient E; they only had one tumor register in Cycle 1, 
but cycles 2 and 3 had two tumors.


Thanks for any help!

          [[alternative HTML version deleted]]

______________________________******________________
R-help at r-project.org mailing list 
https://stat.ethz.ch/mailman/******listinfo/r-help<https://stat.e
thz.ch/mailman/****listinfo/r-help>
<https://**stat.ethz.ch/mailman/****listinfo/r-help<https://stat.
ethz.ch/mailman/**listinfo/r-help>

<https://stat.**ethz.ch/**mailman/listinfo/r-**help<http://ethz.c
h/mailman/listinfo/r-**help> 
<http**s://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.et
hz.ch/mailman/listinfo/r-help>


PLEASE do read the posting guide http://www.R-project.org/** 
posting-guide.html 
<http://www.R-project.org/****posting-guide.html<http://www.R-pro
ject.org/**posting-guide.html> 
<http://www.**R-project.org/posting-guide.**html<http://www.R-pro
ject.org/posting-guide.html>


and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--

LEGAL NOTICE\ \ This message is intended for the use of ...{{dropped:18}}

arun

Fri, Jul 20, 2012 4:50 AM #

Hi,

Just a doubt regarding the dataset.

Suppose, I include two more patients F and G with different missing values as in this new dataset and run the code.
dat1<-read.table(text="
Patient? Cycle? V1? V2? V3? V4? V5
A? 1? 0.4? 0.1? 0.5? 1.5? NA
A? 2? 0.3? 0.2? 0.5? 1.6? NA
A? 3? 0.3? NA? 0.6? 1.7? NA
A? 4? 0.4? NA? 0.4? 1.8? NA
A? 5? 0.5? 0.2? 0.5? 1.5? NA
B? 1? 0.4? NA? NA? NA? NA
B? 2? 0.4? NA? NA? NA? NA
C? 1? 0.9? 0.9? 0.9? NA? NA
C? 3? 0.3? 0.5? 0.6? NA? NA
C? 4? NA? NA? NA? NA? NA
C? 5? 0.4? NA? NA? NA? NA
D? 1? 0.2? 0.5? NA? NA? NA
D? 2? 0.5? 0.7? NA? NA? NA
D? 4? 0.6? 0.4? NA? NA? NA
D? 5? 0.5? 0.5? NA? NA? NA
E? 1? 0.1? NA? NA? NA? NA
E? 2? 0.5? 0.3? NA? NA? NA
E? 3? 0.4? 0.3? NA? NA? NA
F? 1? 0.2? NA?? 0.2 0.5 0.1? 
F? 2? 0.5? NA?? 0.4 NA?? 0.3
F? 3? 0.6? NA?? NA? 0.3? 0.2
G? 1? 0.2?? 0.5? NA? 0.5? 0.2
G? 3? 0.4?? 0.3? 0.4 NA? 0.3
G? 4? 0.6?? 0.2? 0.2? 0.4 NA
",sep="",header=TRUE)


nms <- names(dat1)[grep("^V[1-9]$", names(dat1))]
dd <- split(dat1, dat1$Patient)
fun <- function(x) any(is.na(x)) && any(!is.na(x))
ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))

dd[ix]
do.call(rbind, dd[ix])
???? Patient Cycle? V1? V2? V3? V4? V5
A.1??????? A???? 1 0.4 0.1 0.5 1.5? NA
A.2??????? A???? 2 0.3 0.2 0.5 1.6? NA
A.3??????? A???? 3 0.3? NA 0.6 1.7? NA
A.4??????? A???? 4 0.4? NA 0.4 1.8? NA
A.5??????? A???? 5 0.5 0.2 0.5 1.5? NA
C.8??????? C???? 1 0.9 0.9 0.9? NA? NA
C.9??????? C???? 3 0.3 0.5 0.6? NA? NA
C.10?????? C???? 4? NA? NA? NA? NA? NA
C.11?????? C???? 5 0.4? NA? NA? NA? NA
E.16?????? E???? 1 0.1? NA? NA? NA? NA
E.17?????? E???? 2 0.5 0.3? NA? NA? NA
E.18?????? E???? 3 0.4 0.3? NA? NA? NA
F.19?????? F???? 1 0.2? NA 0.2 0.5 0.1
F.20?????? F???? 2 0.5? NA 0.4? NA 0.3
F.21?????? F???? 3 0.6? NA? NA 0.3 0.2
G.22?????? G???? 1 0.2 0.5? NA 0.5 0.2
G.23?????? G???? 3 0.4 0.3 0.4? NA 0.3
G.24?????? G???? 4 0.6 0.2 0.2 0.4? NA



Then, patients F and G are included in the list.? But, according to your initial statement, V1 and V2 are the most important variables.? If B is not included in the list because B has missing values for both cycles of B, then do you know think F or G should be included in the list.? Only difference is that F and G have missing values in other variables which do not behave consistently.? Do you have situations like that?

A.K.








----- Original Message -----
From: Lib Gray <libgray3827 at gmail.com>
To: Rui Barradas <ruipbarradas at sapo.pt>
Cc: r-help <r-help at r-project.org>
Sent: Thursday, July 19, 2012 8:17 PM
Subject: Re: [R] Subsetting problem data, 2

I'm still getting the message (if this is what you were suggesting I try).
The data set I'm using has many more columns other than these variables;
could that be a problem? I didn't think it would affect it.

Warning message:
In grep(vars, names(data)) :
? argument 'pattern' has length > 1 and only the first element will be used

On Thu, Jul 19, 2012 at 6:55 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

Hello,

Sorry, forgot about that. It's trickier to write code without a dataset to
test it.

Try

pattern <- "L[1-8][12]"

and after the grep print nms to see if it's right.

Rui Barradas

Em 20-07-2012 00:33, Lib Gray escreveu:

I'm getting this error message:

nms<-names(data)[grep(vars,**names(data))]
Warning message:
In grep(vars, names(data)) :
? ? argument 'pattern' has length > 1 and only the first element will be
used

Is there a way around this?


On Thu, Jul 19, 2012 at 6:17 PM, Rui Barradas <ruipbarradas at sapo.pt>
wrote:

? Hello,

I guess so, and I can save you some typing.

vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste, collapse=""))


Then use it and see the result.

Rui Barradas

Em 20-07-2012 00:00, Lib Gray escreveu:

? The variables are actually L11, L12, L21, L22, ... , L81, L82. Would

just
creating a vector c(L11,... ,L82) be fine? (I'm about to try it, but I
wanted to check to see if that was going to be a big issue).

On Thu, Jul 19, 2012 at 3:27 PM, Rui Barradas <ruipbarradas at sapo.pt>
wrote:

?  Hello,

Try the following. The data is your example of Patient A through E, but
from the output of dput().

dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A",
"B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L),
? ? ?  V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4,
? ? ?  0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA,
? ? ?  NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA,
? ? ?  0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6,
? ? ?  NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7,
? ? ?  1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
? ? ?  NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
? ? ?  NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle",
"V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
-18L))

dat

nms <- names(dat)[grep("^V[1-9]$", names(dat))]
dd <- split(dat, dat$Patient)
fun <- function(x) any(is.na(x)) && any(!is.na(x))
ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))

dd[ix]
do.call(rbind, dd[ix])


I'm assuming that the variables names are as posted, V followed by one
single digit 1-9. To keep the Patients with complete cases just negate
the
index 'ix', it's a logical index.
Note also that dput() is the best way of posting a data example.

Hope this helps,

Rui Barradas

Em 19-07-2012 15:15, Lib Gray escreveu:

?  Hello,

I didn't give enough information when I sent an query before, so I'm
trying
again with a more detailed explanation:

In this data set, each patient has a different number of measured
variables
(they represent tumors, so some people had 2 tumors, some had 5, etc).
The
problem I have is that often in later cycles for a patient, tumors
that
were originally measured are now missing (or a "new" tumor showed up).
We
assume there are many different reasons for why a tumor would be
measured
in one cycle and not another, and so I want to subset OUT the
"problem"
patients to better study these patterns.

An example:

Patient? Cycle? V1? V2? V3? V4? V5
A? 1? 0.4? 0.1? 0.5? 1.5? NA
A? 2? 0.3? 0.2? 0.5? 1.6? NA
A? 3? 0.3? NA? 0.6? 1.7? NA
A? 4? 0.4? NA? 0.4? 1.8? NA
A? 5? 0.5? 0.2? 0.5? 1.5? NA

I want to keep patient A; they have 4 measured tumors, but tumor 2 is
missing data for cycles 3 and 4

B? 1? 0.4? NA? NA? NA? NA
B? 2? 0.4? NA? NA? NA? NA

I do not want to keep patient B; they have 1 tumor that is measure
consistently in both cycles

C? 1? 0.9? 0.9? 0.9? NA? NA
C? 3? 0.3? 0.5? 0.6? NA? NA
C? 4? NA? NA? NA? NA? NA
C? 5? 0.4? NA? NA? NA? NA

I do want to keep patient C; all their data is missing for cycle 4 and
cycle 5 only measured one tumor

D? 1? 0.2? 0.5? NA? NA? NA
D? 2? 0.5? 0.7? NA? NA? NA
D? 4? 0.6? 0.4? NA? NA? NA
D? 5? 0.5? 0.5? NA? NA? NA

I do not want patient D, their two tumors were measured each cycle

E? 1? 0.1? NA? NA? NA? NA
E? 2? 0.5? 0.3? NA? NA? NA
E? 3? 0.4? 0.3? NA? NA? NA

I DO want patient E; they only had one tumor register in Cycle 1, but
cycles 2 and 3 had two tumors.


Thanks for any help!

? ? ? ? ?  [[alternative HTML version deleted]]

______________________________******________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/******listinfo/r-help<https://stat.ethz.ch/mailman/****listinfo/r-help>
<https://**stat.ethz.ch/mailman/****listinfo/r-help<https://stat.ethz.ch/mailman/**listinfo/r-help>

<https://stat.**ethz.ch/**mailman/listinfo/r-**help<http://ethz.ch/mailman/listinfo/r-**help>
<http**s://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>


PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html <http://www.R-project.org/****posting-guide.html<http://www.R-project.org/**posting-guide.html>
<http://www.**R-project.org/posting-guide.**html<http://www.R-project.org/posting-guide.html>


and provide commented, minimal, self-contained, reproducible code.

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.