Skip to content

Subsetting problem data, 2

6 messages · Lib Gray, Rui Barradas, Chris Campbell +1 more

#
Hello,

I guess so, and I can save you some typing.

vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste, collapse=""))


Then use it and see the result.

Rui Barradas

Em 20-07-2012 00:00, Lib Gray escreveu:
#
Hello,

Sorry, forgot about that. It's trickier to write code without a dataset 
to test it.

Try

pattern <- "L[1-8][12]"

and after the grep print nms to see if it's right.

Rui Barradas

Em 20-07-2012 00:33, Lib Gray escreveu:
#
Hi!

# toy data   

toyData <- data.frame(x = 1:4, y = 5:8, xy = 9:12, z = 13:16)    
vars <- c("x", "z")      
    
# "pattern" is an argument of grep      
    
args(grep)      
    
# "pattern" must only consist of a single element     
# otherwise only the first element is used      
    
grep(pattern = vars, x = names(toyData))       
    
# one way to do this - a loop     
# create a vector to collect the output of each call    
     
toyColIndexList <- vector(length = length(vars), mode = "list")    
    
# grep each element in turn     
    
for (i in seq_along(vars)) {      
    toyColIndexList[[i]] <- grep(pattern = vars[i], x = names(toyData))     
}      
     
# combine all of the answers     
     
toyColIndex <- unlist(toyColIndexList)     
    
# remove duplicated columns if present    
    
toyColIndex <- toyColIndex[!duplicated(toyColIndex)]     
     
# select the elements we want    
    
toyData[, toyColIndex]     

      
# alternatively we could use regular expressions	   
     
grep(pattern = ("x|z"), x = names(toyData))    
     
# hope this helps

Best wishes

Chris

Chris Campbell
Mango Solutions
Data Analysis that Delivers
http://www.mango-solutions.com
+44 (0) 1249 705 450  


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lib Gray
Sent: 20 July 2012 01:17
To: Rui Barradas
Cc: r-help
Subject: Re: [R] Subsetting problem data, 2

I'm still getting the message (if this is what you were suggesting I try).
The data set I'm using has many more columns other than these variables; could that be a problem? I didn't think it would affect it.
Warning message:
In grep(vars, names(data)) :
  argument 'pattern' has length > 1 and only the first element will be used

        
On Thu, Jul 19, 2012 at 6:55 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

            
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--

LEGAL NOTICE\ \ This message is intended for the use of ...{{dropped:18}}
#
Hi,

Just a doubt regarding the dataset.

Suppose, I include two more patients F and G with different missing values as in this new dataset and run the code.
dat1<-read.table(text="
Patient? Cycle? V1? V2? V3? V4? V5
A? 1? 0.4? 0.1? 0.5? 1.5? NA
A? 2? 0.3? 0.2? 0.5? 1.6? NA
A? 3? 0.3? NA? 0.6? 1.7? NA
A? 4? 0.4? NA? 0.4? 1.8? NA
A? 5? 0.5? 0.2? 0.5? 1.5? NA
B? 1? 0.4? NA? NA? NA? NA
B? 2? 0.4? NA? NA? NA? NA
C? 1? 0.9? 0.9? 0.9? NA? NA
C? 3? 0.3? 0.5? 0.6? NA? NA
C? 4? NA? NA? NA? NA? NA
C? 5? 0.4? NA? NA? NA? NA
D? 1? 0.2? 0.5? NA? NA? NA
D? 2? 0.5? 0.7? NA? NA? NA
D? 4? 0.6? 0.4? NA? NA? NA
D? 5? 0.5? 0.5? NA? NA? NA
E? 1? 0.1? NA? NA? NA? NA
E? 2? 0.5? 0.3? NA? NA? NA
E? 3? 0.4? 0.3? NA? NA? NA
F? 1? 0.2? NA?? 0.2 0.5 0.1? 
F? 2? 0.5? NA?? 0.4 NA?? 0.3
F? 3? 0.6? NA?? NA? 0.3? 0.2
G? 1? 0.2?? 0.5? NA? 0.5? 0.2
G? 3? 0.4?? 0.3? 0.4 NA? 0.3
G? 4? 0.6?? 0.2? 0.2? 0.4 NA
",sep="",header=TRUE)


nms <- names(dat1)[grep("^V[1-9]$", names(dat1))]
dd <- split(dat1, dat1$Patient)
fun <- function(x) any(is.na(x)) && any(!is.na(x))
ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))

dd[ix]
do.call(rbind, dd[ix])
???? Patient Cycle? V1? V2? V3? V4? V5
A.1??????? A???? 1 0.4 0.1 0.5 1.5? NA
A.2??????? A???? 2 0.3 0.2 0.5 1.6? NA
A.3??????? A???? 3 0.3? NA 0.6 1.7? NA
A.4??????? A???? 4 0.4? NA 0.4 1.8? NA
A.5??????? A???? 5 0.5 0.2 0.5 1.5? NA
C.8??????? C???? 1 0.9 0.9 0.9? NA? NA
C.9??????? C???? 3 0.3 0.5 0.6? NA? NA
C.10?????? C???? 4? NA? NA? NA? NA? NA
C.11?????? C???? 5 0.4? NA? NA? NA? NA
E.16?????? E???? 1 0.1? NA? NA? NA? NA
E.17?????? E???? 2 0.5 0.3? NA? NA? NA
E.18?????? E???? 3 0.4 0.3? NA? NA? NA
F.19?????? F???? 1 0.2? NA 0.2 0.5 0.1
F.20?????? F???? 2 0.5? NA 0.4? NA 0.3
F.21?????? F???? 3 0.6? NA? NA 0.3 0.2
G.22?????? G???? 1 0.2 0.5? NA 0.5 0.2
G.23?????? G???? 3 0.4 0.3 0.4? NA 0.3
G.24?????? G???? 4 0.6 0.2 0.2 0.4? NA



Then, patients F and G are included in the list.? But, according to your initial statement, V1 and V2 are the most important variables.? If B is not included in the list because B has missing values for both cycles of B, then do you know think F or G should be included in the list.? Only difference is that F and G have missing values in other variables which do not behave consistently.? Do you have situations like that?

A.K.








----- Original Message -----
From: Lib Gray <libgray3827 at gmail.com>
To: Rui Barradas <ruipbarradas at sapo.pt>
Cc: r-help <r-help at r-project.org>
Sent: Thursday, July 19, 2012 8:17 PM
Subject: Re: [R] Subsetting problem data, 2

I'm still getting the message (if this is what you were suggesting I try).
The data set I'm using has many more columns other than these variables;
could that be a problem? I didn't think it would affect it.
Warning message:
In grep(vars, names(data)) :
? argument 'pattern' has length > 1 and only the first element will be used

        
On Thu, Jul 19, 2012 at 6:55 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

            
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.