File checking problem
2009/3/5 ling ling <metal_licaling at live.com>:
Dear all,
I am a newcomer to R programming, I met the problem:
I have a lot of .txt files in my directory.
Firstly, I check whether the file satisfies the conditions:
1.empty
2.the "Rep" column of the file has no "useractivity_idle" or
"useractivity_act"
3.even The "rep" has both of them, numbers of "useractivity_idle"==numbers of "useractivity_act"==1
If the file has one of those conditions, skip this file, jump to and read the next .txt file:
I made the programming as:
name<-list.files(path = ".", pattern = NULL, all.files = FALSE,
? ? ? ? ? full.names = FALSE, recursive = FALSE,
? ? ? ? ? ignore.case = FALSE)
for(k in 1:length(name)){
log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
x<-which(log1$Rep=="useractivity_act")
y<-which(log1$Rep=="useractivity_idle")
while(all(log1$Rep!="useractivity_act")||all(log1$Rep!="useractivity_idle")||(length(x)==1
&& length(y)==1)||(file.info(name[k])$size== 0)){
k=k+1
log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
}
........
}
But I always get the following information:
Error in file(file, "r") : cannot open the connection
In addition: Warning message:
In file(file, "r") : cannot open file 'NA': No such file or directory
I have been exploring this for long time, any help would be appreciated. Thanks a lot!
You are trying to read one more file than you have! Simplified your
code looks like this:
name = list.files(...)
for(k in 1:length(name)){
log1 = read.table(name[k],....)
while(something){
k =k + 1
log1 = read.table(name[k],...) # 1
}
}
What will happen is that when the last file is read at point #1, the
loop goes round again, k becomes more than the length of name, and it
will fail at #1 again.
I think you've overcomplicated it. You just need one loop with an
'if' in it. I'd write it as:
processFiles = function(){
name<-list.files(path = ".", pattern = NULL, all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE)
for(k in 1:length(name)){
log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
if(testCondition(log1)){
cat("Processing ",name[k],"\n")
processLog(log1)
}else{
cat("Skipping ",name[k],"\n")
}
}
}
Then you need two more functions, testCondition and processLog.
testCondition takes a data frame and decides whether you want to
process it or note. I'm not sure I've got the test logic right here,
but you should get the idea:
`testCondition` <-
function(log1){
## test for Rep column:
if(!any(names(log1)=="Rep"))return(FALSE)
## test active/idle count
nAct = sum(log1$Rep == "useractivity_act")
nIdle = sum(log1$Rep == "useractivity_idle")
## if we have no active or idle, return False
if(nAct + nIdle == 0)return(FALSE)
## if we only have one of either, return False
if(nAct == 1 || nIdle ==1) return(FALSE)
## maybe some other tests here?
return(TRUE)
}
here is a simple processLog function that just prints the summary of
the data frame. Put whatever you want in here:
`processLog` <-
function(log1){
## for example:
print(summary(log1))
}
How's that? Note the use of comments and breaking the code up into
small independent, testable functions.
Barry