Skip to content

Saving misclassified records into dataframe within a loop

10 messages · John Dennison, Phil Spector, David Winsemius +1 more

#
John -
    In your example, the misclassified observations (as defined by
your predict.function) will be

   kyphosis[kyphosis$Kyphosis == 'absent' & prediction[,1] != 1,]

so you could start from there.
 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Thu, 12 May 2011, John Dennison wrote:

            
#
On May 12, 2011, at 5:41 PM, John Dennison wrote:

            
Are we  supposed to know where to find 'testing" (and if we cannot  
find it, how is the R interpreter going to find it)?
David Winsemius, MD
West Hartford, CT
#
On May 12, 2011, at 6:26 PM, John Dennison wrote:

            
I think your next task is figuring out if this expression ,,,, which  
you have not explained at all ... is really doing what you intend:

(kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0

I would have guessed that you might be intending:

kyphosis$Kyphosis[i]=="absent" & prediction[i,1]==1

Since it will hold about half the time:

 > sum(kyphosis$Kyphosis[1:81]=="absent" & prediction[1:81,1]==1)
[1] 41
David Winsemius, MD
West Hartford, CT
#
Your question concerned how to return data from a function.
It looks like you are using the following idiom
to save the data a function generates:
  f <- function() {
     result <- ... some calculations ...
     save(result, file="result.Rdata")
  }
  load("result.Rdata")
  ... now you will find a dataset called "result" ...
The save call stores f's local dataset called 'result' in
a file and the load call loads the data from the file into
a dataset also called result but in a different frame
(the frame of the caller of f, not f's frame).

Don't use save() and load() for this sort of thing.
It will mystify people reading your code and make the
code difficult to reuse.

Instead return the value of f's result from f and
use the assignment operator when calling f to store
that return value in the caller's frame:
  f <- function() {
     fResult <- ... some calculations ...
     fResult # the return value of f
  }
  result <- f()
When f is finished all variables in it disappear and its
return value is passed back to its caller, who can name it or
use it directly in another function call.

You didn't ask about the following, but the code
  results <- as.data.frame(1)
  j <- 0
  for (i in 1:length(kyphosis$Kyphosis)) {
    if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
      j <- j+1
      results[j,] <- row.names(kyphosis[c(i),])
    }
  }
may be written without the for loop as
  isMisclassified <- ((kyphosis$Kyphosis=="absent") ==
(prediction[,1]==1)) == 0
  results <- data.frame("1" = rownames(kyphosis)[isMisclassified],
check.names=FALSE, stringsAsFactors=FALSE)
Note the the isMisclassified<- line is your line with
the subscripts 'i' taken out, as we want to evaluate the condition for
all i.
I find the intent of that easier to understand than that
of the code in the for loop.

I don't know why you want 'results' to be a data.frame instead
of a simple character vector; the expression
  rownames(kyphosis)[isMisclassified]
would give you that.

Also, since 'i' is an integer,
  c(i)
is just a long-winded way of saying
  i

The test
  logicalValue == 0
really ought to have the same type of data on both sides
of the ==, as in
  logicalValue == FALSE
or, even better in this case,
  !logicalValue # bang means not
or, since logicalValue is x==y you could replace !(x==y) with
  x != y
so the following is equivalent to what you wrote
  isMisclassified <- (kyphosis$Kyphosis=="absent") !=
(prediction[,1]==1)
(and, in my opinion, the latter is easier to understand).

Finally, you defined a function of one argument, x, and didn't
use the argument.  Functions don't need arguments,
   f <- function() {
      ....
   }
would do just as well.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
On May 12, 2011, at 6:49 PM, John Dennison wrote:

            
23 cases out of 81
It now will run. It just won't populate a dataframe because you  
initialized it with on column. Try instead:

results<-data.frame(Kyphosis=NA, Age=NA, Number=NA, Start=NA)

You never reference 'x' so just leave it out.

The place where you use kyphosis[ c(i), ] is a bit ugly. You can just  
use kyphosis[ i, ]

And don't put the row.names in results... put the whole row if that is  
what you want.

#create output data.frame
results<-data.frame(Kyphosis=NA, Age=NA, Number=NA, Start=NA)

#misclassification index function

predict.function <- function(){
   j<-0

for (i in 1:length(kyphosis$Kyphosis)) {
if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){

  j<-j+1
results[j,]<-kyphosis[ i,]

print( kyphosis[i,])
} }
{
print(results)
save(results, file="results") } }


predict.function()
David Winsemius, MD
West Hartford, CT