Why does matrix selection behave differently when using which? - R-help

Mon, Dec 17, 2012 11:22 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121217/a0689358/attachment.pl>

Berend Hasselman

Mon, Dec 17, 2012 11:39 AM #

On 17-12-2012, at 20:22, Asis Hallab wrote:

And how are we supposed to call sumRows()?

sumRows(???, ???

Berend

David Winsemius

Mon, Dec 17, 2012 12:00 PM #

On Dec 17, 2012, at 11:22 AM, Asis Hallab wrote:

What part of "minimal" example are you having difficulty understanding? That zip file expands to a 1.8 MB file!

Since it has a header line, you will be creating all factors and it's doubtful you are getting what you want.

Instead:

 t <- read.table("test.tbl", header=TRUE)

'ps'? What is ps????

I suspect that it is not `which` that is the problem, but rahter your understanding of how `if` processes vectors. (This also should be simplified greatly to avoid stepping through vectors one element at a time.)

You didn't do anything with that result!

That value will not depend in any manner on what preceded it.  ???? It will simply be the number of rows in the local copy of "t"

You goal is _only_ to get a count? 

Why not just this:

 sum( tbl[!is.na(tbl$Domain.Architecture.Distance), "Domain.Architecture.Distance" ] == x )

E.g.:

[1] 3440

You should probably be creating a factor variable with `cut` to create reasonable intervals for grouping, and if you do not know this it suggests you need to do more stufy of the text or introductory materials.To get a quick look at the distribution this is useful"

plot( density(tbl[!is.na(tbl$Domain.Architecture.Distance), "Domain.Architecture.Distance" ] ))

(125 KB file so not attached)

(0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8] (0.8,0.9]   (0.9,1] 
      616      1864       328       103       923      1763      1151      2490      3709     38563

The question ... as yet unanswered ....  is _how_ exactly are you calling that function. You posted a link to data "t" but there is no code that calls that function with the data. I do not see anything that would resemble a "ps"-object.

(See above.)

Please read the Posting Guide and learn to post in plain text.

David Winsemius
Alameda, CA, USA

Asis Hallab

Mon, Dec 17, 2012 12:03 PM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121217/2656c2b2/attachment.pl>

Berend Hasselman

Mon, Dec 17, 2012 12:31 PM #

On 17-12-2012, at 21:03, Asis Hallab wrote:

Use this alternative sumRows

sumRows.1 <- function( tbl, ps ) {
 sum(
   sapply(ps,
     function(x) {
       t <- if ( is.na(x) ) {
         tbl[ is.na(tbl[ , "Domain.Architecture.Distance" ] ), ,drop=F]
       } else {
		# explicit check for NA
         tbl[ !is.na(tbl[ , "Domain.Architecture.Distance" ]) & tbl[ , "Domain.Architecture.Distance" ] == x , ,drop=F]
       }
       nrow(t)
     }
   )
 )
}

z <- sort( unique( t[,"Domain.Architecture.Distance"] ) )

sumRows(t,z)
sumRows.1(t,z)

You must check with is.na() when not using which.
More insight can be gained by reading the help for Logical operators.
Try

?'!'

and read the bit about NA.
I'm too lazy to check if the modifcation with !is.na completely accounts for the difference between  the which and the not which versions.

And please don't use t as an object name.

Berend