Skip to content

counting columns that fulfill specific criteria

5 messages · pguilha, PIKAL Petr, Nick Sabbe

#
Hi,

I have a matrix (pwdiff in the example below) with ~480000 rows and 780
columns.
For each row, I want to get the percentage of columns that have an absolute
value above a certain threshold "t". I then want to allocate that percentage
to matrix 'perc' in the corresponding row. Below is my attempt at doing
this, but it does not work: I get 'replacement has length zero'. Any help
would be much appreciated!!

perc<-matrix(c(1:nrow(pwdiff)))
for (x in 1:nrow(pwdiff))
perc[x]<-(((ncol(pwdiff[,abs(pwdiff[x,]>=t)]))/ncol(pwdiff))*100)

I should add that my data has NAs in some rows and not others (but I do not
want to just ignore rows that have NAs)

Thanks!

Paul

--
View this message in context: http://r.789695.n4.nabble.com/counting-columns-that-fulfill-specific-criteria-tp3622265p3622265.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi
absolute
percentage
help
As
Error in nrow(pwdiff) : object 'pwdiff' not found
gives an error we cannot directly check your code.
rowSums(pwdiff>=t, na.rm=T)/ncol(pwdiff)*100

or maybe

rowSums(abs(pwdiff)>=t, na.rm=T)/ncol(pwdiff)*100

but I can be completely wrong.

Regards
Petr
not
http://www.R-project.org/posting-guide.html
#
Hello Paul.

You could try something like
perc<-apply(pwdiff, 1 function(currow){
	mean(abs(currow) > t, na.rm=TRUE)*100
})

I haven't tested this, as you did not provide a sample pwdiff. You should
probably check ?apply for more info.

Two suggestions: probably best not to name any variable t, as this is also
the function for transposing a matrix, and could end up being confusing at
the least. Second: for most practical purposes, it's better to leave out the
*100.

Good luck,


Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove
#
Thanks for your reply, but that is not quite what I am looking for...I do not
want to add up all the values in the row, I want to get the number of
columns in each row that meet the criteria and then get that as a
percentage....

my understanding is that the rowSums function adds up the values does it
not? I tried your code anyways and it did not work:

Error in abs(pwdiff) >= t : 
  comparison (5) is possible only for atomic and list types

and when specifying the columns
(perc[x]<-rowSums(pwdiff[,abs(pwdiff[x,])>=thr], na.rm=T)/ncol(pwdiff)), I
get the following error:

Error in rowSums(pwdiff[, abs(pwdiff[x, ]) >= thr], na.rm = T) : 
  'x' must be an array of at least two dimensions
In addition: There were 30 warnings (use warnings() to see them)
Warning messages:
1: In perc[x] <- rowSums(pwdiff[, abs(pwdiff[x, ]) >= thr],  ... :
  number of items to replace is not a multiple of replacement length
...


Ive been trying to sort this out for the past three days and cannot get it
to work for some reason...I can do it SO easily in excel with a simple
macro, but doing that on a 480000x780 table inevitably crashes the
computer...

Any more help you can provide would be great, thanks!




--
View this message in context: http://r.789695.n4.nabble.com/counting-columns-that-fulfill-specific-criteria-tp3622265p3622711.html
Sent from the R help mailing list archive at Nabble.com.
2 days later
#
Hi

r-help-bounces at r-project.org napsal dne 24.06.2011 16:51:27:
do not
That is why some ***reproducible code*** shall be provided from your side

rowSums(USArrests>50)/ncol(USArrests)*100

gives no error and the result tells you percentage of columns for which in 
each row holds that the number is greater than 50. This may be what you 
want.

rowSums(USArrests>50)

simply tells you how many values in each row are greater than specified 
threshold and dividing it by total number of columns gives you percentage.

The error you got means that pwdiff is probably not data frame or t is 
probably not one number. Only you can know that.

For evaluating your objects you can try

str(pwdiff)
str(t)

Besides t is a function or transposing data so you shall find some 
different name for your constant.

Regards
Petr
I
it
http://www.R-project.org/posting-guide.html