Skip to content

extra digits added to data

7 messages · Mark Harrison, jim holtman, Wolfgang Wu +2 more

#
FAQ 7.31

Sent from my iPad
On Oct 11, 2011, at 1:07, Mark Harrison <harrisonmark1 at gmail.com> wrote:

            
#
Thanks for the quick response.

Read the FAQ.  If i want to keep the values in R the same as when inputed should i be converting the data to a different type - i.e. Not numeric?



Sent from my iPhone
On Oct 11, 2011, at 4:46 AM, Jim Holtman <jholtman at gmail.com> wrote:

            
#
what are you going to do with the data?  If just for presentation, then keep as character.  If you are going to compute on the data, then keep as numeric.  Since you are using floating point, FAQ 7.31 reminds you that the data "is kept" as inputted to the best that can be done with 54 bits of precision.  You can always use 'round' or 'sprintf' for output if you want it to 'look' the same.  Read the paper pointed to by FAQ 7.31 for an in depth understanding of what is happening.  The other solution is to find a package tha works with decimal instead of binary; 'bc'?

Sent from my iPad
On Oct 11, 2011, at 11:57, Mark Harrison <harrisonmark1 at gmail.com> wrote:

            
#
I am having the following problem. I want to calculate the maximum of each row in a matrix. If I pass in the matrix split up by each column then this is no problem and works great. However I don't know how many columns I have in advance. In the example below I have 3 columns, but the number of columns is not fix. So how do I do this? 


??? matRandom <- matrix(runif(n=30), ncol=3);
??? #Does not work
??? pmax(matRandom)
??? #Does work
??? pmax(matRandom[,1], matRandom[,2], matRandom[,3])


I am aware that I can do it with the apply function, but the calculation is time sensitive so fast execution is important. 

?? 
??? #Apply might be too slow????

??? matRandom <- matrix(runif(n=300000), ncol=3);
??? system.time(test <- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
??? system.time(test <- apply(matRandom, 1, max))
?? user? system elapsed?
?? 0.02??? 0.00??? 0.02?
?? 2.37??? 0.00??? 2.38 




Thanks for your help.

Regards.

?
Wolfgang Wu
#
Hi Wolfgang,

how about a loop?

matRandom <- matrix(runif(n=600000), ncol=6)

## variant 1
system.time(test1 <- pmax(matRandom[,1], matRandom[,2], matRandom[,3],
                           matRandom[,4], matRandom[,5], matRandom[,6]))

User      System verstrichen
0.01        0.00        0.01


## variant 2
system.time(test2 <- apply(matRandom, 1, max))

User      System verstrichen
0.56        0.00        0.56


## variant 3
system.time({
   test3 <- matRandom[ ,1L]
   ## add a check that ncol(matrix) > 1L
   for (i in 2:ncol(matRandom))
     test3 <- pmax(test3, matRandom[ ,i])

})
User      System verstrichen
0.01        0.00        0.01



 > all.equal(test1,test2)
[1] TRUE

 > all.equal(test1,test3)
[1] TRUE


Regards,
Enrico

Am 12.10.2011 13:06, schrieb Wolfgang Wu:

  
    
#
I think Enrico's solution is probably better overall and doesn't
require as much ugly behind-the-scenes trickery, but here's another
fun way that seems to run ever-so-marginally faster on my machine.

The vapply call is messy, but it seems to get the job done -- if it's
not clear, the point is to break matRandom into a list where each
element was previously one column in preparation for the do.call();
I'd welcome any insight into a slicker way to do so.

t0 <- system.time(matRandom <- matrix(runif(6000*3000),ncol=3000))
# I have to bump up columns to see any meaningful difference

## Enrico's
t1 <- system.time({ test1 <- matRandom[ ,1L];
 for (i in seq.int(2L, ncol(matRandom)))
   test1 <- pmax(test1, matRandom[ ,i])
})


## Mine
t2 <- system.time({
temp <- vapply(seq.int(ncol(matRandom)), function(i,x) list(x[,i]),
vector("list",1) , matRandom)
test2 <- do.call(pmax, temp)
})

identical(test1, test2)
TRUE

t0
 user  system elapsed
   2.58    0.10    2.69

t1
  user  system elapsed
   1.63    0.00    1.63
 t2

   user  system elapsed
   1.25    0.00    1.25

Michael

PS -- It makes me very happy that building matRandom is the slowest
step. All hail the mighty vectorization of R!

On Wed, Oct 12, 2011 at 9:10 AM, Enrico Schumann
<enricoschumann at yahoo.de> wrote: