extra digits added to data

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111011/65c9aa14/attachment.pl>
FAQ 7.31

Sent from my iPad

I am having a problem with extra digits being added to my data which I think
is a result of how I am converting my data.frame data to xts.

I see the same issue in R v2.13.1 and RStudio version 0.94.106.

I am loading historical foreign exchange data in via csv files or from a sql
server database.  In both cases there are no extra digits and the original
data looks like the following:

       Date   Open   High    Low  Close
1 2001-01-03 1.5021 1.5094 1.4883 1.4898
2 2001-01-04 1.4897 1.5037 1.4882 1.5020
3 2001-01-05 1.5020 1.5074 1.4952 1.5016
4 2001-01-08 1.5035 1.5104 1.4931 1.4964
5 2001-01-09 1.4964 1.4978 1.4873 1.4887
6 2001-01-10 1.4887 1.4943 1.4856 1.4866

So for 2001-01-03 the Open value is 1.5021 with only 4 digits after the
decimal place - i.e. .5021.

I then proceed to do the following in R to convert the 'british pound' data
above from data.frame to xts:

Require(quantmod)
rownames(gbp) <- gbp$Date
head(gbp)

            Open   High    Low  Close
2001-01-03 1.5021 1.5094 1.4883 1.4898
2001-01-04 1.4897 1.5037 1.4882 1.5020
2001-01-05 1.5020 1.5074 1.4952 1.5016
2001-01-08 1.5035 1.5104 1.4931 1.4964
2001-01-09 1.4964 1.4978 1.4873 1.4887
2001-01-10 1.4887 1.4943 1.4856 1.4866

gbp<- as.xts(gbp[,2:5])
class(gbp)

[1] "xts" "zoo"

The data at this point looks ok until you look closer or output the data to
excel at which point you see the following for the 'Open' 2001-01-03:
1.50209999084473

It is not just the above 'Open' or the first value but all the data points
contain the extra digits which I think is the original date data and/or row
numbers that are being tacked on.

My problem is the extra digits being added or whatever I am doing wrong in R
to cause the extra digits to be added.  I need 1.5021 to be 1.5021 and not
1.50209999084473.

Thanks for the help.

   [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thanks for the quick response.

Read the FAQ.  If i want to keep the values in R the same as when inputed should i be converting the data to a different type - i.e. Not numeric?

Sent from my iPhone

FAQ 7.31

Sent from my iPad

On Oct 11, 2011, at 1:07, Mark Harrison <harrisonmark1 at gmail.com> wrote:

I am having a problem with extra digits being added to my data which I think
is a result of how I am converting my data.frame data to xts.

I see the same issue in R v2.13.1 and RStudio version 0.94.106.

I am loading historical foreign exchange data in via csv files or from a sql
server database.  In both cases there are no extra digits and the original
data looks like the following:

      Date   Open   High    Low  Close
1 2001-01-03 1.5021 1.5094 1.4883 1.4898
2 2001-01-04 1.4897 1.5037 1.4882 1.5020
3 2001-01-05 1.5020 1.5074 1.4952 1.5016
4 2001-01-08 1.5035 1.5104 1.4931 1.4964
5 2001-01-09 1.4964 1.4978 1.4873 1.4887
6 2001-01-10 1.4887 1.4943 1.4856 1.4866

So for 2001-01-03 the Open value is 1.5021 with only 4 digits after the
decimal place - i.e. .5021.

I then proceed to do the following in R to convert the 'british pound' data
above from data.frame to xts:

Require(quantmod)
rownames(gbp) <- gbp$Date
head(gbp)

           Open   High    Low  Close
2001-01-03 1.5021 1.5094 1.4883 1.4898
2001-01-04 1.4897 1.5037 1.4882 1.5020
2001-01-05 1.5020 1.5074 1.4952 1.5016
2001-01-08 1.5035 1.5104 1.4931 1.4964
2001-01-09 1.4964 1.4978 1.4873 1.4887
2001-01-10 1.4887 1.4943 1.4856 1.4866

gbp<- as.xts(gbp[,2:5])
class(gbp)

[1] "xts" "zoo"

The data at this point looks ok until you look closer or output the data to
excel at which point you see the following for the 'Open' 2001-01-03:
1.50209999084473

It is not just the above 'Open' or the first value but all the data points
contain the extra digits which I think is the original date data and/or row
numbers that are being tacked on.

My problem is the extra digits being added or whatever I am doing wrong in R
to cause the extra digits to be added.  I need 1.5021 to be 1.5021 and not
1.50209999084473.

Thanks for the help.

  [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
what are you going to do with the data?  If just for presentation, then keep as character.  If you are going to compute on the data, then keep as numeric.  Since you are using floating point, FAQ 7.31 reminds you that the data "is kept" as inputted to the best that can be done with 54 bits of precision.  You can always use 'round' or 'sprintf' for output if you want it to 'look' the same.  Read the paper pointed to by FAQ 7.31 for an in depth understanding of what is happening.  The other solution is to find a package tha works with decimal instead of binary; 'bc'?

Sent from my iPad

Thanks for the quick response.

Read the FAQ.  If i want to keep the values in R the same as when inputed should i be converting the data to a different type - i.e. Not numeric?

Sent from my iPhone

On Oct 11, 2011, at 4:46 AM, Jim Holtman <jholtman at gmail.com> wrote:

FAQ 7.31

Sent from my iPad

On Oct 11, 2011, at 1:07, Mark Harrison <harrisonmark1 at gmail.com> wrote:

I am having a problem with extra digits being added to my data which I think
is a result of how I am converting my data.frame data to xts.

I see the same issue in R v2.13.1 and RStudio version 0.94.106.

I am loading historical foreign exchange data in via csv files or from a sql
server database.  In both cases there are no extra digits and the original
data looks like the following:

     Date   Open   High    Low  Close
1 2001-01-03 1.5021 1.5094 1.4883 1.4898
2 2001-01-04 1.4897 1.5037 1.4882 1.5020
3 2001-01-05 1.5020 1.5074 1.4952 1.5016
4 2001-01-08 1.5035 1.5104 1.4931 1.4964
5 2001-01-09 1.4964 1.4978 1.4873 1.4887
6 2001-01-10 1.4887 1.4943 1.4856 1.4866

So for 2001-01-03 the Open value is 1.5021 with only 4 digits after the
decimal place - i.e. .5021.

I then proceed to do the following in R to convert the 'british pound' data
above from data.frame to xts:

Require(quantmod)
rownames(gbp) <- gbp$Date
head(gbp)

          Open   High    Low  Close
2001-01-03 1.5021 1.5094 1.4883 1.4898
2001-01-04 1.4897 1.5037 1.4882 1.5020
2001-01-05 1.5020 1.5074 1.4952 1.5016
2001-01-08 1.5035 1.5104 1.4931 1.4964
2001-01-09 1.4964 1.4978 1.4873 1.4887
2001-01-10 1.4887 1.4943 1.4856 1.4866

gbp<- as.xts(gbp[,2:5])
class(gbp)

[1] "xts" "zoo"

The data at this point looks ok until you look closer or output the data to
excel at which point you see the following for the 'Open' 2001-01-03:
1.50209999084473

It is not just the above 'Open' or the first value but all the data points
contain the extra digits which I think is the original date data and/or row
numbers that are being tacked on.

My problem is the extra digits being added or whatever I am doing wrong in R
to cause the extra digits to be added.  I need 1.5021 to be 1.5021 and not
1.50209999084473.

Thanks for the help.

 [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
I am having the following problem. I want to calculate the maximum of each row in a matrix. If I pass in the matrix split up by each column then this is no problem and works great. However I don't know how many columns I have in advance. In the example below I have 3 columns, but the number of columns is not fix. So how do I do this? 

??? matRandom <- matrix(runif(n=30), ncol=3);
??? #Does not work
??? pmax(matRandom)
??? #Does work
??? pmax(matRandom[,1], matRandom[,2], matRandom[,3])

I am aware that I can do it with the apply function, but the calculation is time sensitive so fast execution is important. 

?? 
??? #Apply might be too slow????

??? matRandom <- matrix(runif(n=300000), ncol=3);
??? system.time(test <- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
??? system.time(test <- apply(matRandom, 1, max))
matRandom <- matrix(runif(n=300000), ncol=3);
system.time(test <- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
?? user? system elapsed?
?? 0.02??? 0.00??? 0.02?
system.time(test <- apply(matRandom, 1, max))
??? user? system elapsed?
?? 2.37??? 0.00??? 2.38 

Thanks for your help.

Regards.

?
Wolfgang Wu
Hi Wolfgang,

how about a loop?

matRandom <- matrix(runif(n=600000), ncol=6)

## variant 1
system.time(test1 <- pmax(matRandom[,1], matRandom[,2], matRandom[,3],
                           matRandom[,4], matRandom[,5], matRandom[,6]))

User      System verstrichen
0.01        0.00        0.01

## variant 2
system.time(test2 <- apply(matRandom, 1, max))

User      System verstrichen
0.56        0.00        0.56

## variant 3
system.time({
   test3 <- matRandom[ ,1L]
   ## add a check that ncol(matrix) > 1L
   for (i in 2:ncol(matRandom))
     test3 <- pmax(test3, matRandom[ ,i])

})
User      System verstrichen
0.01        0.00        0.01

 > all.equal(test1,test2)
[1] TRUE

 > all.equal(test1,test3)
[1] TRUE

Regards,
Enrico

Am 12.10.2011 13:06, schrieb Wolfgang Wu:
I am having the following problem. I want to calculate the maximum of each row in a matrix. If I pass in the matrix split up by each column then this is no problem and works great. However I don't know how many columns I have in advance. In the example below I have 3 columns, but the number of columns is not fix. So how do I do this?

     matRandom<- matrix(runif(n=30), ncol=3);
     #Does not work
     pmax(matRandom)
     #Does work
     pmax(matRandom[,1], matRandom[,2], matRandom[,3])

I am aware that I can do it with the apply function, but the calculation is time sensitive so fast execution is important.

     #Apply might be too slow

     matRandom<- matrix(runif(n=300000), ncol=3);
     system.time(test<- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
     system.time(test<- apply(matRandom, 1, max))

matRandom<- matrix(runif(n=300000), ncol=3);
system.time(test<- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
    user  system elapsed
    0.02    0.00    0.02
system.time(test<- apply(matRandom, 1, max))
     user  system elapsed
    2.37    0.00    2.38

Thanks for your help.

Regards.

Wolfgang Wu

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Enrico Schumann
Lucerne, Switzerland
http://nmof.net/
I think Enrico's solution is probably better overall and doesn't
require as much ugly behind-the-scenes trickery, but here's another
fun way that seems to run ever-so-marginally faster on my machine.

The vapply call is messy, but it seems to get the job done -- if it's
not clear, the point is to break matRandom into a list where each
element was previously one column in preparation for the do.call();
I'd welcome any insight into a slicker way to do so.

t0 <- system.time(matRandom <- matrix(runif(6000*3000),ncol=3000))
# I have to bump up columns to see any meaningful difference

## Enrico's
t1 <- system.time({ test1 <- matRandom[ ,1L];
 for (i in seq.int(2L, ncol(matRandom)))
   test1 <- pmax(test1, matRandom[ ,i])
})

## Mine
t2 <- system.time({
temp <- vapply(seq.int(ncol(matRandom)), function(i,x) list(x[,i]),
vector("list",1) , matRandom)
test2 <- do.call(pmax, temp)
})

identical(test1, test2)
TRUE

t0
 user  system elapsed
   2.58    0.10    2.69

t1
  user  system elapsed
   1.63    0.00    1.63
 t2

   user  system elapsed
   1.25    0.00    1.25

Michael

PS -- It makes me very happy that building matRandom is the slowest
step. All hail the mighty vectorization of R!

On Wed, Oct 12, 2011 at 9:10 AM, Enrico Schumann
Hi Wolfgang,

how about a loop?

matRandom <- matrix(runif(n=600000), ncol=6)

## variant 1
system.time(test1 <- pmax(matRandom[,1], matRandom[,2], matRandom[,3],
? ? ? ? ? ? ? ? ? ? ? ? ?matRandom[,4], matRandom[,5], matRandom[,6]))

User ? ? ?System verstrichen
0.01 ? ? ? ?0.00 ? ? ? ?0.01

## variant 2
system.time(test2 <- apply(matRandom, 1, max))

User ? ? ?System verstrichen
0.56 ? ? ? ?0.00 ? ? ? ?0.56

## variant 3
system.time({
?test3 <- matRandom[ ,1L]
?## add a check that ncol(matrix) > 1L
?for (i in 2:ncol(matRandom))
? ?test3 <- pmax(test3, matRandom[ ,i])

})
User ? ? ?System verstrichen
0.01 ? ? ? ?0.00 ? ? ? ?0.01

all.equal(test1,test2)
[1] TRUE

all.equal(test1,test3)
[1] TRUE

Regards,
Enrico

Am 12.10.2011 13:06, schrieb Wolfgang Wu:
I am having the following problem. I want to calculate the maximum of each
row in a matrix. If I pass in the matrix split up by each column then this
is no problem and works great. However I don't know how many columns I have
in advance. In the example below I have 3 columns, but the number of columns
is not fix. So how do I do this?

? ? matRandom<- matrix(runif(n=30), ncol=3);
? ? #Does not work
? ? pmax(matRandom)
? ? #Does work
? ? pmax(matRandom[,1], matRandom[,2], matRandom[,3])

I am aware that I can do it with the apply function, but the calculation
is time sensitive so fast execution is important.

? ? #Apply might be too slow

? ? matRandom<- matrix(runif(n=300000), ncol=3);
? ? system.time(test<- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
? ? system.time(test<- apply(matRandom, 1, max))

matRandom<- matrix(runif(n=300000), ncol=3);
system.time(test<- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
? ?user ?system elapsed
? ?0.02 ? ?0.00 ? ?0.02
system.time(test<- apply(matRandom, 1, max))
? ? user ?system elapsed
? ?2.37 ? ?0.00 ? ?2.38

Thanks for your help.

Regards.

Wolfgang Wu

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Enrico Schumann
Lucerne, Switzerland
http://nmof.net/

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.