An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111011/65c9aa14/attachment.pl>
extra digits added to data
7 messages · Mark Harrison, jim holtman, Wolfgang Wu +2 more
FAQ 7.31 Sent from my iPad
On Oct 11, 2011, at 1:07, Mark Harrison <harrisonmark1 at gmail.com> wrote:
I am having a problem with extra digits being added to my data which I think
is a result of how I am converting my data.frame data to xts.
I see the same issue in R v2.13.1 and RStudio version 0.94.106.
I am loading historical foreign exchange data in via csv files or from a sql
server database. In both cases there are no extra digits and the original
data looks like the following:
Date Open High Low Close
1 2001-01-03 1.5021 1.5094 1.4883 1.4898
2 2001-01-04 1.4897 1.5037 1.4882 1.5020
3 2001-01-05 1.5020 1.5074 1.4952 1.5016
4 2001-01-08 1.5035 1.5104 1.4931 1.4964
5 2001-01-09 1.4964 1.4978 1.4873 1.4887
6 2001-01-10 1.4887 1.4943 1.4856 1.4866
So for 2001-01-03 the Open value is 1.5021 with only 4 digits after the
decimal place - i.e. .5021.
I then proceed to do the following in R to convert the 'british pound' data
above from data.frame to xts:
Require(quantmod)
rownames(gbp) <- gbp$Date
head(gbp)
Open High Low Close
2001-01-03 1.5021 1.5094 1.4883 1.4898
2001-01-04 1.4897 1.5037 1.4882 1.5020
2001-01-05 1.5020 1.5074 1.4952 1.5016
2001-01-08 1.5035 1.5104 1.4931 1.4964
2001-01-09 1.4964 1.4978 1.4873 1.4887
2001-01-10 1.4887 1.4943 1.4856 1.4866
gbp<- as.xts(gbp[,2:5])
class(gbp)
[1] "xts" "zoo"
The data at this point looks ok until you look closer or output the data to
excel at which point you see the following for the 'Open' 2001-01-03:
1.50209999084473
It is not just the above 'Open' or the first value but all the data points
contain the extra digits which I think is the original date data and/or row
numbers that are being tacked on.
My problem is the extra digits being added or whatever I am doing wrong in R
to cause the extra digits to be added. I need 1.5021 to be 1.5021 and not
1.50209999084473.
Thanks for the help.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks for the quick response. Read the FAQ. If i want to keep the values in R the same as when inputed should i be converting the data to a different type - i.e. Not numeric? Sent from my iPhone
On Oct 11, 2011, at 4:46 AM, Jim Holtman <jholtman at gmail.com> wrote:
FAQ 7.31 Sent from my iPad On Oct 11, 2011, at 1:07, Mark Harrison <harrisonmark1 at gmail.com> wrote:
I am having a problem with extra digits being added to my data which I think
is a result of how I am converting my data.frame data to xts.
I see the same issue in R v2.13.1 and RStudio version 0.94.106.
I am loading historical foreign exchange data in via csv files or from a sql
server database. In both cases there are no extra digits and the original
data looks like the following:
Date Open High Low Close
1 2001-01-03 1.5021 1.5094 1.4883 1.4898
2 2001-01-04 1.4897 1.5037 1.4882 1.5020
3 2001-01-05 1.5020 1.5074 1.4952 1.5016
4 2001-01-08 1.5035 1.5104 1.4931 1.4964
5 2001-01-09 1.4964 1.4978 1.4873 1.4887
6 2001-01-10 1.4887 1.4943 1.4856 1.4866
So for 2001-01-03 the Open value is 1.5021 with only 4 digits after the
decimal place - i.e. .5021.
I then proceed to do the following in R to convert the 'british pound' data
above from data.frame to xts:
Require(quantmod)
rownames(gbp) <- gbp$Date
head(gbp)
Open High Low Close
2001-01-03 1.5021 1.5094 1.4883 1.4898
2001-01-04 1.4897 1.5037 1.4882 1.5020
2001-01-05 1.5020 1.5074 1.4952 1.5016
2001-01-08 1.5035 1.5104 1.4931 1.4964
2001-01-09 1.4964 1.4978 1.4873 1.4887
2001-01-10 1.4887 1.4943 1.4856 1.4866
gbp<- as.xts(gbp[,2:5])
class(gbp)
[1] "xts" "zoo"
The data at this point looks ok until you look closer or output the data to
excel at which point you see the following for the 'Open' 2001-01-03:
1.50209999084473
It is not just the above 'Open' or the first value but all the data points
contain the extra digits which I think is the original date data and/or row
numbers that are being tacked on.
My problem is the extra digits being added or whatever I am doing wrong in R
to cause the extra digits to be added. I need 1.5021 to be 1.5021 and not
1.50209999084473.
Thanks for the help.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
what are you going to do with the data? If just for presentation, then keep as character. If you are going to compute on the data, then keep as numeric. Since you are using floating point, FAQ 7.31 reminds you that the data "is kept" as inputted to the best that can be done with 54 bits of precision. You can always use 'round' or 'sprintf' for output if you want it to 'look' the same. Read the paper pointed to by FAQ 7.31 for an in depth understanding of what is happening. The other solution is to find a package tha works with decimal instead of binary; 'bc'? Sent from my iPad
On Oct 11, 2011, at 11:57, Mark Harrison <harrisonmark1 at gmail.com> wrote:
Thanks for the quick response. Read the FAQ. If i want to keep the values in R the same as when inputed should i be converting the data to a different type - i.e. Not numeric? Sent from my iPhone On Oct 11, 2011, at 4:46 AM, Jim Holtman <jholtman at gmail.com> wrote:
FAQ 7.31 Sent from my iPad On Oct 11, 2011, at 1:07, Mark Harrison <harrisonmark1 at gmail.com> wrote:
I am having a problem with extra digits being added to my data which I think
is a result of how I am converting my data.frame data to xts.
I see the same issue in R v2.13.1 and RStudio version 0.94.106.
I am loading historical foreign exchange data in via csv files or from a sql
server database. In both cases there are no extra digits and the original
data looks like the following:
Date Open High Low Close
1 2001-01-03 1.5021 1.5094 1.4883 1.4898
2 2001-01-04 1.4897 1.5037 1.4882 1.5020
3 2001-01-05 1.5020 1.5074 1.4952 1.5016
4 2001-01-08 1.5035 1.5104 1.4931 1.4964
5 2001-01-09 1.4964 1.4978 1.4873 1.4887
6 2001-01-10 1.4887 1.4943 1.4856 1.4866
So for 2001-01-03 the Open value is 1.5021 with only 4 digits after the
decimal place - i.e. .5021.
I then proceed to do the following in R to convert the 'british pound' data
above from data.frame to xts:
Require(quantmod)
rownames(gbp) <- gbp$Date
head(gbp)
Open High Low Close
2001-01-03 1.5021 1.5094 1.4883 1.4898
2001-01-04 1.4897 1.5037 1.4882 1.5020
2001-01-05 1.5020 1.5074 1.4952 1.5016
2001-01-08 1.5035 1.5104 1.4931 1.4964
2001-01-09 1.4964 1.4978 1.4873 1.4887
2001-01-10 1.4887 1.4943 1.4856 1.4866
gbp<- as.xts(gbp[,2:5])
class(gbp)
[1] "xts" "zoo"
The data at this point looks ok until you look closer or output the data to
excel at which point you see the following for the 'Open' 2001-01-03:
1.50209999084473
It is not just the above 'Open' or the first value but all the data points
contain the extra digits which I think is the original date data and/or row
numbers that are being tacked on.
My problem is the extra digits being added or whatever I am doing wrong in R
to cause the extra digits to be added. I need 1.5021 to be 1.5021 and not
1.50209999084473.
Thanks for the help.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I am having the following problem. I want to calculate the maximum of each row in a matrix. If I pass in the matrix split up by each column then this is no problem and works great. However I don't know how many columns I have in advance. In the example below I have 3 columns, but the number of columns is not fix. So how do I do this? ??? matRandom <- matrix(runif(n=30), ncol=3); ??? #Does not work ??? pmax(matRandom) ??? #Does work ??? pmax(matRandom[,1], matRandom[,2], matRandom[,3]) I am aware that I can do it with the apply function, but the calculation is time sensitive so fast execution is important. ?? ??? #Apply might be too slow???? ??? matRandom <- matrix(runif(n=300000), ncol=3); ??? system.time(test <- pmax(matRandom[,1], matRandom[,2], matRandom[,3])) ??? system.time(test <- apply(matRandom, 1, max))
matRandom <- matrix(runif(n=300000), ncol=3); system.time(test <- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
?? user? system elapsed? ?? 0.02??? 0.00??? 0.02?
system.time(test <- apply(matRandom, 1, max)) ??? user? system elapsed?
?? 2.37??? 0.00??? 2.38 Thanks for your help. Regards. ? Wolfgang Wu
Hi Wolfgang,
how about a loop?
matRandom <- matrix(runif(n=600000), ncol=6)
## variant 1
system.time(test1 <- pmax(matRandom[,1], matRandom[,2], matRandom[,3],
matRandom[,4], matRandom[,5], matRandom[,6]))
User System verstrichen
0.01 0.00 0.01
## variant 2
system.time(test2 <- apply(matRandom, 1, max))
User System verstrichen
0.56 0.00 0.56
## variant 3
system.time({
test3 <- matRandom[ ,1L]
## add a check that ncol(matrix) > 1L
for (i in 2:ncol(matRandom))
test3 <- pmax(test3, matRandom[ ,i])
})
User System verstrichen
0.01 0.00 0.01
> all.equal(test1,test2)
[1] TRUE
> all.equal(test1,test3)
[1] TRUE
Regards,
Enrico
Am 12.10.2011 13:06, schrieb Wolfgang Wu:
I am having the following problem. I want to calculate the maximum of each row in a matrix. If I pass in the matrix split up by each column then this is no problem and works great. However I don't know how many columns I have in advance. In the example below I have 3 columns, but the number of columns is not fix. So how do I do this?
matRandom<- matrix(runif(n=30), ncol=3);
#Does not work
pmax(matRandom)
#Does work
pmax(matRandom[,1], matRandom[,2], matRandom[,3])
I am aware that I can do it with the apply function, but the calculation is time sensitive so fast execution is important.
#Apply might be too slow
matRandom<- matrix(runif(n=300000), ncol=3);
system.time(test<- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
system.time(test<- apply(matRandom, 1, max))
matRandom<- matrix(runif(n=300000), ncol=3); system.time(test<- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
user system elapsed
0.02 0.00 0.02
system.time(test<- apply(matRandom, 1, max))
user system elapsed
2.37 0.00 2.38 Thanks for your help. Regards. Wolfgang Wu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Enrico Schumann Lucerne, Switzerland http://nmof.net/
I think Enrico's solution is probably better overall and doesn't
require as much ugly behind-the-scenes trickery, but here's another
fun way that seems to run ever-so-marginally faster on my machine.
The vapply call is messy, but it seems to get the job done -- if it's
not clear, the point is to break matRandom into a list where each
element was previously one column in preparation for the do.call();
I'd welcome any insight into a slicker way to do so.
t0 <- system.time(matRandom <- matrix(runif(6000*3000),ncol=3000))
# I have to bump up columns to see any meaningful difference
## Enrico's
t1 <- system.time({ test1 <- matRandom[ ,1L];
for (i in seq.int(2L, ncol(matRandom)))
test1 <- pmax(test1, matRandom[ ,i])
})
## Mine
t2 <- system.time({
temp <- vapply(seq.int(ncol(matRandom)), function(i,x) list(x[,i]),
vector("list",1) , matRandom)
test2 <- do.call(pmax, temp)
})
identical(test1, test2)
TRUE
t0
user system elapsed
2.58 0.10 2.69
t1
user system elapsed
1.63 0.00 1.63
t2
user system elapsed
1.25 0.00 1.25
Michael
PS -- It makes me very happy that building matRandom is the slowest
step. All hail the mighty vectorization of R!
On Wed, Oct 12, 2011 at 9:10 AM, Enrico Schumann
<enricoschumann at yahoo.de> wrote:
Hi Wolfgang,
how about a loop?
matRandom <- matrix(runif(n=600000), ncol=6)
## variant 1
system.time(test1 <- pmax(matRandom[,1], matRandom[,2], matRandom[,3],
? ? ? ? ? ? ? ? ? ? ? ? ?matRandom[,4], matRandom[,5], matRandom[,6]))
User ? ? ?System verstrichen
0.01 ? ? ? ?0.00 ? ? ? ?0.01
## variant 2
system.time(test2 <- apply(matRandom, 1, max))
User ? ? ?System verstrichen
0.56 ? ? ? ?0.00 ? ? ? ?0.56
## variant 3
system.time({
?test3 <- matRandom[ ,1L]
?## add a check that ncol(matrix) > 1L
?for (i in 2:ncol(matRandom))
? ?test3 <- pmax(test3, matRandom[ ,i])
})
User ? ? ?System verstrichen
0.01 ? ? ? ?0.00 ? ? ? ?0.01
all.equal(test1,test2)
[1] TRUE
all.equal(test1,test3)
[1] TRUE Regards, Enrico Am 12.10.2011 13:06, schrieb Wolfgang Wu:
I am having the following problem. I want to calculate the maximum of each row in a matrix. If I pass in the matrix split up by each column then this is no problem and works great. However I don't know how many columns I have in advance. In the example below I have 3 columns, but the number of columns is not fix. So how do I do this? ? ? matRandom<- matrix(runif(n=30), ncol=3); ? ? #Does not work ? ? pmax(matRandom) ? ? #Does work ? ? pmax(matRandom[,1], matRandom[,2], matRandom[,3]) I am aware that I can do it with the apply function, but the calculation is time sensitive so fast execution is important. ? ? #Apply might be too slow ? ? matRandom<- matrix(runif(n=300000), ncol=3); ? ? system.time(test<- pmax(matRandom[,1], matRandom[,2], matRandom[,3])) ? ? system.time(test<- apply(matRandom, 1, max))
matRandom<- matrix(runif(n=300000), ncol=3); system.time(test<- pmax(matRandom[,1], matRandom[,2], matRandom[,3]))
? ?user ?system elapsed ? ?0.02 ? ?0.00 ? ?0.02
system.time(test<- apply(matRandom, 1, max)) ? ? user ?system elapsed
? ?2.37 ? ?0.00 ? ?2.38 Thanks for your help. Regards. Wolfgang Wu
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Enrico Schumann Lucerne, Switzerland http://nmof.net/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.