Both get.hist.quote, and its derivative priceIts, rely on download.file() to
fetch financial data series from Yahoo! in .csv format. They allow for nice
interactive demonstrations of what one can do with R.
Unfortunately, both are currently broken as Yahoo! decided to add a somewhat
useless html comment at the end of the csv 'stream', breaking the regular
format of n rows with k columns. Here is an example for the S&P500 index
since the beginning of the month (to keep it compact):
Date,Open,High,Low,Close,Volume,Adj. Close*
23-Apr-04,1140.81,1141.75,1134.89,1140.60,1820460032,1140.60
22-Apr-04,1122.01,1142.53,1121.98,1139.93,2147280000,1139.93
21-Apr-04,1119.24,1125.66,1116.07,1124.09,1995879936,1124.09
20-Apr-04,1137.60,1139.27,1118.09,1118.15,1806850048,1118.15
19-Apr-04,1132.81,1136.17,1129.87,1135.82,1374380032,1135.82
16-Apr-04,1133.86,1136.75,1126.92,1134.61,1723180032,1134.61
15-Apr-04,1130.45,1133.72,1120.85,1128.84,1895289984,1128.84
14-Apr-04,1122.44,1132.47,1122.33,1128.17,1682800000,1128.17
13-Apr-04,1145.20,1147.73,1127.72,1129.44,1616720000,1129.44
12-Apr-04,1141.98,1147.24,1139.32,1145.20,1194080000,1145.20
9-Apr-04,1149.73,1139.32,1139.32,1139.32,0,1139.32
8-Apr-04,1140.53,1148.91,1134.54,1139.32,1435520000,1139.32
7-Apr-04,1146.25,1148.16,1138.48,1140.53,1658200064,1140.53
6-Apr-04,1144.26,1150.57,1143.35,1148.16,1551449984,1148.16
5-Apr-04,1141.81,1150.57,1141.63,1150.57,1614749952,1150.57
2-Apr-04,1144.15,1144.73,1132.17,1141.81,2134489984,1141.81
1-Apr-04,1128.14,1135.53,1126.21,1132.17,1765560064,1132.17
<!-- chart2.finance.scd.yahoo.com uncompressed Sat Apr 24 15:27:40 PDT 2004 -->
Is there an _elegant and portable_ way of reading this with the last line?
I needed this, and used the somewhat clunky
data <- read.csv(destfile)
unlink(destfile)
data <- data[-(nlines-1),] # skip very last line with commment
which uses nlines, which had already been computed (as has a offset of one
because of the header line).
I'd be happy to send this as a patch to tseries and its, but I have the
feeling we could do better. How?
Thanks, Dirk
The relationship between the computed price and reality is as yet unknown.
-- From the pac(8) manual page
Both get.hist.quote, and its derivative priceIts, rely on download.file() to
fetch financial data series from Yahoo! in .csv format. They allow for nice
interactive demonstrations of what one can do with R.
Unfortunately, both are currently broken as Yahoo! decided to add a somewhat
useless html comment at the end of the csv 'stream', breaking the regular
format of n rows with k columns. Here is an example for the S&P500 index
since the beginning of the month (to keep it compact):
Date,Open,High,Low,Close,Volume,Adj. Close*
23-Apr-04,1140.81,1141.75,1134.89,1140.60,1820460032,1140.60
22-Apr-04,1122.01,1142.53,1121.98,1139.93,2147280000,1139.93
21-Apr-04,1119.24,1125.66,1116.07,1124.09,1995879936,1124.09
20-Apr-04,1137.60,1139.27,1118.09,1118.15,1806850048,1118.15
19-Apr-04,1132.81,1136.17,1129.87,1135.82,1374380032,1135.82
16-Apr-04,1133.86,1136.75,1126.92,1134.61,1723180032,1134.61
15-Apr-04,1130.45,1133.72,1120.85,1128.84,1895289984,1128.84
14-Apr-04,1122.44,1132.47,1122.33,1128.17,1682800000,1128.17
13-Apr-04,1145.20,1147.73,1127.72,1129.44,1616720000,1129.44
12-Apr-04,1141.98,1147.24,1139.32,1145.20,1194080000,1145.20
9-Apr-04,1149.73,1139.32,1139.32,1139.32,0,1139.32
8-Apr-04,1140.53,1148.91,1134.54,1139.32,1435520000,1139.32
7-Apr-04,1146.25,1148.16,1138.48,1140.53,1658200064,1140.53
6-Apr-04,1144.26,1150.57,1143.35,1148.16,1551449984,1148.16
5-Apr-04,1141.81,1150.57,1141.63,1150.57,1614749952,1150.57
2-Apr-04,1144.15,1144.73,1132.17,1141.81,2134489984,1141.81
1-Apr-04,1128.14,1135.53,1126.21,1132.17,1765560064,1132.17
<!-- chart2.finance.scd.yahoo.com uncompressed Sat Apr 24 15:27:40 PDT 2004 -->
Is there an _elegant and portable_ way of reading this with the last line?
If you do not expect to encounter the "<" character in your data you
could try adding comment.char = "<" to your call to read.csv.
Both get.hist.quote, and its derivative priceIts, rely on download.file() to
fetch financial data series from Yahoo! in .csv format. They allow for nice
interactive demonstrations of what one can do with R.
Er, how does this affect get.hist.quote? I see some flakiness, but the
basic conversion appears to work:
trying URL
`http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv'
Error in download.file(url, destfile, method = method) :
cannot open URL
`http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv'
In addition: Warning message:
cannot open: HTTP status was `404 Not Found'
trying URL
`http://chart.yahoo.com/table.csv?s=spc&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=spc&x=.csv'
Content type `application/octet-stream' length unknown
opened URL
.......... .......... .......... .......... ..........
.......... .......... ..
downloaded 72Kb
time series starts 1998-01-02
time series ends 2004-04-01
(Yes, that's the same URL, a few seconds later!)
Unfortunately, both are currently broken as Yahoo! decided to add a somewhat
useless html comment at the end of the csv 'stream', breaking the regular
format of n rows with k columns. Here is an example for the S&P500 index
since the beginning of the month (to keep it compact):
Date,Open,High,Low,Close,Volume,Adj. Close*
23-Apr-04,1140.81,1141.75,1134.89,1140.60,1820460032,1140.60
22-Apr-04,1122.01,1142.53,1121.98,1139.93,2147280000,1139.93
21-Apr-04,1119.24,1125.66,1116.07,1124.09,1995879936,1124.09
20-Apr-04,1137.60,1139.27,1118.09,1118.15,1806850048,1118.15
19-Apr-04,1132.81,1136.17,1129.87,1135.82,1374380032,1135.82
16-Apr-04,1133.86,1136.75,1126.92,1134.61,1723180032,1134.61
15-Apr-04,1130.45,1133.72,1120.85,1128.84,1895289984,1128.84
14-Apr-04,1122.44,1132.47,1122.33,1128.17,1682800000,1128.17
13-Apr-04,1145.20,1147.73,1127.72,1129.44,1616720000,1129.44
12-Apr-04,1141.98,1147.24,1139.32,1145.20,1194080000,1145.20
9-Apr-04,1149.73,1139.32,1139.32,1139.32,0,1139.32
8-Apr-04,1140.53,1148.91,1134.54,1139.32,1435520000,1139.32
7-Apr-04,1146.25,1148.16,1138.48,1140.53,1658200064,1140.53
6-Apr-04,1144.26,1150.57,1143.35,1148.16,1551449984,1148.16
5-Apr-04,1141.81,1150.57,1141.63,1150.57,1614749952,1150.57
2-Apr-04,1144.15,1144.73,1132.17,1141.81,2134489984,1141.81
1-Apr-04,1128.14,1135.53,1126.21,1132.17,1765560064,1132.17
<!-- chart2.finance.scd.yahoo.com uncompressed Sat Apr 24 15:27:40 PDT 2004 -->
Is there an _elegant and portable_ way of reading this with the last line?
I needed this, and used the somewhat clunky
data <- read.csv(destfile)
unlink(destfile)
data <- data[-(nlines-1),] # skip very last line with commment
which uses nlines, which had already been computed (as has a offset of one
because of the header line).
How about this?
v <- readLines(url("http://chart.yahoo.com/table.csv?s=ibm&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=ibm&x=.csv"))
x <- read.csv(textConnection(v[-grep("^<!",v)]))
str(x)
`data.frame': 1586 obs. of 7 variables:
$ Date : Factor w/ 1586 levels "1-Apr-02","1-Ap..",..: 786 732 681 629 524 368 315 263 210 157 ...
$ Open : num 91.0 90.5 91.2 92.0 91.9 ...
$ High : num 91.6 91.5 91.4 92.5 92.3 ...
$ Low : num 90.4 89.7 90.7 90.7 91.7 ...
$ Close : num 91.3 90.7 91.3 90.7 91.9 ...
$ Volume : int 5063200 7988000 4623400 4260200 4159400 1111800 6844200 5316300 5013600 3112600 ...
$ Adj..Close.: num 91.3 90.7 91.3 90.7 91.9 ...
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
On Sun, Apr 25, 2004 at 01:16:15AM +0200, Peter Dalgaard wrote:
Dirk Eddelbuettel <edd@debian.org> writes:
Both get.hist.quote, and its derivative priceIts, rely on download.file() to
fetch financial data series from Yahoo! in .csv format. They allow for nice
interactive demonstrations of what one can do with R.
Er, how does this affect get.hist.quote? I see some flakiness, but the
basic conversion appears to work:
Ah, yes, my bad. I was working with priceIts, and it flakes out as its fails
on the NA the comment turns into:
Error in validObject(.Object) : Invalid "its" object: Missing values in dates
Doug's suggestion of simply using '<' as the comment char is good. I was so
fixated on explaining '<!--' as one that I didn't think of '<'.
But ...
How about this?
v <- readLines(url("http://chart.yahoo.com/table.csv?s=ibm&a=0&b=01&c=1998&d=3&e=24&f=2004&g=d&q=q&y=0&z=ibm&x=.csv"))
x <- read.csv(textConnection(v[-grep("^<!",v)]))
that wins the price. That is pretty much what I was thinking of. Neato.
Doesn't rely on the position of '<!--' within the file, and is less likely
to trigger a false positive as hunting for '<' is.
Thanks!
Dirk
The relationship between the computed price and reality is as yet unknown.
-- From the pac(8) manual page