Importing Excel/Openoffice Dates into R - R-help

Fri, Jan 2, 2004 12:29 AM #

Hi,

I would like to import some daily financial data from excel via csv. 
More specifically, I would like to be able to use the ts.union function 
from the tseries library as the dates are irregular and I need to line 
up the dates so that I can place all the variables into one data frame.

The trouble is, how do I import the dates from excel into R? At the 
moment I'm just importing the data via read.csv, but I don't know how to 
bring the dates in as well.

Example: Here are two csv files.  The first file is missing Jan 13th and 
  the second is missing the 8th.

file 1: cboevix.csv
	          VIX
1/1/1999	24.42
1/4/1999	26.17
1/5/1999	24.46
1/6/1999	23.34
1/7/1999	24.37
1/8/1999	23.28
1/11/1999	25.46
1/12/1999	28.1
1/14/1999	32.98

file 2: yenv.csv

		YENV
1/1/1999	19.5
1/4/1999	22.2
1/5/1999	23.2
1/6/1999	21
1/7/1999	20.2
1/11/1999	21.6
1/12/1999	20.9
1/13/1999	19.1
1/14/1999	19.3

# Read the files in via read.csv
A<-read.csv("cboevix.csv",skip=1,header=FALSE)
B<-read.csv("yenv.csv",skip=1,header=FALSE)

#define variables
VIX<-A$V2
YENV<-B$V2

# MISSING STEP!
#apply dates from original csv files to the variables.
#the dates are stilling sitting in A$V1 and B$V1
#how do I apply them to VIX and YENV?
#????

#use ts.union function to line up the dates and create data frame 
#"vixyen" with lined up data

vixyen<-ts.union(VIX,YENV)

Can anyone help me fill in those missing steps?  Thanks very much!

Cheers,

Ashley Davies
Economist- Australia and New Zealand
IDEAglobal - Singapore
Tel: 65- 6332-0759
Fax: 65- 6332-0701
adavies at ideaglobal.com

Brian Ripley

Fri, Jan 2, 2004 1:10 AM #

I am puzzled: `csv' means `comma-separated values' and those files have no 
commas in them.  You could use

A <- read.table("cboevix.csv")
B <- read.table("yenv.csv")

which looks better.  You can then merge the two dfs by

Row.names   VIX YENV
1   1/1/1999 24.42 19.5
2  1/11/1999 25.46 21.6
3  1/12/1999 28.10 20.9
4  1/14/1999 32.98 19.3
5   1/4/1999 26.17 22.2
6   1/5/1999 24.46 23.2
7   1/6/1999 23.34 21.0
8   1/7/1999 24.37 20.2
9   1/8/1999 23.28   NA
10 1/13/1999    NA 19.1

and then convert to R's date format by 

Date <- as.POSIXct(strptime(as.character(AB$Row.names), "%m/%d/%Y"))
row.names(AB) <- Date
AB <- AB[sort.list(Date),-1]
AB
             VIX YENV
1999-01-01 24.42 19.5
1999-01-04 26.17 22.2
1999-01-05 24.46 23.2
1999-01-06 23.34 21.0
1999-01-07 24.37 20.2
1999-01-08 23.28   NA
1999-01-11 25.46 21.6
1999-01-12 28.10 20.9
1999-01-13    NA 19.1
1999-01-14 32.98 19.3

Second, as ts.union is not part of tseries, and is for regular time series
I don't see how you hoped to use it.  You could for example use

irts(row.names(AB), as.matrix(AB))

to create an object of class "irts", or you could use the `its' package
from CRAN (which is what I would use).

On Fri, 2 Jan 2004, Ashley Davies wrote:

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Peter Dalgaard

Fri, Jan 2, 2004 1:22 AM #

Ashley Davies <adavies at ideaglobal.com> writes:

Umm, those are not the exact contents, are they? CSV==comma-separated-variable

There should be commas between columns or a sep="\t" in the
read.csv(), (or use read.delim() for what it was designed for).

(why not just change the names() of A and B?)

help(strptime) would be the obvious place to start. The read.csv (or
whatever) also have the colClasses argument that you can use to
specify arbitrary conversion of inputs, but it seems like overkill in
this case.

O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Ashley Davies

Fri, Jan 2, 2004 1:55 AM #

/"I am puzzled: `csv' means `comma-separated values' and those files 
have no
commas in them. You could use"
/
Yes sorry about that. The data posted was from the original csv files. I 
just copy and pasted a subsection from the spreadsheet as an example.

/"(AB <- merge(A, B, by="row.names", all=T))"/

I did a "?merge" in R, and it said that this was for two dataframes. I 
would like to find a more general solution for a large number of 
variables with different dates. Financial time series dates are often 
dictated by the holidays of the respective stock/futures exchanges. This 
is a common problem I face.

/"Date <- as.POSIXct(strptime(as.character(AB$Row.names), "%m/%d/%Y"))
row.names(AB) <- Date
AB <- AB[sort.list(Date),-1]
AB"

/That looks like something I could use.

/"Second, as ts.union is not part of tseries, and is for regular time 
series. I don't see how you hoped to use it."
/
I was inspired by the "get.hist.quote" function. From the help file for 
"get.hist.quote"


x <- get.hist.quote(instrument = "^spc", start = "1998-01-01",
quote = "Close")
plot(x)

x <- get.hist.quote(instrument = "ibm", quote = c("Cl", "Vol"))
plot(x, main = "International Business Machines Corp")

spc <- get.hist.quote(instrument = "^spc", start = "1998-01-01")
ibm <- get.hist.quote(instrument = "ibm", start = "1998-01-01")
x <- na.remove(ts.union(spc, ibm))
plot(x, main = "IBM vs S&P 500")


This example, downloads the data from yahoo, lines up the dates, and 
removes any NAs. I would like to be able to do the same for data that I 
have downloaded in my excel from a Reuters or Bloomberg terminal. I will 
play around further with what you have suggested later on tonight, and 
see if I can post back a solution. I'm in Singapore and at the end of 
the working day.

Thanks for your help.

Ashley

Prof Brian Ripley wrote:

I am puzzled: `csv' means `comma-separated values' and those files have no 
commas in them.  You could use

A <- read.table("cboevix.csv")
B <- read.table("yenv.csv")

which looks better.  You can then merge the two dfs by

(AB <- merge(A, B, by="row.names", all=T))

  Row.names   VIX YENV
1   1/1/1999 24.42 19.5
2  1/11/1999 25.46 21.6
3  1/12/1999 28.10 20.9
4  1/14/1999 32.98 19.3
5   1/4/1999 26.17 22.2
6   1/5/1999 24.46 23.2
7   1/6/1999 23.34 21.0
8   1/7/1999 24.37 20.2
9   1/8/1999 23.28   NA
10 1/13/1999    NA 19.1

and then convert to R's date format by 

Date <- as.POSIXct(strptime(as.character(AB$Row.names), "%m/%d/%Y"))
row.names(AB) <- Date
AB <- AB[sort.list(Date),-1]
AB
            VIX YENV
1999-01-01 24.42 19.5
1999-01-04 26.17 22.2
1999-01-05 24.46 23.2
1999-01-06 23.34 21.0
1999-01-07 24.37 20.2
1999-01-08 23.28   NA
1999-01-11 25.46 21.6
1999-01-12 28.10 20.9
1999-01-13    NA 19.1
1999-01-14 32.98 19.3

Second, as ts.union is not part of tseries, and is for regular time series
I don't see how you hoped to use it.  You could for example use

irts(row.names(AB), as.matrix(AB))

to create an object of class "irts", or you could use the `its' package
from CRAN (which is what I would use).



On Fri, 2 Jan 2004, Ashley Davies wrote:

Hi,

I would like to import some daily financial data from excel via csv. 
More specifically, I would like to be able to use the ts.union function 
from the tseries library as the dates are irregular and I need to line 
up the dates so that I can place all the variables into one data frame.

The trouble is, how do I import the dates from excel into R? At the 
moment I'm just importing the data via read.csv, but I don't know how to 
bring the dates in as well.

Example: Here are two csv files.  The first file is missing Jan 13th and 
 the second is missing the 8th.

file 1: cboevix.csv
          VIX
1/1/1999	24.42
1/4/1999	26.17
1/5/1999	24.46
1/6/1999	23.34
1/7/1999	24.37
1/8/1999	23.28
1/11/1999	25.46
1/12/1999	28.1
1/14/1999	32.98

file 2: yenv.csv

	YENV
1/1/1999	19.5
1/4/1999	22.2
1/5/1999	23.2
1/6/1999	21
1/7/1999	20.2
1/11/1999	21.6
1/12/1999	20.9
1/13/1999	19.1
1/14/1999	19.3

# Read the files in via read.csv
A<-read.csv("cboevix.csv",skip=1,header=FALSE)
B<-read.csv("yenv.csv",skip=1,header=FALSE)

#define variables
VIX<-A$V2
YENV<-B$V2

# MISSING STEP!
#apply dates from original csv files to the variables.
#the dates are stilling sitting in A$V1 and B$V1
#how do I apply them to VIX and YENV?
#????

#use ts.union function to line up the dates and create data frame 
#"vixyen" with lined up data

vixyen<-ts.union(VIX,YENV)

Can anyone help me fill in those missing steps?  Thanks very much!

Cheers,

Ashley Davies
Economist- Australia and New Zealand
IDEAglobal - Singapore
Tel: 65- 6332-0759
Fax: 65- 6332-0701
adavies at ideaglobal.com

Brian Ripley

Fri, Jan 2, 2004 2:23 AM #

On Fri, 2 Jan 2004, Ashley Davies wrote:

merge() *is* the general solution: just apply it recursively.

This no longer works for me, BTW.

It doesn't.  It relies on those being regular series once get.hist.quote 
has finished with them.  Take a closer look at get.hist.quote.

If you want a regular daily series with all non-trading days as NAs then 
the way get.hist.quote does it is the way forward.  But that is not what 
you said you wanted ....

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Dirk Eddelbuettel

Fri, Jan 2, 2004 6:24 AM #

On Fri, Jan 02, 2004 at 04:29:15PM +0800, Ashley Davies wrote:

In short, you cannot. The ts objects require _regular_ time series with
fixed increments or frequencies. This works great for annual, quarterly or
monthly macroeconomic series, but is much more difficult for business-daily
data with weekends, holidays and all that.

What you really want is the its package now on CRAN. It provides an object
for _irregular_ time series', as for example business or market daily
series. You can then subset, join, intersect, ... at will. This uses the
very powerful data arithmetic features which R contains, but shields a lot
of the at-first somewhat intimidating complexity of the datetime objects.
Its is a real gem. And do search the archives for this list. There have been
many usage examples for its, often provided directly by the most helpful
author of its, and some contain examples starting from csv files.

Good luck,  Dirk

The relationship between the computed price and reality is as yet unknown.  
                                             -- From the pac(8) manual page