Skip to content

Confusion with Converting Factors to Dates using as.date

4 messages · Peter Dalgaard, Marc Schwartz, Josip Dasovic

#
Dear R-Helpers:

I'm having a problem getting dates into the correct format. I have a data frame, which is based on a .csv file that I imported into R via read.table.

R has converted my date variables to factors; when I use the as.Date command, most of the values are converted "correctly" (and by this I guess I mean converted "as I wish them to be") but some have not been.

Here's what I have:
str(pk.df)

'data.frame':	206 obs. of  134 variables:
 $ uniqid         : int  010 015 120 130 210 245 320 330 415 ...
 $ st_date     : Factor w/ 154 levels "01/01/48","01/01/51",..: 46 27 NA 12 118 NA 63 127 NA NA ...
...
 
I then convert them to a date class using

st_date.new<-as.Date(st_date, "%m/%d/%y")

This _seems_ to work...

str(st_date.new)
Class 'Date'  num [1:206]  8150  8466    NA 33982 10149 ...

But notice the 4th observation; I would like it to be 1963, not 2063.

st_date.new[1:10]
 [1] "1992-04-25" "1993-03-07" NA           "2063-01-15" "1997-10-15"
 [6] NA           "1991-05-31" "1994-11-20" NA           NA 
 
st_date[1:10]
 [1] 04/25/92 03/07/93 <NA>     01/15/63 10/15/97 <NA>     05/31/91
 [8] 11/20/94 <NA>     <NA>    
154 Levels: 01/01/48 01/01/51 01/01/52 01/01/59 01/01/63 ... 12/31/96


I thought that the problem might be that I was converting a factor, so I first converted the variable to a character type (although I understand that this is done automatically) and then to date class, but I still had the same problem. Does anybody know how I can solve this and why I am getting this behavior? One more tidbit: the earliest date for which the date conversion is "correct" is 1969-04-15, while the most recent date for which the century is "incorrect" is 1967-11-05.

Thanks,
Josip

Research Associate
Human Security Report Project
School for International Studies
Simon Fraser University
Suite 7200--515 W. Hastings St.
Vancouver, BC V6B 5K3 Canada
#
Josip Dasovic wrote:
Well, to quote ?strptime:

      '%y' Year without century (00-99). If you use this on input, which
           century you get is system-specific.  So don't!  Often values
           up to 68 (or 69) are prefixed by 20 and 69 (or 70) to 99 by
           19.
#
on 12/10/2008 02:41 PM Josip Dasovic wrote:
This is the consequence of using a two digit year rather than a four
digit year, which BTW, was one of the Y2K issues raised a decade ago...

As per ?strptime:

%y
    Year without century (00?99). If you use this on input, which
century you get is system-specific. So don't! Often values up to 68 (or
69) are prefixed by 20 and 69 (or 70) to 99 by 19.



If you know that all of your dates are going to be before 2000, you can
do the following, by using a regex to convert the two digit year to a
four digit year and then use as.Date() with '%Y':

st_date <- "01/15/63"
[1] "01/15/1963"
[1] "1963-01-15"


The better option is to ensure that the source of your data outputs or
exports dates with a four digit year, before importing into R.

See ?sub and ?regex

HTH,

Marc Schwartz
#
Thank you very much, Peter. As is often the case, R gave me exactly what I
asked it to give me, but not what I wanted it to give me. :)

Cheers,
Josip

Research Associate
Human Security Report Project
School for International Studies
Simon Fraser University
Suite 7200--515 W. Hastings St.
Vancouver, BC V6B 5K3 Canada

----- Original Message -----
From: "Peter Dalgaard" <p.dalgaard at biostat.ku.dk>
To: "Josip Dasovic" <j_dasovic at sfu.ca>
Cc: r-help at r-project.org
Sent: Wednesday, December 10, 2008 1:16:48 PM GMT -08:00 US/Canada Pacific
Subject: Re: [R] Confusion with Converting Factors to Dates using as.date
Josip Dasovic wrote:
Well, to quote ?strptime:

      '%y' Year without century (00-99). If you use this on input, which
           century you get is system-specific.  So don't!  Often values
           up to 68 (or 69) are prefixed by 20 and 69 (or 70) to 99 by
           19.