Skip to content

Problem with diff(strptime(...

3 messages · Brian Ripley, Jim Lemon

#
Hi all,

I have been chipping away at a problem I encountered in calculating 
rates per year from a moderately large data file (46412 rows). When I 
ran the following command, I got obviously wrong output:

interval<-
  c(NA,as.numeric(diff(
  strptime(mkdf$MEAS_DATE,"%d/%m/%Y")))/365.25)

The values in MEAS_DATE looked like this:

mkdf$MEAS_DATE[1:10]
  [1] 1/5/1962  1/5/1963  1/5/1964  1/3/1965  1/4/1966  1/4/1967
  1/6/1968
  [8] 25/3/1969 1/4/1971  1/2/1974
146 Levels: 10/10/1967 1/10/1947 1/10/1965 1/10/1967 1/10/1983 ... 9/1/1992

To abbreviate three evenings of work, I finally found that values 17170 
and 17171 were the same. If I ran the entire set, or anything over 
1:17170, I would get output like this:

interval[1:10]
  [1]        NA  86340.86  86577.41  71911.29  93673.92  86340.86
  101006.98
  [8]  70255.44 174337.58 245292.81

If I ran any set of values up to 17170, I would get the correct output:

interval[1:10]
  [1]        NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
  1.1690623
  [8] 0.8131417 2.0177960 2.8390372

If I changed value 17171 by one day (and added that level), the command 
worked correctly:

interval[1:10]
  [1]        NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
  1.1690623
  [8] 0.8131417 2.0177960 2.8390372

There have been a few messages about this problem, but apparently no 
solution. The problem can be seen with these examples (I haven't 
included the real data as it is not mine):

foodate<-c("1/7/1991","1/8/1991","1/8/1991","3/8/1991")
as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 7333.0595    0.0000  473.1006

foodate<-factor(c("1/7/1991","1/8/1991","1/8/1991","3/8/1991"))
as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 7333.0595    0.0000  473.1006

foodate<-factor(c("1/7/1991","1/8/1991","2/8/1991","3/8/1991"))
 > as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 0.084873374 0.002737851 0.002737851

Beats me.

Jim
#
You are throwing away the clue in your use of as.numeric.

First. strptime returns a POSIXlt value, which you will convert to POSIXct 
when you do arithetic (using diff()).  Why are you doing that?  So
Time differences in secs
[1] 2678400       0  172800
attr(,"tzone")
[1] ""

is correct.  I think you intended

diff(as.Date(foodate,"%d/%m/%Y"))/365.25

or even add as.numeric() inside diff().
On Thu, 20 Mar 2008, Jim Lemon wrote:

            

  
    
#
Prof Brian Ripley wrote:
This is true, but I am puzzled as to why I get the correct output except 
when there are two consecutive input values that are the same. The idea 
was to get the number of years between each date in order to calculate a 
  rate per year. If I put the as.numeric inside diff:

diff(as.numeric(strptime(foodate,"%d/%m/%Y"))/365.25)
Error in Ops.POSIXt(as.numeric(strptime(foodate, "%d/%m/%Y")), 365.25) :
   / not defined for "POSIXt" objects

Jim