Hi all,
I have been chipping away at a problem I encountered in calculating
rates per year from a moderately large data file (46412 rows). When I
ran the following command, I got obviously wrong output:
interval<-
c(NA,as.numeric(diff(
strptime(mkdf$MEAS_DATE,"%d/%m/%Y")))/365.25)
The values in MEAS_DATE looked like this:
mkdf$MEAS_DATE[1:10]
[1] 1/5/1962 1/5/1963 1/5/1964 1/3/1965 1/4/1966 1/4/1967
1/6/1968
[8] 25/3/1969 1/4/1971 1/2/1974
146 Levels: 10/10/1967 1/10/1947 1/10/1965 1/10/1967 1/10/1983 ... 9/1/1992
To abbreviate three evenings of work, I finally found that values 17170
and 17171 were the same. If I ran the entire set, or anything over
1:17170, I would get output like this:
interval[1:10]
[1] NA 86340.86 86577.41 71911.29 93673.92 86340.86
101006.98
[8] 70255.44 174337.58 245292.81
If I ran any set of values up to 17170, I would get the correct output:
interval[1:10]
[1] NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
1.1690623
[8] 0.8131417 2.0177960 2.8390372
If I changed value 17171 by one day (and added that level), the command
worked correctly:
interval[1:10]
[1] NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
1.1690623
[8] 0.8131417 2.0177960 2.8390372
There have been a few messages about this problem, but apparently no
solution. The problem can be seen with these examples (I haven't
included the real data as it is not mine):
foodate<-c("1/7/1991","1/8/1991","1/8/1991","3/8/1991")
as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 7333.0595 0.0000 473.1006
foodate<-factor(c("1/7/1991","1/8/1991","1/8/1991","3/8/1991"))
as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 7333.0595 0.0000 473.1006
foodate<-factor(c("1/7/1991","1/8/1991","2/8/1991","3/8/1991"))
> as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 0.084873374 0.002737851 0.002737851
Beats me.
Jim
Problem with diff(strptime(...
3 messages · Brian Ripley, Jim Lemon
You are throwing away the clue in your use of as.numeric. First. strptime returns a POSIXlt value, which you will convert to POSIXct when you do arithetic (using diff()). Why are you doing that? So
foodate<-factor(c("1/7/1991","1/8/1991","1/8/1991","3/8/1991"))
diff(strptime(foodate,"%d/%m/%Y"))
Time differences in secs [1] 2678400 0 172800 attr(,"tzone") [1] "" is correct. I think you intended diff(as.Date(foodate,"%d/%m/%Y"))/365.25 or even add as.numeric() inside diff().
On Thu, 20 Mar 2008, Jim Lemon wrote:
Hi all,
I have been chipping away at a problem I encountered in calculating
rates per year from a moderately large data file (46412 rows). When I
ran the following command, I got obviously wrong output:
interval<-
c(NA,as.numeric(diff(
strptime(mkdf$MEAS_DATE,"%d/%m/%Y")))/365.25)
The values in MEAS_DATE looked like this:
mkdf$MEAS_DATE[1:10]
[1] 1/5/1962 1/5/1963 1/5/1964 1/3/1965 1/4/1966 1/4/1967
1/6/1968
[8] 25/3/1969 1/4/1971 1/2/1974
146 Levels: 10/10/1967 1/10/1947 1/10/1965 1/10/1967 1/10/1983 ... 9/1/1992
To abbreviate three evenings of work, I finally found that values 17170
and 17171 were the same. If I ran the entire set, or anything over
1:17170, I would get output like this:
interval[1:10]
[1] NA 86340.86 86577.41 71911.29 93673.92 86340.86
101006.98
[8] 70255.44 174337.58 245292.81
If I ran any set of values up to 17170, I would get the correct output:
interval[1:10]
[1] NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
1.1690623
[8] 0.8131417 2.0177960 2.8390372
If I changed value 17171 by one day (and added that level), the command
worked correctly:
interval[1:10]
[1] NA 0.9993155 1.0020534 0.8323066 1.0841889 0.9993155
1.1690623
[8] 0.8131417 2.0177960 2.8390372
There have been a few messages about this problem, but apparently no
solution. The problem can be seen with these examples (I haven't
included the real data as it is not mine):
foodate<-c("1/7/1991","1/8/1991","1/8/1991","3/8/1991")
as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 7333.0595 0.0000 473.1006
foodate<-factor(c("1/7/1991","1/8/1991","1/8/1991","3/8/1991"))
as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 7333.0595 0.0000 473.1006
foodate<-factor(c("1/7/1991","1/8/1991","2/8/1991","3/8/1991"))
as.numeric(diff(strptime(foodate,"%d/%m/%Y"))/365.25)
[1] 0.084873374 0.002737851 0.002737851 Beats me. Jim
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley wrote:
You are throwing away the clue in your use of as.numeric. First. strptime returns a POSIXlt value, which you will convert to POSIXct when you do arithetic (using diff()). Why are you doing that? So
foodate<-factor(c("1/7/1991","1/8/1991","1/8/1991","3/8/1991"))
diff(strptime(foodate,"%d/%m/%Y"))
Time differences in secs [1] 2678400 0 172800 attr(,"tzone") [1] "" is correct. I think you intended diff(as.Date(foodate,"%d/%m/%Y"))/365.25 or even add as.numeric() inside diff().
This is true, but I am puzzled as to why I get the correct output except when there are two consecutive input values that are the same. The idea was to get the number of years between each date in order to calculate a rate per year. If I put the as.numeric inside diff: diff(as.numeric(strptime(foodate,"%d/%m/%Y"))/365.25) Error in Ops.POSIXt(as.numeric(strptime(foodate, "%d/%m/%Y")), 365.25) : / not defined for "POSIXt" objects Jim