Skip to content

How to preserve date format while aggregating

11 messages · Erich Studerus, jim holtman, Dr Eberhard W Lisse +3 more

#
Hi

I have a dataframe in which some subjects appear in more than one row. I
want to extract the subject-rows which have the minimum date per subject. I
tried the following aggregate function.

attach(dataframe.xy)

aggregate(Date,list(SubjectID),min)

Unfortunately, the format of the Date-column changes to numeric, when I'm
applying this function. How can I preserve the date format?

Thanks

Erich
#
Try changing the 'class' of the numeric result back to Date:
[1] "2008-09-08"
[1] 14130
num 14130
[1] "2008-09-08"
Class 'Date'  num 14130
On Mon, Sep 8, 2008 at 6:38 AM, Erich Studerus
<erich.studerus at bli.uzh.ch> wrote:

  
    
#
Thanks, I've already tried that. The problem is, that the original date is
not restored when I change the numeric back to date. I get a totally
different date.
Maybe it has something to do with the original date format. My data are
directly imported from a SQL-database. The date column to which I want to
apply the aggregate function has the two classes POSIXt and POSIXct.
Changing the column to class "Date" before applying the aggregate function
did not help. I still get a different date, when I transform it back to
class "Date".
I would be glad, if someone knew a more elegant way to extract the rows with
minimum/earliest date per subject. 


Erich


-----Urspr?ngliche Nachricht-----
Von: jim holtman [mailto:jholtman at gmail.com] 
Gesendet: Montag, 8. September 2008 14:24
An: Erich Studerus
Cc: r-help at r-project.org
Betreff: Re: [R] How to preserve date format while aggregating

Try changing the 'class' of the numeric result back to Date:
[1] "2008-09-08"
[1] 14130
num 14130
[1] "2008-09-08"
Class 'Date'  num 14130
On Mon, Sep 8, 2008 at 6:38 AM, Erich Studerus
<erich.studerus at bli.uzh.ch> wrote:
I
http://www.R-project.org/posting-guide.html

  
    
#
Can you provide some actual data.  It sounds like the columns you are
are POSIXct, in which case you would want to do something like this:

time <- structure(time, class = c("POSIXt", "POSIXct"))

So it is important to know what your numeric values came from and what
their actual values were.  Which of these values look closer to what
you are seeing:
[1] 1220832000
[1] 14130
On Mon, Sep 8, 2008 at 9:15 AM, Erich Studerus
<erich.studerus at bli.uzh.ch> wrote:

  
    
#
Erich,

how does the data look, when it comes from SQL?

And why not extract the data with SQL directly, so you don't
have this issue in the first place?

el

on 9/8/08 3:15 PM Erich Studerus said the following:

  
    
#
Here's how the dates look like after the sql-query:

Oav$Date[1:3]
[1] "1991-11-22 00:45:00 CET" "1991-12-13 00:01:00 CET" "1992-02-06 00:45:00
CET"
[1] "POSIXt"  "POSIXct"
[1] 690767100 692578860 697333500
[1] "3226-01-31" "8186-07-07" "1204-04-11"

I wanteded to apply the aggregate function in R instead of in the SQL-query,
because R is much more flexible and faster in manipulating large dataframes.

Erich

 

-----Urspr?ngliche Nachricht-----
Von: Dr Eberhard Lisse [mailto:el at lisse.NA] 
Gesendet: Montag, 8. September 2008 15:43
An: Erich Studerus
Cc: r-help at r-project.org
Betreff: Re: [R] How to preserve date format while aggregating

Erich,

how does the data look, when it comes from SQL?

And why not extract the data with SQL directly, so you don't
have this issue in the first place?

el

on 9/8/08 3:15 PM Erich Studerus said the following:
with

  
    
#
Those are not dates!  They are date-times.

aggregate is overkill for a single column.  Something simple like

DT <- seq(Sys.time(), by="4 hours", len=24)
grp <- rbinom(24, 1, p=0.5)
res <- tapply(DT, grp, min)
class(res) <- class(DT)
res

would suffice.
On Mon, 8 Sep 2008, Erich Studerus wrote:

            

  
    
#
As requested in the last line to every message to r-help a cutdown
version of the data should be posted with the question.

Suppose such a cut down version is this:

DF <- data.frame(Date = ISOdatetime(2008, 1, c(1, 2, 5, 3, 4), 0, 0, 0),
	Subject = c(1, 1, 2, 2, 2))

# Then sort the data and take first one in each group:

DFsort <- DF[order(DF$Subject, DF$Date), ]
DFsort[!duplicated(DFsort$Subject), ]

If its already sorted then omit first line and use DF in place
of DFsort in second line.

On Mon, Sep 8, 2008 at 9:15 AM, Erich Studerus
<erich.studerus at bli.uzh.ch> wrote:
#
Hi Erich,

Since min() is defined for numbers and not dates, the problem is in the
min() function. min() is converting from date format to number format.

Your best bet is to make this conversion explicit...such that it is
reversable. So, convert the date into UTC, then UTC to seconds since epoch,
then take the minimum, then convert back to UTC time.

This sounds like a pain...but that's basically what a version of min()
designed to work with dates would do. The reason this is a pain is basically
due to timezones:

Consider a comparison between x = 3:54 PM September 8 in California (right
now where I am) and y = 12:54 AM September 9 in Zurich (right now where you
are). Is it earlier here than there? Yes, because it's Sept 8 to your Sept
9. Is it earlier there than here? Yes, because your day started 56 minutes
ago, mine over 15 hours ago. Is it the same time here than there? Yes,
because our UTC times are equal.

So it's not clear what min should return, so min is not defined for dates.
However, min is defined for numbers, and dates can be converted to
numbers...but what those numbers actually mean is not necessarily clear.

--Adam
On Mon, 8 Sep 2008, Erich Studerus wrote:

            
#
min does work for POSIXct and Date too:
[1] "2008-01-01 EST"
[1] "2008-01-01"



On Mon, Sep 8, 2008 at 6:57 PM, Adam D. I. Kramer
<adik-rhelp at ilovebacon.org> wrote:
#
This is completely wrong: min _is_ defined for date-times:
[1] "1972-07-01 01:00:00 BST"

Please do study the posting guide and do your homework before posting: you 
seem unaware of what the POSIXct class is, so ?DateTimeClasses is one 
place you need to start.  And
[1] Summary.data.frame      Summary.Date            Summary.difftime
[4] Summary.factor          Summary.numeric_version Summary.POSIXct
[7] Summary.POSIXlt

so ?Summary is another.
On Mon, 8 Sep 2008, Adam D. I. Kramer wrote: