Skip to content

Comparing dates in dataframes

8 messages · James Rome, Stephan Kolassa, jim holtman +1 more

#
I have two data frames. One (arr) has all arrivals to an airport for a
year, and the other (gw) has the dates and quarter hour of the day when
the weather is good. arr has a Date and quarter hour column.
[1] "Date"     "weekday"      "hour"         "month"        "minute"     
 [6] "quarter"      "ICAO"         "Flight"       "AircraftType"
"Tail"       
[11] "Arrived"      "STA"          "Runway"       "FromTo"      
"Delay"      
[16] "Operator"     "gw" 

I added the gw column to arr and initialized it to all FALSE
[1] "Date"           "minute"         "hour"           "quarter"      
 [5] "Efficiency.Val" "Weekly.Avg"     "Arrival.Val"    "Weekly.Avg.1" 
 [9] "Departure.Val"  "Weekly.Avg.2"   "Num.of.Hold"    "Runway"       
[13] "Weather" 

First point of confusion:
[1] 1/1/09
353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ... 9/9/09
Why do I get 353 levels?

I am trying to identify the quarter hours with good weather in the arr
data frame. What I want to do is to go through the rows in gw, and to
set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw row.

So I tried
gooddates = function(all, good) {
   la = length(all)   # All the flights
  lw = length(good)  # The good 15-minute periods
  for(j in 1:lw) {
    d=good$Date[j]
    q=good$quarter[j]
    all[all$DateTime==d && all$quarter==q,17]=TRUE
  }
}

but when I run this, I get
"Error in Ops.factor(all$DateTime, d) :
  level sets of factors are different"

I know the level sets are different, that is what I am trying to find.
But I think I am comparing single elements from the data frames.

So what am I doing wrong? And there ought to be a better way to do this.

Thanks in advance,
Jim Rome
#
Hi,

it looks like when you read in your data.frames, you didn't tell R to 
expect dates, so it treats the Date columns as factors. Judicious use of 
  something along these lines before doing your comparisons may help:

arr$Date <- as.Date(as.character(arr$Date),format=something)

Then again, it may be possible to do the actual merging using merge().

HTH
Stephan


James Rome schrieb:
#
I don't want to merge the data frames because there are many entries
in the arrival frame for each one in the weather frame. And it is the
missing dates and quarters in the weather frame that constitute the date
I want, namely those arrivals that occurred in bad (or good) weather.
   But I will try converting the dates as suggested tomorrow.
   Is there a way to do what I want without that for loop? There are
almost 100,000 rows in the arrivals frame, and R is grinding to a halt.
   And is there a way to get R to abort its current calculation? Ctrl-C
and Esc do not seem to work.

Thanks,
Jim
On 1/16/10 4:26 PM, Stephan Kolassa wrote:
#
On Jan 17, 2010, at 12:37 PM, James Rome wrote:

            
You are attempting a vectorized test and assignment with "&&" which  
seems unlikely to succeed, but even then I am not sure your problems  
would be over. (I'm also guessing that you might not have reported a  
warning.)

Why not merge arr to gw by date and quarter?

Answering these questions would be greatly speeded up with a small  
sample dataset. Are you aware of the virtues of the dput function?