how to merge within range?

10 messages · René Mayer, David Winsemius, Ian Gow +2 more

Original

1

10

René Mayer

Sat, May 14, 2011 5:12 AM #

Hello,
how can one merge two data frames when in the second data frame one  
column defines the start values
and another defines the end value of the to be merged range.
data.frame.1
time ...
13
24
35
46
55
...
data.frame.2
start end
24 37 ?h? ?
...

should result in this
13 NA
24 ?h?
35 ?h?
46 NA
55
?
thanks,
Ren?

David Winsemius

Sat, May 14, 2011 6:03 AM #

On May 14, 2011, at 8:12 AM, Ren? Mayer wrote:

And what happened when you typed:

?merge

And _why_ would that be?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

Ian Gow

Sat, May 14, 2011 6:16 AM #

If I assume that the third column in data.frame.2 is named "val" then in
SQL terms it _seems_ you want

SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN data.frame.2 AS b ON
a.time BETWEEN b.start AND b.end;

Not sure how to do that elegantly using R subsetting/merge, but you might
try a package that allows you to use SQL, such as sqldf.

On 5/14/11 8:03 AM, "David Winsemius" <dwinsemius at comcast.net> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius

Sat, May 14, 2011 10:10 AM #

On May 14, 2011, at 9:16 AM, Ian Gow wrote:

Huh? It's just two merge()'s (... once you fix the error in the  
example.)

David

> but you might
> try a package that allows you to use SQL, such as sqldf.
>
>
> On 5/14/11 8:03 AM, "David Winsemius" <dwinsemius at comcast.net> wrote:
>
>>
>> On May 14, 2011, at 8:12 AM, Ren? Mayer wrote:
>>
>>> Hello,
>>> how can one merge
>>
>> And what happened when you typed:
>>
>> ?merge
>>
>>> two data frames when in the second data frame one column defines the
>>> start values
>>> and another defines the end value of the to be merged range.
>>> data.frame.1
>>> time ...
>>> 13
>>> 24
>>> 35
>>> 46
>>> 55
>>> ...
>>> data.frame.2
>>> start end
>>> 24 37 ?h? ?
>>> ...
>>>
>>> should result in this
>>> 13 NA
>>> 24 ?h?
>>> 35 ?h?
>>> 46 NA
>>> 55
>>> ?
>>
>> And _why_ would that be?
>>
>>
>>> thanks,
>>> Ren?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

David Winsemius, MD
West Hartford, CT

René Mayer

Sat, May 14, 2011 11:06 AM #

thanks David and Ian,
let me make a better example as the first one was flawed

df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA)
names(df.1) = c("time", "value")
df.1
    time value
1   101    NA
2   199    NA
3   301    NA
4   401    NA
5   501    NA
6   601    NA
7   700    NA
8   800    NA
9   900    NA
10 1000    NA

# from and to define ranges within time,
# note that from and to may not match the numbers given in time
df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5))
df.2
   from  to value
1   99 303     1
2  500 702     3
3  799 950     5

what I want is:
    time value
1   101    1
2   199    1
3   301    1
4   401    NA
5   501    3
6   601    3
7   700    3
8   800    5
9   900    5
10 1000    NA

@David I don't know what you mean by 2 merges,
Ren?





Zitat von "David Winsemius" <dwinsemius at comcast.net>:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Ian Gow

Sat, May 14, 2011 11:25 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110514/fcc57086/attachment.pl>

Sat, May 14, 2011 11:27 AM #

You could use findInterval() along with a trick with c(rbind(...)):

[1] 1 1 1 2 3 3 3 5 5 6

The even-valued outputs would map to NA's, the odds
to value[(i+1)/2], but you can use the c(rbind(...)) trick again:

[1]  1  1  1 NA  3  3  3  5  5 NA

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org 
[mailto:r-help-bounces at r-project.org] On Behalf Of Ren? Mayer
Sent: Saturday, May 14, 2011 11:06 AM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] how to merge within range?

thanks David and Ian,
let me make a better example as the first one was flawed

df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA)
names(df.1) = c("time", "value")
df.1
    time value
1   101    NA
2   199    NA
3   301    NA
4   401    NA
5   501    NA
6   601    NA
7   700    NA
8   800    NA
9   900    NA
10 1000    NA

# from and to define ranges within time,
# note that from and to may not match the numbers given in time
df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5))
df.2
   from  to value
1   99 303     1
2  500 702     3
3  799 950     5

what I want is:
    time value
1   101    1
2   199    1
3   301    1
4   401    NA
5   501    3
6   601    3
7   700    3
8   800    5
9   900    5
10 1000    NA

@David I don't know what you mean by 2 merges,
Ren?





Zitat von "David Winsemius" <dwinsemius at comcast.net>:

On May 14, 2011, at 9:16 AM, Ian Gow wrote:

If I assume that the third column in data.frame.2 is named

"val" then in

SQL terms it _seems_ you want

SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN

data.frame.2 AS b ON

a.time BETWEEN b.start AND b.end;

Not sure how to do that elegantly using R subsetting/merge,

Huh? It's just two merge()'s (... once you fix the error in

the example.)

-- 
David

but you might
try a package that allows you to use SQL, such as sqldf.


On 5/14/11 8:03 AM, "David Winsemius"

<dwinsemius at comcast.net> wrote:

On May 14, 2011, at 8:12 AM, Ren? Mayer wrote:

Hello,
how can one merge

And what happened when you typed:

?merge

two data frames when in the second data frame one column

defines the

start values
and another defines the end value of the to be merged range.
data.frame.1
time ...
13
24
35
46
55
...
data.frame.2
start end
24 37 ?h? ?
...

should result in this
13 NA
24 ?h?
35 ?h?
46 NA
55
?

And _why_ would that be?

thanks,
Ren?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius

Sat, May 14, 2011 11:47 AM #

On May 14, 2011, at 2:27 PM, William Dunlap wrote:

That's nice. I was working on a slightly different "trick"

findInterval( df.1[,1],t(df.2[,1:2]))
  [1] 1 1 1 2 3 3 3 5 5 6

I was then trying to get the right indices with (.)'%%' 2 and (.) '%/ 
%' 2

I'd like to understand that. Maybe, maybe... ah, got it. At first I  
didn't realize those were the final answers since they looked like  
indices. My t(.) trick doesn't generalize as well.


My earlier suggestion tht two merges woul do it was based on my  
erroneous interpretation of the example, since  I thought the task was  
to match on the end points of the intervals.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Ren? Mayer
Sent: Saturday, May 14, 2011 11:06 AM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] how to merge within range?

thanks David and Ian,
let me make a better example as the first one was flawed

df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA)
names(df.1) = c("time", "value")
df.1
   time value
1   101    NA
2   199    NA
3   301    NA
4   401    NA
5   501    NA
6   601    NA
7   700    NA
8   800    NA
9   900    NA
10 1000    NA

# from and to define ranges within time,
# note that from and to may not match the numbers given in time
df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5))
df.2
  from  to value
1   99 303     1
2  500 702     3
3  799 950     5

what I want is:
   time value
1   101    1
2   199    1
3   301    1
4   401    NA
5   501    3
6   601    3
7   700    3
8   800    5
9   900    5
10 1000    NA

@David I don't know what you mean by 2 merges,
Ren?





Zitat von "David Winsemius" <dwinsemius at comcast.net>:

On May 14, 2011, at 9:16 AM, Ian Gow wrote:

If I assume that the third column in data.frame.2 is named

"val" then in

SQL terms it _seems_ you want

SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN

data.frame.2 AS b ON

a.time BETWEEN b.start AND b.end;

Not sure how to do that elegantly using R subsetting/merge,

Huh? It's just two merge()'s (... once you fix the error in

the example.)

-- 
David

but you might
try a package that allows you to use SQL, such as sqldf.


On 5/14/11 8:03 AM, "David Winsemius"

<dwinsemius at comcast.net> wrote:

On May 14, 2011, at 8:12 AM, Ren? Mayer wrote:

Hello,
how can one merge

And what happened when you typed:

?merge

two data frames when in the second data frame one column

defines the

start values
and another defines the end value of the to be merged range.
data.frame.1
time ...
13
24
35
46
55
...
data.frame.2
start end
24 37 ?h? ?
...

should result in this
13 NA
24 ?h?
35 ?h?
46 NA
55
?

And _why_ would that be?

thanks,
Ren?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

René Mayer

Sat, May 14, 2011 12:08 PM #

sqldf is impressive - compiled it now;
the trick with findInterval is nice, too.
thanks guys!!




Zitat von "David Winsemius" <dwinsemius at comcast.net>:

On May 14, 2011, at 2:27 PM, William Dunlap wrote:

You could use findInterval() along with a trick with c(rbind(...)):

i <- findInterval(x=df.1$time, vec=c(rbind(df.2$from, df.2$to)))
i

[1] 1 1 1 2 3 3 3 5 5 6

That's nice. I was working on a slightly different "trick"

findInterval( df.1[,1],t(df.2[,1:2]))
 [1] 1 1 1 2 3 3 3 5 5 6

I was then trying to get the right indices with (.)'%%' 2 and (.) '%/%' 2

The even-valued outputs would map to NA's, the odds
to value[(i+1)/2], but you can use the c(rbind(...)) trick again:

c(rbind(df.2$value, NA))[i]

[1]  1  1  1 NA  3  3  3  5  5 NA

I'd like to understand that. Maybe, maybe... ah, got it. At first I  
didn't realize those were the final answers since they looked like  
indices. My t(.) trick doesn't generalize as well.


My earlier suggestion tht two merges woul do it was based on my  
erroneous interpretation of the example, since  I thought the task  
was to match on the end points of the intervals.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Ren? Mayer
Sent: Saturday, May 14, 2011 11:06 AM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] how to merge within range?

thanks David and Ian,
let me make a better example as the first one was flawed

df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA)
names(df.1) = c("time", "value")
df.1
  time value
1   101    NA
2   199    NA
3   301    NA
4   401    NA
5   501    NA
6   601    NA
7   700    NA
8   800    NA
9   900    NA
10 1000    NA

# from and to define ranges within time,
# note that from and to may not match the numbers given in time
df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5))
df.2
 from  to value
1   99 303     1
2  500 702     3
3  799 950     5

what I want is:
  time value
1   101    1
2   199    1
3   301    1
4   401    NA
5   501    3
6   601    3
7   700    3
8   800    5
9   900    5
10 1000    NA

@David I don't know what you mean by 2 merges,
Ren?





Zitat von "David Winsemius" <dwinsemius at comcast.net>:

On May 14, 2011, at 9:16 AM, Ian Gow wrote:

If I assume that the third column in data.frame.2 is named

"val" then in

SQL terms it _seems_ you want

SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN

data.frame.2 AS b ON

a.time BETWEEN b.start AND b.end;

Not sure how to do that elegantly using R subsetting/merge,

Huh? It's just two merge()'s (... once you fix the error in

the example.)

-- 
David

but you might
try a package that allows you to use SQL, such as sqldf.


On 5/14/11 8:03 AM, "David Winsemius"

<dwinsemius at comcast.net> wrote:

On May 14, 2011, at 8:12 AM, Ren? Mayer wrote:

Hello,
how can one merge

And what happened when you typed:

?merge

two data frames when in the second data frame one column

defines the

start values
and another defines the end value of the to be merged range.
data.frame.1
time ...
13
24
35
46
55
...
data.frame.2
start end
24 37 ?h? ?
...

should result in this
13 NA
24 ?h?
35 ?h?
46 NA
55
?

And _why_ would that be?

thanks,
Ren?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 day later

Sun, May 15, 2011 2:41 PM #

I'd've first said it's "simply"

sapply(df1$time, function(x) if(any(foo <- (x>=df2$from &
x<=df2$to))>0) df2$value[which(foo)] else NA )

but the following are much nicer (except that instead of NA you'll
have 0 but that's easy to change if necessary):

 colSums(sapply(df1$time, function(x) (x>=df2$from & x <=df2$to) * df2$value) )
 rowSums(outer(df1$time, df2$from, ">=") * outer(df1$time, df2$to,
"<=") * df2$value)



On Sat, May 14, 2011 at 10:08 PM, Ren? Mayer

<mayer at psychologie.tu-dresden.de> wrote:

sqldf is impressive - compiled it now;
the trick with findInterval is nice, too.
thanks guys!!




Zitat von "David Winsemius" <dwinsemius at comcast.net>:

On May 14, 2011, at 2:27 PM, William Dunlap wrote:

You could use findInterval() along with a trick with c(rbind(...)):

i <- findInterval(x=df.1$time, vec=c(rbind(df.2$from, df.2$to)))
i

[1] 1 1 1 2 3 3 3 5 5 6

That's nice. I was working on a slightly different "trick"

findInterval( df.1[,1],t(df.2[,1:2]))
?[1] 1 1 1 2 3 3 3 5 5 6

I was then trying to get the right indices with (.)'%%' 2 and (.) '%/%' 2

The even-valued outputs would map to NA's, the odds
to value[(i+1)/2], but you can use the c(rbind(...)) trick again:

c(rbind(df.2$value, NA))[i]

[1] ?1 ?1 ?1 NA ?3 ?3 ?3 ?5 ?5 NA

I'd like to understand that. Maybe, maybe... ah, got it. At first I didn't
realize those were the final answers since they looked like indices. My t(.)
trick doesn't generalize as well.


My earlier suggestion tht two merges woul do it was based on my erroneous
interpretation of the example, since ?I thought the task was to match on the
end points of the intervals.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Ren? Mayer
Sent: Saturday, May 14, 2011 11:06 AM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] how to merge within range?

thanks David and Ian,
let me make a better example as the first one was flawed

df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA)
names(df.1) = c("time", "value")
df.1
?time value
1 ? 101 ? ?NA
2 ? 199 ? ?NA
3 ? 301 ? ?NA
4 ? 401 ? ?NA
5 ? 501 ? ?NA
6 ? 601 ? ?NA
7 ? 700 ? ?NA
8 ? 800 ? ?NA
9 ? 900 ? ?NA
10 1000 ? ?NA

# from and to define ranges within time,
# note that from and to may not match the numbers given in time
df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5))
df.2
?from ?to value
1 ? 99 303 ? ? 1
2 ?500 702 ? ? 3
3 ?799 950 ? ? 5

what I want is:
?time value
1 ? 101 ? ?1
2 ? 199 ? ?1
3 ? 301 ? ?1
4 ? 401 ? ?NA
5 ? 501 ? ?3
6 ? 601 ? ?3
7 ? 700 ? ?3
8 ? 800 ? ?5
9 ? 900 ? ?5
10 1000 ? ?NA

@David I don't know what you mean by 2 merges,
Ren?





Zitat von "David Winsemius" <dwinsemius at comcast.net>:

On May 14, 2011, at 9:16 AM, Ian Gow wrote:

If I assume that the third column in data.frame.2 is named

"val" then in

SQL terms it _seems_ you want

SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN

data.frame.2 AS b ON

a.time BETWEEN b.start AND b.end;

Not sure how to do that elegantly using R subsetting/merge,

Huh? It's just two merge()'s (... once you fix the error in

the example.)

--
David

but you might
try a package that allows you to use SQL, such as sqldf.


On 5/14/11 8:03 AM, "David Winsemius"

<dwinsemius at comcast.net> wrote:

On May 14, 2011, at 8:12 AM, Ren? Mayer wrote:

Hello,
how can one merge

And what happened when you typed:

?merge

two data frames when in the second data frame one column

defines the

start values
and another defines the end value of the to be merged range.
data.frame.1
time ...
13
24
35
46
55
...
data.frame.2
start end
24 37 ?h? ?
...

should result in this
13 NA
24 ?h?
35 ?h?
46 NA
55
?

And _why_ would that be?

thanks,
Ren?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.