Skip to content

indexing in data frames

12 messages · jimi adams, David L Carlson, David Winsemius +4 more

#
I'm still not fully understanding exactly how R is handling data frames, but am getting closer. Any help with this one will likely go a long way in getting me there. Let's say I have a data frame, let's call it "a". Within that data frame i have two variables, let's call them "b" and "c", where "b" is a single numeric value per observation, while "c" is a LIST of numeric values. What I want to be able to do is perform an operation on each element in "c" by the single element in "b". 

So, for example, if I wanted to subtract each element in "c" from the scalar in "b". For example, if i had
[1] 1988
[2] 1989
?
&
[[1]]
[1] 1985 1982 1984
[[2]]
[1] 1988 1980
?

I'm looking for a result of:
a$new
[[1]]
[1] 3 6 4
[[2]]
[1] 1 9
?

I've tried a few different things, none of which have the desired result. Any help appreciated.
thanks!

jimi adams
Assistant Professor
Department of Sociology
American University
e: jadams at american.edu
w: jimiadams.com
#
HI,

You can get the results you wanted by:
c=list(c(1985,1982,1984),c(1988,1980),c(1983,1984),c(1988,1998,1993),c(1993,1994,1998))
b1<-list(1988,1989,1990,1991,1992)
for(i in 1:length(b1)){
?anew[[i]]<-list()
?anew[[i]]<-b1[[i]]-c[[i]]
?}
?anew
[[1]]
[1] 3 6 4

[[2]]
[1] 1 9
-------
-------
But, if you wanted both both a list and a vector in a dataframe "a"
It is possible:
b<-c(1988,1989,1990,1991,1992)
c=list(c(1985,1982,1984),c(1988,1980),c(1983,1984),c(1988,1998,1993),c(1993,1994,1998))

?a<-data.frame(b,c1=cbind(c))

?a
#???? b??????????????? c
#1 1988 1985, 1982, 1984
#2 1989?????? 1988, 1980
#3 1990?????? 1983, 1984
#4 1991 1988, 1998, 1993
#5 1992 1993, 1994, 1998
anew1<-list()
for(i in 1:nrow(a)){
?anew1[[i]]<-list()
?anew1[[i]]<-a$b[i]-a$c[[i]]
?}
[[1]]
[1] 3 6 4

[[2]]
[1] 1 9

[[3]]
[1] 7 6

[[4]]
[1]? 3 -7 -2

[[5]]
[1] -1 -2 -6
A.K.




----- Original Message -----
From: jimi adams <jadams at american.edu>
To: r-help at r-project.org
Cc: 
Sent: Thursday, August 9, 2012 4:42 PM
Subject: [R] indexing in data frames

I'm still not fully understanding exactly how R is handling data frames, but am getting closer. Any help with this one will likely go a long way in getting me there. Let's say I have a data frame, let's call it "a". Within that data frame i have two variables, let's call them "b" and "c", where "b" is a single numeric value per observation, while "c" is a LIST of numeric values. What I want to be able to do is perform an operation on each element in "c" by the single element in "b". 

So, for example, if I wanted to subtract each element in "c" from the scalar in "b". For example, if i had
[1] 1988
[2] 1989
?
&
[[1]]
[1] 1985 1982 1984
[[2]]
[1] 1988 1980
?

I'm looking for a result of:
a$new
[[1]]
[1] 3 6 4
[[2]]
[1] 1 9
?

I've tried a few different things, none of which have the desired result. Any help appreciated.
thanks!

jimi adams
Assistant Professor
Department of Sociology
American University
e: jadams at american.edu
w: jimiadams.com

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
HI,

In the reply I sent, I forgot to add,

anew<-list()#before,
for(i in 1:length(b1)){
?anew[[i]]<-list()
?anew[[i]]<-b1[[i]]-c[[i]]
?}

A.K.

----- Original Message -----
From: jimi adams <jadams at american.edu>
To: r-help at r-project.org
Cc: 
Sent: Thursday, August 9, 2012 4:42 PM
Subject: [R] indexing in data frames

I'm still not fully understanding exactly how R is handling data frames, but am getting closer. Any help with this one will likely go a long way in getting me there. Let's say I have a data frame, let's call it "a". Within that data frame i have two variables, let's call them "b" and "c", where "b" is a single numeric value per observation, while "c" is a LIST of numeric values. What I want to be able to do is perform an operation on each element in "c" by the single element in "b". 

So, for example, if I wanted to subtract each element in "c" from the scalar in "b". For example, if i had
[1] 1988
[2] 1989
?
&
[[1]]
[1] 1985 1982 1984
[[2]]
[1] 1988 1980
?

I'm looking for a result of:
a$new
[[1]]
[1] 3 6 4
[[2]]
[1] 1 9
?

I've tried a few different things, none of which have the desired result. Any help appreciated.
thanks!

jimi adams
Assistant Professor
Department of Sociology
American University
e: jadams at american.edu
w: jimiadams.com

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Thanks. Yes, I got it to work with loops for small data. I was just hoping, given the size of the data.frame (hundreds of thousands) and the length of the lists (varying up to a few hundred) to avoid that if at all possible. Perhaps I'm expecting some behavior that's not feasible?

cheers,
jimi
On 09Aug, 2012, at 17:39 , arun wrote:

            
#
You have not defined a data frame since data frames cannot contain lists,
but lists can contain data frames so you are asking about how to process a
list. I'm changing your object names to a, b, and d because c() is the
concatenation function and it can cause all kinds of problems to use it as
an object name.
$b
[1] 1988 1989

$d
$d[[1]]
[1] 1985 1982 1984

$d[[2]]
[1] 1988 1980
[1] 1988 1989
[1] 1988 1989
[[1]]
[1] 1985 1982 1984

[[2]]
[1] 1988 1980

[[1]]
[1] 1985 1982 1984

[[2]]
[1] 1988 1980
element
[1] 1985 1982 1984
[1] 1985 1982 1984
element
[1] 1988 1980
[1] 1988 1980
[[1]]
[1] 3 6 4

[[2]]
[1] 1 9

You can do all this with a data.frame if you think about it differently:
d = c(1985, 1982, 1984, 1988, 1980))
$G1988
[1] 3 6 4

$G1989
[1] 1 9

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
#
On Aug 9, 2012, at 2:43 PM, David L Carlson wrote:

            
Not true:

 > dput(a)
structure(list(b = c(1988, 1989),
                c = list(c(1985, 1982, 1984),
                         c(1988, 1980))), .Names = c("b", "c"))

 > ab <- data.frame(a$b)
 > ab
    a.b
1 1988
2 1989
 > ab$cb <- a$c
 > ab
    a.b               cb
1 1988 1985, 1982, 1984
2 1989       1988, 1980
 > str(ab)
'data.frame':	2 obs. of  2 variables:
  $ a.b: num  1988 1989
  $ cb :List of 2
   ..$ : num  1985 1982 1984
   ..$ : num  1988 1980

But it seems unlikely that the OP's "a" object is a dataframe since  
the console eval-print loop would not display a dataframe in that  
manner.

At any rate with the ab dataframe:

 > for( i in 1:NROW(ab) ) print(  ab$a.b[i] - ab$cb[[i]] )
[1] 3 6 4
[1] 1 9

The OP should note the need to use '[[' on a list-object to get  
commensurate classes to pass to the '-' operator.
#
HI,

If? "b" is also an "element of list, then it would be much easier.
b<-c(1988,1989,1990,1991,1992)
c<-list(c(1985,1982,1984),c(1988,1980),c(1983,1984),c(1988,1998,1993),c(1993,1994,1998))
?a<-list(b,c)
?names(a)<-c("b","c")

?lapply(1:length(a$c),function(x) a$b[x]-a$c[[x]])
[[1]]
[1] 3 6 4

[[2]]
[1] 1 9

[[3]]
[1] 7 6

[[4]]
[1]? 3 -7 -2

[[5]]
[1] -1 -2 -6
#2nd case

d<-data.frame(a1=c(1998,1924,1938),b1=c(1942,1938,1944))
a2<-list(d,c)

lapply(1:length(c),function(x) a2$d[,1:2]-a2$c[[x]])



A.K.






----- Original Message -----
From: jimi adams <jadams at american.edu>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Thursday, August 9, 2012 5:42 PM
Subject: Re: [R] indexing in data frames

Thanks. Yes, I got it to work with loops for small data. I was just hoping, given the size of the data.frame (hundreds of thousands) and the length of the lists (varying up to a few hundred) to avoid that if at all possible. Perhaps I'm expecting some behavior that's not feasible?

cheers,
jimi
On 09Aug, 2012, at 17:39 , arun wrote:

            
#
Amazing. You have to create the data frame and then add a variable
containing the list to keep R from checking the number of rows and
objecting:

This does not work
data.frame(b = c(1988, 1989),c = list(c(1985, 1982, 1984),c(1988, 1980)))

Nor this
data.frame(a$b, a$c)

-------
David
#
Other way:

b<-c(1988,1989,1990,1991,1992)
c=list(c(1985,1982,1984),c(1988,1980),c(1983,1984),c(1988,1998,1993),c(1993,1994,1998))
?a<-data.frame(b,c1=cbind(c))
a
???? b??????????????? c
1 1988 1985, 1982, 1984
2 1989?????? 1988, 1980
3 1990?????? 1983, 1984
4 1991 1988, 1998, 1993
5 1992 1993, 1994, 1998
A.K.
On Aug 9, 2012, at 2:43 PM, David L Carlson wrote:

            
Not true:
structure(list(b = c(1988, 1989),
? ? ? ? ? ? ?  c = list(c(1985, 1982, 1984),
? ? ? ? ? ? ? ? ? ? ? ? c(1988, 1980))), .Names = c("b", "c"))
?  a.b
1 1988
2 1989
?  a.b? ? ? ? ? ? ?  cb
1 1988 1985, 1982, 1984
2 1989? ? ?  1988, 1980
'data.frame':??? 2 obs. of? 2 variables:
$ a.b: num? 1988 1989
$ cb :List of 2
? ..$ : num? 1985 1982 1984
? ..$ : num? 1988 1980

But it seems unlikely that the OP's "a" object is a dataframe since the console eval-print loop would not display a dataframe in that manner.

At any rate with the ab dataframe:
[1] 3 6 4
[1] 1 9

The OP should note the need to use '[[' on a list-object to get commensurate classes to pass to the '-' operator.

--david.
David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
On Thu, Aug 9, 2012 at 5:30 PM, arun <smartpink111 at yahoo.com> wrote:
Arun,

I've seen you use this idiom a few times lately and I'd just like to note that

seq_along()

is an (underutilized) primitive and a safer and faster alternative
(avoiding the pathological length(x) = 0 case).

Cheers,
Michael
#
And if you are extremely concerned with speed, do
not compute a$b and a$c in every iteration of the loop.
E.g., change
   lapply(seq_along(a$c),function(x) a$b[x]-a$c[[x]])
to something like
   with(a, lapply(seq_along(c), function(x)b[x] - c[[x]]))

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
On Aug 10, 2012, at 01:36 , William Dunlap wrote:

            
This seems to be working fine as well:
b    c  dif
1 1    2   -1
2 2 1, 2 1, 0