Skip to content

How to use tapply with more than one variables grouped

19 messages · noobmin, Bert Gunter, PIKAL Petr +2 more

#
I'm studying alone the R language for data preparation. I found a course at
MIT for data preparation that uses python but I'm using R to learning. The
first exercise is the preparation of data from a database that shows the
contributions made to candidates for U.S. president. The database is
described in FORMART
ftp://ftp.fec.gov/FEC/Presidential_Map/2012/DATA_DICTIONARIES/CONTRIBUTOR_FORMAT.txt
link. I wonder how to print the table showing how many states are President
Obama the top candidate (by full amount of donations received) with R
language?

I try using tapply method but, i dont understand how to working with more
than one variable grouped. Could anyone help me in advance of the studies?



--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi
How did you use tapply? Did you read help page? It points to ?aggregate which is maybe what you are looking for.

Regards
Petr
#
interTable <-data.frame (Tapply ($ contb_receipt_amt date, list ($ cand_nm
date, $ contbr_st date), sum))

I got create a table with the sum total contribution (contb_receipt_amt) of
each presidential candidate (cand_nm) in each state (contbr_st). How could
from interTable create a table of  states where candidate 'Obama' has
received greater Contribution?

thanks



--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4646985.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi
Greater than what? How does the table look like? Sorry I forgot my crystal ball at home.

Maybe you want to look to

?"]"

Regards
Petr
#
Inline.
On Mon, Oct 22, 2012 at 6:55 AM, PIKAL Petr <petr.pikal at precheza.cz> wrote:
-- or ?"["  rather.  :-)

-- Bert

  
    
#
Hi
THX
Correct, I forgot to check.
Regards
Petr
#
I believe that previously could not be understood. To facilitate'll give you
an example. Assuming my table is presented below with the amount received
from each candidate for president in a particular country state.


                   AL  AR  CA  NY
Doug     250 250 250  NA
Jennifer  20 340 300 100
Michele  250 500 250  60
Obama     15  45 520 600

I would like to list the states where Obama has higher amount received (ie
in CA and NY) and also the number of states, in this case 2. How to do this?

Thanks



--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647111.html
Sent from the R help mailing list archive at Nabble.com.
#
I used these commands previously:

data <- read.csv("test.csv")
AL  AR  CA  NY
Doug     250 250 250  NA
Jennifer  20 340 300 100
Michele  250 500 250  60
Obama     15  45 520 600



--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647122.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi

and what is wrong?

Petr
#
Hi,

If the criteria is to pick which among the following states are the top 2 contributors for each candidate,
dat1<-read.table(text="

????????????????? AL? AR? CA? NY
Doug??? 250 250 250? NA
Jennifer? 20 340 300 100
Michele? 250 500 250? 60
Obama??? 15? 45 520 600
",header=TRUE,stringsAsFactors=FALSE,sep="")

#for Obama
apply(dat1,1,function(x,n) x[which(rank(x)>length(x)-n)],n=2)[4]
#$Obama
# CA? NY 
#520 600 

Your question was to list the states where Obama has higher amount received compared to ??

A.K.

?

----- Original Message -----
From: noobmin <pseudovoid at hotmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, October 23, 2012 7:41 AM
Subject: Re: [R] How to use tapply with more than one variables grouped

I believe that previously could not be understood. To facilitate'll give you
an example. Assuming my table is presented below with the amount received
from each candidate for president in a particular country state.


? ? ? ? ? ? ? ? ?  AL? AR? CA? NY
Doug? ?  250 250 250? NA
Jennifer? 20 340 300 100
Michele? 250 500 250? 60
Obama? ?  15? 45 520 600

I would like to list the states where Obama has higher amount received (ie
in CA and NY) and also the number of states, in this case 2. How to do this?

Thanks



--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647111.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
To take this example I reduced the number of records absurdly. In the
original database there are 48 000 candidates and dozens of states. There is
no way to analyze data visually. I would not put 400 mb of tables here. But
based on the example how could list the states where obama received more
contribution?




--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647175.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi,

Suppose if you have a threshold (say >500), then:
???? 
dat1<-read.table(text="
????????????????? AL? AR? CA? NY
Doug??? 250 250 250? NA
Jennifer? 20 340 300 100
Michele? 250 500 250? 60
Obama??? 15? 45 520 600
",header=TRUE,stringsAsFactors=FALSE,sep="")
?res<-unlist(lapply(split(dat1,rownames(dat1)),function(x) x[x[!is.na(x)]>500]))
?res
Obama.CA Obama.NY 
???? 520????? 600 


# And suppose the threshold is >400
res1<-unlist(lapply(split(dat1,rownames(dat1)),function(x) x[x[!is.na(x)]>400]))
?res1
#Michele.AR?? Obama.CA?? Obama.NY 
? # ??? 500??????? 520??????? 600 

res1[grep("Obama",names(res1))] #amount received for Obama 
#Obama.CA Obama.NY 
? # ? 520????? 600 
?length(res1[grep("Obama",names(res1))])
#[1] 2
A.K.





----- Original Message -----
From: noobmin <pseudovoid at hotmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, October 23, 2012 7:41 AM
Subject: Re: [R] How to use tapply with more than one variables grouped

I believe that previously could not be understood. To facilitate'll give you
an example. Assuming my table is presented below with the amount received
from each candidate for president in a particular country state.


? ? ? ? ? ? ? ? ?  AL? AR? CA? NY
Doug? ?  250 250 250? NA
Jennifer? 20 340 300 100
Michele? 250 500 250? 60
Obama? ?  15? 45 520 600

I would like to list the states where Obama has higher amount received (ie
in CA and NY) and also the number of states, in this case 2. How to do this?

Thanks



--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647111.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
The criteria is to list where Obama has a higher number of contributions. The
table shows the number of contribution that each presidential candidate
received in a state of the country.

The table shown is an example, the query should be generic to a database
with hundreds of candidates and dozens of states of the country. The
original base has 450 mb, in real database I'm don't know  how many contry
states Obama has more donations, but in sample is in CA and NA. Michelle
wins more contribution on AR....

Thanks



--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647196.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi,

Your question is not clear.

Suppose if you want to find the highest two contributions for each candidate:

dat1<-read.table(text="
????????????????? AL? AR? CA? NY
Doug??? 250 250 250? NA
Jennifer? 20 340 300 100
Michele? 250 500 250? 60
Obama??? 15? 45 520 600
",header=TRUE,stringsAsFactors=FALSE,sep="")

res1<-unlist(lapply(split(dat1,rownames(dat1)),function(x) tail(apply(x,1,sort),2)))
nam1<-unlist(lapply(lapply(split(dat1,rownames(dat1)),function(x) tail(apply(x,1,sort),2)),function(x) dimnames(x)[1]),use.names=F)
names(res1)<-paste(names(res1),nam1,sep="_")
names(res1)<-gsub("\\d+","",names(res1))
res1
? #? Doug_AR???? Doug_CA Jennifer_CA Jennifer_AR? Michele_CA? Michele_AR 
??? # ?? 250???????? 250???????? 300???????? 340???????? 250???????? 500 
? # Obama_CA??? Obama_NY 
??? # ?? 520???????? 600 


#Contribution for Obama
res1[grep("Obama",names(res1))]
#Obama_CA Obama_NY 
? # ? 520????? 600 

A.K.



----- Original Message -----
From: noobmin <pseudovoid at hotmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, October 23, 2012 12:48 PM
Subject: Re: [R] How to use tapply with more than one variables grouped

To take this example I reduced the number of records absurdly. In the
original database there are 48 000 candidates and dozens of states. There is
no way to analyze data visually. I would not put 400 mb of tables here. But
based on the example how could list the states where obama received more
contribution?




--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647175.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
You're posting on Nabble, so we don't see earlier messages in the thread here.

-- Bert
On Tue, Oct 23, 2012 at 11:36 AM, noobmin <pseudovoid at hotmail.com> wrote:

  
    
#
AL  AR  CA  NY 
Doug    250 250 250  NA 
Jennifer  20 340 300 100 
Michele  250 500 250  60 
Obama    15  45 520 600 

My English is not very good, I'll try again. I want to list ALL states in
the country where Obama had greater contribution. The table above shows the
total contribution received by each candidate in a given state. To AL state
obama not received more than Doug. For the AR state he received no more than
others candidates. For the CA state he received a total of $ 520, which is
520>300>250>=250 and should be selected. In NY also had the largest
contribution, $ 600, 600>100>60 and should therefore be selected.

I want to make it to the N presidency candidates and M states of the
country. The table above is only an example.

Sorry again, for me it was clear. = ( 
Thanks



--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647220.html
Sent from the R help mailing list archive at Nabble.com.
#
On Oct 23, 2012, at 1:25 PM, noobmin wrote:

            
Perhaps:
AL    AR    CA    NY 
FALSE FALSE  TRUE  TRUE 

Or perhaps:
[1] "CA" "NY"
David Winsemius, MD
Alameda, CA, USA
#
Hi,

Just a modification of David's method:

apply(dat1,2,function(x) names(which.max(x[!is.na(x)]))=="Obama")
#?? AL??? AR??? CA??? NY 
#FALSE FALSE? TRUE? TRUE 
names(dat1)[apply(dat1,2,function(x) names(which.max(x[!is.na(x)]))=="Obama")] 
#[1] "CA" "NY"
A.K.



----- Original Message -----
From: David Winsemius <dwinsemius at comcast.net>
To: noobmin <pseudovoid at hotmail.com>
Cc: r-help at r-project.org
Sent: Tuesday, October 23, 2012 8:45 PM
Subject: Re: [R] How to use tapply with more than one variables grouped
On Oct 23, 2012, at 1:25 PM, noobmin wrote:

            
Perhaps:
?  AL? ? AR? ? CA? ? NY 
FALSE FALSE? TRUE? TRUE 

Or perhaps:
[1] "CA" "NY"
David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.