Hi, I am new to R so I would appreciate any help. I have some data that has passenger flight data between city pairs. The way I got the data, there are multiple rows of data for each city pair; the number of passengers needs to be summed to get a TOTAL annual passenger count for each city pair. So my question is: how do I create a new table (or data frame) that selectively sums My initial thought would be to iterate through each row with the following logic: 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add them to the table 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the passengers (and do not add a new row) Is this logical? If so, I think I just need some help on syntax (or do I use a script?). Thanks. The first few rows of data look like this: -- View this message in context: http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question-tp3933512p3933512.html Sent from the R help mailing list archive at Nabble.com.
How to selectively sum rows [Beginner question]
5 messages · asindc, jim holtman, Aaron Siirila +1 more
It would be good to follow the posting guide and at least supply a sample of the data. Most likely 'tapply' is one way of doing it: tapply(df$passenger, list(df$orig, df$dest), sum)
On Mon, Oct 24, 2011 at 11:27 AM, asindc <siirilaa at eastwestcenter.org> wrote:
Hi, I am new to R so I would appreciate any help. I have some data that has passenger flight data between city pairs. The way I got the data, there are multiple rows of data for each city pair; the number of passengers needs to be summed to get a TOTAL annual passenger count for each city pair. So my question is: how do I create a new table (or data frame) that selectively sums My initial thought would be to iterate through each row with the following logic: 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add them to the table 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the passengers (and do not add a new row) Is this logical? If so, I think I just need some help on syntax (or do I use a script?). Thanks. The first few rows of data look like this: -- View this message in context: http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question-tp3933512p3933512.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Sorry, I attempted to paste the sample data but it must have been stripped out when I posted. It is hopefully now listed below. tapply looks useful. I will check it out further. Here's the sample data:
flights[1:10,]
PASSENGERS DISTANCE ORIGIN ORIGIN_CITY_NAME ORIGIN_WAC DEST DEST_CITY_NAME DEST_WAC YEAR 1 17266 5995 LAX Los Angeles, CA 91 ICN Seoul, South Korea 778 2010 2 16934 5995 LAX Los Angeles, CA 91 ICN Seoul, South Korea 778 2010 3 15470 5995 LAX Los Angeles, CA 91 ICN Seoul, South Korea 778 2010 4 13997 5995 ICN Seoul, South Korea 778 LAX Los Angeles, CA 91 2010 5 13738 5995 LAX Los Angeles, CA 91 ICN Seoul, South Korea 778 2010 6 13682 5995 LAX Los Angeles, CA 91 ICN Seoul, South Korea 778 2010 7 13187 5995 ICN Seoul, South Korea 778 LAX Los Angeles, CA 91 2010 8 13051 5995 LAX Los Angeles, CA 91 ICN Seoul, South Korea 778 2010 9 12761 1940 SPN Saipan, TT 5 ICN Seoul, South Korea 778 2010 10 12419 5995 ICN Seoul, South Korea 778 LAX Los Angeles, CA 91 2010 Thanks, Aaron -----Original Message----- From: jim holtman [mailto:jholtman at gmail.com] Sent: Monday, October 24, 2011 11:58 AM To: asindc Cc: r-help at r-project.org Subject: Re: [R] How to selectively sum rows [Beginner question] It would be good to follow the posting guide and at least supply a sample of the data. Most likely 'tapply' is one way of doing it: tapply(df$passenger, list(df$orig, df$dest), sum) On Mon, Oct 24, 2011 at 11:27 AM, asindc <siirilaa at eastwestcenter.org> wrote:
Hi, I am new to R so I would appreciate any help. I have some data that
has
passenger flight data between city pairs. The way I got the data, there
are
multiple rows of data for each city pair; the number of passengers needs
to
be summed to get a TOTAL annual passenger count for each city pair. So my question is: how do I create a new table (or data frame) that selectively sums My initial thought would be to iterate through each row with the following logic: 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add them to the table 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the passengers (and do not add a new row) Is this logical? If so, I think I just need some help on syntax (or do I
use
a script?). Thanks. The first few rows of data look like this: -- View this message in context:
http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question- tp3933512p3933512.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
See the count() function in the plyr package; it does fast summation.
Something like
library('plyr')
count(passengerData, c('ORIGIN_WAC', 'DEST_WAC'), 'npassengers')
HTH,
Dennis
On Mon, Oct 24, 2011 at 8:27 AM, asindc <siirilaa at eastwestcenter.org> wrote:
Hi, I am new to R so I would appreciate any help. I have some data that has passenger flight data between city pairs. The way I got the data, there are multiple rows of data for each city pair; the number of passengers needs to be summed to get a TOTAL annual passenger count for each city pair. So my question is: how do I create a new table (or data frame) that selectively sums My initial thought would be to iterate through each row with the following logic: 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add them to the table 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the passengers (and do not add a new row) Is this logical? If so, I think I just need some help on syntax (or do I use a script?). Thanks. The first few rows of data look like this: -- View this message in context: http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question-tp3933512p3933512.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The count() function in the plyr package works beautifully. Thanks to Jim,
Rainer and Dennis for your help.
Best.
-----Original Message-----
From: Dennis Murphy [mailto:djmuser at gmail.com]
Sent: Monday, October 24, 2011 12:05 PM
To: asindc
Cc: r-help at r-project.org
Subject: Re: [R] How to selectively sum rows [Beginner question]
See the count() function in the plyr package; it does fast summation.
Something like
library('plyr')
count(passengerData, c('ORIGIN_WAC', 'DEST_WAC'), 'npassengers')
HTH,
Dennis
On Mon, Oct 24, 2011 at 8:27 AM, asindc <siirilaa at eastwestcenter.org> wrote:
Hi, I am new to R so I would appreciate any help. I have some data that
has
passenger flight data between city pairs. The way I got the data, there
are
multiple rows of data for each city pair; the number of passengers needs
to
be summed to get a TOTAL annual passenger count for each city pair. So my question is: how do I create a new table (or data frame) that selectively sums My initial thought would be to iterate through each row with the following logic: 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add them to the table 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the passengers (and do not add a new row) Is this logical? If so, I think I just need some help on syntax (or do I
use
a script?). Thanks. The first few rows of data look like this: -- View this message in context:
http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question- tp3933512p3933512.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.