Skip to content

creating conditional means

7 messages · Moshe Olshansky, Sherri Heck, Gabor Grothendieck

#
Hi all-

I have a dataset (year, month, hour, co2(ppm), num1,num2)


[49,] 2006   11    0 383.3709   28   28
[50,] 2006   11    1 383.3709   28   28
[51,] 2006   11    2 383.3709   28   28
[52,] 2006   11    3 383.3709   28   28
[53,] 2006   11    4 383.3709   28   28
[54,] 2006   11    5 383.3709   28   28
[55,] 2006   11    6 383.3709   28   28
[56,] 2006   11    7 383.3709   28   28
[57,] 2006   11    8 383.3709   28   28
[58,] 2006   11    9 383.3709   27   27
[59,] 2006   11   10 383.3709   28   28

that repeats in this style for each month.  I would like to compute the 
mean for each hour in three month intervals.
i.e.  average all 2pms for each day for months march, april and may. and 
then do this for each hour interval.
i have been messing around with 'for loops' but can't seem to get the 
output I want.

thanks in advance for any help-

s.heck
CU, Boulder
#
Try aggregate:


Lines <- "Year Month Hour co2 num1 num2
 2006   11    0 383.3709   28   28
 2006   11    1 383.3709   28   28
 2006   11    2 383.3709   28   28
 2006   11    3 383.3709   28   28
 2006   11    4 383.3709   28   28
 2006   11    5 383.3709   28   28
 2006   11    6 383.3709   28   28
 2006   11    7 383.3709   28   28
 2006   11    8 383.3709   28   28
 2006   11    9 383.3709   27   27
 2006   11   10 383.3709   28   28
"
DF <- read.table(textConnection(Lines), header = TRUE)
aggregate(DF[4:6],
   with(DF, data.frame(Year, Qtr = (Month - 1) %/% 3 + 1, Hour)),
   mean)
On Dec 1, 2007 3:57 PM, Sherri Heck <sheck at ucar.edu> wrote:
#
Hi Gabor,

Thank you for your help.  I think I need to clarify a bit more.  I am 
trying to say

average all 2pms for months march + april + may (for example). I hope this is clearer.  

here's a larger subset of my data set:

year, month, hour, co2(ppm), num1,num2

2006 1 0 384.2055 14 14
2006 1 1 384.0304 14 14
2006 1 2 383.9672 14 14
2006 1 3 383.8452 14 14
2006 1 4 383.8594 14 14
2006 1 5 383.7318 14 14
2006 1 6 383.6439 14 14
2006 1 7 383.7019 14 14
2006 1 8 383.7487 14 14
2006 1 9 383.8376 14 14
2006 1 10 383.8684 14 14
2006 1 11 383.8301 14 14
2006 1 12 383.8058 14 14
2006 1 13 383.9419 14 14
2006 1 14 383.7876 14 14
2006 1 15 383.7744 14 14
2006 1 16 383.8566 14 14
2006 1 17 384.1014 14 14
2006 1 18 384.1312 14 14
2006 1 19 384.1551 14 14
2006 1 20 384.099 14 14
2006 1 21 384.1408 14 14
2006 1 22 384.3637 14 14
2006 1 23 384.1491 14 14
2006 2 0 384.7082 27 27
2006 2 1 384.6139 27 27
2006 2 2 384.7453 26 26
2006 2 3 384.9224 28 28
2006 2 4 384.8581 28 28
2006 2 5 384.9208 28 28
2006 2 6 384.9086 28 28
2006 2 7 384.837 28 28
2006 2 8 384.6163 27 27
2006 2 9 384.7406 28 28
2006 2 10 384.7468 28 28
2006 2 11 384.6992 28 28
2006 2 12 384.6388 28 28
2006 2 13 384.6346 28 28
2006 2 14 384.6037 28 28
2006 2 15 384.5295 28 28
2006 2 16 384.5654 28 28
2006 2 17 384.6466 28 28
2006 2 18 384.6344 28 28
2006 2 19 384.5911 28 28
2006 2 20 384.6084 28 28
2006 2 21 384.6318 28 28
2006 2 22 384.6181 27 27
2006 2 23 384.6087 27 27


thanks you again for your assistance-

s.heck
Gabor Grothendieck wrote:
#
Just adjust the formula for Qtr appropriately if your quarters
are not Jan/Feb/Mar, Apr/May/Jun, Jul/Aug/Sep, Oct/Nov/Dec
as I assumed.
On Dec 1, 2007 5:21 PM, Sherri Heck <sheck at ucar.edu> wrote:
1 day later
#
Following Gabor's suggestion, if x is your data.frame
you can do 

y <- x[x$month %in% c(3,4,5),]
aggregate(y[,4:6],list(y$hour),mean)
--- Sherri Heck <sheck at ucar.edu> wrote:

            
3 days later
#
hi gabor,

i was able to get your suggestion to work.  i have been going through 
the R help tools to figure out what each step actually does because i 
have something similar but hours 2,5,8,11,14,17 and 20 are missing.  i 
haven't had any luck.  each "mean value" that is calculated is the 
same.  i keep getting the following error:

"> DF<- read.table(textConnection(Lines), header = TRUE)
Error in read.table(textConnection(Lines), header = TRUE) :
        duplicate 'row.names' are not allowed
 >   aggregate(DF[2:4],
+    with(DF, data.frame(Year, Qtr = (Month - 3) %/% 3 + 1, Hour)),
+    mean)    #skip=hour[2,5,8,11,14]
Error in data.frame(Year, Qtr = (Month - 3)%/%3 + 1, Hour) :
        object "Year" not found
"

i am not clear why in "aggregate(DF[#:#]" that we are subsetting other 
variables besides co2.  i have been trying to just subset co2 without 
success though.
your original suggestion is below and a snippet of my data set is below 
that. if you have any ideas  or if you know of a help page that i may 
not have found yet that would be great (i've been using the "aggregate" 
help pages mostly.

thanks for your help-

s.heck



Lines <- "Year Month Hour co2 num1 num2
 2006   11    0 383.3709   28   28
 2006   11    1 383.3709   28   28
 2006   11    2 383.3709   28   28
 2006   11    3 383.3709   28   28
 2006   11    4 383.3709   28   28
 2006   11    5 383.3709   28   28
 2006   11    6 383.3709   28   28
 2006   11    7 383.3709   28   28
 2006   11    8 383.3709   28   28
 2006   11    9 383.3709   27   27
 2006   11   10 383.3709   28   28
"
DF <- read.table(textConnection(Lines), header = TRUE)
aggregate(DF[4:6],
   with(DF, data.frame(Year, Qtr = (Month - 1) %/% 3 + 1, Hour)),
   mean)			#skip=hour[2,5,8,11,14,17,20]???


 



Year Month Hour co2
2005    1    0    386.1600708
2005    1    1    386.823056
2005    1    3    387.1335939
2005    1    4    387.0681103
2005    1    6    387.4750983
2005    1    7    388.3398313
2005    1    9    388.7545317
2005    1    10    388.0844451
2005    1    12    386.7929627
2005    1    13    385.5569521
2005    1    15    384.5523752
2005    1    16    385.0246721
2005    1    18    385.8646669
2005    1    19    386.2182493
2005    1    21    386.4820756
2005    1    22    386.6606276
2005    2    0    386.6791667
2005    2    1    386.6597544
2005    2    3    386.5725303
2005    2    4    387.0638611
2005    2    6    387.9293508
2005    2    7    388.3778991
2005    2    9    388.3721947
2005    2    10    387.8324642
2005    2    12    386.8404892
2005    2    13    385.6770345
2005    2    15    384.4798484
2005    2    16    384.6214677
2005    2    18    384.3044105
2005    2    19    383.3018709
2005    2    21    382.5837339
2005    2    22    382.2658036
Gabor Grothendieck wrote:
#
The error message says you have duplicate row names and that
is not allowed.  Make sure you have the same number of elements
on each line of data as in the header.  If you have one more on each line
than on the header then the first data item on each line will be regarded
as the row name.  See ?count.fields

The rest of your message is not clear.
On Dec 6, 2007 11:52 AM, Sherri Heck <sheck at ucar.edu> wrote: