monthly median in a daily dataset
On 21.12.2010 08:15, SNV Krishna wrote:
Hi Dennis, I am looking for similar function and this post is useful. But a strange thing is happening when I try which I couldn't figure out (details below). Could you or anyone help me understand why this is so?
df = data.frame(date = seq(as.Date("2010-1-1"), by = "days", length =
250))
df$value = cumsum(rnorm(1:250))
When I use the statement (as given in ?aggregate help file) the following error is displayed
aggregate(df$value, by = months(df$date), FUN = median)
Error in aggregate.data.frame(as.data.frame(x), ...) : 'by' must be a list
The error message is quite helpful, you need a list of all the elements you'd have after the "~" in a formula, in this case only the date: aggregate(df$value, by = list(date = months(df$date)), FUN = median)
But it works when I use as was suggested
aggregate(value~months(date), data = df, FUN = median)
months(date) value 1 April 15.5721440 2 August -0.1261205 3 February -1.0230631 4 January -0.9277885 5 July -2.1890907 6 June 1.3045260 7 March 11.4126371 8 May 2.1625091 The second question, is it possible to have the median across the months and years. Say I have daily data for last five years the above function will give me the median of Jan of all the five years, while I want Jan-2010, Jan-2009 and so... Wish my question is clear.
Just use Year-Month as the grouping criterion as follows: aggregate(x=df$value, by = list(date = format(df$date, "%Y-%m")), FUN = median) Uwe Ligges
Any assistance will be greatly appreciated and many thanks for the same.
Regards,
Krishna
Date: Sun, 19 Dec 2010 15:42:15 -0800
From: Dennis Murphy<djmuser at gmail.com>
To: HUXTERE<emilyhuxter at gmail.com>
Cc: r-help at r-project.org
Subject: Re: [R] monthly median in a daily dataset
Message-ID:
<AANLkTimXTJhbsE1mq4o121fEKXTf8d1pSYEEgzKKzdYu at mail.gmail.com>
Content-Type: text/plain
Hi:
There is a months() function associated with Date objects, so you should be
able to do something like
aggregate(value ~ months(date), data = data$flow$daily, FUN = median)
Here's a toy example because your data are not in a ready form:
df<- data.frame(date = seq(as.Date('2010-01-01'), by = 'days', length =
250),
val = rnorm(250))
aggregate(val ~ months(date), data = df, FUN = median)
months(date) val 1 April -0.18864817 2 August -0.16203705 3 February 0.03671700 4 January 0.04500988 5 July -0.12753151 6 June 0.09864811 7 March 0.23652105 8 May 0.25879994 9 September 0.53570764 HTH, Dennis On Sun, Dec 19, 2010 at 2:31 PM, HUXTERE<emilyhuxter at gmail.com> wrote:
Hello, I have a multi-year dataset (see below) with date, a data value and a flag for the data value. I want to find the monthly median for each month in this dataset and then plot it. If anyone has suggestions they would be greatly apperciated. It should be noted that there are some dates with no values and they should be removed. Thanks Emily
print ( str(data$flow$daily) )
'data.frame': 16071 obs. of 3 variables: $ date :Class 'Date' num [1:16071] -1826 -1825 -1824 -1823 -1822 ... $ value: num NA NA NA NA NA NA NA NA NA NA ... $ flag : chr "" "" "" "" ... NULL 520 2008-11-01 0.034 1041 2008-11-02 0.034 1562 2008-11-03 0.034 2083 2008-11-04 0.038 2604 2008-11-05 0.036 3125 2008-11-06 0.035 3646 2008-11-07 0.036 4167 2008-11-08 0.039 4688 2008-11-09 0.039 5209 2008-11-10 0.039 5730 2008-11-11 0.038 6251 2008-11-12 0.039 6772 2008-11-13 0.039 7293 2008-11-14 0.038 7814 2008-11-15 0.037 8335 2008-11-16 0.037 8855 2008-11-17 0.037 9375 2008-11-18 0.037 9895 2008-11-19 0.034 B 10415 2008-11-20 0.034 B 10935 2008-11-21 0.033 B 11455 2008-11-22 0.034 B 11975 2008-11-23 0.034 B 12495 2008-11-24 0.034 B 13016 2008-11-25 0.034 B 13537 2008-11-26 0.033 B 14058 2008-11-27 0.033 B 14579 2008-11-28 0.033 B 15068 2008-11-29 0.034 B 15546 2008-11-30 0.035 B 521 2008-12-01 0.035 B 1042 2008-12-02 0.034 B 1563 2008-12-03 0.033 B 2084 2008-12-04 0.031 B 2605 2008-12-05 0.031 B 3126 2008-12-06 0.031 B 3647 2008-12-07 0.032 B 4168 2008-12-08 0.032 B 4689 2008-12-09 0.032 B 5210 2008-12-10 0.033 B 5731 2008-12-11 0.033 B 6252 2008-12-12 0.032 B 6773 2008-12-13 0.031 B 7294 2008-12-14 0.030 B 7815 2008-12-15 0.030 B 8336 2008-12-16 0.029 B 8856 2008-12-17 0.028 B 9376 2008-12-18 0.028 B 9896 2008-12-19 0.028 B 10416 2008-12-20 0.027 B 10936 2008-12-21 0.027 B 11456 2008-12-22 0.028 B 11976 2008-12-23 0.028 B 12496 2008-12-24 0.029 B 13017 2008-12-25 0.029 B 13538 2008-12-26 0.029 B 14059 2008-12-27 0.030 B 14580 2008-12-28 0.030 B 15069 2008-12-29 0.030 B 15547 2008-12-30 0.031 B 15851 2008-12-31 0.031 B -- View this message in context:
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
------------------------------
Message: 35
Date: Mon, 20 Dec 2010 00:03:24 +0000
From: "Enrico R. Crema"<enryu_crema at yahoo.it>
To: r-help at r-project.org
Subject: [R] Time Series of Histograms
Content-Type: text/plain; charset=us-ascii
Dear List,
I have a set of distributions recorded at an equal interval of time and I
would like to plot them as series of horizontal histograms (with the x-axis
representing time, and y-axis representing the bins) since the distribution
shifts from unimodal to multimodal in several occasions. What I would like
to see is something close to a violinplot, but I do not want a kernel
density estimate...
[[elided Yahoo spam]]
Thanks in Advance,
Enrico
------------------------------
Message: 36
Date: Mon, 20 Dec 2010 00:21:02 +0000
From: Paolo Rossi<statmailinglists at googlemail.com>
To: r-help at r-project.org
Subject: [R] Turning a Variable into String
Message-ID:
<AANLkTi=fa+982ZNie+z-iDwURvq8Ee5zJoQ7OPXLhMK6 at mail.gmail.com>
Content-Type: text/plain
I would like to know how to turn a variable into a string. I have tried
as.symbol and as.name but it doesnt work for what I'd like to do
Essentially, I'd like to feed the function below with two variables. This
works fine in the bit working out number of elements in each variable.
In the print(sprintf("OK with %s and %s\n", var1, var2)) line I would like
var1 and var2 to be magically substituted with a string containing the name
of var1 and name of var2.
Thanks in advance
Paolo
haveSameLength<- function(var1, var2) {
if (length(var1)==length(var2))
{
print(sprintf("OK with %s and %s\n", var1, var2))
} else {
print("Problems!!")
}
}
[[alternative HTML version deleted]]
------------------------------
Message: 37
Date: Sun, 19 Dec 2010 16:30:38 -0800 (PST)
From: Phil Spector<spector at stat.berkeley.edu>
To: Paolo Rossi<statmailinglists at googlemail.com>
Cc: r-help at r-project.org
Subject: Re: [R] Turning a Variable into String
Message-ID:
<alpine.DEB.2.00.1012191627170.26060 at springer.Berkeley.EDU>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Paolo -
One way to make the function do what you want is to replace
the line
print(sprintf("OK with %s and %s\n", var1, var2))
with
cat('OK with',substitute(var1),'and',substitute(var2),'\n')
With sprintf, you'd need
print(sprintf("OK with %s and %s\n", deparse(substitute(var1)),
deparse(substitute(var2))))
but since you're just printing the string returned by sprintf, I'd
go with cat.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Mon, 20 Dec 2010, Paolo Rossi wrote:
I would like to know how to turn a variable into a string. I have tried
as.symbol and as.name but it doesnt work for what I'd like to do
Essentially, I'd like to feed the function below with two variables. This
works fine in the bit working out number of elements in each variable.
In the print(sprintf("OK with %s and %s\n", var1, var2)) line I would
like
var1 and var2 to be magically substituted with a string containing the
name
of var1 and name of var2.
Thanks in advance
Paolo
haveSameLength<- function(var1, var2) {
if (length(var1)==length(var2))
{
print(sprintf("OK with %s and %s\n", var1, var2))
} else {
print("Problems!!")
}
}
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------ Message: 38 Date: Sun, 19 Dec 2010 19:35:28 -0500 From: Duncan Murdoch<murdoch.duncan at gmail.com> To: Paolo Rossi<statmailinglists at googlemail.com> Cc: r-help at r-project.org Subject: Re: [R] Turning a Variable into String Message-ID:<4D0EA4D0.10507 at gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 19/12/2010 7:21 PM, Paolo Rossi wrote: I would like to know how to turn a variable into a string. I have tried as.symbol and as.name but it doesnt work for what I'd like to do Essentially, I'd like to feed the function below with two variables. This works fine in the bit working out number of elements in each variable. In the print(sprintf("OK with %s and %s\n", var1, var2)) line I would like var1 and var2 to be magically substituted with a string containing the name of var1 and name of var2. The name of var1 is var1, so I assume you mean the expression passed to your function and bound to var1. In that case, what you want is deparse(substitute(var1)) Watch out: if the expression is really long, that can be a vector with more than one element. See ?deparse for ways to deal with that. Duncan Murdoch Thanks in advance Paolo haveSameLength<- function(var1, var2) { if (length(var1)==length(var2)) { print(sprintf("OK with %s and %s\n", var1, var2)) } else { print("Problems!!") } } [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------ Message: 39 Date: Sun, 19 Dec 2010 20:11:58 -0500 From: Duncan Murdoch<murdoch.duncan at gmail.com> To: Jeff Breiwick<jeff.breiwick at noaa.gov> Cc: r-help at stat.math.ethz.ch Subject: Re: [R] system/system2 command Message-ID:<4D0EAD5E.5060006 at gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 17/12/2010 4:36 PM, Jeff Breiwick wrote: All, I had a simple function call I used to open up a dos shell running R under Win XP: system("cmd.exe", wait=FALSE, invisible=FALSE). This does not work with R 2.12.1 - I get a window that briefly flashes open but then disappears. Does anyone know the method to open a DOS command window in running R with Win XP? Thank you. This is a new bug in 2.12.1, which I am about to fix in R-patched. The problem was that it was passing a null input stream to cmd.exe, which saw an immediate EOF, and quit. A similar thing happened in Rterm, where system("cmd") should drop into a command shell in the same window, but it would immediately exit. Duncan Murdoch ------------------------------ Message: 40 Date: Sun, 19 Dec 2010 17:47:20 -0800 From: Dennis Murphy<djmuser at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] Time Series of Histograms Message-ID: <AANLkTiknfHmEuaHp_7GirmpNKB=eMKSUeDNiE7jgh6Q_ at mail.gmail.com> Content-Type: text/plain Hi: You can get a violin plot in lattice rather straightforwardly. It's easiest if time is an ordered factor, but you can also do it if time is numeric; in the latter case, the code associated with Figure 10.14 in the Lattice book provides a template to start with: http://lmdvr.r-forge.r-project.org/figures/figures.html To get horizontal violin plots, use time as the y variable and start by replacing panel.boxplot with panel.violin; see the help page of the latter if more specific options are required. It also contains an example using a panel function. I don't know how you expect to get horizontal histograms without setting the time variable to be a factor. If you have enough time periods, the result will not be pretty. If you have a fairly large number of time periods, the best distributional displays are boxplots, violin plots, beanplots or some variation of that general concept. Since neither data nor code were offered, one can only speculate so far as to what your intentions might be. A reproducible example with data and code would undoubtedly elicit more useful responses. HTH, Dennis On Sun, Dec 19, 2010 at 4:03 PM, Enrico R. Crema Dear List, I have a set of distributions recorded at an equal interval of time and I would like to plot them as series of horizontal histograms (with the x-axis representing time, and y-axis representing the bins) since the distribution shifts from unimodal to multimodal in several occasions. What I would like to see is something close to a violinplot, but I do not want a kernel density estimate... [[elided Yahoo spam]] Thanks in Advance, Enrico ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ------------------------------ Message: 41 Date: Mon, 20 Dec 2010 02:04:22 +0000 To: Dennis Murphy<djmuser at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] Time Series of Histograms Content-Type: text/plain Many Thanks Dennis, The distributions are simulated ordinal data all bounded in the same upper and lower limit, and I wanted to plot how the distribution changes through time. Since the distributions are often multimodal boxplots were not useful so I made some violinplots... My practical solution which I'm testing right now is to create a matrix of frequencies and then plot these as a series of horrizontal barplots (after normalising each distribution) , using the offset parameter to control the temporal sequence....It actually works fine, but I was wondering if there were better ways... Enrico On 20 Dec 2010, at 01:47, Dennis Murphy wrote: Hi: You can get a violin plot in lattice rather straightforwardly. It's easiest if time is an ordered factor, but you can also do it if time is numeric; in the latter case, the code associated with Figure 10.14 in the Lattice book provides a template to start with: http://lmdvr.r-forge.r-project.org/figures/figures.html To get horizontal violin plots, use time as the y variable and start by replacing panel.boxplot with panel.violin; see the help page of the latter if more specific options are required. It also contains an example using a panel function. I don't know how you expect to get horizontal histograms without setting the time variable to be a factor. If you have enough time periods, the result will not be pretty. If you have a fairly large number of time periods, the best distributional displays are boxplots, violin plots, beanplots or some variation of that general concept. Since neither data nor code were offered, one can only speculate so far as to what your intentions might be. A reproducible example with data and code would undoubtedly elicit more useful responses. HTH, Dennis wrote: Dear List, I have a set of distributions recorded at an equal interval of time and I would like to plot them as series of horizontal histograms (with the x-axis representing time, and y-axis representing the bins) since the distribution shifts from unimodal to multimodal in several occasions. What I would like to see is something close to a violinplot, but I do not want a kernel density estimate... [[elided Yahoo spam]] Thanks in Advance, Enrico ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ------------------------------ Message: 42 Date: Sun, 19 Dec 2010 21:11:15 -0500 From: Jorge Ivan Velez<jorgeivanvelez at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] Time Series of Histograms Message-ID: <AANLkTikp5Zr3_AMJ7uGeHnWRuovW1ddnja2JXPhpD6Uu at mail.gmail.com> Content-Type: text/plain Hi Enrico, Is this close to what you want to do? http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=109 HTH, Jorge On Sun, Dec 19, 2010 at 7:03 PM, Enrico R. Crema<> wrote: Dear List, I have a set of distributions recorded at an equal interval of time and I would like to plot them as series of horizontal histograms (with the x-axis representing time, and y-axis representing the bins) since the distribution shifts from unimodal to multimodal in several occasions. What I would like to see is something close to a violinplot, but I do not want a kernel density estimate... [[elided Yahoo spam]] Thanks in Advance, Enrico ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ------------------------------ Message: 43 Date: Mon, 20 Dec 2010 13:17:59 +1100 From:<Bill.Venables at csiro.au> To:<emilyhuxter at gmail.com>,<r-help at r-project.org> Subject: Re: [R] monthly median in a daily dataset Message-ID: <1BDAE2969943D540934EE8B4EF68F95FB27A44FE07 at EXNSW-MBX03.nexus.csiro.au> Content-Type: text/plain; charset="us-ascii" I find this function useful for digging out months from Date objects Month<- function(date, ...) factor(month.abb[as.POSIXlt(date)$mon + 1], levels = month.abb) For this little data set below this is what it gives with(data, tapply(value, Month(date), median, na.rm = TRUE)) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec NA NA NA NA NA NA NA NA NA NA 0.035 0.030 Here is another useful little one: Year<- function(date, ...) as.POSIXlt(date)$year + 1900 So if you wanted the median by year and month you could do with(data, tapply(value, list(Year(date), Month(date)), median, na.rm = TRUE)) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2008 NA NA NA NA NA NA NA NA NA NA 0.035 0.03 (The result is a matrix, which in this case has only one row, of course.) See how you go. Bill Venables. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of HUXTERE Sent: Monday, 20 December 2010 8:32 AM To: r-help at r-project.org Subject: [R] monthly median in a daily dataset Hello, I have a multi-year dataset (see below) with date, a data value and a flag for the data value. I want to find the monthly median for each month in this dataset and then plot it. If anyone has suggestions they would be greatly apperciated. It should be noted that there are some dates with no values and they should be removed. Thanks Emily print ( str(data$flow$daily) ) 'data.frame': 16071 obs. of 3 variables: $ date :Class 'Date' num [1:16071] -1826 -1825 -1824 -1823 -1822 ... $ value: num NA NA NA NA NA NA NA NA NA NA ... $ flag : chr "" "" "" "" ... NULL 520 2008-11-01 0.034 1041 2008-11-02 0.034 1562 2008-11-03 0.034 2083 2008-11-04 0.038 2604 2008-11-05 0.036 3125 2008-11-06 0.035 3646 2008-11-07 0.036 4167 2008-11-08 0.039 4688 2008-11-09 0.039 5209 2008-11-10 0.039 5730 2008-11-11 0.038 6251 2008-11-12 0.039 6772 2008-11-13 0.039 7293 2008-11-14 0.038 7814 2008-11-15 0.037 8335 2008-11-16 0.037 8855 2008-11-17 0.037 9375 2008-11-18 0.037 9895 2008-11-19 0.034 B 10415 2008-11-20 0.034 B 10935 2008-11-21 0.033 B 11455 2008-11-22 0.034 B 11975 2008-11-23 0.034 B 12495 2008-11-24 0.034 B 13016 2008-11-25 0.034 B 13537 2008-11-26 0.033 B 14058 2008-11-27 0.033 B 14579 2008-11-28 0.033 B 15068 2008-11-29 0.034 B 15546 2008-11-30 0.035 B 521 2008-12-01 0.035 B 1042 2008-12-02 0.034 B 1563 2008-12-03 0.033 B 2084 2008-12-04 0.031 B 2605 2008-12-05 0.031 B 3126 2008-12-06 0.031 B 3647 2008-12-07 0.032 B 4168 2008-12-08 0.032 B 4689 2008-12-09 0.032 B 5210 2008-12-10 0.033 B 5731 2008-12-11 0.033 B 6252 2008-12-12 0.032 B 6773 2008-12-13 0.031 B 7294 2008-12-14 0.030 B 7815 2008-12-15 0.030 B 8336 2008-12-16 0.029 B 8856 2008-12-17 0.028 B 9376 2008-12-18 0.028 B 9896 2008-12-19 0.028 B 10416 2008-12-20 0.027 B 10936 2008-12-21 0.027 B 11456 2008-12-22 0.028 B 11976 2008-12-23 0.028 B 12496 2008-12-24 0.029 B 13017 2008-12-25 0.029 B 13538 2008-12-26 0.029 B 14059 2008-12-27 0.030 B 14580 2008-12-28 0.030 B 15069 2008-12-29 0.030 B 15547 2008-12-30 0.031 B 15851 2008-12-31 0.031 B