[The data setting in the last email might be faulty]
Dear useRs,
I have the following dataset which represents rainfall data at a 5-minute interval from 1 May 2021 to 30 September 2021.
Hi Eliza,
It sure was:
YY$datetime<-strptime(YY$TIMESTAMP,"%Y/%m/%d %I:%M:%S %p")
dt5min<-seq(ISOdate(2021,5,1,0,5),ISOdate(2021,5,31,12,55),by="5 min")
newdt<-data.frame(datetime=dt5min)
newyy<-merge(newdt,YY,by="datetime",all=TRUE)
newyy$RAINFALL[is.na(newyy$RAINFALL)]<-0
plot(newyy$datetime,newyy$RAINFALL)
Jim
On Tue, Mar 1, 2022 at 2:57 PM Eliza Botto <eliza_botto at outlook.com> wrote:
[The data setting in the last email might be faulty]
Dear useRs,
I have the following dataset which represents rainfall data at a 5-minute interval from 1 May 2021 to 30 September 2021.
In base R, I wish the question below had been explained better. It is nice that an example was given, albeit misleading for me.
The data shown is not flawed and has nothing inside it that reflects being missing as it first sounded like.
What sounds like it is missing is specific dates entirely. The column called Channel seems irrelevant as it is always 30. Rain fall is always 0.2 or 0.4. The YEAR is always 2021. So the ONLY interesting thing here seems to be TIMESTAMP.
But I am NOT convinced they are missing because the times are all over the place. I mean 10 PM and 5:40 PM and 5:20 AM and so on. There are multiple rows for the same day.
Yes, there is no info for May 1 and May 6 and 7. I have no idea why but How and why are we supposed to guess that it means no rain versus some other reason? Towards the end, what I think is the real message is shown. The suggestion is there should be data for every five minute period interpolated here
Fair enough. Can I suggest that the data offered to us has the TIMESTAMP field as character, rather than some form of DATE/TIME that can be used in Python?
Converting it or extracting some info into temporary columns might be useful here.
You could then create some kind of data that loops over times starting with your start time, say midnight on the 1st and for every 5 minute interval makes a timestamp that looks like what you need and COMPARE to what is in the data shown. For any that are nor present, you can create a similar row with a zero in it for the RAINFALL field. There are oodles of ways to do that, including some more straightforward than others. Or, you may just make the sequence or all, and later in some kind of merge, only keep ones from the original data if there is a duplicate. Again, many ways, even in base R.
If my analysis is right, and clearly it may not be, a much better way to ask this question might be to say you have timestamped data about rainfall where the readings for every 5 minute interval with no rainfall have been omitted. How do you create records for all 5-minute intervals that are not present and merge that info with the records shown?
As a hint, you can make a sequence like below, with your own adjustments for starting and ending dates.
[1] "2020-05-01 00:00:00 EDT" "2020-05-01 00:05:00 EDT" "2020-05-01 00:10:00 EDT"
[4] "2020-05-01 00:15:00 EDT" "2020-05-01 00:20:00 EDT" "2020-05-01 00:25:00 EDT"
[7] "2020-05-01 00:30:00 EDT" "2020-05-01 00:35:00 EDT" "2020-05-01 00:40:00 EDT"
[10] "2020-05-01 00:45:00 EDT" "2020-05-01 00:50:00 EDT" "2020-05-01 00:55:00 EDT"
[13] "2020-05-01 01:00:00 EDT"
Of course you may want to know WHY you need the missing data interpolated. Some graphics programs, if properly supplied with actual dates, not character strings, may simply skip missing records and leave room between others. The missing ones might be treated as zero, depending what you are doing.
-----Original Message-----
From: Eliza Botto <eliza_botto at outlook.com>
To: R-help at r-project.org <R-help at r-project.org>
Sent: Mon, Feb 28, 2022 10:52 pm
Subject: [R] setting zeros for missing interval in data
[The data setting in the last email might be faulty]
Dear useRs,
I have the following dataset which represents rainfall data at a 5-minute interval from 1 May 2021 to 30 September 2021.