Need fresh eyes to see what I'm missing
Remove all your as.integer() and as.double() coercions. They are unnecessary (unless you are preparing input for C code; also, all R non-integers are double precision) and may be the source of your problems. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Sep 14, 2021 at 8:31 AM Eric Berger <ericjberger at gmail.com> wrote:
Before you create vel_by_month you can check vel for NAs and NaNs by sum(is.na(vel)) sum(unlist(lapply(vel,is.nan))) HTH, Eric On Tue, Sep 14, 2021 at 6:21 PM Rich Shepard <rshepard at appl-ecosys.com> wrote:
The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81
The script to process it:
library('tidyverse')
vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',',
stringsAsFactors = FALSE)
vel$year <- as.integer(vel$year)
vel$month <- as.integer(vel$month)
vel$day <- as.integer(vel$day)
vel$hour <- as.integer(vel$hour)
vel$min <- as.integer(vel$min)
vel$fps <- as.double(vel$fps, length = 6)
# use dplyr to filter() by year, month, day; summarize() to get monthly
# means
vel_by_month = vel %>%
group_by(year, month) %>%
summarize(flow = mean(fps, na.rm = TRUE))
R's display after running the script:
source('vel.R')
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument. Warning messages: 1: In eval(ei, envir) : NAs introduced by coercion 2: In eval(ei, envir) : NAs introduced by coercion 3: In eval(ei, envir) : NAs introduced by coercion The dataframe created by the read.csv() command:
head(vel)
year month day hour min fps 1 2016 3 3 12 0 1.74 2 2016 3 3 12 10 1.75 3 2016 3 3 12 20 1.76 4 2016 3 3 12 30 1.81 5 2016 3 3 12 40 1.79 6 2016 3 3 12 50 1.75 and the resulting grouping:
vel_by_month
# A tibble: 67 ? 3
# Groups: year [8]
year month flow
<int> <int> <dbl>
1 0 NA NaN
2 2016 3 2.40
3 2016 4 3.00
4 2016 5 2.86
5 2016 6 2.51
6 2016 7 2.18
7 2016 8 1.89
8 2016 9 1.38
9 2016 10 1.73
10 2016 11 2.01
# ? with 57 more rows
I cannot find why line 1 is there. Other data sets don't produce this
result.
TIA,
Rich
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.