Subsetting dataframe by all factor levels
One more thing. Let's still assume you want to interpolate it yearly. The
below code will assign names to the output during the loop.
for (i in levels(rainfall.year[,#year])) {
assign ( paste (i,"interpolation output",sep = "_")
, interpolation_function()
}
Cheers,
Justin
On Fri, Sep 14, 2018 at 2:03 PM Justin H. <jstnhllrd at gmail.com> wrote:
Hi Rich, For the sake of example, here's a solution for a simple aggregation.
aggregate(rainfall, list(rainfall$name), mean) #This will aggregate all
columns and determine their mean. You're left with 58 rows.
aggregate( rainfall[, #:#], list(rainfall$name), mean) #In case you only
want to aggregate over select columns. I am assuming you want rows with every combination of year and station with their average precipitations. To aggregate it in that way you will need to create a new column that represents the year (or month/year if the data are appropriate for that resolution).
rainfall.year<-with(rainfall, tapply(prcp, list(name, year), mean))
#This does the aggregation.
rainfall.year<-data.frame(as.table(rainfall.year)) #However, you are
given a "wide" data frame. This makes it "long" as you probably want it.
A for-do-done loop option.
for (i in levels(rainfall.year[,#year])) {
print(i)
print(mean(rainfall.year[rainfall.year$year==i,#prcp]))
}
The loop will return the mean rainfall per year, where #year is the number
for the year column and #prcp is for precipitation.
Try running that loop to see that it is properly looping through the
factor you want and then stick in the interpolation function.
I hope that helps!
Cheers,
Justin
On Fri, Sep 14, 2018 at 1:13 PM Rich Shepard <rshepard at appl-ecosys.com>
wrote:
I need to learn geospatial analyses in R to complement my GIS knowledge. I've just re-read the subsetting chapter in Hadley's 'Advanced R' without seeing how to create separate data frames based by extracting all rows for each site name in the parent data frame in one step. I believe that what I need to do is create a list of the factor names and feed them to a loop subsetting each to a new dataframe. Perhaps there's a better way unknown to me and I need advice, suggestions, and recommendations how to proceed. The inclusive data frame has this structure: str(rainfall) 'data.frame': 113569 obs. of 6 variables: $ name : Factor w/ 58 levels "Blazed Alder",..: 20 20 20 20 20 20 20 ... $ easting : num 2370575 2370575 2370575 2370575 2370575 ... $ northing: num 199338 199338 199338 199338 199338 ... $ elev : num 228 228 228 228 228 228 228 228 228 228 ... $ sampdate: Date, format: "2005-01-01" "2005-01-02" ... $ prcp : num 0.59 0.08 0.1 0 0 0.02 0.05 0.1 0 0.02 ... My goal is to use the monthly mean rainfall at each of the 58 reporting stations to interpolate/extrapolate rainfall over the entire county for selected years to show variability. The data points are not evenly distributed but clustered in more populated areas and dispersed in rural areas. My geochemical data typically are like this and I need to also learn how this distribution affects how the data are analyzed. TIA, Rich
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo