Prev 26848 / 29559 Next

Subsetting dataframe by all factor levels

Justin H.

Fri, Sep 14, 2018 11:10 AM

One more thing. Let's still assume you want to interpolate it yearly. The
below code will assign names to the output during the loop.

for (i in levels(rainfall.year[,#year])) {
assign ( paste (i,"interpolation output",sep = "_")
, interpolation_function()
}


Cheers,
Justin

On Fri, Sep 14, 2018 at 2:03 PM Justin H. <jstnhllrd at gmail.com> wrote:

Hi Rich,

For the sake of example, here's a solution for a simple aggregation.

aggregate(rainfall, list(rainfall$name), mean)  #This will aggregate all

columns and determine their mean. You're left with 58 rows.

aggregate( rainfall[, #:#], list(rainfall$name), mean)  #In case you only

want to aggregate over select columns.


I am assuming you want rows with every combination of year and station
with their average precipitations. To aggregate it in that way you will
need to create a new column that represents the year (or month/year if the
data are appropriate for that resolution).

rainfall.year<-with(rainfall, tapply(prcp, list(name, year), mean))

#This does the aggregation.

rainfall.year<-data.frame(as.table(rainfall.year))  #However, you are

given a "wide" data frame. This makes it "long" as you probably want it.

A for-do-done loop option.

for (i in levels(rainfall.year[,#year])) {
print(i)
print(mean(rainfall.year[rainfall.year$year==i,#prcp]))
}

The loop will return the mean rainfall per year, where #year is the number
for the year column and #prcp is for precipitation.
Try running that loop to see that it is properly looping through the
factor you want and then stick in the interpolation function.

I hope that helps!

Cheers,
Justin

On Fri, Sep 14, 2018 at 1:13 PM Rich Shepard <rshepard at appl-ecosys.com>
wrote:

   I need to learn geospatial analyses in R to complement my GIS
knowledge.
I've just re-read the subsetting chapter in Hadley's 'Advanced R' without
seeing how to create separate data frames based by extracting all rows for
each site name in the parent data frame in one step. I believe that what I
need to do is create a list of the factor names and feed them to a loop
subsetting each to a new dataframe. Perhaps there's a better way unknown
to
me and I need advice, suggestions, and recommendations how to proceed.

   The inclusive data frame has this structure:

str(rainfall)
'data.frame':   113569 obs. of  6 variables:
  $ name    : Factor w/ 58 levels "Blazed Alder",..: 20 20 20 20 20 20 20
...
  $ easting : num  2370575 2370575 2370575 2370575 2370575 ...
  $ northing: num  199338 199338 199338 199338 199338 ...
  $ elev    : num  228 228 228 228 228 228 228 228 228 228 ...
  $ sampdate: Date, format: "2005-01-01" "2005-01-02" ...
  $ prcp    : num  0.59 0.08 0.1 0 0 0.02 0.05 0.1 0 0.02 ...

   My goal is to use the monthly mean rainfall at each of the 58 reporting
stations to interpolate/extrapolate rainfall over the entire county for
selected years to show variability. The data points are not evenly
distributed but clustered in more populated areas and dispersed in rural
areas. My geochemical data typically are like this and I need to also
learn
how this distribution affects how the data are analyzed.

TIA,

Rich

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Thread (5 messages)

Rich Shepard Subsetting dataframe by all factor levels Sep 14 Justin H. Subsetting dataframe by all factor levels Sep 14 Justin H. Subsetting dataframe by all factor levels Sep 14 Rich Shepard Subsetting dataframe by all factor levels Sep 14 Rich Shepard Subsetting dataframe by all factor levels Sep 14