Skip to content
Prev 205637 / 398506 Next

mean for subset

As a novice R user, I face a similar challenge. I am almost afraid to share
with this group how I solved it. About 65 labs in our proficiency program
submit data on individual Excel spreadsheets with triple replicates. There
always are a few labs that do not complete the full set of three replicates,
and I do not want their data included in my analysis.

First, I combine all the individual spreadsheets into one large Excel
spreadsheet. The replicates are in three columns: rep1, rep2, and rep3. I
sort on each individual rep column in Excel. Then I go to both the top and
the bottom of the list. 

For example, I sort on rep1 and go to the top of the list to delete any rows
where a value for rep1 was not recorded. Then I go to the bottom of the list
and delete any rows where rep1 is text instead of a number, for example,
<0.001. I should say that the labs are instructed that they must complete
all three replicates, and they must not enter results as text. Next I repeat
the process for rep2 and rep3. 

I'll do a little more work in Excel on the large, combined table with all
the lab data. I calculate in Excel the mean, standard deviation, and
coefficient of variation for each of the three reps. Finally, I filter all
the data and delete duplicate rows. This is necessary as I sometimes
accidentally copy the same spreadsheet two times from a lab into my large
table. Finally, I save the cleaned up table in *.csv format that is easily
read into R. 

I know that R can do all of these things, but if you are just learning how
to use R it might be easier to do some initial work in Excel, or a similar
spreadsheet, before running your data through R.

I also use MS-Word's mail merge feature to generate my code. I'll get three
or four pages of code doing what I want for a single analytical test, for
example, calcium. Then I'll use the mail merge feature to generate hundreds
of pages of code with the other analytical tests (nitrogen, phosphorus,
potassium, etc.). I just copy and paste the large, merged Word document into
R. R cranks away for 30 minutes and I end up with several large tables (and
these get additional editing in Ecel) and hundreds of beautiful graphs that
would take weeks to create in Excel.

I was amazed that Word would work. I expected all of Word's special print
control codes would mess things up. I just recently received a new laptop
computer, and now I have an occassional problem with Word's "pretty print
quotes," but if you know about that problem, it is easy to fix.

Jerry Floren
Minnesota Department of Agriculture
Matthew Dowle-3 wrote: