mean for subset

As a novice R user, I face a similar challenge. I am almost afraid to share
with this group how I solved it. About 65 labs in our proficiency program
submit data on individual Excel spreadsheets with triple replicates. There
always are a few labs that do not complete the full set of three replicates,
and I do not want their data included in my analysis.

First, I combine all the individual spreadsheets into one large Excel
spreadsheet. The replicates are in three columns: rep1, rep2, and rep3. I
sort on each individual rep column in Excel. Then I go to both the top and
the bottom of the list. 

For example, I sort on rep1 and go to the top of the list to delete any rows
where a value for rep1 was not recorded. Then I go to the bottom of the list
and delete any rows where rep1 is text instead of a number, for example,
<0.001. I should say that the labs are instructed that they must complete
all three replicates, and they must not enter results as text. Next I repeat
the process for rep2 and rep3. 

I'll do a little more work in Excel on the large, combined table with all
the lab data. I calculate in Excel the mean, standard deviation, and
coefficient of variation for each of the three reps. Finally, I filter all
the data and delete duplicate rows. This is necessary as I sometimes
accidentally copy the same spreadsheet two times from a lab into my large
table. Finally, I save the cleaned up table in *.csv format that is easily
read into R. 

I know that R can do all of these things, but if you are just learning how
to use R it might be easier to do some initial work in Excel, or a similar
spreadsheet, before running your data through R.

I also use MS-Word's mail merge feature to generate my code. I'll get three
or four pages of code doing what I want for a single analytical test, for
example, calcium. Then I'll use the mail merge feature to generate hundreds
of pages of code with the other analytical tests (nitrogen, phosphorus,
potassium, etc.). I just copy and paste the large, merged Word document into
R. R cranks away for 30 minutes and I end up with several large tables (and
these get additional editing in Ecel) and hundreds of beautiful graphs that
would take weeks to create in Excel.

I was amazed that Word would work. I expected all of Word's special print
control codes would mess things up. I just recently received a new laptop
computer, and now I have an occassional problem with Word's "pretty print
quotes," but if you know about that problem, it is easy to fix.

Jerry Floren
Minnesota Department of Agriculture

mean for subset

Thread (10 messages)