An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130228/f9f43e7e/attachment.pl>
help for an R automated procedures
2 messages · Gustavo Vieira, PIKAL Petr
Hi
exactly what is
fortune("surgery")
about.
Anyway, you can save yourself a lot headache, if you start using lists for your objects.
Lists can be used easily in cycles.
for (i in 1:n) {
some.list[i] <- some.function(some.other.list[i])
}
and also lapply/sapply functions can be useful
sapply(sp1.loc1,scale)
will give you scaled data frame
Regards
Petr
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Gustavo Vieira
Sent: Thursday, February 28, 2013 10:53 AM
To: r-help at r-project.org
Subject: [R] help for an R automated procedures
Dear, I would like to post the following question to the r-help on
Nabble (thanks in advance for the attention, Gustavo Vieira):
Hi there.
I have a data set on hands with 5,220 cases and I'd like to automate
some procedures (but I have almost no programming knowledge). The data
has some continuous variables that are grouped by 2 others: the name of
species and the locality where they were collected. So, the samples are
defined as 'each species on each locality'. For every sample I'd like
to do multiple imputation (when applicable), test for the presence of
outliers, standardize the variables, correct some species abundances,
save individual samples to tab delimited text file, and assemble each
individual sample (now, without NAs and outliers, corrected abundances,
and with the new standardized
variables) into a single data set. That task is pretty complex to me,
since my programming knowledge is poor (and my free time to learn R
programming is sparse). Could someone help me with that (I could
provide you the data set and the script I have written to do that,
sample by sample [ouch!])?
Thanks in advance for your attention and all the best
(ghcv at hotmail.com).
[Bellow is an example is the codes I've used to accomplish my goals,
sample by sample, which can exemplify the complexity of the procedures:
#Subsetting the data (v1-v11 are continuous "predictors"): species 1 at
locality 1 (all data [5520 cases] are on a vector called 'morfo')
sp1.loc1<-morfo[which(spps=="sp1" & taxoc=="loc1"),] #getting only the
observations of sp1 (species 1) at loc1 (locality 1)
str(sp1.loc1) #abundance -> 19 cases and the abundance variable
('abund') says 18...
sp1.loc1$abund<-rep(19,19)
summary(sp1.loc1) #missing values present; abundance for sp1 at loc1
corrected
attach(sp1.loc1)
#Dealing with NAs:
install.packages("mice", dependencies = T) #ok (R at: home & work)
library(mice)
imp <- mice(sp1.loc1)
sp1.loc1 <- complete(imp)
summary(sp1.loc1) #jaust checking... No more Nas!
attach(sp1.loc1)
#Detecting univariate outliers
z.crit <- qnorm(0.9999)
subset(sp1.loc1, select = id, subset = abs(scale(v1)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit)
morfo[47,6]
sort(v2[taxoc=="loc1"]) #the nearest observation close to 32.00 is
25.10 sp1.loc1[,6][sp1.loc1[,6]==32.00]<-25.10
subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit)
#Rechecking for outliers (now, it's ok)
subset(sp1.loc1, select = id, subset = abs(scale(v3)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v4)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v5)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v6)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v7)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v8)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v9)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v10)) > z.crit)
subset(sp1.loc1, select = id, subset = abs(scale(v11)) > z.crit)
#Standardizing variables
v1.std<-with(sp1.loc1,(scale(v1)))
v1.pad<-v1.std[,1]
v2.std<-with(sp1.loc1,(scale(v2)))
v2.pad<-v2.std[,1]
v3.std<-with(sp1.loc1,(scale(v3)))
v3.pad<-v3.std[,1]
v4.std<-with(sp1.loc1,(scale(v4)))
v4.pad<-v4.std[,1]
v5.std<-with(sp1.loc1,(scale(v5)))
v5.pad<-v5.std[,1]
v6.std<-with(sp1.loc1,(scale(v6)))
v6.pad<-v6.std[,1]
v7.std<-with(sp1.loc1,(scale(v7)))
v7.pad<-v7.std[,1]
v8.std<-with(sp1.loc1,(scale(v8)))
v8.pad<-v8.std[,1]
v9.std<-with(sp1.loc1,(scale(v9)))
v9.pad<-v9.std[,1]
v10.std<-with(sp1.loc1,(scale(v10)))
v10.pad<-v10.std[,1]
v11.std<-with(sp1.loc1,(scale(v11)))
v11.pad<-v1.std[,1]
#Joining the new standardized variables to the sp1.loc1 data set
sp1.loc1<-
data.frame(sp1.loc1,v1.pad,v2.pad,v3.pad,v4.pad,v5.pad,v6.pad,v7.pad,v8
.pad,v9.pad,v10.pad,v11.pad)
attach(sp1.loc1)
write.table(sp1.loc1,"sp1.at.loc1.txt",quote=F,row.names=F,
col.names=T,sep="\t")
detach(sp1.loc1)
#Subsetting the data (v1-v11 are continuous "predictors"): species 2 at
locality 1...]--
"Time will tell"
--
[[alternative HTML version deleted]]