Skip to content
Prev 359705 / 398502 Next

R how to find outliers and zero mean columns?

Hi Norman,
To check whether all values of an object (say "x") fulfill a certain
condition (==0):

all(x==0)

If your object (X) is indeed a data frame, you can only do this by
column, so if you want to get the results:

X<-data.frame(A=c(0,1:10),B=c(0,2:10,99999),
 C=c(0,-1,3:11),D=rep(0,11))
all_zeros<-function(x) return(all(x==0))
which_cols<-unlist(lapply(X,all_zeros))

If your data frame (or a subset) contains all numeric values, you can
finesse the problem like this:

which_rows<-apply(as.matrix(X),1,all_zeros)

What you get is a list of logical (TRUE/FALSE) values from lapply, so
it has to be unlisted to get a vector of logical values like you get
with "apply".

You can then use that vector to index (subset) the original data frame
by logically inverting it with ! (NOT):

X[,!which_cols]
X[!which_rows,]

Your "outliers" look suspiciously like missing values from certain
statistical packages. If you know the values you are looking for, you
can do something like:

NA99999<-X==99999

and then "remove" them by replacing those values with NA:

X[NA99999]<-NA

Be aware that all these hackles (diminutive of hacks) are pretty
specific to this example. Also remember that if this is homework, your
karma has just gone down the cosmic sinkhole.

Jim
On Thu, Mar 31, 2016 at 9:56 AM, Norman Pat <normanmath1 at gmail.com> wrote: