difficult data manipulation question
Try this: # test data # read in header separately so R does not make column names unique Lines <- "AAA BBB CCC DDD AAA BBB 0 2 1 2 0 0 2 3 7 6 0 1 1.5 4 9 9 6 0 1.0 6 10 11 3 3 " DF <- read.table(textConnection(Lines), skip = 1) names(DF) <- scan(textConnection(Lines), what = "", nlines = 1) f <- function(x) x[which.max(colSums(DF[x]!=0))] tapply(seq(DF), names(DF), f)
On 7/3/06, markleeds at verizon.net <markleeds at verizon.net> wrote:
hi everyone :
suppose i have a matrix in which some column names are identical so,
for example, TEMP
"AAA", "BBB", "CCC", "DDD","AAA", "BBB"
0 2 1 2 0 0
2 3 7 6 0 1
1.5 4 9 9 6 0
1.0 6 10 11 3 3
I didn't even check yet whether identical column names are allowed
in a matrix but i hope they are.
assuming that they are, then i would like to be able to take the matrix and make a new matrix with the following requirements.
1) whenever there is a unique column name, just take that column for the new matrix
2) whenever the column name is not unique, take the one
that has the most non zero elements ? ( in the case of
ties, i don't care which one is picked ).
so, in this case, the resulting matrix would just be the first 4 columns.
i realize ( or atleast i think ) that
sum( TEMP[(TEMP[,columnname] !=0) ,columnname) will give me the
number of non elements in a column with the name columnmame
but how to use this deal with the non uniqueness to solve my particular problem is beyond me. plus, i think the command will
bomb because columnname will not always be unique ?
Thanks for any help. I realize this is not a trivial problem so I really appreciate it.
Mark
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html