help with rowsum/aggregate type functions

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080324/085d6024/attachment.pl
Try this:

aggregate(list(Number=x$Number), by=list(Gene_Name=x$Gene_Name), sum)
Hi--

  This is a question with a trivial and obvious answer, I'm sure, but I can't seem to find it in the help files and books that I have handy.  I have a dataframe consisting of two columns, "Gene_Name," a list of gene symbols, and "Number," a numeric measure of how frequently a tag representing that gene showed up in a SAGE library.  Several of the genes are represented by multiple tags, and therefore are present more than once in the list, e.g.:

 1167     Zcchc8      6
 1168     Zcwpw1      5
 1169     Zdhhc18     6
 1170     Zdhhc20     5
 1171     Zdhhc3      6
 1172     Zdhhc3      5
 1173     Zeb2        9
 1174     Zeb2        6

  What I want is to collapse the list by gene name, such that duplicates are summed up and appear only once in the final version:

 Zcchc8      6

 Zcwpw1      5

 Zdhhc18     6
 Zdhhc20     5

 Zdhhc3     11

 Zeb2       15

  The only way I can figure out to do this is via rowsum:

 > rowsum (Number,Gene_Name)

 gives me exactly what I want, *except* that in the end, I am left with a matrix containing the Number values and with the Gene_Names used as row names (the output therefore looks exactly as printed above) -- what I want is a dataframe equivalent to the starting table, with numbered rows and separate, accessible columns containing the Gene_Name and Number values.

  I was able to put such a dataframe together manually, by cobbling together the row names of the above list with the values:

 > genes.unique <- data.frame (rownames (rowsum(Number,Gene_Name)), rowsum(Number,Gene_Name))

 but then I have to manually replace the row names of the dataframe with numbers, to get back to what I wanted in the first place.

  I hope this makes some sort of sense.  Is there an easier way to do this?  Thanks in advance!

  Charlie Murtaugh

 =====

 L. Charles Murtaugh
 Assistant Professor

 University of Utah
 Dept. of Human Genetics
 15 N. 2030 E. Rm. 2100
 Salt Lake City, UT 84112

 tel 801-581-5958
 fax 801-581-6463
 email murtaugh at genetics.utah.edu

        [[alternative HTML version deleted]]

 ______________________________________________
 R-help at r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O
See the reshape package. 

library(reshape)
yy <- melt(xx, id=c("Gene.name")) 
cast(yy, Gene.name~variable, sum)

--- Charles Murtaugh <murtaugh at genetics.utah.edu>
wrote:
Hi--

  This is a question with a trivial and obvious
answer, I'm sure, but I can't seem to find it in the
help files and books that I have handy.  I have a
dataframe consisting of two columns, "Gene_Name," a
list of gene symbols, and "Number," a numeric
measure of how frequently a tag representing that
gene showed up in a SAGE library.  Several of the
genes are represented by multiple tags, and
therefore are present more than once in the list,
e.g.:

1167     Zcchc8      6
1168     Zcwpw1      5
1169     Zdhhc18     6
1170     Zdhhc20     5
1171     Zdhhc3      6
1172     Zdhhc3      5
1173     Zeb2        9
1174     Zeb2        6

  What I want is to collapse the list by gene name,
such that duplicates are summed up and appear only
once in the final version:

Zcchc8      6

Zcwpw1      5

Zdhhc18     6
Zdhhc20     5

Zdhhc3     11

Zeb2       15

  The only way I can figure out to do this is via
rowsum:

rowsum (Number,Gene_Name)

gives me exactly what I want, *except* that in the
end, I am left with a matrix containing the Number
values and with the Gene_Names used as row names
(the output therefore looks exactly as printed
above) -- what I want is a dataframe equivalent to
the starting table, with numbered rows and separate,
accessible columns containing the Gene_Name and
Number values.

  I was able to put such a dataframe together
manually, by cobbling together the row names of the
above list with the values:

genes.unique <- data.frame (rownames
(rowsum(Number,Gene_Name)),
rowsum(Number,Gene_Name))

but then I have to manually replace the row names of
the dataframe with numbers, to get back to what I
wanted in the first place.

  I hope this makes some sort of sense.  Is there an
easier way to do this?  Thanks in advance!

  Charlie Murtaugh

=====

L. Charles Murtaugh
Assistant Professor

University of Utah
Dept. of Human Genetics
15 N. 2030 E. Rm. 2100
Salt Lake City, UT 84112

tel 801-581-5958
fax 801-581-6463
email murtaugh at genetics.utah.edu

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.

__________________________________________________________________
[[elided trailing spam]]