An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110613/0f4a6582/attachment.pl>
log2() and -min() very quick question
3 messages · Ben Ganzfried, jim holtman, PIKAL Petr
The second line is just scaling the data based on log2. It is subtracting the minimun of the entire matrix (not just each row) and adding 1 to make sure there is not a value of zero since log2(0) is not valid. Here is an example of sample data:
x <- matrix(runif(25, -50, 50), 5) x
[,1] [,2] [,3] [,4] [,5] [1,] 29.730883 15.47239 -28.679186 47.617069 -48.692242 [2,] -4.472555 -14.68027 -37.062765 23.179251 21.556607 [3,] -8.991592 -22.97399 -2.188197 -14.327309 -39.681576 [4,] 31.087024 49.26841 42.407447 -6.852631 -5.371565 [5,] 10.493329 13.34933 9.876097 -35.178844 14.010105
# scale to log2 x <- log2(x - min(x) + 1) x
[,1] [,2] [,3] [,4] [,5] [1,] 6.311487 6.026017 4.393214 6.604506 0.000000 [2,] 5.498879 5.129776 3.658723 6.187283 6.154795 [3,] 5.346980 4.739754 5.569978 5.144248 3.323466 [4,] 6.335913 6.628783 6.525124 5.420873 5.469908 [5,] 5.911346 5.978232 5.896474 3.859313 5.993275 You should see a noticable change between the data read in and the result of the second statement.
On Mon, Jun 13, 2011 at 11:59 AM, Ben Ganzfried <ben.ganzfried at gmail.com> wrote:
I'm looking over good-code a post-doc in my lab wrote and trying to learn
how it works. ?I came across the following:
rel.abundance <- as.matrix(read.delim("rel.abundance.csv",row.names=1,as.is
=TRUE))
rel.abundance <- log2(rel.abundance-min(rel.abundance)+1)
I'm not sure what the second line is doing. ?I ran each line in R and
couldn't see a noticeable difference in the output. ?I assume log2() takes
the log base 2 of the values? ?I'm not clear what -min(rel.abundance) is
doing either...my hunch would be that it would take the smallest value in
each row?
I'd really like to figure out:
1) What's actually going on?
2) Is there a good way to run a command over a large dataset in R and better
be able to tell what is going on? ?More specifically, when I run each line
in R it looks something like this (w/ dif. values per row):
Archaea|Euryarchaeota|Methanobacteria|Methanobacteriales|Methanobacteriaceae|Methanobrevibacter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,3,0,0,0
There are a lot of cells w/ values per row, which is one reason why I think
it is difficult to detect a pattern....
Thanks in advance!
Ben
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Hi r-help-bounces at r-project.org napsal dne 13.06.2011 17:59:03:
Ben Ganzfried <ben.ganzfried at gmail.com> Odeslal: r-help-bounces at r-project.org 13.06.2011 17:59 Komu r-help at r-project.org Kopie P?edm?t [R] log2() and -min() very quick question I'm looking over good-code a post-doc in my lab wrote and trying to
learn
how it works. I came across the following: rel.abundance <-
as.matrix(read.delim("rel.abundance.csv",row.names=1,as.is
=TRUE)) rel.abundance <- log2(rel.abundance-min(rel.abundance)+1) I'm not sure what the second line is doing. I ran each line in R and couldn't see a noticeable difference in the output. I assume log2()
takes
the log base 2 of the values? I'm not clear what -min(rel.abundance) is doing either...my hunch would be that it would take the smallest value
in
each row?
No. If rel.abundance is matrix min(rel.abundance) is overall minimum
mat<-matrix(1:12, 3,4) min(mat)
[1] 1 so log2(rel.abundance-min(rel.abundance)+1) subtract minimum value from all numbers, after that it add 1 do all numbers, takes log base 2 from each number and returns matrix with the same dimensions as input matrix.
I'd really like to figure out: 1) What's actually going on? 2) Is there a good way to run a command over a large dataset in R and
better
be able to tell what is going on? More specifically, when I run each
line
in R it looks something like this (w/ dif. values per row): Archaea|Euryarchaeota|Methanobacteria|Methanobacteriales|
Methanobacteriaceae|Methanobrevibacter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,3,0,0,0 There are a lot of cells w/ values per row, which is one reason why I
think
it is difficult to detect a pattern....
there are some summary and structure commands summary(data) or str(data) which can tell you some overall information about your data. Regards Petr
Thanks in advance! Ben [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.