Calculate median from counts and values
Thanks Gabor and Phil. That did it. I've used R for years for plotting and run-of-the-mill data analysis (the only kind I do). But the syntax of this language has just never clicked for me. I can't seem to advance beyond the "mostly harmless" stage. Python is roting my brain I guess. Again, thanks for the tips David
On 5/3/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
On 5/3/05, David Finlayson <david.p.finlayson at gmail.com> wrote:
I am tangled with a syntax question. I want to calculate basic statistics for a large dataset provided in weights and values and I can't figure out an elegant way to expand the data. For example here are the counts:
counts
n4 n3 n2 n1 p0 p1 p2 p3 p4 1 0 0 0 1 1 3 16 55 24 2 0 0 0 0 2 8 28 47 15 3 1 17 17 13 4 5 12 24 8 ... and the values:
values
n4 n3 n2 n1 p0 p1 p2 p3 p4 [1,] 16 8 4 2 1 0.5 0.25 0.125 0.0625 What I want for each row is something like this (shown for row 1): c( rep(16, 0), rep(8, 0), rep(4, 0), rep(2, 1), rep(1, 1), rep(0.5, 3), rep(0.25, 16), rep(0.125, 55), rep(0.0625, 24)) I am sure that this is a one-liner for an R-master, but I can't figure it out without a set of nested for loops iterating over each row in counts.
Is there supposed to be one row of values that apply to all rows of counts or is there to be different rows of values for different rows of counts? Also in your example row 3 has a different total than 1 or 2. Is that right? At any rate, I will assume that there is only one row of values and many rows of counts and that its not necessarily true that counts sum to the same number in each row. Then noting that c(rep(4,1), rep(5,2), rep(6,3)) is the same as rep(4:6, 1:3) is the same as, we have: lapply(as.data.frame(t(counts)), rep, x = unlist(values))
David Finlayson Marine Geology & Geophysics School of Oceanography Box 357940 University of Washington Seattle, WA 98195-7940 USA Office: Marine Sciences Building, Room 112 Phone: (206) 616-9407 Web: http://students.washington.edu/dfinlays