Skip to content

histogram frequency weighing

10 messages · jim holtman, Sébastien Bihorel, Gunther Höning +1 more

#
Fellow R-helpers,

Suppose we create a histogram as follows (although it could be any vector
with zeroes in it):


R> lenh <- hist(iris$Sepal.Length, br=seq(4, 8, 0.05))
R> lenh$counts
 [1]  0  0  0  0  0  1  0  3  0  1  0  4  0  2  0  5  0  6  0 10  0  9  0  4  0
[26]  1  0  6  0  7  0  6  0  8  0  7  0  3  0  6  0  6  0  4  0  9  0  7  0  5
[51]  0  2  0  8  0  3  0  4  0  1  0  1  0  3  0  1  0  1  0  0  0  1  0  4  0
[76]  0  0  1  0  0


and we wanted to apply a weighing scheme where frequencies immediately
following (and only those) empty class intervals (0) should be adjusted by
averaging them over the number of preceding empty intervals + 1.  For
example, the first frequency that would need to be adjusted in 'lenh' is
element 6 (1), which has 5 preceding empty intervals, so its adjusted
count would be 1/6.  Similarly, the second one would be element 8 (3),
which has 1 preceding empty interval, so its adjusted count would be 3/2.
Can somebody please provide a hint to implement such a weighing scheme?

I thought about some very contrived ways to accomplish this, involving
'which' and 'diff', but I sense a function might already be available to
do this efficiently.  I couldn't find relevant info in the usual channels.
Thanks in advance for any pointers.


Cheers,
#
I think this should do it:
[1]  0  0  0  0  0  1  0  3  0  1  0  4  0  2  0  5  0  6  0 10  0  9
 0  4  0  1  0  6  0  7  0  6  0
[34]  8  0  7  0  3  0  6  0  6  0  4  0  9  0  7  0  5  0  2  0  8  0
 3  0  4  0  1  0  1  0  3  0  1
[67]  0  1  0  0  0  1  0  4  0  0  0  1  0  0
[1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667
0.0000000 1.5000000 0.0000000 0.5000000
[11] 0.0000000 2.0000000 0.0000000 1.0000000 0.0000000 2.5000000
0.0000000 3.0000000 0.0000000 5.0000000
[21] 0.0000000 4.5000000 0.0000000 2.0000000 0.0000000 0.5000000
0.0000000 3.0000000 0.0000000 3.5000000
[31] 0.0000000 3.0000000 0.0000000 4.0000000 0.0000000 3.5000000
0.0000000 1.5000000 0.0000000 3.0000000
[41] 0.0000000 3.0000000 0.0000000 2.0000000 0.0000000 4.5000000
0.0000000 3.5000000 0.0000000 2.5000000
[51] 0.0000000 1.0000000 0.0000000 4.0000000 0.0000000 1.5000000
0.0000000 2.0000000 0.0000000 0.5000000
[61] 0.0000000 0.5000000 0.0000000 1.5000000 0.0000000 0.5000000
0.0000000 0.5000000 0.0000000 0.0000000
[71] 0.0000000 0.2500000 0.0000000 2.0000000 0.0000000 0.0000000
0.0000000 0.2500000 0.0000000 0.0000000

        
On 9/17/06, Sebastian P. Luque <spluque at gmail.com> wrote:

  
    
#
On Sun, 17 Sep 2006 15:12:30 -0500,
"Sebastian P. Luque" <spluque at gmail.com> wrote:
[...]
I think I found a better combination of those two functions than I what I
was initially playing with:


R> lenh <- hist(iris$Sepal.Length, br=seq(4, 8, 0.05))
R> ok <- which(lenh$counts > 0)
R> lenh$counts[ok] <- lenh$counts[ok] / diff(c(0, ok))
R> lenh$counts
 [1] 0.0000 0.0000 0.0000 0.0000 0.0000 0.1667 0.0000 1.5000 0.0000 0.5000
[11] 0.0000 2.0000 0.0000 1.0000 0.0000 2.5000 0.0000 3.0000 0.0000 5.0000
[21] 0.0000 4.5000 0.0000 2.0000 0.0000 0.5000 0.0000 3.0000 0.0000 3.5000
[31] 0.0000 3.0000 0.0000 4.0000 0.0000 3.5000 0.0000 1.5000 0.0000 3.0000
[41] 0.0000 3.0000 0.0000 2.0000 0.0000 4.5000 0.0000 3.5000 0.0000 2.5000
[51] 0.0000 1.0000 0.0000 4.0000 0.0000 1.5000 0.0000 2.0000 0.0000 0.5000
[61] 0.0000 0.5000 0.0000 1.5000 0.0000 0.5000 0.0000 0.5000 0.0000 0.0000
[71] 0.0000 0.2500 0.0000 2.0000 0.0000 0.0000 0.0000 0.2500 0.0000 0.0000


It makes sense, although I'm a bit nervous about floating-point issues
that might make this fail in some cases.  Any suggestions/comments
welcome.
#
On Sun, 17 Sep 2006 18:05:15 -0400,
"jim holtman" <jholtman at gmail.com> wrote:

            
[...]

Thank you Jim, the idea with 'rle' is great.  I missed your follow-up
before mine a minute ago with another solution.  I'll do some testing with
both.


Cheers,
#
Dear list,

I try to do the following:
I have an list of length n, with elements done by smooth.spline
(SmoothList).
Now I have a matrix with n rows and m columns with x-values(Xarray) 
Now I want ot predict the y-values.
Therefor I want to take the first element of SmoothList and the first row of
Xarray and predict for each element in Xarray the y value.
And then take the second element of SmoothList and second row of Xarray,
third row of SmoothList and third row of Xarray and so on....

I tried following:

NewValues <- function(x,LIST){predict(LIST,x)}
apply(Xarray, 2, NewValues,SmoothList)

But it don't work.

Could anybody help please ?

Gunther
#
Hi

not much information about what can be wrong. As nobody knows your 
Xarray and SmoothList it is hard to guess. You even omitted to show 
what "does not work"
So here are few guesses.

predict usually expects comparable data
apply(Xarray, 2, NewValues,LIST=SmoothList)

HTH
Petr
On 18 Sep 2006 at 8:05, Gunther H?ning wrote:
From:           	Gunther H?ning <gunther.hoening at ukmainz.de>
To:             	<r-help at stat.math.ethz.ch>
Date sent:      	Mon, 18 Sep 2006 08:05:28 +0200
Subject:        	[R] Question on apply()
Petr Pikal
petr.pikal at precheza.cz
#
Ok.
I tried this too, but it still doesn't work.
Here some more information to try out, but just an excerpt of Xarray

x <- c(0.11,0.25,0.45,0.65,0.80,0.95,1)
Y <- matrix(c(15,83,57,111,150,168,175,37,207,142,277,375,420,437),nrow=2)

sm <- function(y,x){smooth.spline(x,y)} 
SmoothList <- apply(Y,1,sm,x)
NewValues <- function(x,LIST){predict(LIST,x)} 
Xarray <-
matrix(c(0.15,0.56,0.66,0.45,0.19,0.17,0.99,0.56,0.77,0.41,0.11,0.63,0.42,0.
43),nrow=2)


apply(Xarray, 2, NewValues,SmoothList)
apply(Xarray, 2, NewValues,LIST=SmoothList)



-----Urspr?ngliche Nachricht-----
Von: Petr Pikal [mailto:petr.pikal at precheza.cz] 
Gesendet: Montag, 18. September 2006 08:43
An: Gunther H?ning; r-help at stat.math.ethz.ch
Betreff: Re: [R] Question on apply()

Hi

not much information about what can be wrong. As nobody knows your Xarray
and SmoothList it is hard to guess. You even omitted to show what "does not
work"
So here are few guesses.

predict usually expects comparable data
apply(Xarray, 2, NewValues,LIST=SmoothList)


HTH
Petr
On 18 Sep 2006 at 8:05, Gunther H?ning wrote:
From:           	Gunther H?ning <gunther.hoening at ukmainz.de>
To:             	<r-help at stat.math.ethz.ch>
Date sent:      	Mon, 18 Sep 2006 08:05:28 +0200
Subject:        	[R] Question on apply()
Petr Pikal
petr.pikal at precheza.cz
#
Hi

If I am correct apply do not choose from SmoothList as you expected. 
Instead probably

lapply(SmoothList, predict,Xarray)
or
mapply(predict,SmoothList, Xarray)

can give you probably what you want.

HTH
Petr
On 18 Sep 2006 at 9:26, Gunther H?ning wrote:
From:           	Gunther H?ning <gunther.hoening at ukmainz.de>
To:             	"'Petr Pikal'" <petr.pikal at precheza.cz>,
	<r-help at stat.math.ethz.ch>
Subject:        	AW: [R] Question on apply() with more information...
Date sent:      	Mon, 18 Sep 2006 09:26:01 +0200
Petr Pikal
petr.pikal at precheza.cz
#
Hi,

I tried both ideas, but it isn't that what I'm looking for.
I want to avoid for loop, because the matrix is of big size(1200*1200
entries)

With a loop I would do:

for ( i in seq(along = SmoothList))
{
	Xarry[i,] <- predict(SmoothList[[i]],Xarry[i,])$y
} 

Actually I want to do more than just to predict a value, but it isn't
important for the initial question...

Gunther

-----Urspr?ngliche Nachricht-----
Von: Petr Pikal [mailto:petr.pikal at precheza.cz] 
Gesendet: Montag, 18. September 2006 11:44
An: Gunther H?ning
Cc: r-help at stat.math.ethz.ch
Betreff: Re: AW: [R] Question on apply() with more information...

Hi

If I am correct apply do not choose from SmoothList as you expected. 
Instead probably

lapply(SmoothList, predict,Xarray)
or
mapply(predict,SmoothList, Xarray)

can give you probably what you want.

HTH
Petr
On 18 Sep 2006 at 9:26, Gunther H?ning wrote:
From:           	Gunther H?ning <gunther.hoening at ukmainz.de>
To:             	"'Petr Pikal'" <petr.pikal at precheza.cz>,
	<r-help at stat.math.ethz.ch>
Subject:        	AW: [R] Question on apply() with more information...
Date sent:      	Mon, 18 Sep 2006 09:26:01 +0200
Petr Pikal
petr.pikal at precheza.cz
#
Hi

both lapply and mapply can give you what you want but you have to 
select only desired part.

e.g.

spec.apply<- function(sl, x) {
d<-dim(x)[1]
d2<-dim(x)[2]*2*d
vec<-seq(d,d2,2)
Z<-as.numeric(mapply(predict,sl, x))[vec]
dim(Z) <-c(d,length(vec)/d)
Z
}


but it is probably slower than simple for loop. However maybe some 
clever use of do.call can help.

Best regards
Petr
On 18 Sep 2006 at 13:26, Gunther H?ning wrote:
From:           	Gunther H?ning <gunther.hoening at ukmainz.de>
To:             	"'Petr Pikal'" <petr.pikal at precheza.cz>
Date sent:      	Mon, 18 Sep 2006 13:26:25 +0200
Copies to:      	r-help at stat.math.ethz.ch
Subject:        	Re: [R] Question on apply() with more information...
Petr Pikal
petr.pikal at precheza.cz