what does cut(data, breaks=n) actually do?
cut(data, breaks=n)
splits the data in n bins of (approximately) the same size.
The used size is obtained by:
max(data) - min(data)
------------------------------------
n
> x=rnorm(x)
> cut(x,breaks=3)
[1] (1.79,9.97] (-6.39,1.79] (9.97,18.2] (9.97,18.2] (-6.39,1.79]
[6] (1.79,9.97] (-6.39,1.79] (1.79,9.97] (-6.39,1.79] (-6.39,1.79]
Levels: (-6.39,1.79] (1.79,9.97] (9.97,18.2]
Then you have:
> 18.2-9.97
[1] 8.23
> 9.97-1.79
[1] 8.18
> 1.79+6.39
[1] 8.18
>
> (max(x)-min(x))/3
[1] 8.164187
I don't know the reasons for the little differences (I am wondering about).
I hope it is useful.
domenico
melissa cline wrote:
Hello, I'm trying to bin a quantity into 2-3 bins for calculating entropy and mutual information. One of the approaches I'm exploring is the cut() function, which is what the mutualInfo function in binDist uses. When it's called in the format cut(data, breaks=n), it somehow splits the data into n distinct bins. Can anyone tell me how cut() decides where to cut? Thanks, Melissa --------------------------------------------------------------- Melissa Cline, Independent Investigator MCD Biology, UCSC [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.