Sorry for the dumb question, but I cant work out how to do this. Quick version, How can I re-bin a given frequency distribution using new breaks without reference to the original data? Given distribution has integer valued bins. Long version, I am loading a frequency table into R from a file. The original data is very large, and it is a very simple process to get a frequency distribution from an SQL database, so in all this is a convenient method for me. Point being I don't start with 'raw' data. The data looks like this...
dat
COUNT FREQUENCY 1 1 5734 2 2 1625 3 3 793 4 4 480 5 5 294 6 6 237 7 7 205 8 8 200 9 9 123 10 10 108 11 11 90 12 12 62 13 13 60 14 14 68 15 15 64 16 16 56 17 17 68 18 18 45 19 19 38 20 20 37 21 21 29 22 22 39 23 23 35 24 24 33 25 25 36 ... 148 153 5 149 156 2 150 157 3 151 158 2 152 159 2 153 162 1 154 163 3 155 164 3 156 165 2 157 166 1 158 168 2 159 169 4 160 170 1 ... 354 2106 1 355 2189 1 356 2194 1 357 2217 1 358 2246 1 359 2474 1 360 2801 1 361 3697 1 362 3702 1 363 7353 1 364 8738 1 365 9442 1 366 12280 1 This is a tipical 'count / frequency' distribution in biology, where low counts of a certain property are very frequent (across genomes, proteins, ecosystems, etc...), and high counts of of a certain property are very rare. In the above example a certain property occurs 12280 times with a frequency of 1, another property occurs 9442 times with the same frequency. At the other end of the extreem, a certain property occurs once with a frequency of 5734, and another property occurs twice with a frequency of 1625. This kind of distribution is variously known as a "zipf", a "power law", a "Pareto", "scale free", "heavy tailed" or a "80:20" distribution, or coloquially "the dominance of the few over the many". The term I choose is a "log linear" distribution, because that makes no assumptions about the underlying cause of the overall shape. People tipically quote the curve in the form of y ~ Cx^(-a). I want to use the binning method of parameter estimation given here... http://www.ece.uc.edu/~annexste/Courses/cs690/Zipf,%20Power-law,%20Pareto%20-%20a%20ranking%20tutorial.htm (bin the data with exponentially increasing bin widths within the data range). But I can't work out how to re-bin my existing frequency data. Sorry for the long question, all the best Dan.