Dear R user, I am using UK census data on travel to work. The authorities have provided a breakdown in each area by mode (car, bicycle etc.) and distance travelled (0 ? 2 km, 2 ? 5 km etc). Therefore, after processing, the data for Sheffield look like this https://files.one.ubuntu.com/ej2VtVbJTEaelvMRlsocRg : dshef <- read.table("distmodesheff.csv", sep=",", header=TRUE) print(dshef) Dist Tr Bici Met Pas Foot Bus Car 1 2 > 45 571 491 2125 16644 4469 13494 2 2 ? 5 80 1136 2540 4738 3659 17290 30212 3 5 ? 10 217 466 2335 3994 1041 12963 35221 4 10 ? 20 191 76 491 1333 332 2439 16322 5 20 ? 30 168 6 25 235 41 175 3711 6 30 ? 40 78 6 3 122 20 74 2179 7 40 ? 60 349 6 21 261 96 333 3501 8 60 < 332 62 125 369 534 433 3276 9 Other 148 40 79 905 388 622 6481 It's interesting to look at the different distributions of different transport modes: attach(dshef) rs <- rbind(Tr,Bici,Met,Pas,Foot,Bus,Car) barplot(rs, beside=TRUE, names=Dist, col=rainbow(7), legend=TRUE) http://r.789695.n4.nabble.com/file/n3758198/1.png This is brilliant, and creates output similar to that of OO calc: http://r.789695.n4.nabble.com/file/n3758198/egraphmini.jpg However, as you can see, the pre-made categories (0 ? 2 km etc.) are unevenly spaced bins within a continuous variable. This puts the analysis into histogram mode (with frequency determined by the area, not the height). What I would look for for the vector Car, for example, would be something like this: n <- c(rep(1.5,Car[1]), rep(3,Car[2]), rep(7.7,Car[3]), rep(15,Car[4]),rep(25,Car[5]), rep(35,Car[6]), rep(50,Car[7]), rep(100,Car[8]) ) hist(n, breaks=c(0,2,5,10,20,30,40,60,200)) http://r.789695.n4.nabble.com/file/n3758198/2.png This produces a histogram, but it's a tedious an ugly way of getting there. Also, this does not allow for trend-line analysis of the likely distribution of the continuous variable distance: lines(density(n)), for example results in peaks around my arbitrary value. Has anyone else encountered similar issues? I've searched high and low but can find no solution other than creating a barplot with variable widths: http://r.789695.n4.nabble.com/Histogram-using-frequency-data-td827927.html Any ideas about how to resolve this issue very greatly appreciated. Eventually I hope to model the distribution of distances travelled in order to estimate the mean distance within each bin. Many thanks, Robin -- View this message in context: http://r.789695.n4.nabble.com/Histogram-from-frequency-data-in-pre-made-bins-tp3758198p3758198.html Sent from the R help mailing list archive at Nabble.com.
Histogram from frequency data in pre-made bins
3 messages · RobinLovelace
Update: I have recreated an artificial distribution using uniform random numbers n <- c(runif(Car[1],0,2), runif(Car[2],2,5),runif(Car[3],5,10), runif(Car[4],10,20), runif(Car[5],20,30), runif(Car[6],30,40), runif(Car[7],40,60), runif(Car[8],60,200) ) The resulting density distribution is very jumpy, but should, in theory allow me to fit a distribution to it and then extract the bin means from a random sample of the given distribution. Again, this is tedious and far from ideal, but cannot see any way around it. Also the distributions I fit to this artificial dataset shoot up to infinity as x => 0. Any ideas anyone??? -- View this message in context: http://r.789695.n4.nabble.com/Histogram-from-frequency-data-in-pre-made-bins-tp3758198p3759645.html Sent from the R help mailing list archive at Nabble.com.
Sorry to anyone who tried but failed to download the data - seems not to be there. Here's a new link to it please take a look. http://ubuntuone.com/p/1C6U/ -- View this message in context: http://r.789695.n4.nabble.com/Histogram-from-frequency-data-in-pre-made-bins-tp3758198p3760458.html Sent from the R help mailing list archive at Nabble.com.