Hello: I'm dealing with an issue currently that I'm not sure the best way to approach. I've got a very large (10G+) dataset that I'm trying to create a histogram for. I don't seem to be able to use hist directly as I can not create an R vector of size greater than 2.2G. I considered condensing the data previous to loading it into R and just plotting the frequencies as a barplot; unfortunately, barplot does not support plotting the values according to a set of x-axis positions. What I have is something similar to: ys <- c(12,3,7,22,10) xs <- c(1,30,35,39,60) and I'd like the bars (ys) to appear at the positions described by xs. I can get this to work on smaller sets by filling zero values in for missing ys for the entire range of xs but in my case this would again create a vector too large for R. Is there another way to use the two vectors to create a simulated frequency histogram? Is there a way to create a histogram object (as returned by hist) from the condensed data so that plot would handle it correctly? Thanks in advance, Jesse
barplot as histogram
5 messages · R. Michael Weylandt, Duncan Murdoch, Jesse Brown +1 more
Perhaps plot(xs, ys, type = "h", lwd = 3) will work? I'm not sure that a direct call to hist(, plot = F) will get around the data problems. If you type getAnywhere(hist.default) you can see the code that runs hist(): perhaps you can extract the working bits you need. Michael
On Fri, Nov 4, 2011 at 2:04 PM, Jesse Brown <jesse.r.brown at lmco.com> wrote:
Hello: I'm dealing with an issue currently that I'm not sure the best way to approach. I've got a very large (10G+) dataset that I'm trying to create a histogram for. I don't seem to be able to use hist directly as I can not create an R vector of size greater than 2.2G. I considered condensing the data ?previous to loading it into R ?and just plotting the frequencies as a barplot; unfortunately, barplot does not support plotting the values according to a set of x-axis positions. What I have is something similar to: ys <- c(12,3,7,22,10) xs <- c(1,30,35,39,60) and I'd like the bars (ys) to appear at the positions described by xs. I can get this to work on smaller sets by filling zero values in for missing ys for the entire range of xs but in my case this would again create a vector too large for R. Is there another way to use the two vectors to create a simulated frequency histogram? Is there a way to create a histogram object (as returned by hist) from the condensed data so that plot would handle it correctly? Thanks in advance, Jesse
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 04/11/2011 2:04 PM, Jesse Brown wrote:
Hello: I'm dealing with an issue currently that I'm not sure the best way to approach. I've got a very large (10G+) dataset that I'm trying to create a histogram for. I don't seem to be able to use hist directly as I can not create an R vector of size greater than 2.2G. I considered condensing the data previous to loading it into R and just plotting the frequencies as a barplot; unfortunately, barplot does not support plotting the values according to a set of x-axis positions. What I have is something similar to: ys<- c(12,3,7,22,10) xs<- c(1,30,35,39,60) and I'd like the bars (ys) to appear at the positions described by xs. I can get this to work on smaller sets by filling zero values in for missing ys for the entire range of xs but in my case this would again create a vector too large for R. Is there another way to use the two vectors to create a simulated frequency histogram? Is there a way to create a histogram object (as returned by hist) from the condensed data so that plot would handle it correctly?
Follow your own last suggestion. Take a small subset of your data, and calculate x <- hist(data, plot=FALSE) str(x) will show you the structure of the object in x. Modify the entries to reflect your full dataset, and then plot(x) will show it. Duncan Murdoch
I believe that plot(..., type='h') will do the trick. I had tried that earlier but forgot to play with the lwd parameter. Incidentally, I didn't know about getAnywhere(hist.default) - really handy. I was reading the code to find the details. Thanks! Jesse
R. Michael Weylandt wrote:
Perhaps plot(xs, ys, type = "h", lwd = 3) will work? I'm not sure that a direct call to hist(, plot = F) will get around the data problems. If you type getAnywhere(hist.default) you can see the code that runs hist(): perhaps you can extract the working bits you need. Michael On Fri, Nov 4, 2011 at 2:04 PM, Jesse Brown <jesse.r.brown at lmco.com> wrote:
Hello: I'm dealing with an issue currently that I'm not sure the best way to approach. I've got a very large (10G+) dataset that I'm trying to create a histogram for. I don't seem to be able to use hist directly as I can not create an R vector of size greater than 2.2G. I considered condensing the data previous to loading it into R and just plotting the frequencies as a barplot; unfortunately, barplot does not support plotting the values according to a set of x-axis positions. What I have is something similar to: ys <- c(12,3,7,22,10) xs <- c(1,30,35,39,60) and I'd like the bars (ys) to appear at the positions described by xs. I can get this to work on smaller sets by filling zero values in for missing ys for the entire range of xs but in my case this would again create a vector too large for R. Is there another way to use the two vectors to create a simulated frequency histogram? Is there a way to create a histogram object (as returned by hist) from the condensed data so that plot would handle it correctly? Thanks in advance, Jesse
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 11/05/2011 05:04 AM, Jesse Brown wrote:
Hello: I'm dealing with an issue currently that I'm not sure the best way to approach. I've got a very large (10G+) dataset that I'm trying to create a histogram for. I don't seem to be able to use hist directly as I can not create an R vector of size greater than 2.2G. I considered condensing the data previous to loading it into R and just plotting the frequencies as a barplot; unfortunately, barplot does not support plotting the values according to a set of x-axis positions. What I have is something similar to: ys <- c(12,3,7,22,10) xs <- c(1,30,35,39,60) and I'd like the bars (ys) to appear at the positions described by xs. I can get this to work on smaller sets by filling zero values in for missing ys for the entire range of xs but in my case this would again create a vector too large for R. Is there another way to use the two vectors to create a simulated frequency histogram? Is there a way to create a histogram object (as returned by hist) from the condensed data so that plot would handle it correctly?
Hi Jesse, I think that barp (plotrix) will get you out of trouble. Jim