An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130119/2539833c/attachment.pl>
importing large datasets in R
8 messages · gaurav singh, Wensui Liu, Duncan Murdoch +4 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130119/fa1fc366/attachment.pl>
On 13-01-19 3:28 AM, gaurav singh wrote:
Hi Everyone, I am a little new to R and the first problem I am facing is the dilemma whether R is suitable for files of size 2 GB's and slightly more then 2 Million rows. When I try importing the data using read.table, it seems to take forever and I have to cancel the command. Are there any special techniques or methods which i can use or some tricks of the game that I should keep in mind in order to be able to do data analysis on such large files using R?
Specifying the type of each column with colClasses will speed up read.table a lot in a big file. You have a lot of data, so having a lot of memory will help. You may want to work in 64 bit R, which has access to a lot more than 32 bit R sees. Duncan Murdoch
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130119/46aa1e17/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130119/3ad7ed9f/attachment.pl>
I'm not sure I understand your question. It is always better to use an example:
set.seed(42)
dataN <- array(sapply(1:5, function(i) assign(paste0("data",i),
+ matrix(rnorm(6), 2, 3))), c(2, 3, 5))
meanmtrx <- apply(dataN,1:2,mean) meanmtrx
[,1] [,2] [,3] [1,] 0.189669255 0.3368646 0.34261301 [2,] -0.009700353 -0.4676745 0.01974906 The result is a matrix, not a data frame, and certainly not "resulting data frames." What are you trying to cbind? ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of ya Sent: Saturday, January 19, 2013 8:50 AM To: r-help Subject: [R] calculating mean matrix Hi list, Thank you vey much for reading this post. I have a data frame, I am trying to split it into a couple of data frame using one of the columns, say, x. After I get the data frames, I am planning to treat them as matrices and trying to calculate an element by element mean matrix. Could anyone give me some advice how to do it? So far, I know that if I have a couple of matrices, say data1,data2,data3,data4...dataN, I can do it like this: data=array(cbind(data1,data2,data3,data4,....dataN), c(2, 3, N)) #2 refers to row number of matrix, 3 refers to column number of matrix, N refers to number of matrices to be averaged. meanmtrx=apply(data,1:2,mean) but I do not know how to use the resulting data frames with cbind(). Maybe there are other better ways. Any advice is appreciated. Thank you very much. Have a nice day. ya [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130119/258877f4/attachment.pl>
Hi,
This could be also done by:
#Using Arun's example:
?res<- Reduce('+', split(df, grp))/length(levels(grp))
res
?# ??? V1??? V2??? V3??? V4??? V5??? V6??? V7??? V8??? V9?? V10 #1? 417.3 792.2 504.2 506.1 513.9 480.7 545.4 564.4 473.7 486.2 #2? 585.8 416.6 409.5 417.8 480.1 586.4 436.1 615.1 449.8 501.2 #3? 459.3 449.1 542.0 411.6 404.6 507.6 472.0 344.0 363.2 485.1 #4? 591.1 448.4 482.6 464.0 554.0 374.1 567.9 450.0 477.9 488.0 #5? 433.1 438.2 441.4 596.4 356.9 461.6 356.7 457.4 434.9 510.4 6 # 425.7 498.3 452.0 489.4 302.8 538.1 270.6 418.6 564.1 545.8 #7? 755.1 526.4 615.2 559.9 483.3 379.7 439.3 458.8 528.5 564.0 #8? 599.7 579.4 473.2 585.1 508.3 643.7 432.1 587.2 547.6 506.2 #9? 471.8 321.0 375.8 394.4 355.5 434.4 532.1 640.5 490.1 619.1 #10 356.6 434.3 403.9 445.0 416.2 532.8 570.9 548.9 697.9 488.8 library(plyr) ?res1<-aaply(laply(split(df,((1:nrow(df)-1)%/% 10)+1),as.matrix),c(2,3),mean) res1 #??? X2 #X1????? V1??? V2??? V3??? V4??? V5??? V6??? V7??? V8??? V9?? V10 ?# 1? 417.3 792.2 504.2 506.1 513.9 480.7 545.4 564.4 473.7 486.2 ?# 2? 585.8 416.6 409.5 417.8 480.1 586.4 436.1 615.1 449.8 501.2 ?# 3? 459.3 449.1 542.0 411.6 404.6 507.6 472.0 344.0 363.2 485.1 ?# 4? 591.1 448.4 482.6 464.0 554.0 374.1 567.9 450.0 477.9 488.0 ?# 5? 433.1 438.2 441.4 596.4 356.9 461.6 356.7 457.4 434.9 510.4 ?# 6? 425.7 498.3 452.0 489.4 302.8 538.1 270.6 418.6 564.1 545.8 ?# 7? 755.1 526.4 615.2 559.9 483.3 379.7 439.3 458.8 528.5 564.0 ? #8? 599.7 579.4 473.2 585.1 508.3 643.7 432.1 587.2 547.6 506.2 ? #9? 471.8 321.0 375.8 394.4 355.5 434.4 532.1 640.5 490.1 619.1 ? #10 356.6 434.3 403.9 445.0 416.2 532.8 570.9 548.9 697.9 488.8 A.K. ----- Original Message ----- From: ya <xinxi813 at 126.com> To: r-help <r-help at r-project.org> Cc: Sent: Saturday, January 19, 2013 9:49 AM Subject: [R] calculating mean matrix Hi list, Thank you vey much for reading this post. I have a data frame, I am trying to split it into a couple of data frame using one of the columns, say, x. After I get the data frames, I am planning to treat them as matrices and trying to calculate an element by element mean matrix. Could anyone give me some advice how to do it? So far, I know that if I have a couple of matrices, say data1,data2,data3,data4...dataN, I can do it like this: data=array(cbind(data1,data2,data3,data4,....dataN), c(2, 3, N)) #2 refers to row number of matrix, 3 refers to column number of matrix, N refers to number of matrices to be averaged. meanmtrx=apply(data,1:2,mean) but I do not know how to use the resulting data frames with cbind(). Maybe there are other better ways. Any advice is appreciated. Thank you very much. Have a nice day. ya ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.