Hi all,
For a project we have to process some very large CSV files (up to 40 gig)
To reduce them in size and increase operating performance I wanted to store
them as RData files.
Since it was to big I decided to split the csv and saving those parts as
separate .RDA files.
So far so good. Now I want to bind them all together to save as one RDA file
again and this is supprisingly difficult.
First I load my rda files into my environment:
load(paste(rdaoutputdir, "file1.rda", sep=""))
load(paste(rdaoutputdir, "file2.rda", sep=""))
load(paste(rdaoutputdir, "file3.rda", sep=""))
etc
Then I try to combine them into one object.
Using rbind like this gives memory allocation problems ('Error: cannot
allocate vector of size')
objectToSave <- rbind(object1, object2, object3)
using pre-allocation gives me a factor level error. I used this code:
nextrow <- nrow(object1)+1
object1[nextrow:(nextrow+nrow(object2)-1),] <- object2
# we need to assure unique row names
row.names(object1) = 1:nrow(object1)
rm(object2)
gc()
15! warning messages:
1: In `[<-.factor`(`*tmp*`, iseq, value = structure(c(1L, ... :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, iseq, value = structure(c(1L, ... :
invalid factor level, NA generated
What can I do?
Regards Derk
--
View this message in context: http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041.html
Sent from the R help mailing list archive at Nabble.com.
Saving multiple rda-files as one rda-file
7 messages · Dark, PIKAL Petr, David Winsemius +1 more
2 days later
Really no one has any suggestions on this issue? -- View this message in context: http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041p4672278.html Sent from the R help mailing list archive at Nabble.com.
Hi
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Dark Sent: Thursday, July 25, 2013 11:00 AM To: r-help at r-project.org Subject: Re: [R] Saving multiple rda-files as one rda-file Really no one has any suggestions on this issue?
What issue? AFAIK you can load any number of RDA files to your workspace and save your workspace as one file. I do not see any problem. Regards Petr
-- View this message in context: http://r.789695.n4.nabble.com/Saving- multiple-rda-files-as-one-rda-file-tp4672041p4672278.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, Yes maybe I should have been more clear on my problem. I want to append the different data-frames back into one variable ( rbind ) and save it as one R Data file. Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041p4672313.html Sent from the R help mailing list archive at Nabble.com.
On Jul 22, 2013, at 4:18 AM, Dark wrote:
Hi all,
For a project we have to process some very large CSV files (up to 40 gig)
To reduce them in size and increase operating performance I wanted to store
them as RData files.
Since it was to big I decided to split the csv and saving those parts as
separate .RDA files.
So far so good. Now I want to bind them all together to save as one RDA file
again and this is supprisingly difficult.
First I load my rda files into my environment:
load(paste(rdaoutputdir, "file1.rda", sep=""))
load(paste(rdaoutputdir, "file2.rda", sep=""))
load(paste(rdaoutputdir, "file3.rda", sep=""))
etc
Then I try to combine them into one object.
Using rbind like this gives memory allocation problems ('Error: cannot
allocate vector of size')
objectToSave <- rbind(object1, object2, object3)
using pre-allocation gives me a factor level error. I used this code:
nextrow <- nrow(object1)+1
object1[nextrow:(nextrow+nrow(object2)-1),] <- object2
# we need to assure unique row names
row.names(object1) = 1:nrow(object1)
rm(object2)
gc()
15! warning messages:
1: In `[<-.factor`(`*tmp*`, iseq, value = structure(c(1L, ... :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, iseq, value = structure(c(1L, ... :
invalid factor level, NA generated
The warning messages suggests that the factor levels in object1, object2, object3 in corresponding columns are not the same.
What can I do?
You can identify which columns are factors and make the corresponding columns have levels that span the values. OR: Depending on the contents of that factor you could convert to character before the rbind operation. If the levels are not particularly long (in character length), that procedure might not expand the memory footprint very much.
David > > Regards Derk > > David Winsemius Alameda, CA, USA
On Jul 25, 2013, at 7:17 AM, Dark wrote:
Hi, Yes maybe I should have been more clear on my problem. I want to append the different data-frames back into one variable ( rbind ) and save it as one R Data file.
Indeed. That was the operation I had in mind when I made my suggestions. Perhaps you need to create a set of toy dataframes with similar structure and then the audience can propose solutions. That's the usual process around these parts.
David. > Regards Derk > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041p4672313.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130726/c53b2491/attachment.pl>