How is possible to split a .csv file in terms of size (in KiloByte) ?
-----Original Message-----
From: jim holtman [mailto:jholtman at gmail.com]
Sent: Tuesday, July 24, 2012 11:30 PM
To: Akkara, Antony (GE Energy, Non-GE)
Cc: r-help at r-project.org
Subject: Re: [R] ERROR : cannot allocate vector of size (in MB & GB)
try this:
input <- file("yourLargeCSV", "r")
fileNo <- 1
repeat{
myLines <- readLines(input, n=100000) # 100K lines / file
if (length(myLines) == 0) break
writeLines(myLines, sprintf("output%03d.csv", fileNo))
fileNo <- fileNo + 1
}
close(input)
On Tue, Jul 24, 2012 at 9:45 AM, Rantony <antony.akkara at ge.com> wrote:
Hi,
Here in R, I need to load a huge file(.csv) , its size is 200MB. [may
come more than 1GB sometimes].
When i tried to load into a variable it taking too much of time and
after that when i do cbind by groups, getting an error like this
" Error: cannot allocate vector of size 82.4 Mb "
My requirement is, spilt data from Huge-size-file(.csv) to no. of
small csv files.
Here i will give no of lines to be 'split by' as input.
Below i give my code
-------------------------------
SplitLargeCSVToMany <-
function(DataMatrix,Destination,NoOfLineToGroup)
{
test <- data.frame(read.csv(DataMatrix))
# create groups No.of rows
group <- rep(1:NROW(test),
each=NoOfLineToGroup)
new.test <- cbind(test, group=group)
new.test2 <- new.test
new.test2[,ncol(new.test2)] <- NULL
# now get indices to write out
indices <- split(seq(nrow(test)), new.test[,
'group'])
# now write out the files
for (i in names(indices))
{
write.csv(new.test2[indices[[i]],],
file=paste(Destination,"data.", i, ".csv", sep=""),row.names=FALSE)
}
}
-----------------------------------------------------
My system Configuration is,
Intel Core2 Duo
speed : 3GHz
2 GB RAM
OS: Windows-XP [ServicePack-3]
---------------------------------------------------
Any hope to solve this issue ?
Thanks in advance,
Antony.
--
View this message in context:
http://r.789695.n4.nabble.com/ERROR-cannot-allocate-vector-of-size-in-
MB-GB-tp4637597.html Sent from the R help mailing list archive at
Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.