Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

Mon, Jul 23, 2012 6:14 AM

Where should this be discussed since it is definitely XTS related?  I will
gladly upload the simplified script + data files to whoever is maintaining
this part of the code.  Fortunately there is a workaround here.

-----Original Message-----
From: Joshua Ulrich [mailto:josh.m.ulrich at gmail.com] 
Sent: Monday, July 23, 2012 8:15 AM
To: David Terk
Cc: Duncan Murdoch; r-devel at r-project.org
Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug
in R 2.15.1 64-bit Ubuntu

David,

You still haven't provided a reproducible example.  As Duncan already said,
"if you don't post code that allows us to reproduce the crash, it's really
unlikely that we'll be able to fix it."

And R-devel is not the appropriate venue to discuss this if it's truly an
issue with xts/zoo.

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com

On Mon, Jul 23, 2012 at 12:41 AM, David Terk <david.terk at gmail.com> wrote:

the code which

is causing the problem it will actually run.   So I think it is safe to
assume something wrong is taking place with memory allocation.  Example.
While testing, I have been able to get to a point where the code will run.
But if I reboot the machine and try again, the code will not run.

The bug itself is happening somewhere in XTS or ZOO.  I will gladly 
upload the data files.  It is happening on the 10th data file which is 
only 225k lines in size.

Below is the simplified code.  The call to either

dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
index(dat.i) <- index(to.period(templateTimes, period=per, k=subper))

is what is causing R to hang or crash.  I have been able to replicate 
this on Windows 7 64 bit and Ubuntu 64 bit.  Seems easiest to 
consistently replicate from R Studio.

The code below will consistently replicate when the appropriate files 
are used.

parseTickDataFromDir = function(tickerDir, per, subper) {
  tickerAbsFilenames = list.files(tickerDir,full.names=T)
  tickerNames = list.files(tickerDir,full.names=F)
  tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)
  pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), 
style = 3)

  for(i in 1:length(tickerAbsFilenames)) {
    dat.i = parseTickData(tickerAbsFilenames[i])
    dates <- unique(substr(as.character(index(dat.i)), 1,10))
    times <- rep("09:30:00", length(dates))
    openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")
    templateTimes <- NULL

    for (j in 1:length(openDateTimes)) {
      if (is.null(templateTimes)) {
        templateTimes <- openDateTimes[j] + 0:23400
      } else {
        templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400)
      }
    }

    templateTimes <- as.xts(templateTimes)
    dat.i <- merge(dat.i, templateTimes, all=T)
    if (is.na(dat.i[1])) {
      dat.i[1] <- -1
    }
    dat.i <- na.locf(dat.i)
        dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)
        index(dat.i) <- index(to.period(templateTimes, period=per,
k=subper))
    setTxtProgressBar(pb, i)
  }
  close(pb)
}

parseTickData <- function(inputFile) {
  DAT.list <- scan(file=inputFile,
sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)
  index <- 
as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
%H:%M:%S")
  DAT.xts <- xts(DAT.list$Close,index)
  DAT.xts <- make.index.unique(DAT.xts)
  return(DAT.xts)
}

DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10)

-----Original Message-----
From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
Sent: Sunday, July 22, 2012 4:48 PM
To: David Terk
Cc: r-devel at r-project.org
Subject: Re: [Rd] Reading many large files causes R to crash - 
Possible Bug in R 2.15.1 64-bit Ubuntu

On 12-07-22 3:54 PM, David Terk wrote:

I am reading several hundred files.  Anywhere from 50k-400k in size.
It appears that when I read these files with R 2.15.1 the process 
will hang or seg fault on the scan() call.  This does not happen on R

2.14.1.

functions.

the bug.

If you don't post code that allows us to reproduce the crash, it's 
really unlikely that we'll be able to fix it.

Duncan Murdoch



This is happening on the precise build of Ubuntu.



I have included everything, but the issue appears to be when 
performing the scan in the method parseTickData.



Below is the code.  Hopefully this is the right place to post.



parseTickDataFromDir = function(tickerDir, per, subper, fun) {

   tickerAbsFilenames = list.files(tickerDir,full.names=T)

   tickerNames = list.files(tickerDir,full.names=F)

   tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames)

   pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), 
style = 3)



   for(i in 1:length(tickerAbsFilenames)) {



     # Grab Raw Tick Data

     dat.i = parseTickData(tickerAbsFilenames[i])

     #Sys.sleep(1)

     # Create Template

     dates <- unique(substr(as.character(index(dat.i)), 1,10))

     times <- rep("09:30:00", length(dates))

     openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S")

     templateTimes <- NULL



     for (j in 1:length(openDateTimes)) {

       if (is.null(templateTimes)) {

         templateTimes <- openDateTimes[j] + 0:23400

       } else {

         templateTimes <- c(templateTimes, openDateTimes[j] + 
0:23400)

       }

     }



     # Convert templateTimes to XTS, merge with data and convert NA's

     templateTimes <- as.xts(templateTimes)

     dat.i <- merge(dat.i, templateTimes, all=T)

     # If there is no data in the first print, we will have leading 
NA's.  So set them to -1.

     # Since we do not want these values removed by to.period

     if (is.na(dat.i[1])) {

       dat.i[1] <- -1

     }

     # Fix remaining NA's

     dat.i <- na.locf(dat.i)

     # Convert to desired bucket size

     dat.i <- to.period(dat.i, period=per, k=subper, name=NULL)

     # Always use templated index, otherwise merge fails with other 
symbols

     index(dat.i) <- index(to.period(templateTimes, period=per,
k=subper))

     # If there was missing data at open, set close to NA

     valsToChange <- which(dat.i[,"Open"] == -1)

     if (length(valsToChange) != 0) {

       dat.i[valsToChange, "Close"] <- NA

     }

     if(i == 1) {

       DAT = fun(dat.i)

     } else {

       DAT = merge(DAT,fun(dat.i))

     }

     setTxtProgressBar(pb, i)

   }

   close(pb)

   colnames(DAT) = tickerNames

   return(DAT)

}



parseTickData <- function(inputFile) {

   DAT.list <- scan(file=inputFile,
sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T)

   index <-
as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y
%H:%M:%S")

   DAT.xts <- xts(DAT.list$Close,index)

   DAT.xts <- make.index.unique(DAT.xts)

   return(DAT.xts)

}






      [[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu

Thread (13 messages)