Skip to content
Prev 201743 / 398506 Next

questions on the ff package

Hi Jens,

Thanks for your prompt and informative answers. ff is a fabulous package and
your suggestions helped me solve my problems at hands.

As I need to incrementally increase each of several large matrices (about
1000 rows *10000 columns, 1000 matrices) by a row every day. I wonder how
efficiently it is to do the following command on a frequent basis.

nrow(matFF) <- nrow(matFF)+1 

as far as I know for mmap, a fixed size of file is preallocated. I don't
know the ff implementation and how it handle file size changes. Does the
command in the line above preallocate, say 10% more space of the current
size so that no large file copying is needed each time the nrow is changed?



Another problem I am facing is that I have over 2000 large matrices that
need the help of ff. Suppose I have a list of 2000 ff objects. My computing
environment is 64bit linux, 64Gb mem. I remember there is some limitations
on the maximum of files that can be opened in linux. If I need to access
each matrix, would you think I can open each matrix and leave them open or I
need to close it after it is opened and used?

Thanks a lot

Jeff

-----Original Message-----
From: "Jens Oehlschl?gel" [mailto:oehl_list at gmx.de] 
Sent: Wednesday, November 25, 2009 8:04 AM
To: hcen at andrew.cmu.edu
Cc: r-help at lists.R-project.org
Subject: [R] questions on the ff package

Jeff,
# This stores the data in an ff file, 
# but not the metadata in R's ff object. 
# To do the latter you need to do 
save(matFF, file="~/matFF.RData")

# Assuming that your ff file remains in the same location, 
# in a new R session you simply 
load(file="~/matFF.RData")
# and the ff file is available automagically
# You can create an ff object using your existing ff file by
matFF <- ff(filename="~/a.mat", vmode="double", dim=c(4,5))

# You can do the same at unknown file size with 
matFF <- ff(filename="~/a.mat", vmode="double")
# which gives you the length of the ff object
length(matFF)
# if you know the number of columns you can calculate the number of rows and
give your ff object the interpretation of a matrix
dim(matFF) <- c(length(matFF)/5, 5)
# there are two ways to grow a matrix by rows

# 1) you create the matrix in major row order
matFF <- ff(1:20, dim=c(4,5), dimorder=c(2:1))
# then you require a higher number of rows
nrow(matFF) <- 6
# as you can see there are new empty rows in the file
matFF

# 2) Instead of a matrix you create a ffdf data.frame
#    which you can also give more rows using nrow<-
#    An example of this is in read.table.ffdf
#    which reads a csv file in chunks and extends the 
#    number of rows in the ffdf

Jens Oehlschl?gel