Skip to content

Segmentation Fault with large dataframes and packages using rJava

1 message · Sebastian Salentin

#
Dear all,

I have been trying to perform machine learning/feature selection tasks 
in R using various packages (e.g. mlr and FSelector).
However, when giving larger data frames as input for the functions, I 
get a segmentation fault (memory not mapped).

This happened first when using the mlr benchmark function with 
dataframes in the order of 200 rows x 10,000 columns (all integer values).

I prepared a minimal working example where I get a segmentation fault 
trying to calculate the information gain with FSelector package.

require("FSelector")
# Random dataframe 200 rows * 25,000 cols
large.df <- data.frame(replicate(25000,sample(0:1,200,rep=TRUE)))
weights <- information.gain(X24978~., large.df)
print(weights)


I am using R version 3.3.0 64-bit on Ubuntu 14.04.4 LTS with FSelector 
v0.20 and rJava v0.9.8 on a machine with 32 core Intel i7 and 250 GB 
Ram. Java is OpenJDK 1.7 74bit.

I would highly appreciate if you could give me any hint on how to solve 
the problem.

Best
ssalentin