Hi,
I'm attempting to "glm" a formula - something that's not caused problems in
the past. I've used formulas of the form
formula( "dependant-variable~independant-variables" )
where the independant variable string is of the form:
"indvar1+indvar2+...+indvarN"
Now, however, our independant variable strings are quite long (hundreds of
variables) - R dies with an "input buffer overflow" error. I've tried
writing out the code to files and sourcing them, as well as building the
strings incrementally in R, but these have not worked either. I have come
to believe there is a maximum length for char strings - some sort of
fundamental limitation. Is there such a max-length and, if so, is there a
way I can work with long strings of the sort referenced above?
Thank you,
J. Wilson
string-length limitations
4 messages · Thomas Lumley, Brian Ripley, jake wilson
On Wed, 12 Jul 2006, jake wilson wrote:
I'm attempting to "glm" a formula - something that's not caused problems in the past. I've used formulas of the form formula( "dependant-variable~independant-variables" ) where the independant variable string is of the form: "indvar1+indvar2+...+indvarN" Now, however, our independant variable strings are quite long (hundreds of variables) - R dies with an "input buffer overflow" error. I've tried writing out the code to files and sourcing them, as well as building the strings incrementally in R, but these have not worked either. I have come to believe there is a maximum length for char strings - some sort of fundamental limitation. Is there such a max-length and, if so, is there a way I can work with long strings of the sort referenced above?
How long are the strings, and where does the error occur (traceback())
will tell you where)?
With
fn <- function(n) formula(paste("y",paste("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",1:n,collapse="+",sep=""),sep="~"))
I can run terms(fn(500)) with no problems. This is a 15500 character
string, and produces a terms object over a megabyte in size. This suggests
that it isn't a string problem, unless you really want formulas larger
than this.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
On Wed, 12 Jul 2006, jake wilson wrote:
Hi,
I'm attempting to "glm" a formula - something that's not caused problems in
the past. I've used formulas of the form
formula( "dependant-variable~independant-variables" )
where the independant variable string is of the form:
"indvar1+indvar2+...+indvarN"
Why the quotes?: I think that is your problem.
Now, however, our independant variable strings are quite long (hundreds of variables) - R dies with an "input buffer overflow" error.
It is normal to use (y ~ ., data=mydata) to avoid such formulae.
I've tried writing out the code to files and sourcing them, as well as building the strings incrementally in R, but these have not worked either. I have come to believe there is a maximum length for char strings - some sort of fundamental limitation. Is there such a max-length and, if so, is there a way I can work with long strings of the sort referenced above?
The limit is 2^31 -1, not relevant here.
Your message is coming from the parser, and suggests that it is trying to
parse a piece of text longer than MAXELTSIZE bytes. The latter depends on
the platform (unstated: do see the posting guide) and is often 8196 bytes.
So there is a limit on the length of quoted strings which can be input.
However, what is wrong with say
tmp <- paste(paste("indvar", 1:1000, sep=""), collapse="+")
tmp <- paste("y ~", tmp)
form <- eval(parse(text=tmp))
?
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thank you - the proposed solution (eliminating quotes) worked.
From: Prof Brian Ripley <ripley at stats.ox.ac.uk> To: jake wilson <jake_f_wilson at hotmail.com> CC: r-devel at r-project.org Subject: Re: [Rd] string-length limitations Date: Wed, 12 Jul 2006 15:38:22 +0100 (BST) On Wed, 12 Jul 2006, jake wilson wrote:
Hi, I'm attempting to "glm" a formula - something that's not caused problems
in
the past. I've used formulas of the form
formula( "dependant-variable~independant-variables" )
where the independant variable string is of the form:
"indvar1+indvar2+...+indvarN"
Why the quotes?: I think that is your problem.
Now, however, our independant variable strings are quite long (hundreds
of
variables) - R dies with an "input buffer overflow" error.
It is normal to use (y ~ ., data=mydata) to avoid such formulae.
I've tried writing out the code to files and sourcing them, as well as building the strings incrementally in R, but these have not worked either. I have come to believe there is a maximum length for char strings - some sort of fundamental limitation. Is there such a max-length and, if so, is there a way I can work with long strings of the sort referenced above?
The limit is 2^31 -1, not relevant here.
Your message is coming from the parser, and suggests that it is trying to
parse a piece of text longer than MAXELTSIZE bytes. The latter depends on
the platform (unstated: do see the posting guide) and is often 8196 bytes.
So there is a limit on the length of quoted strings which can be input.
However, what is wrong with say
tmp <- paste(paste("indvar", 1:1000, sep=""), collapse="+")
tmp <- paste("y ~", tmp)
form <- eval(parse(text=tmp))
?
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595