Skip to content

Cannot Compute Box's M (Three Days Trying...)

14 messages · William Dunlap, Morkus, Duncan Murdoch +1 more

#
It can't be this hard, right? I really need a shove in the right direction here. Been spinning wheels for three days. Cannot get past the errors.

I'm doing something wrong, obviously, since I can easily compute the Box's M right there in RStudio

But I don't see what is wrong below with the coding equivalent.

The entire code snippet is below. The code fails below on the call to the boxM statistic call.

PLEASE HELP!!!

Thanks in advance,

-------------------------

rConnection.eval("library('biotools')");

String inputIris = "5.1,3.5,1.4,0.2,setosa\n" +
"4.9,3,1.4,0.2,setosa\n" +
"4.7,3.2,1.3,0.2,setosa\n" +
"4.6,3.1,1.5,0.2,setosa\n" +
"5,3.6,1.4,0.2,setosa\n" +
"5.4,3.9,1.7,0.4,setosa\n" +
"4.6,3.4,1.4,0.3,setosa\n" +
"5,3.4,1.5,0.2,setosa\n" +
"4.4,2.9,1.4,0.2,setosa\n" +
"4.9,3.1,1.5,0.1,setosa\n" +
"5.4,3.7,1.5,0.2,setosa\n" +
"4.8,3.4,1.6,0.2,setosa\n" +
"4.8,3,1.4,0.1,setosa\n" +
"4.3,3,1.1,0.1,setosa\n" +
"5.8,4,1.2,0.2,setosa\n" +
"5.7,4.4,1.5,0.4,setosa\n" +
"5.4,3.9,1.3,0.4,setosa\n" +
"5.1,3.5,1.4,0.3,setosa\n" +
"5.7,3.8,1.7,0.3,setosa\n" +
"5.1,3.8,1.5,0.3,setosa\n" +
"5.4,3.4,1.7,0.2,setosa\n" +
"5.1,3.7,1.5,0.4,setosa\n" +
"4.6,3.6,1,0.2,setosa\n" +
"5.1,3.3,1.7,0.5,setosa\n" +
"4.8,3.4,1.9,0.2,setosa\n" +
"5,3,1.6,0.2,setosa\n" +
"5,3.4,1.6,0.4,setosa\n" +
"5.2,3.5,1.5,0.2,setosa\n" +
"5.2,3.4,1.4,0.2,setosa\n" +
"4.7,3.2,1.6,0.2,setosa\n" +
"4.8,3.1,1.6,0.2,setosa\n" +
"5.4,3.4,1.5,0.4,setosa\n" +
"5.2,4.1,1.5,0.1,setosa\n" +
"5.5,4.2,1.4,0.2,setosa\n" +
"4.9,3.1,1.5,0.2,setosa\n" +
"5,3.2,1.2,0.2,setosa\n" +
"5.5,3.5,1.3,0.2,setosa\n" +
"4.9,3.6,1.4,0.1,setosa\n" +
"4.4,3,1.3,0.2,setosa\n" +
"5.1,3.4,1.5,0.2,setosa\n" +
"5,3.5,1.3,0.3,setosa\n" +
"4.5,2.3,1.3,0.3,setosa\n" +
"4.4,3.2,1.3,0.2,setosa\n" +
"5,3.5,1.6,0.6,setosa\n" +
"5.1,3.8,1.9,0.4,setosa\n" +
"4.8,3,1.4,0.3,setosa\n" +
"5.1,3.8,1.6,0.2,setosa\n" +
"4.6,3.2,1.4,0.2,setosa\n" +
"5.3,3.7,1.5,0.2,setosa\n" +
"5,3.3,1.4,0.2,setosa\n" +
"7,3.2,4.7,1.4,versicolor\n" +
"6.4,3.2,4.5,1.5,versicolor\n" +
"6.9,3.1,4.9,1.5,versicolor\n" +
"5.5,2.3,4,1.3,versicolor\n" +
"6.5,2.8,4.6,1.5,versicolor\n" +
"5.7,2.8,4.5,1.3,versicolor\n" +
"6.3,3.3,4.7,1.6,versicolor\n" +
"4.9,2.4,3.3,1,versicolor\n" +
"6.6,2.9,4.6,1.3,versicolor\n" +
"5.2,2.7,3.9,1.4,versicolor\n" +
"5,2,3.5,1,versicolor\n" +
"5.9,3,4.2,1.5,versicolor\n" +
"6,2.2,4,1,versicolor\n" +
"6.1,2.9,4.7,1.4,versicolor\n" +
"5.6,2.9,3.6,1.3,versicolor\n" +
"6.7,3.1,4.4,1.4,versicolor\n" +
"5.6,3,4.5,1.5,versicolor\n" +
"5.8,2.7,4.1,1,versicolor\n" +
"6.2,2.2,4.5,1.5,versicolor\n" +
"5.6,2.5,3.9,1.1,versicolor\n" +
"5.9,3.2,4.8,1.8,versicolor\n" +
"6.1,2.8,4,1.3,versicolor\n" +
"6.3,2.5,4.9,1.5,versicolor\n" +
"6.1,2.8,4.7,1.2,versicolor\n" +
"6.4,2.9,4.3,1.3,versicolor\n" +
"6.6,3,4.4,1.4,versicolor\n" +
"6.8,2.8,4.8,1.4,versicolor\n" +
"6.7,3,5,1.7,versicolor\n" +
"6,2.9,4.5,1.5,versicolor\n" +
"5.7,2.6,3.5,1,versicolor\n" +
"5.5,2.4,3.8,1.1,versicolor\n" +
"5.5,2.4,3.7,1,versicolor\n" +
"5.8,2.7,3.9,1.2,versicolor\n" +
"6,2.7,5.1,1.6,versicolor\n" +
"5.4,3,4.5,1.5,versicolor\n" +
"6,3.4,4.5,1.6,versicolor\n" +
"6.7,3.1,4.7,1.5,versicolor\n" +
"6.3,2.3,4.4,1.3,versicolor\n" +
"5.6,3,4.1,1.3,versicolor\n" +
"5.5,2.5,4,1.3,versicolor\n" +
"5.5,2.6,4.4,1.2,versicolor\n" +
"6.1,3,4.6,1.4,versicolor\n" +
"5.8,2.6,4,1.2,versicolor\n" +
"5,2.3,3.3,1,versicolor\n" +
"5.6,2.7,4.2,1.3,versicolor\n" +
"5.7,3,4.2,1.2,versicolor\n" +
"5.7,2.9,4.2,1.3,versicolor\n" +
"6.2,2.9,4.3,1.3,versicolor\n" +
"5.1,2.5,3,1.1,versicolor\n" +
"5.7,2.8,4.1,1.3,versicolor\n" +
"6.3,3.3,6,2.5,virginica\n" +
"5.8,2.7,5.1,1.9,virginica\n" +
"7.1,3,5.9,2.1,virginica\n" +
"6.3,2.9,5.6,1.8,virginica\n" +
"6.5,3,5.8,2.2,virginica\n" +
"7.6,3,6.6,2.1,virginica\n" +
"4.9,2.5,4.5,1.7,virginica\n" +
"7.3,2.9,6.3,1.8,virginica\n" +
"6.7,2.5,5.8,1.8,virginica\n" +
"7.2,3.6,6.1,2.5,virginica\n" +
"6.5,3.2,5.1,2,virginica\n" +
"6.4,2.7,5.3,1.9,virginica\n" +
"6.8,3,5.5,2.1,virginica\n" +
"5.7,2.5,5,2,virginica\n" +
"5.8,2.8,5.1,2.4,virginica\n" +
"6.4,3.2,5.3,2.3,virginica\n" +
"6.5,3,5.5,1.8,virginica\n" +
"7.7,3.8,6.7,2.2,virginica\n" +
"7.7,2.6,6.9,2.3,virginica\n" +
"6,2.2,5,1.5,virginica\n" +
"6.9,3.2,5.7,2.3,virginica\n" +
"5.6,2.8,4.9,2,virginica\n" +
"7.7,2.8,6.7,2,virginica\n" +
"6.3,2.7,4.9,1.8,virginica\n" +
"6.7,3.3,5.7,2.1,virginica\n" +
"7.2,3.2,6,1.8,virginica\n" +
"6.2,2.8,4.8,1.8,virginica\n" +
"6.1,3,4.9,1.8,virginica\n" +
"6.4,2.8,5.6,2.1,virginica\n" +
"7.2,3,5.8,1.6,virginica\n" +
"7.4,2.8,6.1,1.9,virginica\n" +
"7.9,3.8,6.4,2,virginica\n" +
"6.4,2.8,5.6,2.2,virginica\n" +
"6.3,2.8,5.1,1.5,virginica\n" +
"6.1,2.6,5.6,1.4,virginica\n" +
"7.7,3,6.1,2.3,virginica\n" +
"6.3,3.4,5.6,2.4,virginica\n" +
"6.4,3.1,5.5,1.8,virginica\n" +
"6,3,4.8,1.8,virginica\n" +
"6.9,3.1,5.4,2.1,virginica\n" +
"6.7,3.1,5.6,2.4,virginica\n" +
"6.9,3.1,5.1,2.3,virginica\n" +
"5.8,2.7,5.1,1.9,virginica\n" +
"6.8,3.2,5.9,2.3,virginica\n" +
"6.7,3.3,5.7,2.5,virginica\n" +
"6.7,3,5.2,2.3,virginica\n" +
"6.3,2.5,5,1.9,virginica\n" +
"6.5,3,5.2,2,virginica\n" +
"6.2,3.4,5.4,2.3,virginica\n" +
"5.9,3,5.1,1.8,virginica\n";

List tableRead = rConnection.eval(
"read.csv(textConnection(\"" + inputIris + "\"), header = FALSE)").asList();  // works!

double[] d1 = ((REXPVector) ((RList) tableRead).get(0)).asDoubles();
double[] d2 = ((REXPVector) ((RList) tableRead).get(1)).asDoubles();
double[] d3 = ((REXPVector) ((RList) tableRead).get(2)).asDoubles();
double[] d4 = ((REXPVector) ((RList) tableRead).get(3)).asDoubles();
String[] d5 = ((REXPVector) ((RList) tableRead).get(4)).asStrings();

// create data frame with data.REXP myDf = REXP.createDataFrame(new RList(
new REXP[]
{
new REXPDouble(d1),
new REXPDouble(d2),
new REXPDouble(d3),
new REXPDouble(d4),
new REXPString(d5)
}));

// assign the data to a variable as was suggested.rConnection.assign("boxMVariable", myDf);

// create a string command with that variable name.String boxVariable = "boxM(boxMVariable [,-5], boxMVariable[,5]";

// try to execute the command...
// FAILS with org.rosuda.REngine.Rserve.RserveException: eval failed, request status: R parser: input incomplete>>>> FAILS ! >>>>  REXP theBoxMResult = rConnection.eval(boxVariable);    <<<< FAILS <<<<<

sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.
Message-ID: <yNSrSEBSEthk1cJc2W6y3M50LXPWIRpFNyUtLE8yIZMb4guTLWuQi9XjWd4MHnt5vxOZAK1bO32QWqqM26bRU_XpiYnip5jDiyjKWUKIH7w=@protonmail.com>
#
Does it work if you supply the closing parenthesis on the call to boxM?
The parser says the input is incomplete and a missing closing parenthesis
would cause that error..

// create a string command with that variable name.String boxVariable =
"boxM(boxMVariable [,-5], boxMVariable[,5]";

// try to execute the command...
// FAILS with org.rosuda.REngine.Rserve.RserveException: eval failed,
request status: R parser: input incomplete>>>> FAILS ! >>>>  REXP
theBoxMResult = rConnection.eval(boxVariable);    <<<< FAILS <<<<<

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 27, 2017 at 12:41 PM, Morkus via R-devel <r-devel at r-project.org>
wrote:

  
  
#
Just print the string you are asking to R to evaluate.  It doesn't make 
any sense as an R expression.  Fix that, and things will work.

Duncan Murdoch
On 27/10/2017 3:41 PM, Morkus via R-devel wrote:
#
Hi Bill,

Thanks for catching that. However, the problem remains.

If I use R debugging code with the rResponseObject below, I get a maybe better error, but it still doesn't make sense.

This is the actual error R is throwing:

Error in `[.data.frame`(boxMVariable, , -5) : undefined columns selected

Does this error make sense?

Please reply. :)

Thanks in advance.

---------------

<R Debugging code, one line substituted from earlier posting>

REXP rResponseObject = rConnection.parseAndEval("try(eval(" + boxVariable+ "),silent=TRUE)");
if (rResponseObject.inherits("try-error"))
{
System.out.println("R Serve Eval Exception : " + rResponseObject.asString());
}

Sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.
Message-ID: <dpHJqU1V654ZL2__5E_U_s-e58Z6BE4WNs7aLK00rfn-5JAVKhVaCDxuObj-IzSDpLIV78dglE20l0NFkabmimXxsYewxWl3i4BIkIw7o9c=@protonmail.com>
#
I'm not sure what you mean. Could you please be more specific?

If I print the string, I get:  boxM(boxMVariable[, -5], boxMVariable[, 5])
Message-ID: <P6V3IIqCfJ4SUFtNl9ML3Wmo2cvnL50vlzlUJdvRaAj106MjIojw5ITRt5ZQdkeOa9rnyS6HTxRgg7SUwmPY4CqGPbQFWqYEKCTr_yDVZdM=@protonmail.com>
#
On 28/10/2017 6:26 AM, Morkus wrote:
You were trying to eval an expression that you constructed in Java.  I 
was suggesting that before you eval it, you print it.
Right, that's what I was suggesting you do. Now you've fixed the syntax 
error, that looks okay.

If I'm reading these messages in the right order, your latest error is

   Error in `[.data.frame`(boxMVariable, , -5) : undefined columns selected

The expression there is a funny way of printing boxMVariable[,-5].  So 
now you need to figure out why it thinks you've selected undefined 
columns.  This is a little perplexing, because you're asking for all 
columns except column 5, and that works whether or not you have a column 
5.

So I'd guess there's something weird about boxMVariable.  You should ask 
R to print it, and to print str(boxMVariable), to make sure it's a 
regular dataframe containing 4 numeric columns and one factor or 
character column.

Duncan Murdoch
#
Thanks Duncan. Awesome ideas!

I think we're getting closer!

I tried what you suggested and got a possibly better error...
.
.
.
rConnection.assign("boxMVariable", myDf);

String resultBV = "str(boxMVariable)";   // your suggestion.

RESULTING ERROR:

Error in format.default(nam.ob, width = max(ncn), justify = "left") :  invalid 'width' argument

(No idea what this means).

For testing, I'm using the same standard IRIS dataset as the Box's M documentation shows in biotools:

Examples

data(iris)

boxM(iris[, -5], iris[, 5])

-------

Now, in the debugger, the built values of myDf are these:

myDf = {org.rosuda.REngine.REXPGenericVector at 562} "org.rosuda.REngine.REXPGenericVector at 17d99928+[5]?


? payload = {org.rosuda.REngine.RList at 566} size = 5
? 0 = {org.rosuda.REngine.REXPDouble at 570} "org.rosuda.REngine.REXPDouble at 6fffcba5[150]"
? 1 = {org.rosuda.REngine.REXPDouble at 571} "org.rosuda.REngine.REXPDouble at 34340fab[150]?
? 2 = {org.rosuda.REngine.REXPDouble at 572} "org.rosuda.REngine.REXPDouble at 2aafb23c[150]"
? 3 = {org.rosuda.REngine.REXPDouble at 573} "org.rosuda.REngine.REXPDouble at 2b80d80f[150]?
? 4 = {org.rosuda.REngine.REXPString at 574} "org.rosuda.REngine.REXPString at 3ab39c39[150]?

Does this help?

Please let me know what else I can try.

Thanks,

Sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.
Message-ID: <rl0BFmHLQl4yweaOtoWaJJEScu0VetftcUuusTtgU1qjAdU-8pgrnohMfnzvJQZTY2jPjjraq6xgDDAlxfHHZunKbfGydZILJQ8xk4g0IWc=@protonmail.com>
#
On 28/10/2017 7:12 AM, Morkus wrote:
That looks like an error occurring in the str() function.  I've never 
seen such a think in a regular R session, so I would guess that either 
your boxMVariable object is set up in a weird way that is confusing 
str(), or your R session in Java is messed up.

This is likely to be pretty hard to debug.  As a general strategy, I'd 
try to find out exactly what is in boxMVariable first.  Since str() 
doesn't work, try printing things like

head(boxMVariable)
class(boxMVariable)
names(boxMVariable)
ncol(boxMVariable)
nrow(boxMVariable)
typeof(boxMVariable)
for (i in 1:5)
   print(typeof(boxMVariable[[i]]))

etc.

Make sure the values match what you see in a regular R session:

 > boxMVariable <- iris
 >
 > head(boxMVariable)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
 > class(boxMVariable)
[1] "data.frame"
 > names(boxMVariable)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
"Species"
 > ncol(boxMVariable)
[1] 5
 > nrow(boxMVariable)
[1] 150
 > typeof(boxMVariable)
[1] "list"
 > for (i in 1:5)
+   print(typeof(boxMVariable[[i]]))
[1] "double"
[1] "double"
[1] "double"
[1] "double"
[1] "integer"
#
Hey Duncan,

Hard to debug? That's an understatement. Eyes bleeding....

In any case, I tried all your suggestions. To get "integer" for the final column, I had to change the code to get integers instead of strings.

double[] d1 = ((REXPVector) ((RList) tableRead).get(0)).asDoubles();
double[] d2 = ((REXPVector) ((RList) tableRead).get(1)).asDoubles();
double[] d3 = ((REXPVector) ((RList) tableRead).get(2)).asDoubles();
double[] d4 = ((REXPVector) ((RList) tableRead).get(3)).asDoubles();
int[] d5 = ((REXPVector) ((RList) tableRead).get(4)).asIntegers();

// create data frame with data.REXP myDf = REXP.createDataFrame(new RList(
new REXP[]
{
new REXPDouble(d1),
new REXPDouble(d2),
new REXPDouble(d3),
new REXPDouble(d4),
new REXPInteger(d5)
}));

Here are the results from the eval debug code.

head(boxMVariable)  ? Gives the high level 5 objects.

typeof(boxMVariable): ?list?

class(boxMVariable) : ?data.frame?

names(boxMVariable)  ? String object returned (couldn't evaluate it)

ncol(boxMVariable)  - 5

nrow(boxMVariable)   - 150

typeof(boxMVariable)

for (i in 1:5) print(typeof(boxMVariable[[i]]))
I get:
1 ?> double
2 ?> double
3  ?> double
4 ?> double
5 ->  integer

Is this problem "debug-proof"?

Does anyone out there actually use Java and R?

Sigh...

Sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.
Message-ID: <9BD27QZMzsNhYUSXQNpmzB6M1_eswp75Vyk6vgZ67jKV3Mf8H3ji_fE_YVPRSOl9Pb7XZDy1nfJCHyqt5J8hzVCzg9JxsJ-piLcIeu_7JZ4=@protonmail.com>
#
On 28/10/2017 8:59 AM, Morkus wrote:
The last column in iris is actually a factor.  That's stored as an 
S3-classed integer vector with an attribute listing the levels.  Using 
strings instead can cause problems in a few R functions (they want 
factors, and don't do automatic conversions), but the errors you're 
seeing seem more fundamental.
That sounds like it could be serious.  Dataframe names shouldn't be 
particularly complicated, so there shouldn't be a problem evaluating 
them.  (But maybe this is just hard in Java for some reason.  As I've 
mentioned, I'm not familiar with the R Java interface.)  If there really 
is a problem with the way the names have been constructed, that would 
explain the error in str(), and would lead to lots of other weird problems.

Another way to look at R objects from within R is to use 
.Internal(inspect( x )).  For example,

 > .Internal(inspect(names(iris)))
@7f898ff9e2e8 16 STRSXP g0c4 [NAM(2)] (len=5, tl=0)
   @7f8992c41878 09 CHARSXP g0c2 [gp=0x61,ATT] [ASCII] [cached] 
"Sepal.Length"
   @7f8992c41840 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] "Sepal.Width"
   @7f8992c41808 09 CHARSXP g0c2 [gp=0x61,ATT] [ASCII] [cached] 
"Petal.Length"
   @7f898ba99f78 09 CHARSXP g0c2 [gp=0x61,ATT] [ASCII] [cached] 
"Petal.Width"
   @7f898b9a3468 09 CHARSXP g0c1 [gp=0x61,ATT] [ASCII] [cached] "Species"

You can also look at R objects while in a debugger like gdb using the 
R_PV() function; see Writing R Extensions for details if this is 
something available to you.
I don't know anyone who does that.  It seems like a bad idea just 
because it's always easiest to do what everyone else does.

I think it's more common to call Java from R than the reverse.

Duncan Murdoch
#
Thanks Duncan. I can't tell you how helpful all your terrific replies have been.

I think the biggest surprise is that nobody appears to be using Java and R together like I"m trying to do. I suppose it should be a surprise since there are no books on the subject and almost no technical documentation other than a few sites here and there.

-----

I originally had the "int" as the return type for the factors, but that didn't make any difference.

So, let me ask you. What I can get working is calling an R Script from Java. Literally opening the ".R" file and reading it line by line and evaluating it. That works. Is there any reason why that's not a viable way to go?

The one thing I don't know how to do is pass a parameter to an RScript from Java. Is it possible to pass a parameter to an RScript from Java? If I can pass a parameter to an RScript, then it's not static and I could use it as a "function" to call for different values.

Look forward to your reply.

Sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.
Message-ID: <3vTUHD1PtBDruYfNDtqh8Aom_SVM7c5CwgImGeBOfwq5N1z329qNFlNLnj9dL1NBUJExFwkQmTaBHYy88hXIdWhd4uiamMHFlqKs06TccYU=@protonmail.com>
#
On 29/10/2017 7:26 AM, Morkus wrote:
I can't answer this very specifically, because it depends so much on 
your circumstances.  But why bother with the file system at all? 
Presumably if you can read a string, you can construct the same string 
within your Java program (perhaps as a literal string, perhaps by 
building it from local variables).
I can't really answer that question, since I have no experience at all 
in calling R from Java.  But if you want to pass a parameter named "x" 
with value 123 from Java to R, why not just construct and evaluate the 
statement "x <- 123"?

Duncan Murdoch
#
Hey Duncan,

Since Java is the #1 language and R is extremely popular, I think the most telling thing is that nobody on the "R-devel" forum (where people do "programming with R") is doing R and Java like I'm doing: calling R from Java and passing data structures.

So it appears I'm clearly pushing R somewhere it doesn't want to go. And, the boxM issue is more or less debug-proof at this moment.

Perhaps I'll need to off-shore this issue for resolution for a few hundred bucks? I'm at the point now where I probably need to just pay somebody to get this crazy BoxM thing working! :(

Thanks again,

Sent from [ProtonMail](https://protonmail.com), Swiss-based encrypted email.
Message-ID: <yBJfX4WjevNGyZ5SgHYCo6jzpWa0eniZt751IWhfzVnyh1WH1QiaGGohK3Q0D_rHbHsPTmXL9E7laTfz_wBkgBL-T2jRxipDY3Nz0J97iTM=@protonmail.com>
#
The SJava package from 18 years  ago did (does) have bidirectional calls
from R to Java and Java to R. So you are not pushing
the interface somewhere it doesn't want to go.   But you are going about it
with strings and R syntax which is a much less powerful approach
than working with actual objects and function calls, and as you are finding
out, very cumbersome.   And it helps to know both R and Java well.
You might take a look at the SJava or rJava packages and see how they do
this. But you have to do your own homework first before asking questions
you can answer yourself.

Best,
 Duncan (TL)


On Sun, Oct 29, 2017 at 6:01 AM Morkus via R-devel <r-devel at r-project.org>
wrote: