Skip to content

freetype 2.5.2, problem with the survival package, build R 2.15.x with gcc 4.8.x

3 messages · Hin-Tak Leung, Terry Therneau, David Winsemius

#
Here is a rather long discussion etc about freetype 2.5.2, problem with the 
survival package, and build R 2.15.x with gcc 4.8.x. Please feel free to skip 
forward.

- freetype 2.5.2:

the fix to cope with one of the Mac OS X's system fonts just before the release 
of freetype 2.5.1 caused a regression, crashing over one of Microsoft windows' 
system fonts. So there is a 2.5.2. There are new 2.5.2 bundles for windows & Mac 
OS X. The official win/mac binaries of R were built statically with 2+-years-old 
freetype with a few known problems. Most should upgrade/rebuild.

http://sourceforge.net/projects/outmodedbonsai/files/R/

- problem with the survival package:

Trying to re-run a vignette to get the same result as two years ago
reveal a strange change. I went and bisected it down to
r11513 and r11516 of the survival package.

-------------- r11513 --------------------
clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set))


                    coef exp(coef) se(coef)     z      p
addContr(A)2     -0.620     0.538    0.217 -2.86 0.0043
addContr(C)2      0.482     1.620    0.217  2.22 0.0270
addContr(A.C)1-2 -0.778     0.459    0.275 -2.83 0.0047
addContr(A.C)2-1     NA        NA    0.000    NA     NA
addContr(A.C)2-2     NA        NA    0.000    NA     NA

Likelihood ratio test=26  on 3 df, p=9.49e-06  n= 13110, number of events= 3524
------------------------------------------

------------- r11516 ---------------------
clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set))


                      coef exp(coef) se(coef)         z  p
addContr(A)2     -0.14250     0.867   110812 -1.29e-06  1
addContr(C)2      0.00525     1.005   110812  4.74e-08  1
addContr(A.C)1-2 -0.30097     0.740   110812 -2.72e-06  1
addContr(A.C)2-1 -0.47712     0.621   110812 -4.31e-06  1
addContr(A.C)2-2       NA        NA        0        NA NA

Likelihood ratio test=26  on 4 df, p=3.15e-05  n= 13110, number of events= 3524
------------------------------------------

r11514 does not build, and r11515 have serious memory hogs, so the survival
package broke somewhere between r11513 and r11516. Anyway, here is the diff in
the vignette, and the data, etc is in the directory above. If somebody want to
fix this before I spend any more time on this particular matter, please feel 
free to do so.

http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/practical8_survival-clogit-diff.pdf/download

That's the one problem from David's 10 practicals which are not due to bugs in 
snpStats. Some might find it reassuring that only 3 of the 4 problems with the 
practicals are due to snpStats bugs.

http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/practical7_snpStatsBug-diff.pdf/download
http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/practical6_snpStatsBug-diff.pdf/download
http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/practical3_snpStatsBug-diff.pdf/download

- build R 2.15.x with gcc 4.8.x

I wish the R commit log was a bit more detailed with r62430 than just
"tweak needed for gcc 4.8.x". Anyway, building R 2.15.x with gcc 4.8.x
could result in segfaults in usage as innocent and essential
as running summary() on a data.frame:

--------------------------------
  *** caught segfault ***
address 0x2f8e6a00, cause 'memory not mapped'

Traceback:
  1: sort.list(y)
  2: factor(a, exclude = exclude)
  3: table(object, exclude = NULL)
  4: summary.default(X[[3L]], ...)
  5: FUN(X[[3L]], ...)
  6: lapply(X = as.list(object), FUN = summary, maxsum = maxsum, digits = 12, 
   ...)
  7: summary.data.frame(support)
...
--------------------------------

r62430 needs a bit of adapting to apply to R 2.15.x , but you get the idea.
I hope this info is useful to somebody else who is still using R 2.15.x , no 
doubt for very good reasons.
Hin-Tak Leung wrote:
1 day later
#
I was sent a copy of the data, and this is what I get on a different machine:
data=pscc)
Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
   Loglik converged before variable  1,2,3,4 ; beta may be infinite.
2: In coxph(formula = Surv(rep(1, 13110L), cc) ~ addContr(A) + addContr(C) +  :
   X matrix deemed to be singular; variable 5
coef exp(coef) se(coef)         z  p
addContr(A)2     -0.14250     0.867   110812 -1.29e-06  1
addContr(C)2      0.00525     1.005   110812  4.74e-08  1
addContr(A.C)1-2 -0.30097     0.740   110812 -2.72e-06  1
addContr(A.C)2-1 -0.47712     0.621   110812 -4.31e-06  1
addContr(A.C)2-2       NA        NA        0        NA NA
[1] 1.932097e+02 2.700101e+01 1.624731e+01 6.049630e-15 2.031334e-15

The primary issue is that the covariates matrix is singular, having rank 3 instead of rank 5.
The coxph routine prints two warning messages that things are not good about the matrix. 
Warning messages should not be ignored!  The insane se(coef) values in the printed result 
are an even bigger clue that the model fit is suspect. Unfortunately, some small change in 
the iteration path or numerics has put this data set over the edge from being seen as rank 
3 (old run) to rank 4 (new run).  Moral: coxph does pretty well at detecting redundat 
variables, but if you know of some it never hurts to help the routine out by removing them 
before the fit.

Singularity of the X matrix in a Cox model is very difficult to detect reliably; the 
current threshold is the result of long experience and experiment to give as few false 
messages as possible.  (The RMS package in particular used truncated power basis functions 
for the splines, which lead to X matrices that look almost singular numerically, but are 
not.)  Setting a little less stringent threshold for declaring singularity in the cholesky 
decompostion sufficies for this data set.

fit2 <- clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set),
          data=pscc, toler.chol=1e-10)

I'll certainly add this to my list of test problems that I use to tune those constants.

Terry Therneau
On 12/11/2013 09:30 PM, Hin-Tak Leung wrote:
#
On Dec 11, 2013, at 7:30 PM, Hin-Tak Leung wrote:

            
First: Sorry for the blank message. Need more coffee.

Second: Does this mean that only Mac users who are still using 2.15.x need to worry about this issue?

Third: I'm reading this (and Terry's comment about singularity conditions)  to mean that a numerical  discrepancy between vignette output when code was run being from what was expected was causing a segfault under some situation that I cannot quite reconstruct. Was the implication that Mac users (of 2.15.x) need to build from sources only if they wanted to build the survival package from source? Does this have any implications for those of us who use the survival package as the binary? (And I'm using 3.0.2, so a split answer might be needed to cover 2.15.x and the current versions separately)