Skip to content

boxplot.formula with missing values (PR#6846)

3 messages · Brian Ripley, Ramon Diaz-Uriarte

#
If an array has missing values in different rows, plotting using the formul=
a=20
interface can produce errors. Example:


fake.data <- matrix(rep(-100:100, 4),
                    ncol =3D 4)

par(mfrow =3D c(1,2))
boxplot(fake.data ~ col(fake.data))
abline(h =3D 0, lty =3D 2)
boxplot(as.data.frame(fake.data))
abline(h =3D 0, lty =3D 2)

##### Add the missing data
fake.data[190:200, 1] <- NA
fake.data[1:5, 3] <- NA

## Bot only columns 1 and 3 should change!! (and in opposite directions)
par(mfrow =3D c(1, 2))
boxplot(fake.data ~ col(fake.data))
abline(h =3D 0, lty =3D 2)
boxplot(as.data.frame(fake.data))
abline(h =3D 0, lty =3D 2)

### The problem is that the same rows are removed from all the columns:

bp.a <- boxplot(fake.data ~ col(fake.data))
bp.df<- boxplot(as.data.frame(fake.data))

### which happens during the call to

eval(m, parent.frame())

inside boxplot.formula

**********************************

This happens in at least:

         _               =20
platform i686-pc-linux-gnu
arch     i686            =20
os       linux-gnu       =20
system   i686, linux-gnu =20
status   Patched         =20
major    1               =20
minor    9.0             =20
year     2004            =20
month    05              =20
day      02              =20
language R   =20

         _               =20
platform i386-pc-linux-gnu

arch     i386            =20
os       linux-gnu       =20
system   i386, linux-gnu =20
status                   =20
major    1               =20
minor    8.1             =20
year     2003            =20
month    11              =20
day      21              =20
language R          =20

         _                          =20
platform i686-pc-linux-gnu          =20
arch     i686                       =20
os       linux-gnu                  =20
system   i686, linux-gnu            =20
status   Under development (unstable)
major    2                          =20
minor    0.0                        =20
year     2004                       =20
month    04                         =20
day      30                         =20
language R         =20





=2D-=20
Ram=F3n D=EDaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncol=F3gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern=E1ndez Almagro, 3
28029 Madrid (Spain)
=46ax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
#
I think this *is* the correct behaviour for a formula method. The problem
I see is that boxplot.formula does not have an na.action argument and so
you may not have realised that na.action=na.omit is the default.

Note that subset= will `remove the same rows from all columns', too.

It really is not the intention that the formula interface is used with 
matrices, and as.vector will do what I think you intended:

	boxplot(as.vector(fake.data) ~ as.vector(col(fake.data)))

Also, setting options(na.action=na.pass) will work as you expected.

I've added an na.action argument for R-devel.
On Mon, 3 May 2004 rdiaz@cnio.es wrote:

            
Well, not do what you expected, but the error appears to be in your 
expectations.

  
    
#
Thanks for your comments. I understand this is probably the correct behavior 
for a formula method, but I also think that it is not what many people 
expect, and the note about NA behavior did not seem to help me. Thus, I 
appreciate your adding the na.action.

Thanks,

R.
On Monday 03 May 2004 13:33, Prof Brian Ripley wrote: