Skip to content

[Bioc-devel] Dimension (rows & columns) in the writeCDF function seems to be swapped

4 messages · Groot, Philip de, Kasper Daniel Hansen

#
Hello Karst,

You can download all required files via this link:
https://sendit.wur.nl/Download.aspx?id=9eed2afd-40a8-4770-b44b-5c0b6e55084c

Simply execute the commands in the Create_CDF.R script and everything you need will be created and installed. 

Note that also a fixed "PdInfo2Cdf.R" file is included. To create a CDF with the rows & columns NOT swapped (so that gives the error during RMA) please edit the script at the commented-out lines. Currently, the script creates a proper CDF that also does the RMA normalization properly.

Thank you for your help!

Regards,

Dr. Philip de Groot Ph.D.
Bioinformatics Researcher

Wageningen University / TIFN
Nutrigenomics Consortium
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
PO Box 8129, 6700 EV Wageningen
Visiting Address: Erfelijkheidsleer: De Valk, Building 304
Dreijenweg 2, 6703 HA  Wageningen
Room: 0052a
T: +31-317-485786
F: +31-317-483342
E-mail:   Philip.deGroot at wur.nl
Internet: http://www.nutrigenomicsconsortium.nl
             http://humannutrition.wur.nl/
             https://madmax.bioinformatics.nl/
#
Your zip file does not seem to be working.

Kasper
On Thu, May 20, 2010 at 6:12 AM, Groot, Philip de <philip.degroot at wur.nl> wrote:
#
CC to Ben Bolstad who may want to confirm a statement below and the
maintainer of affy, since the bug is there

Executive summary: there is a bug in read.affybatch having to do with
non-square microarrays (ncol and nrow were switched).  This comes from
a misunderstanding of the output of affyio::read.celfile.header which
lists dimensions as c(ncol, nrow), not as c(nrow, ncol).  I have added
names to the output of affyio::read.celfile.header and I have changed
affy::read.affybatch to use these names instead of the order of the
output.  I have committed the changes to the devel branch, but I think
release needs to be updated as well.  Version numbers have not been
bumped.  I will wait for reactions to this email first.

Long summary: To the people coming in a bit late: This is a
complicated issue Philip has reported with a pipeline taking a
platform design object, converting it into a binary CDF file using
affxparser as well as a script from the aroma.affymetrix website,
converting the cdf file into a cdf environment using makecdfenv and
finally calling rma from affy.  All of this on a non-square cel file.

Data is in the zip file
  https://sendit.wur.nl/Download.aspx?id=7becdf12-35ff-44c3-a4a7-537d0fe895a2
The main script from Philip is CreateCel.R

Note that the CDF file included above has "switched" dimensions, ie.
Philip has modified a script to swap dimensions.

Essentially, it all boils down to the following: read.celfile.header yields
$`CEL dimensions`
[1]  990 1190

Contrary to what one immediately would assume, this is actually
nColumns, nRows as stated by Ben Bolstad in another thread (and I
would agree based on a casual reading of the source code from affyio).
 I also get (using affxparser, output scrubbed)
$cols
[1] 990

$rows
[1] 1190

However, the function read.affybatch in affy assumes that the
dimensions are nRows x nColumns, as seen in the source code (line
numbers from read.affybatch.R from subversion):

     71   headdetails <- read.celfile.header(as.character(filenames[[1]]))
     72   ##now we use the length
     73   dim.intensity <- headdetails[[2]]   ##dim(intensity(cel))

<SNIP>

   108   if (!sd){
    109     return(new("AffyBatch",
    110                exprs  = exprs,
    111                ##se.exprs = array(NaN, dim=dim.sd),
    112                cdfName    = cdfname,   ##cel at cdfName,
    113                phenoData  = phenoData,
    114                nrow       = dim.intensity[1],
    115                ncol       = dim.intensity[2],
    116                annotation = cleancdfname(cdfname, addcdf=FALSE),
    117                protocolData  = protocol,
    118                description= description,
    119                notes      = notes))

Clearly the current author assumed the dimensions would be nrow x ncol
and not reversed.

Finally, a FYI to Ben: currently, the show method for AffyBatch does
_not_ report nCol x nRow as stated in another thread, as per
<SNIP>
    cat("size of arrays=", nrow(object), "x", ncol(object), " features (",
        object.size(object)%/%1024, " kb)\n", sep = "")
<SNIP>

A short time bug fix would be to change line 73 in read.affybatch.R from
  dim.intensity <- headdetails[[2]]
to
  dim.intensity <- rev(headdetails[[2]])

A better long term fix would be to put names on the output of
affyio::read.celfile.header so that the order is documented, and also
to

Kasper


On Thu, May 20, 2010 at 8:30 AM, Kasper Daniel Hansen
<khansen at stat.berkeley.edu> wrote:
#
Hello all,

I am glad that the issue is resolved at last (Thanks Kasper!). Consequently, I (or the core project members) can provide CDF-files (derived from the pdInfoBuilder packages) for the hugene, mogene, ragene ST v1.0 and 1.1 arrays so that these arrays can also be analysed utilizing the "affy" library.

The question is: who takes responsibility for this? Currently, I maintain the NuGO (custom Affymetrix) arrays and including the CDF-files (derived from the pdInfoBuilder files) for these arrays  is no big deal.

However, the issue is that for the Gene ST 1.0 arrays non-supported CDF-files have been released by Affymetrix which are reflected in the currently available CDF-environments in the Bioconductor repository. Unfortunately, the different CDF-environments for the same chip do not match, for example:

Freshly installed mogene10stv1cdf library from Bioconductor:
[1] 34760

And the CDF-library as compiled by me (derived from pdInfoBuilder):
[1] 35556

Personally, I prefer the CDF-libraries derived from the pdInfoBuilder libraries, because these exactly reflects the chips as Affymetrix intended it (utilizing the .bgp, .clf, and so on files from Affymetrix: the official support files).

So I think we need to discuss on this issue too... What CDF files will be made available, who takes responsibility, and who else needs to be included?

Regards,

Dr. Philip de Groot Ph.D.
Bioinformatics Researcher

Wageningen University / TIFN
Nutrigenomics Consortium
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
PO Box 8129, 6700 EV Wageningen
Visiting Address: Erfelijkheidsleer: De Valk, Building 304
Dreijenweg 2, 6703 HA  Wageningen
Room: 0052a
T: +31-317-485786
F: +31-317-483342
E-mail:   Philip.deGroot at wur.nl
Internet: http://www.nutrigenomicsconsortium.nl
             http://humannutrition.wur.nl/
             https://madmax.bioinformatics.nl/