Skip to content
Prev 8317 / 21307 Next

[Bioc-devel] Base class for interaction data - expressions of interest

While I'm on this point, there's another, more subtle issue with using 
sparseMatrix(). Specifically, there's a distinction between zeros and 
missing values when considering a ContactMatrix. For example, in Hi-C 
data, a zero in the matrix means there aren't any read pairs mapping 
between the corresponding bins. A missing value means that the count for 
the bin pair is unknown, e.g., because that particular pairwise 
interaction was missing from the InteractionSet during conversion.

This difference may be important in calculating correct statistics; one 
can imagine situations where assuming all missing values are zero would 
not be appropriate. In general, I would expect that missing values would 
take up most of the matrix entries after conversion from an 
InteractionSet. sparseMatrix() doesn't seem to support setting "NA" as 
the default value to collapse a sparse matrix; it's fixed at zero, which 
makes mathematical sense but isn't quite right for our purposes.

Now, this might not be so bad for count data, depending on how you 
counted the reads into bin pairs; converting all NA's to zeros might be 
okay in such circumstances, if the occurrence of those NA's in the first 
place was due to the lack of reads. However, if you fill the contact 
matrix with other metrics (e.g., log-FCs, average log-CPMs), assuming 
that all missing values are zero would probably be incorrect.

Anyway, food for thought.

- Aaron
On 16/11/15 10:31, Aaron Lun wrote: