mutually exclusive events

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140802/3cb28888/attachment.pl>

Hi:

I am trying to identify mutually exclusive events from the following
example:

#-------------
 dat <- read.table(text="Cluster      Gene      Mutated    not_mutated
  1             G1             1              0
  1             G2             1              0
  1             G3             0              1
  1             G4             0              1
  1             G5             1              0
  2             G1             0              1
  2             G2             1              0
  2             G3             1              0
  2             G4             0              0
  2             G5             1              0", header=TRUE, stringsAsFactors=FALSE)

 with(dat, table(Cluster, Gene, Mutated)  )
#----------------
, , Mutated = 0

       Gene
Cluster G1 G2 G3 G4 G5
      1  0  0  1  1  0
      2  1  0  0  1  0

, , Mutated = 1

       Gene
Cluster G1 G2 G3 G4 G5
      1  1  1  0  0  1
      2  0  1  1  0  1
#--------------
Or:
xtabs(Mutated ~ Cluster+Gene, data=dat)
#----------------
       Gene
Cluster G1 G2 G3 G4 G5
      1  1  1  0  0  1
      2  0  1  1  0  1

I'm a bit unclear about your goals. Are you trying to identify the "Gene"s that have only one "Cluster" mutated as the "G1-G3" events and the Gene's that have either-Cluster but not both as the "G2-G5" events?

If so you can choose the columns that have a sum of 2 for the first and columns with sum of 1 for the second.

In cluster 1 :  G1, G2, G5 are mutated

In cluster 2:    G2, G3, G5 are mutated.

I am interested in finding such G2-G5 event and G1-G3 events.

In total I have a 8 clusters and 150 gene (1200 rows x 4 columns).

What test could be appropriate to identify such pairs.

In my naive understanding would a fishers-exact test give such
combinations.
It's even less clear what sort of "test" you propose. `fisher.test` is a test of association. It doesn't identify combinations.
Thanks a lot.

-Adrian

	[[alternative HTML version deleted]]
This is a plain text mailing list.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140802/382fdee3/attachment.pl>
Homework?

There is a no homework policy here.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll
David?s answer assumes a more complicated objective, but obviously we are both unclear as to what you want.  Are you trying to find out which clusters have a unique pattern of mutation? (probably all of them, with so few clusters and so many genes?)

For either objective, this is not a statistical test, but a problem of identification.  For the simpler question, create a data frame with each row being the 150 1s and 0s associated with each cluster, and use duplicated() to identify unique rows. (unique rows will return ?FALSE?)

Untested

On Aug 2, 2014, at 11:41 AM, David Winsemius <dwinsemius at comcast.net> wrote:

On Aug 2, 2014, at 11:11 AM, Adrian Johnson wrote:

Hi:

I am trying to identify mutually exclusive events from the following
example:

#-------------
dat <- read.table(text="Cluster      Gene      Mutated    not_mutated
 1             G1             1              0
 1             G2             1              0
 1             G3             0              1
 1             G4             0              1
 1             G5             1              0
 2             G1             0              1
 2             G2             1              0
 2             G3             1              0
 2             G4             0              0
 2             G5             1              0", header=TRUE, stringsAsFactors=FALSE)

with(dat, table(Cluster, Gene, Mutated)  )
#----------------
, , Mutated = 0

      Gene
Cluster G1 G2 G3 G4 G5
     1  0  0  1  1  0
     2  1  0  0  1  0

, , Mutated = 1

      Gene
Cluster G1 G2 G3 G4 G5
     1  1  1  0  0  1
     2  0  1  1  0  1
#--------------
Or:
xtabs(Mutated ~ Cluster+Gene, data=dat)
#----------------
      Gene
Cluster G1 G2 G3 G4 G5
     1  1  1  0  0  1
     2  0  1  1  0  1

I'm a bit unclear about your goals. Are you trying to identify the "Gene"s that have only one "Cluster" mutated as the "G1-G3" events and the Gene's that have either-Cluster but not both as the "G2-G5" events?

If so you can choose the columns that have a sum of 2 for the first and columns with sum of 1 for the second.

In cluster 1 :  G1, G2, G5 are mutated

In cluster 2:    G2, G3, G5 are mutated.

I am interested in finding such G2-G5 event and G1-G3 events.

In total I have a 8 clusters and 150 gene (1200 rows x 4 columns).

What test could be appropriate to identify such pairs.

In my naive understanding would a fishers-exact test give such
combinations.
It's even less clear what sort of "test" you propose. `fisher.test` is a test of association. It doesn't identify combinations.
Thanks a lot.

-Adrian

     [[alternative HTML version deleted]]
This is a plain text mailing list.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Don McKenzie
Research Ecologist
Pacific Wildland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences
University of Washington
dmck at uw.edu

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

On Sat, Aug 2, 2014 at 1:11 PM, Adrian Johnson
Hi:

I am trying to identify mutually exclusive events from the following
example:

Cluster      Gene      Mutated    not-mutated
  1             G1             1              0
  1             G2             1              0
  1             G3             0              1
  1             G4             0              1
  1             G5             1              0
  2             G1             0              1
  2             G2             1              0
  2             G3             1              0
  2             G4             0              0
  2             G5             1              0

In cluster 1 :  G1, G2, G5 are mutated

In cluster 2:    G2, G3, G5 are mutated.

I am interested in finding such G2-G5 event and G1-G3 events.

In total I have a 8 clusters and 150 gene (1200 rows x 4 columns).

What test could be appropriate to identify such pairs.

In my naive understanding would a fishers-exact test give such
combinations.

Thanks a lot.

-Adrian
I am having trouble visualizing your data. How about a sample? The
easy is to do something like:

temp <- head(realData,10);
dput(temp);

Then cut'n'paste the output from the dput() into another email here.

But, asuming I have a bit of a grasp, you have four columns (example
only shows 3). If you have a set of columns which are 0 & 1 or FALSE
and TRUE, then you can create a "temp" column which encodes tehm
simply by considering them to be binary digits in a number. I.e.
tempColumn = 1 * column1 + 2 * column2 + 4*column3 + 8*column4. You
can the "group" the data by this value. All rows with the same value
are in the same "group". But I don't know what you want your output to
look like. As an aside any value other than 0, 1, 2,4, or 8 could be
considered invalid because it means that more than one column is TRUE,
which violates your constraint.
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140802/b4ea85f2/attachment.pl>