Skip to content

help, please! matrix operations inside 3 nested loops

9 messages · R. Michael Weylandt, Berend Hasselman, Fridolin +1 more

#
hello, this is my script:

#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt',
header=TRUE, sep="\t")
daten<-as.matrix(daten)

#2) create empty matrix:
indxind<-matrix(nrow=617, ncol=617) 
indxind[1:20,1:19]

#3) compare cells to each other, score:
for (s in 3:34) {   #walks though the matrix colum by colum, starting at
colum 3
  for (z1 in 1:617) {  #for each current colum, take one row (z1)...
    for (z2 in 1:617) {  #...and compare it to another row (z2) of the
current colum
      if (z1!=z2) {topf<-indxind[z1,z2]
                   if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1  
#actually, 2 rows make up 1 individual,
                   if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1     
#therefore i compare 2 rows
                   if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1     
#with another 2 rows
                   if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
                   indxind[z1,z2]<-topf
                   indxind[z2,z1]<-topf
                  }
      #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but
gives NA for indxind[1,2]
      }
    #indxind[1:5,1:5] #empty matrix
  }
  #indxind[1:5,1:5] #empty matrix
  }

#4) check:
indxind[1:5,1:5]

this results no errors, but my matrix indxind remains empty (only NAs).
though all columns and rows are counted properly. R needs quite a while to
get through all this (there are probably smarter and faster ways to
calculate this but i am not too deep into R and bioinformatics, and i need
to calculate this only once). could the 3 for-loops already be too
computationally intense for adding matrix operations?

any help would be much appreciated!

thx, frido



--
View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592.html
Sent from the R help mailing list archive at Nabble.com.
#
On Wed, Aug 8, 2012 at 9:06 AM, Fridolin <smells_like_rock at gmx.net> wrote:
Hi Frido,

I'm afraid I get a little lost in your code, but I'd be willing to bet
we can cut the loops out entirely and speed things up.

Can you give us a "big picture" description of the algorithm you're
implementing as well as (if it's not too hard) a small reproducible
example [1]?

Note also that most of us don't use Nabble so you'll need to
explicitly quote any relevant context.

Thanks,
Michael

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
#
Hi
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt',
If there is any column with nonnumeric values it will transfer all numeric 
values from daten data.frame to character values.
The above code is rather clumsy and it is difficult to understand what it 
shall do without extensive study. AFAIU you first set topf to NA and then 
try to add 1 to topf. The result is again NA regardless of your 
sophisticated z constuction. Therefore you are just computing NA in each 
cycle, so you can not expect other result them NA.
but
to
need
What is this. Please try to set up small example with what do you have and 
what do you want to achive. Unless you can explain better what do you 
want, you probably will not get better answer. 

I, however, may be proven wrong as some clever people in this list are far 
better in mind reading then I am :-)

Regards
Petr
http://www.R-project.org/posting-guide.html
#
Fridolin wrote
You should at least initialize indxind to 0 with

indxind<-matrix(0,nrow=617, ncol=617) 

because the default for matrix is to use NA for data.

Berend




--
View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592p4639621.html
Sent from the R help mailing list archive at Nabble.com.
#
thank you for your help.

my input data looks like this (tab separated):

Ind.nr.	Pop.nr.	scm266	rms1280	scm247	rms1107
1	101	305	318	222	135
1	101	305	318	231	135
2	101	305	313	999	96
2	101	305	321	999	130
3	101	305	324	231	135
3	101	305	324	231	135
4	101	305	313	230	126
4	101	305	313	230	135
6	101	305	313	231	135
6	101	305	321	231	135

it is a dataset with genetic marker alleles for single individuals. 
the first row is the header, all following rows are individuals. 2 rows
count for 1 individual.
first colum is the individual's number, second colum is the number for the
population the individual comes from, and all following colums are different
genetic markers.

what i want to do with this data in R, is to compare one individual with
each of the other individuals, allele-wise. there are five possibilities:
the two compared individuals share 4,3,2,1,0 alleles of the currently
examined marker (=colum). for each shared allele this pair of individuals
shall get 1 scoring point. for each pair of individuals, all scoring points
shall be summarized over all markers.


my code again, modified according to your suggestions:

#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t")
daten<-as.data.frame(daten)

#2) create empty matrix:
indxind<-matrix(0,nrow=617, ncol=617) 
indxind[1:20,1:19]

#3) compare cells to each other, score:
#for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
for (s in 3:6) {   #walks though the matrix colum by colum, starting at
colum 3
  for (z1 in 1:6) {  #for each current colum, take one row (z1)...
    for (z2 in 1:6) {  #...and compare it to another row (z2) of the current
colum
      if (z1!=z2) {topf<-indxind[z1,z2]
                   if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1  
#actually, 2 rows make up 1 individual,
                   if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1     
#therefore i compare 2 rows
                   if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1     
#with another 2 rows
                   if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
                   indxind[z1,z2]<-topf
                   indxind[z2,z1]<-topf
      }
      #print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but
gives always 8 for indxind[1,2]
    }
    #indxind[1:5,1:5] #empty matrix
  }
  #indxind[1:5,1:5] #empty matrix
}

#4) check:
indxind[1:5,1:5]



@ Michael Weylandt: i've done my best with regard to the "big picture" of my
algorithm and the small reproducible example. i hope both is sufficient.
@ Petr Pikal-3: in this case, there are only numerical values, but it's a
useful hint for my other codes.
@ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's instead
of NAs helps, it fills something in indxind now. but it does the calculation
only for the first marker (colum 3), afterwards i get an error: 
Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf + 
: 
  Fehlender Wert, wo TRUE/FALSE n?tig ist
Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf +  :
  Missing value, where TRUE/FAlse is required
Has this something to do with the changing to daten<-as.data.frame(daten) in
line 3 (instead of as.matrix before)?



--
View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592p4639730.html
Sent from the R help mailing list archive at Nabble.com.
#
SORRY!!!! it should be:


Fridolin wrote
error is gone now.... SORRY!!!



--
View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592p4639735.html
Sent from the R help mailing list archive at Nabble.com.
#
all problems solved. thank you for your help!
for the sake of completeness, here my solution:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t")
daten<-as.data.frame(daten)

#2) create empty matrix:
indxind<-matrix(0,nrow=617, ncol=617) 
#indxind[1:20,1:19]

#3) compare cells to each other, score:
#for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
z1<-1 #running variable for rows in daten
z2<-1 #running variable for rows in daten
l1<-1 #running variable for rows in indxind
l2<-1 #running variable for rows in indxind
for (s in 3:6) {   #walks though the matrix colum by colum, starting at
colum 3
                while (z1<11) {  #for each current colum, take one row
(z1)...
                                while (z2<11) {  #...and compare it to
another row (z2) of the current colum
                                              if (z1!=z2) {
                                                          l1
                                                         
topf<-indxind[l1,l2]
                                                          if
(daten[z1,s]==daten[z2,s]) topf<-topf+1   #actually, 2 rows make up 1
individual,
                                                          if
(daten[z1,s]==daten[z2+1,s]) topf<-topf+1      #therefore i compare 2 rows
                                                          if
(daten[z1+1,s]==daten[z2,s]) topf<-topf+1      #with another 2 rows
                                                          if
(daten[z1+1,s]==daten[z2+1,s]) topf<-topf+1
                                                         
indxind[l1,l2]<-topf
                                                          }
                                              z2<-z2+2
                                              l2<-l2+1
                                              }
                                z2<-1
                                l2<-1
                                z1<-z1+2
                                l1<-l1+1
                              }
                z1<-1
                l1<-1
               }

#4) check:
indxind[1:5,1:5]



--
View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592p4639744.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi
Better to use dput(your.data) for sharing data. Anyway I am still confused 
but you probably are able to clarify things further.
the
different
In those 2 rows for one individual sometimes the genetic marker differs
[1] 222 231

What do you want to do with them?
possibilities:
individuals
points
Based on your example,
structure(list(Ind.nr. = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 6L, 
6L), Pop.nr. = c(101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 
101L, 101L), scm266 = c(305L, 305L, 305L, 305L, 305L, 305L, 305L, 
305L, 305L, 305L), rms1280 = c(318L, 318L, 313L, 321L, 324L, 
324L, 313L, 313L, 313L, 321L), scm247 = c(222L, 231L, 999L, 999L, 
231L, 231L, 230L, 230L, 231L, 231L), rms1107 = c(135L, 135L, 
96L, 130L, 135L, 135L, 126L, 135L, 135L, 135L)), .Names = c("Ind.nr.", 
"Pop.nr.", "scm266", "rms1280", "scm247", "rms1107"), class = 
"data.frame", row.names = c(NA, 
-10L))

what is your desired result?

Regards
Petr
sep="\t")
current
but
of my
a
instead
calculation
+
+  :
daten<-as.data.frame(daten) in
http://www.R-project.org/posting-guide.html
#
Hi
sep="\t")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
not needed, daten is already data frame
rows
I believe that above cycles can be simplified, maybe by changing your 
daten to three dimensional array or some clever **ply construction but if 
your loops works it is not probably worth en effort.

Regards
Petr
http://www.R-project.org/posting-guide.html