hello, this is my script:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt',
header=TRUE, sep="\t")
daten<-as.matrix(daten)
#2) create empty matrix:
indxind<-matrix(nrow=617, ncol=617)
indxind[1:20,1:19]
#3) compare cells to each other, score:
for (s in 3:34) { #walks though the matrix colum by colum, starting at
colum 3
for (z1 in 1:617) { #for each current colum, take one row (z1)...
for (z2 in 1:617) { #...and compare it to another row (z2) of the
current colum
if (z1!=z2) {topf<-indxind[z1,z2]
if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1
#actually, 2 rows make up 1 individual,
if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1
#therefore i compare 2 rows
if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1
#with another 2 rows
if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
indxind[z1,z2]<-topf
indxind[z2,z1]<-topf
}
#print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but
gives NA for indxind[1,2]
}
#indxind[1:5,1:5] #empty matrix
}
#indxind[1:5,1:5] #empty matrix
}
#4) check:
indxind[1:5,1:5]
this results no errors, but my matrix indxind remains empty (only NAs).
though all columns and rows are counted properly. R needs quite a while to
get through all this (there are probably smarter and faster ways to
calculate this but i am not too deep into R and bioinformatics, and i need
to calculate this only once). could the 3 for-loops already be too
computationally intense for adding matrix operations?
any help would be much appreciated!
thx, frido
--
View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592.html
Sent from the R help mailing list archive at Nabble.com.
help, please! matrix operations inside 3 nested loops
9 messages · R. Michael Weylandt, Berend Hasselman, Fridolin +1 more
On Wed, Aug 8, 2012 at 9:06 AM, Fridolin <smells_like_rock at gmx.net> wrote:
hello, this is my script:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt',
header=TRUE, sep="\t")
daten<-as.matrix(daten)
#2) create empty matrix:
indxind<-matrix(nrow=617, ncol=617)
indxind[1:20,1:19]
#3) compare cells to each other, score:
for (s in 3:34) { #walks though the matrix colum by colum, starting at
colum 3
for (z1 in 1:617) { #for each current colum, take one row (z1)...
for (z2 in 1:617) { #...and compare it to another row (z2) of the
current colum
if (z1!=z2) {topf<-indxind[z1,z2]
if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1
#actually, 2 rows make up 1 individual,
if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1
#therefore i compare 2 rows
if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1
#with another 2 rows
if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
indxind[z1,z2]<-topf
indxind[z2,z1]<-topf
}
#print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but
gives NA for indxind[1,2]
}
#indxind[1:5,1:5] #empty matrix
}
#indxind[1:5,1:5] #empty matrix
}
#4) check:
indxind[1:5,1:5]
this results no errors, but my matrix indxind remains empty (only NAs).
though all columns and rows are counted properly. R needs quite a while to
get through all this (there are probably smarter and faster ways to
calculate this but i am not too deep into R and bioinformatics, and i need
to calculate this only once). could the 3 for-loops already be too
computationally intense for adding matrix operations?
any help would be much appreciated!
thx, frido
Hi Frido, I'm afraid I get a little lost in your code, but I'd be willing to bet we can cut the loops out entirely and speed things up. Can you give us a "big picture" description of the algorithm you're implementing as well as (if it's not too hard) a small reproducible example [1]? Note also that most of us don't use Nabble so you'll need to explicitly quote any relevant context. Thanks, Michael [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
Hi
hello, this is my script: #1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt',
header=TRUE, sep="\t") daten<-as.matrix(daten)
If there is any column with nonnumeric values it will transfer all numeric values from daten data.frame to character values.
#2) create empty matrix:
indxind<-matrix(nrow=617, ncol=617)
indxind[1:20,1:19]
#3) compare cells to each other, score:
for (s in 3:34) { #walks though the matrix colum by colum, starting at
colum 3
for (z1 in 1:617) { #for each current colum, take one row (z1)...
for (z2 in 1:617) { #...and compare it to another row (z2) of the
current colum
if (z1!=z2) {topf<-indxind[z1,z2]
if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1
#actually, 2 rows make up 1 individual,
if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1
#therefore i compare 2 rows
if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1
#with another 2 rows
if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
indxind[z1,z2]<-topf
indxind[z2,z1]<-topf
}
The above code is rather clumsy and it is difficult to understand what it shall do without extensive study. AFAIU you first set topf to NA and then try to add 1 to topf. The result is again NA regardless of your sophisticated z constuction. Therefore you are just computing NA in each cycle, so you can not expect other result them NA.
#print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly,
but
gives NA for indxind[1,2]
}
#indxind[1:5,1:5] #empty matrix
}
#indxind[1:5,1:5] #empty matrix
}
#4) check:
indxind[1:5,1:5]
this results no errors, but my matrix indxind remains empty (only NAs).
though all columns and rows are counted properly. R needs quite a while
to
get through all this (there are probably smarter and faster ways to calculate this but i am not too deep into R and bioinformatics, and i
need
to calculate this only once). could the 3 for-loops already be too
What is this. Please try to set up small example with what do you have and what do you want to achive. Unless you can explain better what do you want, you probably will not get better answer. I, however, may be proven wrong as some clever people in this list are far better in mind reading then I am :-) Regards Petr
computationally intense for adding matrix operations? any help would be much appreciated! thx, frido -- View this message in context: http://r.789695.n4.nabble.com/help-please- matrix-operations-inside-3-nested-loops-tp4639592.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Fridolin wrote
hello, this is my script:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/input_STRUCTURE_tab_excl_5_282_559.txt',
header=TRUE, sep="\t")
daten<-as.matrix(daten)
#2) create empty matrix:
indxind<-matrix(nrow=617, ncol=617)
indxind[1:20,1:19]
You should at least initialize indxind to 0 with indxind<-matrix(0,nrow=617, ncol=617) because the default for matrix is to use NA for data. Berend -- View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592p4639621.html Sent from the R help mailing list archive at Nabble.com.
thank you for your help.
my input data looks like this (tab separated):
Ind.nr. Pop.nr. scm266 rms1280 scm247 rms1107
1 101 305 318 222 135
1 101 305 318 231 135
2 101 305 313 999 96
2 101 305 321 999 130
3 101 305 324 231 135
3 101 305 324 231 135
4 101 305 313 230 126
4 101 305 313 230 135
6 101 305 313 231 135
6 101 305 321 231 135
it is a dataset with genetic marker alleles for single individuals.
the first row is the header, all following rows are individuals. 2 rows
count for 1 individual.
first colum is the individual's number, second colum is the number for the
population the individual comes from, and all following colums are different
genetic markers.
what i want to do with this data in R, is to compare one individual with
each of the other individuals, allele-wise. there are five possibilities:
the two compared individuals share 4,3,2,1,0 alleles of the currently
examined marker (=colum). for each shared allele this pair of individuals
shall get 1 scoring point. for each pair of individuals, all scoring points
shall be summarized over all markers.
my code again, modified according to your suggestions:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t")
daten<-as.data.frame(daten)
#2) create empty matrix:
indxind<-matrix(0,nrow=617, ncol=617)
indxind[1:20,1:19]
#3) compare cells to each other, score:
#for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
for (s in 3:6) { #walks though the matrix colum by colum, starting at
colum 3
for (z1 in 1:6) { #for each current colum, take one row (z1)...
for (z2 in 1:6) { #...and compare it to another row (z2) of the current
colum
if (z1!=z2) {topf<-indxind[z1,z2]
if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1
#actually, 2 rows make up 1 individual,
if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1
#therefore i compare 2 rows
if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1
#with another 2 rows
if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
indxind[z1,z2]<-topf
indxind[z2,z1]<-topf
}
#print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly, but
gives always 8 for indxind[1,2]
}
#indxind[1:5,1:5] #empty matrix
}
#indxind[1:5,1:5] #empty matrix
}
#4) check:
indxind[1:5,1:5]
@ Michael Weylandt: i've done my best with regard to the "big picture" of my
algorithm and the small reproducible example. i hope both is sufficient.
@ Petr Pikal-3: in this case, there are only numerical values, but it's a
useful hint for my other codes.
@ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's instead
of NAs helps, it fills something in indxind now. but it does the calculation
only for the first marker (colum 3), afterwards i get an error:
Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf +
:
Fehlender Wert, wo TRUE/FALSE n?tig ist
Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf + :
Missing value, where TRUE/FAlse is required
Has this something to do with the changing to daten<-as.data.frame(daten) in
line 3 (instead of as.matrix before)?
--
View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592p4639730.html
Sent from the R help mailing list archive at Nabble.com.
SORRY!!!! it should be: Fridolin wrote
for (s in 3:6) { #walks though the matrix colum by colum, starting at
colum 3
for (z1 in 1:5) { #for each current colum, take one row (z1)...
for (z2 in 1:5) { #...and compare it to another row (z2) of the
current colum
error is gone now.... SORRY!!! -- View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592p4639735.html Sent from the R help mailing list archive at Nabble.com.
all problems solved. thank you for your help!
for the sake of completeness, here my solution:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE, sep="\t")
daten<-as.data.frame(daten)
#2) create empty matrix:
indxind<-matrix(0,nrow=617, ncol=617)
#indxind[1:20,1:19]
#3) compare cells to each other, score:
#for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
z1<-1 #running variable for rows in daten
z2<-1 #running variable for rows in daten
l1<-1 #running variable for rows in indxind
l2<-1 #running variable for rows in indxind
for (s in 3:6) { #walks though the matrix colum by colum, starting at
colum 3
while (z1<11) { #for each current colum, take one row
(z1)...
while (z2<11) { #...and compare it to
another row (z2) of the current colum
if (z1!=z2) {
l1
topf<-indxind[l1,l2]
if
(daten[z1,s]==daten[z2,s]) topf<-topf+1 #actually, 2 rows make up 1
individual,
if
(daten[z1,s]==daten[z2+1,s]) topf<-topf+1 #therefore i compare 2 rows
if
(daten[z1+1,s]==daten[z2,s]) topf<-topf+1 #with another 2 rows
if
(daten[z1+1,s]==daten[z2+1,s]) topf<-topf+1
indxind[l1,l2]<-topf
}
z2<-z2+2
l2<-l2+1
}
z2<-1
l2<-1
z1<-z1+2
l1<-l1+1
}
z1<-1
l1<-1
}
#4) check:
indxind[1:5,1:5]
--
View this message in context: http://r.789695.n4.nabble.com/help-please-matrix-operations-inside-3-nested-loops-tp4639592p4639744.html
Sent from the R help mailing list archive at Nabble.com.
Hi
thank you for your help. my input data looks like this (tab separated): Ind.nr. Pop.nr. scm266 rms1280 scm247 rms1107 1 101 305 318 222 135 1 101 305 318 231 135 2 101 305 313 999 96 2 101 305 321 999 130 3 101 305 324 231 135 3 101 305 324 231 135 4 101 305 313 230 126 4 101 305 313 230 135 6 101 305 313 231 135 6 101 305 321 231 135
Better to use dput(your.data) for sharing data. Anyway I am still confused but you probably are able to clarify things further.
it is a dataset with genetic marker alleles for single individuals. the first row is the header, all following rows are individuals. 2 rows count for 1 individual. first colum is the individual's number, second colum is the number for
the
population the individual comes from, and all following colums are
different
genetic markers. what i want to do with this data in R, is to compare one individual with
In those 2 rows for one individual sometimes the genetic marker differs
test[1:2, "scm247"]
[1] 222 231 What do you want to do with them?
each of the other individuals, allele-wise. there are five
possibilities:
the two compared individuals share 4,3,2,1,0 alleles of the currently examined marker (=colum). for each shared allele this pair of
individuals
shall get 1 scoring point. for each pair of individuals, all scoring
points
shall be summarized over all markers.
Based on your example,
dput(test)
structure(list(Ind.nr. = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 6L,
6L), Pop.nr. = c(101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L,
101L, 101L), scm266 = c(305L, 305L, 305L, 305L, 305L, 305L, 305L,
305L, 305L, 305L), rms1280 = c(318L, 318L, 313L, 321L, 324L,
324L, 313L, 313L, 313L, 321L), scm247 = c(222L, 231L, 999L, 999L,
231L, 231L, 230L, 230L, 231L, 231L), rms1107 = c(135L, 135L,
96L, 130L, 135L, 135L, 126L, 135L, 135L, 135L)), .Names = c("Ind.nr.",
"Pop.nr.", "scm266", "rms1280", "scm247", "rms1107"), class =
"data.frame", row.names = c(NA,
-10L))
what is your desired result?
Regards
Petr
my code again, modified according to your suggestions:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE,
sep="\t")
daten<-as.data.frame(daten)
#2) create empty matrix:
indxind<-matrix(0,nrow=617, ncol=617)
indxind[1:20,1:19]
#3) compare cells to each other, score:
#for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
for (s in 3:6) { #walks though the matrix colum by colum, starting at
colum 3
for (z1 in 1:6) { #for each current colum, take one row (z1)...
for (z2 in 1:6) { #...and compare it to another row (z2) of the
current
colum
if (z1!=z2) {topf<-indxind[z1,z2]
if (daten[2*z1-1,s]==daten[2*z2-1,s]) topf<-topf+1
#actually, 2 rows make up 1 individual,
if (daten[2*z1-1,s]==daten[2*z2,s]) topf<-topf+1
#therefore i compare 2 rows
if (daten[2*z1,s]==daten[2*z2-1,s]) topf<-topf+1
#with another 2 rows
if (daten[2*z1,s]==daten[2*z2,s]) topf<-topf+1
indxind[z1,z2]<-topf
indxind[z2,z1]<-topf
}
#print(c(s,z1,z2,indxind[1,2])) ##counts s, z1 and z2 properly,
but
gives always 8 for indxind[1,2]
}
#indxind[1:5,1:5] #empty matrix
}
#indxind[1:5,1:5] #empty matrix
}
#4) check:
indxind[1:5,1:5]
@ Michael Weylandt: i've done my best with regard to the "big picture"
of my
algorithm and the small reproducible example. i hope both is sufficient. @ Petr Pikal-3: in this case, there are only numerical values, but it's
a
useful hint for my other codes. @ Petr Pikal-3 and Berend Hasselman: initializing indxind with 0's
instead
of NAs helps, it fills something in indxind now. but it does the
calculation
only for the first marker (colum 3), afterwards i get an error: Fehler in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf
+
: Fehlender Wert, wo TRUE/FALSE n?tig ist Error in if (daten[2 * z1 - 1, s] == daten[2 * z2 - 1, s]) topf <- topf
+ :
Missing value, where TRUE/FAlse is required Has this something to do with the changing to
daten<-as.data.frame(daten) in
line 3 (instead of as.matrix before)? -- View this message in context: http://r.789695.n4.nabble.com/help-please- matrix-operations-inside-3-nested-loops-tp4639592p4639730.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi
all problems solved. thank you for your help!
for the sake of completeness, here my solution:
#1) read in data:
daten<-read.table('K:/Analysen/STRUCTURE/test.txt', header=TRUE,
sep="\t")
daten<-as.data.frame(daten)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not needed, daten is already data frame
#2) create empty matrix:
indxind<-matrix(0,nrow=617, ncol=617)
#indxind[1:20,1:19]
#3) compare cells to each other, score:
#for the whole dataset: s in 3:34, z1 in 1:617, z2 in 1:617
z1<-1 #running variable for rows in daten
z2<-1 #running variable for rows in daten
l1<-1 #running variable for rows in indxind
l2<-1 #running variable for rows in indxind
for (s in 3:6) { #walks though the matrix colum by colum, starting at
colum 3
while (z1<11) { #for each current colum, take one row
(z1)...
while (z2<11) { #...and compare it to
another row (z2) of the current colum
if (z1!=z2) {
l1
topf<-indxind[l1,l2]
if
(daten[z1,s]==daten[z2,s]) topf<-topf+1 #actually, 2 rows make up 1
individual,
if
(daten[z1,s]==daten[z2+1,s]) topf<-topf+1 #therefore i compare 2
rows
if
(daten[z1+1,s]==daten[z2,s]) topf<-topf+1 #with another 2 rows
if
(daten[z1+1,s]==daten[z2+1,s]) topf<-topf+1
indxind[l1,l2]<-topf
}
z2<-z2+2
l2<-l2+1
}
z2<-1
l2<-1
z1<-z1+2
l1<-l1+1
}
z1<-1
l1<-1
}
#4) check:
indxind[1:5,1:5]
I believe that above cycles can be simplified, maybe by changing your daten to three dimensional array or some clever **ply construction but if your loops works it is not probably worth en effort. Regards Petr
-- View this message in context: http://r.789695.n4.nabble.com/help-please- matrix-operations-inside-3-nested-loops-tp4639592p4639744.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.