Skip to content

caculate the frequencies of the Amino Acids

9 messages · che, Jorge Ivan Velez, David Winsemius +1 more

che
#
may some one please help me to sort this out, i am trying to writ a R code
for calculating the frequencies of the amino acids in 9 different sequences,
i want the code to read the sequence from external text file, i used the
following code to do so:
x<-read.table("sequence.txt",header=FALSE)

then i defined an array for 20 amino acids as following:
AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y')
i am using the following code to calculate the frequencies:

frequency<-function(X)
{
y<-rep(0,20)
for(j in 1:nchar(as.character(x$V1[i]))){
for(i in 1:9){

	res<-which(AA==substr(x$V1[i],j,j))
	y[res]=y[res]+1
	}
	}
return(y)
}

but this code actually is not working, it reads only one sequence, i dont
know why the loop is not working for the "i", which suppose to read the nine
rows of the file sequence.txt. the sequence.txt file is attached to this
message.

cheers 
http://n4.nabble.com/file/n997072/sequence.txt sequence.txt
#
On Jan 1, 2010, at 11:59 PM, che wrote:

            
# at this point you are referencing "i" but it is not yet being  
iterated and might not even exist.
# did you mean "j"?
# also might be  safer to use seq_along()
# Is that really working for even one sequence? Without an "x"  
sequence I cannot test, but it "looks wrong".
#
On Jan 2, 2010, at 12:26 AM, David Winsemius wrote:

            
Further thoughts: If I understand how you are doing this, that  
structure would only be large enough for one string's results.

(Seems like this process would be easier if you used table() instead  
of for-loops.)
che
#
i know it would be better to ask R to make the data, but i need to sequence
this particular file, because it is data for some Amino Acids and i cant
play with, so i need to ask R to go through the sequence one by one, and
then give me the numbers of each letters of each sequence, i am quite
confused between using "i" and "j" and how to iterate both of them and make
them work functionally. i attached the sequence.txt with my original
message, and i will attach it here in case. thanks for your help.
http://n4.nabble.com/file/n997087/sequence.txt sequence.txt
che wrote:

  
    
#
On Jan 2, 2010, at 12:55 AM, che wrote:

            
Sorry. I did not read to the very end. My apologies, hopefully the  
following
oneliner will make up for my dereliction of attention.
After copy-pasting the sequences from a browser window to a character  
object, "seqnc", I then processed it:

 > seqlines <- readLines(textConnection(seqnc))

# Then for the first sequence:

 > table(strsplit(seqlines[1], vector())  )

  A  D  E  F  G  I  K  L  M  N  P  Q  R  S  T  V  W  Y
21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33  3 15

# For "mass production": The names that resulted from my first effort  
were a bit
unwieldly ( > 200 characters long) so I unnamed it:

unname( sapply(seqlines, function(x) table(strsplit(x, vector() ) ) )  )

[[1]]

  A  D  E  F  G  I  K  L  M  N  P  Q  R  S  T  V  W  Y
21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33  3 15

[[2]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
34  5 15 25  6 35  7 24 23 32  9 12 15 10 17 14 13 36  2 13

[[3]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
33  5 17 24  7 36  7 24 24 32  9 13 14  9 17 12 14 36  2 12

[[4]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
33  5 16 25  5 35  6 24 23 33  8 12 15  9 17 17 12 35  2 15

[[5]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
33  4 15  6 21 30  3 19 23 22  8  8  8 14 17 14 12 24  5 12

[[6]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
30  3 13  4 16 22  2 17 16 17  6  6  7 11 15 11 12 18  3 11

[[7]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
39  5 21  8 22 39  2 23 29 25 10  8  7 13 22 14 21 25  7 16

[[8]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
34  4 17  6 19 30  2 20 24 21  8  7  7 12 17 14 16 21  5 14

[[9]]

  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y
35  4 17  6 18 31  3 20 23 21  8  7  7 12 18 12 17 21  5 13

[[10]]

A
5
che
#
Thanks very much the code is working perfectly, but I hope guys that you can
help me to do the same thing but by using the loop structure, i want to know
if i am doing right, i want to use the loop structure to scan each sequence
from the file sequence.txt (the file is attached) to get the frequency for
each Amino Acid, and i wrote the following code so far, and i stopped, got
confused, specially that i am a very beginner in R
http://n4.nabble.com/file/n997581/sequence.txt sequence.txt :
x<-read.table("sequence.txt",header=FALSE)
AA<-c('A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y')

test<-nchar(as.character(x$V1[i]))
frequency<-function(X)
{
y<-rep(0,20)
for(j in 1:test){
for(i in 1:nrow(x)){
	res<-which(AA==substr(x$V1[i],j,j))
	y[res]=y[res]+1
	}
	}
return(y)
}
So how to fix this code, how to give the life for the ?i? and the ?j? in
order to initiate the indexing..... Sorry for bothering you guys.
che wrote:

  
    
#
On Jan 3, 2010, at 12:28 AM, che wrote:

            
I earlier pointed out that such a structure would be inadequate to  
hold the tabulation of more than one sequence. You probably need a  
matrix of "width" = 20 and "depth" = the number of your sequences.
... and here you will need to index y[ , ] with both the proper row  
and column.