caculate the frequencies of the Amino Acids
On Jan 2, 2010, at 12:55 AM, che wrote:
i know it would be better to ask R to make the data, but i need to sequence this particular file, because it is data for some Amino Acids and i cant play with, so i need to ask R to go through the sequence one by one, and then give me the numbers of each letters of each sequence, i am quite confused between using "i" and "j" and how to iterate both of them and make them work functionally. i attached the sequence.txt with my original message, and i will attach it here in case. thanks for your help. http://n4.nabble.com/file/n997087/sequence.txt sequence.txt
Sorry. I did not read to the very end. My apologies, hopefully the following oneliner will make up for my dereliction of attention.
che wrote:
may some one please help me to sort this out, i am trying to writ a
R code
for calculating the frequencies of the amino acids in 9 different
sequences, i want the code to read the sequence from external text
file, i
used the following code to do so:
x<-read.table("sequence.txt",header=FALSE)
then i defined an array for 20 amino acids as following:
AA<-
c
('A
','C
','D
','E
','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y')
i am using the following code to calculate the frequencies:
After copy-pasting the sequences from a browser window to a character object, "seqnc", I then processed it: > seqlines <- readLines(textConnection(seqnc)) # Then for the first sequence: > table(strsplit(seqlines[1], vector()) ) A D E F G I K L M N P Q R S T V W Y 21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33 3 15 # For "mass production": The names that resulted from my first effort were a bit unwieldly ( > 200 characters long) so I unnamed it: unname( sapply(seqlines, function(x) table(strsplit(x, vector() ) ) ) ) [[1]] A D E F G I K L M N P Q R S T V W Y 21 25 28 27 24 34 39 31 11 20 16 10 17 25 22 33 3 15 [[2]] A C D E F G H I K L M N P Q R S T V W Y 34 5 15 25 6 35 7 24 23 32 9 12 15 10 17 14 13 36 2 13 [[3]] A C D E F G H I K L M N P Q R S T V W Y 33 5 17 24 7 36 7 24 24 32 9 13 14 9 17 12 14 36 2 12 [[4]] A C D E F G H I K L M N P Q R S T V W Y 33 5 16 25 5 35 6 24 23 33 8 12 15 9 17 17 12 35 2 15 [[5]] A C D E F G H I K L M N P Q R S T V W Y 33 4 15 6 21 30 3 19 23 22 8 8 8 14 17 14 12 24 5 12 [[6]] A C D E F G H I K L M N P Q R S T V W Y 30 3 13 4 16 22 2 17 16 17 6 6 7 11 15 11 12 18 3 11 [[7]] A C D E F G H I K L M N P Q R S T V W Y 39 5 21 8 22 39 2 23 29 25 10 8 7 13 22 14 21 25 7 16 [[8]] A C D E F G H I K L M N P Q R S T V W Y 34 4 17 6 19 30 2 20 24 21 8 7 7 12 17 14 16 21 5 14 [[9]] A C D E F G H I K L M N P Q R S T V W Y 35 4 17 6 18 31 3 20 23 21 8 7 7 12 18 12 17 21 5 13 [[10]] A 5
David.
>>
>> frequency<-function(X)
>> {
>> y<-rep(0,20)
>> for(j in 1:nchar(as.character(x$V1[i]))){
>> for(i in 1:9){
>>
>> res<-which(AA==substr(x$V1[i],j,j))
>> y[res]=y[res]+1
>> }
>> }
>> return(y)
>> }
>>
>> but this code actually is not working, it reads only one sequence,
>> i dont
>> know why the loop is not working for the "i", which suppose to read
>> the
>> nine rows of the file sequence.txt. the sequence.txt file is
>> attached to
>> this message.
>>
>> cheers
>> http://n4.nabble.com/file/n997072/sequence.txt sequence.txt
>>
>
> --
> View this message in context: http://n4.nabble.com/caculate-the-frequencies-of-the-Amino-Acids-tp997072p997087.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.