motif search - R-help | R Mailing Lists

Thu, Dec 11, 2008 8:37 AM #

Dear Alessia,

You may try this:

#
# Load the seqinr package:
#
   library(seqinr)
#
# A FASTA file example - that ships with seqinr - which contains
# the complete genome sequence of Chlamydia trachomatis :
#
   fastafile <- system.file("sequences/ct.fasta", package = "seqinr")
#
# Import the sequence as a string of characters:
#
   myseq <- read.fasta(fastafile, as.string = TRUE)
   nchar(myseq) # 1042519, that is a Mb sequence
#
# Look for motif "atatatat", with possible overlap:
#
   words.pos("atatatat", myseq, extended = TRUE)
#
# This returns the posistions where the motif is found, that
# is : 236501 236503 283987 687083 792792 792794
#
   substr(myseq, 236501, 236501 + 8)
#
# Should be
# [1] "atatatata"
#

HTH,

Jean

Jean R. Lobry            (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo  : +33 472 43 27 56     fax    : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/