Skip to content
Prev 175393 / 398502 Next

Burt table from word frequency list

The usual approach is to count the co-occurence within so many words of each 
other.
Typical is between 5 words before and 5 words after a given word.
So for each word in the document, you look for the occurence of all other 
words
within -5 -4 -3 -2 -1 0 1 2 3 4 5 words. Depending on the language and the 
question
being asked certain words may be excluded.

This is not a simple function! I don't know if anyone has done a package, 
for this type
of analysis but with over 2000 packages floating around you might get lucky.

Murray M Cooper, Ph.D.
Richland Statistics
9800 N 24th St
Richland, MI, USA 49083
Mail: richstat at earthlink.net

----- Original Message ----- 
From: "Ted Harding" <Ted.Harding at manchester.ac.uk>
To: "Joan-Josep Vallb?" <Pep.Vallbe at uab.cat>
Cc: <r-help at r-project.org>
Sent: Sunday, March 29, 2009 2:46 PM
Subject: Re: [R] Burt table from word frequency list
On 29-Mar-09 16:32:11, Joan-Josep Vallb? wrote:
You will have to think about what you are doing. As Duncan said,
you need "counts of pairs of words" or, more precisely, of
co-occurrence. But co-occurrence within what?

Adjacent?
Within the same sentence?
Within the same paragraph?
Within the same chapter?
Within the same document (if your corpus incorporates several
  documents)?
Within documents by the same author?
  If so, then is there an additional classification by
  individual document?

Etc., etc., etc.

In short, what is the structure of your corpus, and how do
you wish this to be represented in the Burt table?

Hoping this helps to move you forward,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Mar-09                                       Time: 18:46:40
------------------------------ XFMail ------------------------------

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.