Dear all, I have a word frequency list from a corpus (say, in .csv), where the first column is a word and the second is the occurrence frequency of that word in the corpus. Is it possible to obtain a Burt table (a table crossing all words with each other, i.e., where rows and columns are the words) from that frequency list with R? I'm exploring the "ca" package but I'm not able to solve this detail. Thank you very much! Joan-Josep Vallb?
Burt table from word frequency list
5 messages · Joan-Josep Vallbé, Duncan Murdoch, (Ted Harding) +1 more
On 29/03/2009 7:02 AM, Joan-Josep Vallb? wrote:
Dear all, I have a word frequency list from a corpus (say, in .csv), where the first column is a word and the second is the occurrence frequency of that word in the corpus. Is it possible to obtain a Burt table (a table crossing all words with each other, i.e., where rows and columns are the words) from that frequency list with R? I'm exploring the "ca" package but I'm not able to solve this detail.
No, because you don't have any information on that. You only have marginal counts. You need counts of pairs of words (from the original corpus, or already summarized.) Duncan Murdoch
Ok, thank you. And is there any function to get the table directly from the original corpus? best, joan-josep vallb?
On Mar 29, 2009, at 2:00 PM, Duncan Murdoch wrote:
On 29/03/2009 7:02 AM, Joan-Josep Vallb? wrote:
Dear all, I have a word frequency list from a corpus (say, in .csv), where the first column is a word and the second is the occurrence frequency of that word in the corpus. Is it possible to obtain a Burt table (a table crossing all words with each other, i.e., where rows and columns are the words) from that frequency list with R? I'm exploring the "ca" package but I'm not able to solve this detail.
No, because you don't have any information on that. You only have marginal counts. You need counts of pairs of words (from the original corpus, or already summarized.) Duncan Murdoch
On 29-Mar-09 16:32:11, Joan-Josep Vallb? wrote:
Ok, thank you. And is there any function to get the table directly from the original corpus? best, joan-josep vallb?
You will have to think about what you are doing. As Duncan said, you need "counts of pairs of words" or, more precisely, of co-occurrence. But co-occurrence within what? Adjacent? Within the same sentence? Within the same paragraph? Within the same chapter? Within the same document (if your corpus incorporates several documents)? Within documents by the same author? If so, then is there an additional classification by individual document? Etc., etc., etc. In short, what is the structure of your corpus, and how do you wish this to be represented in the Burt table? Hoping this helps to move you forward, Ted.
On Mar 29, 2009, at 2:00 PM, Duncan Murdoch wrote:
On 29/03/2009 7:02 AM, Joan-Josep Vallb? wrote:
Dear all, I have a word frequency list from a corpus (say, in .csv), where the first column is a word and the second is the occurrence frequency of that word in the corpus. Is it possible to obtain a Burt table (a table crossing all words with each other, i.e., where rows and columns are the words) from that frequency list with R? I'm exploring the "ca" package but I'm not able to solve this detail.
No, because you don't have any information on that. You only have marginal counts. You need counts of pairs of words (from the original corpus, or already summarized.) Duncan Murdoch
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 29-Mar-09 Time: 18:46:40 ------------------------------ XFMail ------------------------------
The usual approach is to count the co-occurence within so many words of each other. Typical is between 5 words before and 5 words after a given word. So for each word in the document, you look for the occurence of all other words within -5 -4 -3 -2 -1 0 1 2 3 4 5 words. Depending on the language and the question being asked certain words may be excluded. This is not a simple function! I don't know if anyone has done a package, for this type of analysis but with over 2000 packages floating around you might get lucky. Murray M Cooper, Ph.D. Richland Statistics 9800 N 24th St Richland, MI, USA 49083 Mail: richstat at earthlink.net ----- Original Message ----- From: "Ted Harding" <Ted.Harding at manchester.ac.uk> To: "Joan-Josep Vallb?" <Pep.Vallbe at uab.cat> Cc: <r-help at r-project.org> Sent: Sunday, March 29, 2009 2:46 PM Subject: Re: [R] Burt table from word frequency list
On 29-Mar-09 16:32:11, Joan-Josep Vallb? wrote:
Ok, thank you. And is there any function to get the table directly from the original corpus? best, joan-josep vallb?
You will have to think about what you are doing. As Duncan said, you need "counts of pairs of words" or, more precisely, of co-occurrence. But co-occurrence within what? Adjacent? Within the same sentence? Within the same paragraph? Within the same chapter? Within the same document (if your corpus incorporates several documents)? Within documents by the same author? If so, then is there an additional classification by individual document? Etc., etc., etc. In short, what is the structure of your corpus, and how do you wish this to be represented in the Burt table? Hoping this helps to move you forward, Ted.
On Mar 29, 2009, at 2:00 PM, Duncan Murdoch wrote:
On 29/03/2009 7:02 AM, Joan-Josep Vallb? wrote:
Dear all, I have a word frequency list from a corpus (say, in .csv), where the first column is a word and the second is the occurrence frequency of that word in the corpus. Is it possible to obtain a Burt table (a table crossing all words with each other, i.e., where rows and columns are the words) from that frequency list with R? I'm exploring the "ca" package but I'm not able to solve this detail.
No, because you don't have any information on that. You only have marginal counts. You need counts of pairs of words (from the original corpus, or already summarized.) Duncan Murdoch
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 29-Mar-09 Time: 18:46:40 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.