grouping followed by finding frequent patterns in R - R-help

Sun, Mar 10, 2013 7:55 AM #

1.Please cc to the list, as I have here, unless your comments are off topic.

2. Use dput() (?dput) to include **small** amounts of data in your
message, as attachments are generally stripped from r-help.

3. I have no experience with itemsets or the arules package, but a
quick glance at the docs there said that your data argument must be in
a specific form coercible into an S4 "transactions" class. I suspect
that neither your initial data frame nor the list deriving from split
is, but maybe someone familiar with the package can tell you for sure.
That's why you need to cc to the list.

-- Bert

On Sun, Mar 10, 2013 at 7:04 AM, Dhiman Biswas <crazydhimu at gmail.com> wrote:

Dear Bert,

My intention is to mine frequent itemsets of TRN_TYP for individual CIN out
of that data.
But the problem is using eclat after splitting gives the following error:

Error in eclat(list) : internal error in trio library

PS: I have attached my dataset.


On Sat, Mar 9, 2013 at 8:27 PM, Bert Gunter <gunter.berton at gene.com> wrote:

I **suggest** that you explain what you wish to accomplish using a
reproducible example rather than telling us what packages you think
you should use. I believe you are making things too complicated; e.g.
what do you mean by "frequent patterns"?  Moreover, "basket format" is
rather unclear -- and may well be unnecessary. But using lists, it
could be simply accomplished by

?split  ## as in
the_list <- with(yourdata, split(TYP,  CIN.TRN))

or possibly

the_list <- with(yourdata, tapply(TYP,CIN.TRN, FUN = table))

Of course, these may be irrelevant and useless, but without knowing
your purpose ...?

-- Bert

On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas <crazydhimu at gmail.com>
wrote:

I have a data in the following form :
CIN TRN_TYP
9079954    1
9079954    2
9079954    3
9079954    4
9079954    5
9079954    4
9079954    5
9079954    6
9079954    7
9079954    8
9079954    9
9079954    9
.                    .
.                    .
.                    .
there are 100 types of CIN (9079954,12441087,15246633,...) and
respective
TRN_TYP

first of all, I want this data to be grouped into basket format:
9079954   1, 2, 3, 4, 5, ....
12441087  19, 14, 21, 3, 7, ...
.
.
.
and then apply eclat from arules package to find frequent patterns.

1) I ran the following code:
file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
file <- file[!duplicated(file),]
eclat(split(file$TRN_TYP,file$CIN))

but it gave me the following error:
Error in asMethod(object) : can not coerce list with transactions with
duplicated items

2) I ran this code:
file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other
columns as well, so I selecting only CIN and TRN_TYP
file_new <- file_new[!duplicated(file_new),]
eclat(split(file_new$TRN_TYP,file_new$CIN))

but again:
Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
  internal error in trio library

PLEASE HELP

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm