Hi,
I'm new to R and trying to some simple analysis. I have a data set with
about 88000 transactions and i want to perform a simple support count
analysis of an itemset which is say not a complete transaction but a subset
of a transaction.
say
{A,B,D} is a transaction and i want to find support of {A,B} even though it
never occurs as only A,B in the entire set
To this i needed to create a new itemsets class and then use the support
function but somehow the answers never seem to tally.
Thanks in advance
Srinivas
--
View this message in context: http://r.789695.n4.nabble.com/Support-Counting-tp3424730p3424730.html
Sent from the R help mailing list archive at Nabble.com.
Support Counting
4 messages · Petr Savicky, psombe
On Mon, Apr 04, 2011 at 01:11:37AM -0500, psombe wrote:
Hi,
I'm new to R and trying to some simple analysis. I have a data set with
about 88000 transactions and i want to perform a simple support count
analysis of an itemset which is say not a complete transaction but a subset
of a transaction.
say
{A,B,D} is a transaction and i want to find support of {A,B} even though it
never occurs as only A,B in the entire set
To this i needed to create a new itemsets class and then use the support
function but somehow the answers never seem to tally.
Hi.
The answer depends on the representation of the data set. Can you
describe the representation?
A possible representation of a data set for itemsets counting is a matrix
of 0/1. Using this representation, computing the support may be done
as follows.
db <- matrix(0, nrow=5, ncol=5, dimnames=list(NULL, LETTERS[1:5]))
db[1, c("A", "B", "D")] <- 1
db[2, c("A", "B")] <- 1
db[3, c("A", "D", "E")] <- 1
db[4, c("B", "C", "D")] <- 1
db[5, c("A", "B", "C")] <- 1
db
A B C D E
[1,] 1 1 0 1 0
[2,] 1 1 0 0 0
[3,] 1 0 0 1 1
[4,] 0 1 1 1 0
[5,] 1 1 1 0 0
itemset <- c("A", "B")
# for each transaction, whether it contains c("A", "B")
rowSums(db[, itemset]) == length(itemset)
[1] TRUE TRUE FALSE FALSE TRUE
# the number of transactions containing c("A", "B")
sum(rowSums(db[, itemset]) == length(itemset))
[1] 3
Hope this helps.
Petr Savicky.
1 day later
well im using the "arules" package and i'm trying to use the support command. my data is read form a file using the "read.transactions" command and a line of data looks something like this. there are aboutt 88000 rows and 16000 different items
inspect(dset[3])
items
1 {33,
34,
35}
inspect(dset[1])
items
1 {0, 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 3, 4,5, 6, 7, 8, 9}
So in order to use support i have to make an object of class "itemsets" and
im kind of struggling with the "new" command.
I made an object of class itemsets by first creating a presence/absence
matrix and with something like 16000 items this is really sort of tedious. I
wonder if there is a better way.
//Currently im doing this
avec = array(dim=400) //dim is till the max number of the item im concerned
with
avec[1:400] = 0
avec[27] = 1
avec[63] = 1 //and do on for all the items i want
amat = matrix(data = avec,ncol = 400)
aset = as(amat,"transactions") //coercing the matrix as a transactions class
then say my data is "dat" i can use
support(aset,dat)
[1] 0.001406470 There has to be a better way Thanks once again -- View this message in context: http://r.789695.n4.nabble.com/Support-Counting-tp3424730p3428062.html Sent from the R help mailing list archive at Nabble.com.
On Tue, Apr 05, 2011 at 08:43:34AM -0500, psombe wrote:
well im using the "arules" package and i'm trying to use the support command.
Hi. R-help can provide help for some of the frequently used CRAN packages, but not for all. There are too many of them. It is not clear, whether there is someone on R-help, who uses "arules". One of my students is using Eclat for association rules directly, but not from R. I am using R, but not for association rules. Try to determine, whether your question is indeed specific to "arules". If the question may be formulated without "arules", it has a good chance to be replied here. Otherwise, send a query to the package maintainer. Package maintainers usually welcome feedback.
my data is read form a file using the "read.transactions" command and a line of data looks something like this. there are aboutt 88000 rows and 16000 different items
inspect(dset[3])
items
1 {33,
34,
35}
inspect(dset[1])
items
1 {0, 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 3, 4,5, 6, 7, 8, 9}
So in order to use support i have to make an object of class "itemsets" and
im kind of struggling with the "new" command.
I made an object of class itemsets by first creating a presence/absence
matrix and with something like 16000 items this is really sort of tedious. I
wonder if there is a better way.
//Currently im doing this
avec = array(dim=400) //dim is till the max number of the item im concerned
with
avec[1:400] = 0
avec[27] = 1
avec[63] = 1 //and do on for all the items i want
amat = matrix(data = avec,ncol = 400)
Up to here, this may be simplified, if the required indices
are stored in a vector, say, "indices". For example
indices <- c(3, 5, 6, 10)
avec <- array(0, dim=14)
avec[indices] <- 1
amat <- rbind(avec)
or
amat <- matrix(0, nrow=1, ncol=14)
amat[1, indices] <- 1
amat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
avec 0 0 1 0 1 1 0 0 0 1 0 0 0 0
Hope this helps.
Petr Savicky.