I have a matrix of claims at year1 that I get simply by claims<-read.csv(file="Claims.csv") qq1<-claims[claims$Year=="Y1",] I have MemberID and ProviderID for every claim in qq1 both are integers An example for the type of questions that I want to answer is how many times ProviderID number 345 appears together with MemberID 23 in the table qq1 In order to answer these questions for every possible ProviderId and every possible MemberID I would like to have a matrix that has first column as memberID when every memberID in qq1 appears only once and columns that have number of appearance of ProviderID==i for every i that has sum(qq1$ProviderID==i)>0 My question is if there is a simple way to do it in R Thanks in Advance Uri -- View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3643248.html Sent from the R help mailing list archive at Nabble.com.
How to build a matrix of number of appearance?
6 messages · UriB, David Winsemius, jim holtman
On Jul 4, 2011, at 5:48 AM, UriB wrote:
I have a matrix of claims at year1 that I get simply by claims<-read.csv(file="Claims.csv") qq1<-claims[claims$Year=="Y1",] I have MemberID and ProviderID for every claim in qq1 both are integers An example for the type of questions that I want to answer is how many times ProviderID number 345 appears together with MemberID 23 in the table qq1 In order to answer these questions for every possible ProviderId and every possible MemberID I would like to have a matrix that has first column as memberID when every memberID in qq1 appears only once and columns that have number of appearance of ProviderID==i for every i that has sum(qq1$ProviderID==i)>0 My question is if there is a simple way to do it in R
A really quick way of finding this would be: as.data.frame ( xtabs( ~ ProviderID +MemberID, data= qq1) )
David Winsemius, MD West Hartford, CT
Here is another way:
xx <- data.frame(P = sample(5, 100, TRUE), M = sample(5, 100, TRUE), id = 1:100) require(data.table) xx <- data.table(xx) # convert to data.table count <- xx[
+ , list(count = length(id)) + , by = list(M, P) + ]
str(count)
Classes ?data.table? and 'data.frame': 24 obs. of 3 variables: $ M : int 1 1 1 1 1 2 2 2 2 2 ... $ P : int 1 2 3 4 5 1 2 3 4 5 ... $ count: int 5 4 3 2 9 3 3 6 3 7 ...
count
M P count 1 1 5 1 2 4 1 3 3 1 4 2 1 5 9 2 1 3 2 2 3 2 3 6 2 4 3
On Mon, Jul 4, 2011 at 5:48 AM, UriB <uriblass at gmail.com> wrote:
I have a matrix of claims at year1 that I get simply by claims<-read.csv(file="Claims.csv") qq1<-claims[claims$Year=="Y1",] I have MemberID and ProviderID for every claim in qq1 both are integers An example for the type of questions that I want to answer is how many times ProviderID number 345 appears together with MemberID 23 in the table qq1 In order to answer these questions for every possible ProviderId and every possible MemberID I would like to have a matrix that has first column as memberID when every memberID in qq1 appears only once and columns that have number of appearance of ProviderID==i for every i that has sum(qq1$ProviderID==i)>0 My question is if there is a simple way to do it in R Thanks in Advance Uri -- View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3643248.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Thanks for your reply Note that I guess that there are many providerID and I get the error cannot allocate vector of size 2.1 Gb (I can use the same trick for most of the other fields) Is there a way to do the same only for providerID with relatively high frequency? -- View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3645550.html Sent from the R help mailing list archive at Nabble.com.
On Jul 5, 2011, at 5:45 AM, UriB wrote:
Thanks for your reply Note that I guess that there are many providerID and I get the error cannot allocate vector of size 2.1 Gb
What code?
(I can use the same trick for most of the other fields) Is there a way to do the same only for providerID with relatively high frequency?
You are posting to a mailing list from a non-official web mirror/ interface. Those of us using this list with mail clients cannot tell who you are responding to and what code is throwing an error without opening up a browser and following the link. below (and speaking from prior failed efforts at figuring out context on Nabble, maybe not even then.) Get with the program. Read the Posting Guide. As the sign says:
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
If you persist in psoting to R-help then ...Learn to include context.
-- View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3645550.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius, MD West Hartford, CT
Provide some more information about the size of the data and the number of different ID combinations. I have found that in some cases like this using the 'sqldf' package helps since it can deal with large number of combinations.
On Tue, Jul 5, 2011 at 5:45 AM, UriB <uriblass at gmail.com> wrote:
Thanks for your reply Note that I guess that there are many providerID and I get the error cannot allocate vector of size 2.1 Gb (I can use the same trick for most of the other fields) Is there a way to do the same only for providerID with relatively high frequency? -- View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3645550.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?