correlating rows of two differently-sized data frames in R
Hi Jen, It's generally best to keep cc'ing R-help so others can lend a hind when I step away from my computer:
On Thu, Aug 9, 2012 at 11:49 AM, Jennifer Hobbs <jenachobbs at gmail.com> wrote:
Hi Michael - thanks for the advice - I did find merge() just after posting but I'm having difficulty with using it. I've loaded both datasets; then I tried
CombinedData<-merge(MethyData1,ExprData1)
but when I looked at CombinedData, I found there was no actual data in it:
str(CombinedData)
'data.frame': 0 obs. of 20 variables
Take a look at ?merge.data.frame in particular since there are many different forms of merges. Your original post suggests you may want to set all = TRUE by = "Location" Hope that helps, Michael
I thought this might be due to the fact that my column names, as well as the row names, in both data sets were the same, so I renamed the column names in ExprData1 and tried again:
colnames(ExprData1)<-NewExprNames merge(ExprData1,MethyData1)
Error: cannot allocate vector of size 4.2 Gb In addition: Warning messages: 1: In expand.grid(seq_len(nx), seq_len(ny)) : Reached total allocation of 8055Mb: see help(memory.size) 2: In expand.grid(seq_len(nx), seq_len(ny)) : Reached total allocation of 8055Mb: see help(memory.size) 3: In expand.grid(seq_len(nx), seq_len(ny)) : Reached total allocation of 8055Mb: see help(memory.size) 4: In expand.grid(seq_len(nx), seq_len(ny)) : Reached total allocation of 8055Mb: see help(memory.size) I was surprised about this, as I'm using a 64-bit computer and it's managed
You'll also need to be using a 64 bit build of R. Merging is pretty memory expensive so if you're right on the edge of what R can handle you might have to look into a more specialized solution (such as an SQL backend)
to deal with much larger data sets before now (I know that's not the only criterion, but my understanding of computers isn't extensive). I had previously run up against a memory problem because I hadn't transformed my data (I thought I was looking at columns, the computer was looking at rows) so I tried transforming both data sets and merging again, but I end up with another empty data frame:
tED1<-t(ExprData1) tMD1<-t(MethyData1) CombineData<-merge(tED1,tMD1) str(CombineData)
'data.frame': 0 obs. of 152247 variables: This is where I'm stuck. Any advice would be hugely appreciated! Jen On Thu, Aug 9, 2012 at 5:28 PM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Perhaps load them both and ?merge can show you the way. Michael On Thu, Aug 9, 2012 at 9:54 AM, JenniferH <jenachobbs at gmail.com> wrote:
Hello everyone, I have two sets of data, with the following structure: DataSet1 Location Part Sample 1 Sample 2 A 1 value value A 2 value value A 3 value value B 1 value value DataSet2 Location Sample 1 Sample 2 A value value B value value C value value I would like to look at the correlations between DataSet1 and DataSet2, such that each row in Location A from DataSet1 is paired with the Location A row from DataSet2, and so forth. So far, my only ideas involve trying to copy-paste each of the rows in DataSet2 the number of times each occurs in DataSet1 on a spreadsheet before loading the sets into R; however, as I have approaching 8000 rows in DataSet2, this is clearly not a workable solution! I'm sure there's a simple solution to this, so I'm sorry if this seems like a really silly question. Thanks for your help! Jen -- View this message in context: http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.