Skip to content

Help... Organizing multiple spreadsheets data into a huge R data structure!

2 messages · John Wong, Duncan Murdoch

#
Hello R users,

I am relatively new to the R program, and I hope some of you can offer
me some suggestions on how to organize my data in R using some of the
more advanced data structuring technique. Here's my scenario:

I have date set of 50 participants (each with conditions and
demographic data), each participant performed 2x16 trials, for each
trial, there was specific information about the trial (i.e. errors,
and timing), and a spreadsheet-like large data set with headers. I
have to extract data from each spreadsheet-like data according to the
information about the specific trial. And then group then according to
trial nature in the 2x16 structure. Then I can further analyse then
according to the demographic data grouping the 50 participants.

1. I have no idea about what is the best way to organized this data
set in R, so that it can be the most efficient to analyse it.
50 (demographic data set) X  2 (phase) X 16 (trials of varied nature)
X Trial Data set + Trial Online Recording Physiological Data Set
Spreadsheet (in text format)
2. I don't have a clear idea on how to manage this data structure in
R. Can somebody point me to the corresponding R resource / examples so
that I can read and try it out on my data set?

I tried to hurry for my project but there's no cohort here that is
particularly polished in R...
Thanks a LOT...!

- John
#
On 15/09/2008 12:27 PM, John Wong wrote:
Generally the easiest format to use in R is a dataframe, with one row 
per observation. In your case this would be something like:

participant phase trial trialdata spreadsheetrow spreadsheetcolumn 
observation

This is repetitive (you repeat the trialdata for every observation in 
the spreadsheet); if that's a problem, I'd split it into two dataframes, 
one for the trial data, one for the spreadsheet data:

participant phase trial trialdata

participant phase trial spreadsheetrow spreadsheetcolumn observation

This makes more sense from a database point of view, but it can be 
harder to work with in R, if you want to use the trialdata when 
analyzing the spreadsheet data.

Duncan Murdoch

and the second would have as many of those as are necess