Skip to content
Prev 309216 / 398506 Next

Having some Trouble Data Structures

HI,
May be this helps. 

dat1<-data.frame(ID=formatC(0001:0010,width=4,flag="0"),No_of_Effectors=rep(3,10))
dat1<-within(dat1,{ID<-as.character(ID)})
list1<-lapply(1:nrow(dat1),function(x) paste(sample(1:10000,3,replace=TRUE)),sep=",")

dat2<-data.frame(dat1,do.call(rbind,lapply(lapply(1:nrow(dat1),function(x) sample(1:10000,3,replace=TRUE)),function(x) paste(x,collapse=","))))
colnames(dat2)[3]<-"Effectors"
?dat2
#???? ID No_of_Effectors????? Effectors
#1? 0001?????????????? 3 4759,8109,7997
#2? 0002?????????????? 3 2649,9496,9167
#3? 0003?????????????? 3 4229,3282,6235
#4? 0004?????????????? 3 5388,3088,6420
#5? 0005?????????????? 3 5602,5981,4749
#6? 0006?????????????? 3 4971,6956,5913
#7? 0007?????????????? 3? 4999,9465,799
#8? 0008?????????????? 3? 8419,4346,266
#9? 0009?????????????? 3 9329,8819,4011
#10 0010?????????????? 3 5817,8729,6499
?dat3<-within(dat2,{Effectors<-as.character(Effectors)})

#converting back the Effector column to numeric 3 columns
res<-do.call(rbind,lapply(strsplit(dat3[,3],","),function(x) as.numeric(x)))
?head(res)
#???? [,1] [,2] [,3]
#[1,] 4759 8109 7997
#[2,] 2649 9496 9167
#[3,] 4229 3282 6235
#[4,] 5388 3088 6420
#[5,] 5602 5981 4749
#[6,] 4971 6956 5913


A.K.





----- Original Message -----
From: Benjamin Ward (ENV) <B.Ward at uea.ac.uk>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Sunday, October 28, 2012 5:32 AM
Subject: [R] Having some Trouble Data Structures

Hi All,

I'm trying to run a simulation of host-pathogen evolution based around individuals.
What I need to have is a dataframe or table of some description - describing all the individuals of a pathogen population (so far I've implemented this as a matrix):

? ? ? ?  ID? ? ? ?  No_of_Effectors? ? ? ? ? ? ? ? ?  Effectors (Sequences)
? [1,] 0001? ? ? ? ? ? ? 3? ? ? ? ? ? ? ? ?  ##?  3 Random Numbers ##

There will be many such rows for many individuals. They have something called effectors, the number of which is randomly generated, so say you get 3 in the No_of_Effectors column. Then I make R generate 3 numbers from between 1 and 10,000, this gives me three numerical representations of genes. These numbers will be compared to a similar data structure of the host individuals who have their immune genes with similar numbers.

My problem is that obviously I can't stick 3 numbers in one "cell" of the matrix (I've tried) :

Pathogen_Individuals[1,3] <- c(2,3,4)
Error in Pathogen_Individuals[1, 3] <- c(345, 567, 678) :
? number of items to replace is not a multiple of replacement length

In future I'm also going to have more variables such as whether a gene is expressed. Such information may require a matrix in itself - something like:


? ? ? ? Effector ID? ? ? ? ? ?  Sequence? ? ? ? ? ? ? ? ? Expressed?
? [1,]? ?  0001? ? ? ? ? ? ? 345,567,678? ? ? ? ? ? ? ? ? ? ?  1 (or 0).

Is there a way then I can put more than one value in the cell like a list of values, or a way to put objects in a cell of a data frame, matrix or table etc. Almost an inception deal - data structures nested in a data structure? If I search for things like "insert list into matrix" I get results like how to turn one into another, which is not what I think I need to be doing.

I have been considering having several data structures not nested in each other, something like for every individual create a new matrix object with the name Effectors_[Individual_ID] and some how get my simulation loops operating on those objects but I find it hard to see how to tell R all of those matrices are to be included in an operation, as you can all lines of a data frame for example with for loops.
This is strange for me because this model was written in a macro-code for another program which handles data in a different format and layout to R.

My problem is I think, each individual in the model has many variables - in this case representations of genes. So I'm having trouble getting my head about this.

Hopefully someone more experienced will be able to offer advice or a solution, it will be very appreciated.

Many Thanks,
Ben Ward (ENV, UEA & The Sainsbury Lab, JIC).

P.S. I have searched previous queries to the list, and I'm not sure but this may be useful for relevant:


Have you thought of using a list?
$a
? ?  [,1] [,2] [,3] [,4] [,5]
[1,]? ? 1? ? 3? ? 5? ? 7? ? 9
[2,]? ? 2? ? 4? ? 6? ? 8?  10

$b
[1] 1 2 3 4 5
? ?  [,1] [,2] [,3] [,4] [,5]
[1,]? ? 1? ? 3? ? 5? ? 7? ? 9
[2,]? ? 2? ? 4? ? 6? ? 8?  10
[1] 1 2 3 4 5

oliveoil and yarn datasets have been mentioned.





??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.