problem applying the same function twice
You said your data only had 14000 rows, which really isn't many. How many possible combinations do you have, and how many do you need to add? On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
<curtisburkhalter at gmail.com> wrote:
Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
Hi,
I didn't work through your code, because it looked overly complicated.
Here's a more general approach that does what you appear to want:
# use dput() to provide reproducible data please!
comAn <- structure(list(animals = c("bird", "bird", "bird", "bird",
"bird",
"bird", "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat",
"cat", "cat"), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L,
20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L
)), .Names = c("animals", "animalYears", "animalMass"), class =
"data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16"))
# add reps to comAn
# assumes comAn is already sorted on animals, animalYears
comAn$reps <- unlist(sapply(rle(do.call("paste",
comAn[,1:2]))$lengths, seq_len))
# create full set of combinations
outgrid <- expand.grid(animals=unique(comAn$animals),
animalYears=unique(comAn$animalYears), reps=unique(comAn$reps),
stringsAsFactors=FALSE)
# combine with comAn
comAn.full <- merge(outgrid, comAn, all.x=TRUE)
comAn.full
animals animalYears reps animalMass 1 bird 1 1 29 2 bird 1 2 48 3 bird 1 3 36 4 bird 2 1 20 5 bird 2 2 34 6 bird 2 3 34 7 cat 1 1 46 8 cat 1 2 33 9 cat 1 3 48 10 cat 2 1 21 11 cat 2 2 NA 12 cat 2 3 NA 13 dog 1 1 21 14 dog 1 2 28 15 dog 1 3 25 16 dog 2 1 35 17 dog 2 2 18 18 dog 2 3 11
On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter <curtisburkhalter at gmail.com> wrote:
Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my dataframe there are areas where I need to add two rows of NAs (b/c I need to have 3 animal x year combos and for cat in year 2 I only have one) so I thought that I'd just run my code twice using the function in the code below. Everything works great when I run it the first time, but when I run it again it says that the value returned to the list 'x' is of length 0. I don't understand why the function works the first time around and adds an NA to the 'animalMass' column, but won't do it again. I've used (print(str(dataframe)) to see if there is a change in class or type when the function runs through the original dataframe and there is for 'animalYears', but I just convert it back before rerunning the function for second time. Any thoughts on this would be greatly appreciated b/c my actual data dataframe I have to input into WinBUGS is 14000x12, so it's not a trivial thing to just add in an NA here or there.
comAn
animals animalYears animalMass
1 bird 1 29
2 bird 1 48
3 bird 1 36
4 bird 2 20
5 bird 2 34
6 bird 2 34
7 dog 1 21
8 dog 1 28
9 dog 1 25
10 dog 2 35
11 dog 2 18
12 dog 2 11
13 cat 1 46
14 cat 1 33
15 cat 1 48
16 cat 2 21
So every animal has 3 measurements per year, except for the cat in year
two
which has only 1. I run the code below and get:
#combs defines the different combinations of
#animals and animalYears
combs<-paste(comAn$animals,comAn$animalYears,sep=':')
#counts defines how long the different combinations are
counts<-ave(1:nrow(comAn),combs,FUN=length)
#missing defines the combs that have length less than one and puts it in
#the data frame missing
missing<-data.frame(vals=combs[counts<2],count=counts[counts<2])
genRows<-function(dat){
vals<-strsplit(dat[1],':')[[1]]
#not sure why dat[2] is being converted to a string
newRows<-2-as.numeric(dat[2])
newDf<-data.frame(animals=rep(vals[1],newRows),
animalYears=rep(vals[2],newRows),
animalMass=rep(NA,newRows))
return(newDf)
}
x<-apply(missing,1,genRows)
comAn=rbind(comAn,
do.call(rbind,x))
comAn
animals animalYears animalMass
1 bird 1 29
2 bird 1 48
3 bird 1 36
4 bird 2 20
5 bird 2 34
6 bird 2 34
7 dog 1 21
8 dog 1 28
9 dog 1 25
10 dog 2 35
11 dog 2 18
12 dog 2 11
13 cat 1 46
14 cat 1 33
15 cat 1 48
16 cat 2 21
17 cat 2 <NA>
So far so good, but then I adjust the code so that it reads (**notice
the
change in the specification in 'missing' to counts<3**):
#combs defines the different combinations of
#animals and animalYears
combs<-paste(comAn$animals,comAn$animalYears,sep=':')
#counts defines how long the different combinations are
counts<-ave(1:nrow(comAn),combs,FUN=length)
#missing defines the combs that have length less than one and puts it in
#the data frame missing
missing<-data.frame(vals=combs[counts<3],count=counts[counts<3])
genRows<-function(dat){
vals<-strsplit(dat[1],':')[[1]]
#not sure why dat[2] is being converted to a string
newRows<-2-as.numeric(dat[2])
newDf<-data.frame(animals=rep(vals[1],newRows),
animalYears=rep(vals[2],newRows),
animalMass=rep(NA,newRows))
return(newDf)
}
x<-apply(missing,1,genRows)
comAn=rbind(comAn,
do.call(rbind,x))
The result for 'x' then reads:
x
[[1]] [1] animals animalYears animalMass <0 rows> (or 0-length row.names) Any thoughts on why it might be doing this instead of adding an additional row to get the result:
comAn
animals animalYears animalMass 1 bird 1 29 2 bird 1 48 3 bird 1 36 4 bird 2 20 5 bird 2 34 6 bird 2 34 7 dog 1 21 8 dog 1 28 9 dog 1 25 10 dog 2 35 11 dog 2 18 12 dog 2 11 13 cat 1 46 14 cat 1 33 15 cat 1 48 16 cat 2 21 17 cat 2 <NA> 18 cat 2 <NA> Thanks -- Curtis Burkhalter