Hi all,
I Have a data frame test.. that I would like to convert into a list below
test_ but am unsure how to efficiently do this. I can do it in a for loop
but my data set is huge and it takes forever. Wondering how I can do this
more efficiently. So again how to I go from test.. to test_ below?
#Data frame
test.. <- data.frame(Apples = c(1,3,0,0,1), Pears = c(0,0,1,0,2), Beans =
c(1,2,1,0,0))
#list - my desired outcome
test_ <- list("1" = c("Apples","Beans"),
"2" = c("Apples","Apples","Apples","Beans","Beans"),
"3" = c("Pears","Beans"),
"4" = c(NULL),
"5" = c("Apples","Pears","Pears"))
Thanks
Josh
--
View this message in context: http://r.789695.n4.nabble.com/Probably-a-good-use-for-apply-tp4631883.html
Sent from the R help mailing list archive at Nabble.com.
Probably a good use for apply
7 messages · jim holtman, Jim Lemon, LCOG1 +2 more
try this:
test.. <- data.frame(Apples = c(1,3,0,0,1), Pears = c(0,0,1,0,2), Beans =
+ c(1,2,1,0,0))
lapply(seq(nrow(test..)), function(.row){
+ do.call(c, sapply(names(test..), function(.col){
+ rep(.col, test..[[.col]][.row])
+ }))
+ })
[[1]]
Apples Beans
"Apples" "Beans"
[[2]]
Apples1 Apples2 Apples3 Beans1 Beans2
"Apples" "Apples" "Apples" "Beans" "Beans"
[[3]]
Pears Beans
"Pears" "Beans"
[[4]]
character(0)
[[5]]
Apples Pears1 Pears2
"Apples" "Pears" "Pears"
On Wed, May 30, 2012 at 8:50 PM, LCOG1 <jroll at lcog.org> wrote:
Hi all,
?I Have a data frame test.. that I would like to convert into a list below
test_ but am unsure how to efficiently do this. ?I can do it in a for loop
but my data set is huge and it takes forever. ?Wondering how I can do this
more efficiently. ?So again how to I go from test.. to test_ below?
#Data frame
test.. <- data.frame(Apples = c(1,3,0,0,1), Pears = c(0,0,1,0,2), Beans =
c(1,2,1,0,0))
#list - my desired outcome
test_ <- list("1" = c("Apples","Beans"),
? ? ? ? ? ? ? ? ? ? ? ? ?"2" = c("Apples","Apples","Apples","Beans","Beans"),
? ? ? ? ? ? ? ? ? ? ? ? ?"3" = c("Pears","Beans"),
? ? ? ? ? ? ? ? ? ? ? ? ?"4" = c(NULL),
? ? ? ? ? ? ? ? ? ? ? ? ?"5" = c("Apples","Pears","Pears"))
Thanks
Josh
--
View this message in context: http://r.789695.n4.nabble.com/Probably-a-good-use-for-apply-tp4631883.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
On 05/31/2012 10:50 AM, LCOG1 wrote:
Hi all,
I Have a data frame test.. that I would like to convert into a list below
test_ but am unsure how to efficiently do this. I can do it in a for loop
but my data set is huge and it takes forever. Wondering how I can do this
more efficiently. So again how to I go from test.. to test_ below?
#Data frame
test..<- data.frame(Apples = c(1,3,0,0,1), Pears = c(0,0,1,0,2), Beans =
c(1,2,1,0,0))
#list - my desired outcome
test_<- list("1" = c("Apples","Beans"),
"2" = c("Apples","Apples","Apples","Beans","Beans"),
"3" = c("Pears","Beans"),
"4" = c(NULL),
"5" = c("Apples","Pears","Pears"))
Hi Josh, How about this? test.. Apples Pears Beans 1 1 0 1 2 3 0 2 3 0 1 1 4 0 0 0 5 1 2 0 indices2names<-function(x,xnames) return(rep(xnames,x)) apply(as.matrix(test..),1,indices2names,names(test..)) [[1]] [1] "Apples" "Beans" [[2]] [1] "Apples" "Apples" "Apples" "Beans" "Beans" [[3]] [1] "Pears" "Beans" [[4]] character(0) [[5]] [1] "Apples" "Pears" "Pears" Jim
This is great thank you. I think I am getting the hang of some of the apply functions. I am stuck again however. I have list test_ below and would like to apply the sample function using each element of each vector as the probability and return a TRUE or FALSE that I will ultimately sum the TRUES by vector. test_<- list(a=c(.85,.10),b=c(.99,.05)) #Write a function to sample based on labor force participation rates to determine presence of workers in household sampleWorker <- function(x) return(sample(c(TRUE,FALSE),x, replace = TRUE, prob = c(x, 1-x))) IsWorker.Hh_ <- lapply(test , sampleWorker) I am doing something wrong with the setup becuase i am getting an error about specifying probabilities incorrectly. The result I am looking for for IsWorker_ to be (assuming the .85, and . 99 probabilities 'win' from each vector and the lower values do not.
IsWorker_
$a [1]TRUE $b [1]TRUE but ultimately I will need to sum the TRUEs for each vector
IsWorker_
$a [1] 1 $b [1] 1 Thanks Josh -- View this message in context: http://r.789695.n4.nabble.com/Probably-a-good-use-for-apply-tp4631883p4631974.html Sent from the R help mailing list archive at Nabble.com.
Hi,
On Thu, May 31, 2012 at 1:08 PM, LCOG1 <jroll at lcog.org> wrote:
This is great thank you. ?I think I am getting the hang of some of the apply functions. ?I am stuck again however. ?I have list test_ below and would like to apply the sample function using each element of each vector as the probability and return a TRUE or FALSE that I will ultimately sum the TRUES by vector. test_<- list(a=c(.85,.10),b=c(.99,.05)) #Write a function to sample based on labor force participation rates to determine presence of workers in household sampleWorker <- function(x) return(sample(c(TRUE,FALSE),x, replace = TRUE, prob = c(x, 1-x)))
Your first problem is that sampleWorker() doesn't run with a single component of test_ so it can't possibly run in an apply statement. Please reread ?sample - the second argument is the size of the desired sample, but what you are passing is a non-integer vector of length 2. What do you actually want this to be? Then for prob, you're passing c(x, 1-x)) but x is again a non-integer vector of length 2, so that results in a vector of length 4, which is longer than the number of options sample() is choosing from. Do you perhaps want to pass only a single probability at a time? But even then you need to resolve the size problem. Sarah
IsWorker.Hh_ <- lapply(test , sampleWorker) I am doing something wrong with the setup becuase i am getting an error about specifying probabilities incorrectly. The result I am looking for for ?IsWorker_ to be (assuming the .85, and . 99 probabilities 'win' from each vector and the lower values do not.
IsWorker_
$a [1]TRUE $b [1]TRUE but ultimately I will need to sum the TRUEs for each vector
IsWorker_
$a [1] 1 $b [1] 1 Thanks Josh
Sarah Goslee http://www.functionaldiversity.org
Yes you are correct. I want need to change my sample number specification to the number of elements in the vector. So sampleWorker function should be: sampleWorker <- function(x) return(sample(c(TRUE,FALSE),length(x), replace = TRUE, prob = c(x, 1-x))) So this is where I get a little confused with using apply functions. Isnt x each element of each vector. So in the sample data I provide there are 4 x's, and each would be put into the sampleWorker function using the lapply. #sample data test_<- list(a=c(.85,.10),b=c(.99,.05)) To show what I want without using a list of vectors and instead just a single one see below: IsWorker.Hh_ <- lapply(c(.9,.1) , sampleWorker) #Returns: [[1]] [1] TRUE [[2]] [1] FALSE Now I just need to run through each vector of the list I specify, in this case test_. Then I need to sum the TRUES for each vector. So again if we assume the test_ data would result in a single TRUE for each vector (because of the .85 and .99 probabilities) the result would be
IsWorker_
$a [1] 1 $b [1] 1 Perhaps lapply isnt the right tool? I have seen a couple of comments on the list that say the plyr package is easy to figure out but you lose out on speed and that is my issue right now. I can do what I need to do using some for loops but its way way too slow. Any guidance is appreciated. Thanks guys Josh -----Original Message----- From: Sarah Goslee [mailto:sarah.goslee at gmail.com] Sent: Thursday, May 31, 2012 1:35 PM To: ROLL Josh F Cc: r-help at r-project.org Subject: Re: [R] Probably a good use for apply Hi,
On Thu, May 31, 2012 at 1:08 PM, LCOG1 <jroll at lcog.org> wrote:
This is great thank you. ?I think I am getting the hang of some of the apply functions. ?I am stuck again however. ?I have list test_ below and would like to apply the sample function using each element of each vector as the probability and return a TRUE or FALSE that I will ultimately sum the TRUES by vector. test_<- list(a=c(.85,.10),b=c(.99,.05)) #Write a function to sample based on labor force participation rates to determine presence of workers in household sampleWorker <- function(x) return(sample(c(TRUE,FALSE),x, replace = TRUE, prob = c(x, 1-x)))
Your first problem is that sampleWorker() doesn't run with a single component of test_ so it can't possibly run in an apply statement. Please reread ?sample - the second argument is the size of the desired sample, but what you are passing is a non-integer vector of length 2. What do you actually want this to be? Then for prob, you're passing c(x, 1-x)) but x is again a non-integer vector of length 2, so that results in a vector of length 4, which is longer than the number of options sample() is choosing from. Do you perhaps want to pass only a single probability at a time? But even then you need to resolve the size problem. Sarah
IsWorker.Hh_ <- lapply(test , sampleWorker) I am doing something wrong with the setup becuase i am getting an error about specifying probabilities incorrectly. The result I am looking for for ?IsWorker_ to be (assuming the .85, and . 99 probabilities 'win' from each vector and the lower values do not.
IsWorker_
$a [1]TRUE $b [1]TRUE but ultimately I will need to sum the TRUEs for each vector
IsWorker_
$a [1] 1 $b [1] 1 Thanks Josh
-- Sarah Goslee http://www.functionaldiversity.org
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120531/4420ea55/attachment.pl>