Skip to content

Probably a good use for apply

7 messages · jim holtman, Jim Lemon, LCOG1 +2 more

#
Hi all, 
  I Have a data frame test.. that I would like to convert into a list below
test_ but am unsure how to efficiently do this.  I can do it in a for loop
but my data set is huge and it takes forever.  Wondering how I can do this
more efficiently.  So again how to I go from test.. to test_ below?  
#Data frame
test.. <- data.frame(Apples = c(1,3,0,0,1), Pears = c(0,0,1,0,2), Beans =
c(1,2,1,0,0))

#list - my desired outcome
test_ <- list("1" = c("Apples","Beans"),
			  "2" = c("Apples","Apples","Apples","Beans","Beans"),
			  "3" = c("Pears","Beans"),
			  "4" = c(NULL),
			  "5" = c("Apples","Pears","Pears"))

Thanks

Josh

--
View this message in context: http://r.789695.n4.nabble.com/Probably-a-good-use-for-apply-tp4631883.html
Sent from the R help mailing list archive at Nabble.com.
#
try this:
+ c(1,2,1,0,0))
+     do.call(c, sapply(names(test..), function(.col){
+         rep(.col, test..[[.col]][.row])
+     }))
+ })
[[1]]
  Apples    Beans
"Apples"  "Beans"

[[2]]
 Apples1  Apples2  Apples3   Beans1   Beans2
"Apples" "Apples" "Apples"  "Beans"  "Beans"

[[3]]
  Pears   Beans
"Pears" "Beans"

[[4]]
character(0)

[[5]]
  Apples   Pears1   Pears2
"Apples"  "Pears"  "Pears"
On Wed, May 30, 2012 at 8:50 PM, LCOG1 <jroll at lcog.org> wrote:

  
    
#
On 05/31/2012 10:50 AM, LCOG1 wrote:
Hi Josh,
How about this?

test..
   Apples Pears Beans
1      1     0     1
2      3     0     2
3      0     1     1
4      0     0     0
5      1     2     0
indices2names<-function(x,xnames) return(rep(xnames,x))
apply(as.matrix(test..),1,indices2names,names(test..))
[[1]]
[1] "Apples" "Beans"

[[2]]
[1] "Apples" "Apples" "Apples" "Beans"  "Beans"

[[3]]
[1] "Pears" "Beans"

[[4]]
character(0)

[[5]]
[1] "Apples" "Pears"  "Pears"

Jim
#
This is great thank you.  I think I am getting the hang of some of the apply
functions.  I am stuck again however.  I have list test_ below and would
like to apply the sample function using each element of each vector as the
probability and return a TRUE or FALSE that I will ultimately sum the TRUES
by vector.

test_<- list(a=c(.85,.10),b=c(.99,.05))
#Write a function to sample based on labor force participation rates to
determine presence of workers in household
sampleWorker <- function(x) return(sample(c(TRUE,FALSE),x, replace = TRUE,
prob = c(x, 1-x)))
IsWorker.Hh_ <- lapply(test , sampleWorker)

I am doing something wrong with the setup becuase i am getting an error
about specifying probabilities incorrectly.

The result I am looking for for  IsWorker_ to be (assuming the .85, and . 99
probabilities 'win' from each vector and the lower values do not.
$a
[1]TRUE
$b
[1]TRUE

but ultimately I will need to sum the TRUEs for each vector
$a
[1] 1
$b
[1] 1

   
Thanks 

Josh

--
View this message in context: http://r.789695.n4.nabble.com/Probably-a-good-use-for-apply-tp4631883p4631974.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi,
On Thu, May 31, 2012 at 1:08 PM, LCOG1 <jroll at lcog.org> wrote:
Your first problem is that sampleWorker() doesn't run with a single
component of test_ so it can't possibly run in an apply statement.

Please reread ?sample - the second argument is the size of the desired
sample, but what you are passing is a non-integer vector of length 2.
What do you actually want this to be?

Then for prob, you're passing c(x, 1-x)) but x is again a non-integer
vector of length 2, so that results in a vector of length 4, which is
longer than the number of options sample() is choosing from.

Do you perhaps want to pass only a single probability at a time? But
even then you need to resolve the size problem.

Sarah

  
    
#
Yes you are correct.  I want need to change my sample number specification to the number of elements in the vector.  

So sampleWorker function should be:

sampleWorker <- function(x) return(sample(c(TRUE,FALSE),length(x), replace = TRUE, prob = c(x, 1-x)))

So this is where I get a little confused with using apply functions.  Isnt x each element of each vector.  So in the sample data I provide there are 4 x's, and each would be put into the sampleWorker function using the lapply.
#sample data 
test_<- list(a=c(.85,.10),b=c(.99,.05))

To show what I want without using a list of vectors and instead just a single one see below:

IsWorker.Hh_ <- lapply(c(.9,.1) , sampleWorker)
#Returns:
[[1]]
[1] TRUE

[[2]]
[1] FALSE

Now I just need to run through each vector of the list I specify, in this case test_.  Then I need to sum the TRUES for each vector.  So again if we assume the test_ data would result in a single TRUE for each vector (because of the .85 and .99 probabilities) the result would be
$a
 [1] 1
 $b
 [1] 1

Perhaps lapply isnt the right tool?  I have seen a couple of comments on the list that say the plyr package is easy to figure out but you lose out on speed and that is my issue right now.  I can do what I need to do using some for loops but its way way too slow.  Any guidance is appreciated.  Thanks guys

Josh



-----Original Message-----
From: Sarah Goslee [mailto:sarah.goslee at gmail.com] 
Sent: Thursday, May 31, 2012 1:35 PM
To: ROLL Josh F
Cc: r-help at r-project.org
Subject: Re: [R] Probably a good use for apply

Hi,
On Thu, May 31, 2012 at 1:08 PM, LCOG1 <jroll at lcog.org> wrote:
Your first problem is that sampleWorker() doesn't run with a single component of test_ so it can't possibly run in an apply statement.

Please reread ?sample - the second argument is the size of the desired sample, but what you are passing is a non-integer vector of length 2.
What do you actually want this to be?

Then for prob, you're passing c(x, 1-x)) but x is again a non-integer vector of length 2, so that results in a vector of length 4, which is longer than the number of options sample() is choosing from.

Do you perhaps want to pass only a single probability at a time? But even then you need to resolve the size problem.

Sarah
--
Sarah Goslee
http://www.functionaldiversity.org