Skip to content

subsetting question

7 messages · MacQueen, Don, Vin Cheng, Duncan Murdoch +1 more

#
Assuming datums is a vector of the unique dates in Date... perhaps
  datums <- sort(unique(dataset1$Date))

I usually set it up like this

for (i in 1:length(datums) ) {

  crnt.date <- datums[i]
  tmpdat <- subset(dataset1, Date==crnt.date)
  cat(i, format(crnt.date), 'dim(tmpdat)',dim(tmpdat),'\n\n')

 ## use tmpdat for the multiple actions

}

The extra step of creating a subset helps one check that everything is
working as expected. It has no noticeable effect on performance with
datasets of the size I normally work with.
#
Hi,
 
I'm trying to group rows in a dataframe with SPCLORatingValue factor >16 and summing the Wgt's that correspond to this condition.  There are 100 dataframes in a list.  
 
Some of the dataframes won't have any rows that have this condition SPCLORatingValue>16 and therefore no corresponding weight.  
 
My problem is that I need to have a corresponding value for each dataframe in the list - so 100 values. 
 
If dataframe 44 doesn't have any SPCLORatingValue>16, then I end up getting a vector that's 99 long vs. 100.  putting value 45 into 44's slot and so on.
 
Is there either an if/else statement or argument I can place into subset to put a 0 for the data frames that don't have SPCLORatingValue>16?
 
GenEval[18,1:100] <- t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x) summaryBy(Wgt ~ SPCLORatingValue, data=x, FUN=c(sum))),SPCLORatingValue>16),FUN=c(sum),order=FALSE))
 
Any help or guidance would be greatly appreciated!
Many Thanks,
Vince
#
On 20/05/2015 7:13 PM, Vin Cheng wrote:
The summaryBy function is not in base R.  There's a function with that
name in the doBy package; is that the one you're using?

You doing say how to do the grouping, and I can't read your code to
figure it out, but this code will do what you want with suitable inputs:

by(df, group, function(subset) with(subset, sum(Wgt[SPCLORatingValue >
16])))

where df is your dataframe, and group is a variable that defines the groups.

Duncan Murdoch
#
Can you show a small self-contained example of you data and expected
results?
I tried to make one and your expression returned a single number in a 1 by
1 matrix.

library(doBy)
Generation<-list(
   data.frame(Wgt=c(1,2,4), SPCLORatingValue=c(10,11,12)),
   data.frame(Wgt=c(8,16), SPCLORatingValue=c(15,17)),
   data.frame(Wgt=c(32,64), SPCLORatingValue=c(19,20)))
 t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x)
summaryBy(Wgt ~ SPCLORatingValue, data=x,
FUN=c(sum))),SPCLORatingValue>16),FUN=c(sum),order=FALSE))
#              1
#Wgt.sum.sum 112
str(.Last.value)
# num [1, 1] 112
# - attr(*, "dimnames")=List of 2
#  ..$ : chr "Wgt.sum.sum"
#  ..$ : chr "1"

Two ways of dealing with the problem you verbally described are
(a) determine which elements of the input you can process (e.g., which
have some values>16) and use subscripting on both the left and right
side of the assignment operator to put the results in the right place.
E.g.,
    x <- c(-1, 1, 2)
    ok <- x>0
    x[ok] <- log(x[ok])
(b) make your function handle any case so you don't have to do any
subsetting on either side.  In your case it may be easy since
sum(zeroLongNumericVector) is 0. In other cases you may want to use ifelse,
as in
   x <- c(-1, 1, 2)
   x <- ifelse(x>0, log(x), x)



Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, May 20, 2015 at 4:13 PM, Vin Cheng <newrnewbie at hotmail.com> wrote:

            

  
  
#
Thanks William/Duncan!
 
Duncan - Yes - I am using the doBy package.
 
running this line on the sample data below gives weights for V5,V44, & V2.  Ideally I would like 0's for V8 and V10 in the output.
 
So it would look like:
e<-structure(matrix(c("V5", "0.008714910", "V8", "0", "V10", "0", "V44", "0.004357455", "V2", "0.008714910"),nrow = 2))
 
 
This is far as I've gotten by subsetting and  summing:
a<-t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(c,function(x) summaryBy(Wgt ~ SPCLORatingValue, data=x, FUN=c(sum))),SPCLORatingValue>16),FUN=c(sum),order=FALSE))
 
All help/guidance is much appreciated!  Thanks Vince!
 
Sample data example:
c<-structure(list(V5 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2), Wgt = c(0.00435745520833333, 0.00435745520833333, 
0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
0.00435745520833333, 0.00435745520833333, 0.00435745520833333
), SPCLORatingValue = c(11L, 15L, 14L, 15L, 14L, 14L, 16L, 19L, 
13L, 17L, 11L)), .Names = c("WgtBand", "Wgt", "SPCLORatingValue"
), row.names = 12:22, class = "data.frame"), V8 = structure(list(
    WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Wgt = c(0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333), SPCLORatingValue = c(14L, 15L, 15L, 
    12L, 15L, 12L, 13L, 15L, 14L, 15L, 14L)), .Names = c("WgtBand", 
"Wgt", "SPCLORatingValue"), row.names = 12:22, class = "data.frame"), 
    V10 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 
    2, 2, 2), Wgt = c(0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333
    ), SPCLORatingValue = c(15L, 13L, 14L, 14L, 13L, 13L, 13L, 
    15L, 15L, 13L, 14L)), .Names = c("WgtBand", "Wgt", "SPCLORatingValue"
    ), row.names = 12:22, class = "data.frame"), V44 = structure(list(
        WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Wgt = c(0.00435745520833333, 
        0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
        0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
        0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
        0.00435745520833333), SPCLORatingValue = c(13L, 14L, 
        16L, 15L, 14L, 14L, 18L, 13L, 16L, 15L, 11L)), .Names = c("WgtBand", 
    "Wgt", "SPCLORatingValue"), row.names = 12:22, class = "data.frame"), 
    V2 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 
    2, 2), Wgt = c(0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333
    ), SPCLORatingValue = c(13L, 14L, 15L, 15L, 15L, 14L, 12L, 
    16L, 17L, 15L, 19L)), .Names = c("WgtBand", "Wgt", "SPCLORatingValue"
    ), row.names = 12:22, class = "data.frame")), .Names = c("V5", 
"V8", "V10", "V44", "V2"))
structure(list(V5 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2), Wgt = c(0.00435745520833333, 0.00435745520833333, 
0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
0.00435745520833333, 0.00435745520833333, 0.00435745520833333
), SPCLORatingValue = c(11L, 15L, 14L, 15L, 14L, 14L, 16L, 19L, 
13L, 17L, 11L)), .Names = c("WgtBand", "Wgt", "SPCLORatingValue"
), row.names = 12:22, class = "data.frame"), V8 = structure(list(
    WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Wgt = c(0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333), SPCLORatingValue = c(14L, 15L, 15L, 
    12L, 15L, 12L, 13L, 15L, 14L, 15L, 14L)), .Names = c("WgtBand", 
"Wgt", "SPCLORatingValue"), row.names = 12:22, class = "data.frame"), 
    V10 = structure(list(WgtBand = c(2, 2, 2, 2, 2, 2, 2, 2, 
    2, 2, 2), Wgt = c(0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333, 
    0.00435745520833333, 0.00435745520833333, 0.00435745520833333
    ), SPCLORatingValue = c(15L, 13L, 14L, 14L, 13L, 13L, 13L, 
    15L, 15L, 13L, 14L)), .Names = c("WgtBand", "Wgt", "SPCLORatingValue"))))
 
 
 
 
 
 
 
 
 
 
 
 

 
From: wdunlap at tibco.com
Date: Wed, 20 May 2015 22:12:01 -0700
Subject: Re: [R] Subset and 0 replace?
To: newrnewbie at hotmail.com
CC: r-help at r-project.org

Can you show a small self-contained example of you data and expected results?I tried to make one and your expression returned a single number in a 1 by 1 matrix.
library(doBy)Generation<-list(   data.frame(Wgt=c(1,2,4), SPCLORatingValue=c(10,11,12)),   data.frame(Wgt=c(8,16), SPCLORatingValue=c(15,17)),   data.frame(Wgt=c(32,64), SPCLORatingValue=c(19,20))) t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x) summaryBy(Wgt ~ SPCLORatingValue, data=x, FUN=c(sum))),SPCLORatingValue>16),FUN=c(sum),order=FALSE))#              1#Wgt.sum.sum 112str(.Last.value)# num [1, 1] 112# - attr(*, "dimnames")=List of 2#  ..$ : chr "Wgt.sum.sum"#  ..$ : chr "1"
Two ways of dealing with the problem you verbally described are(a) determine which elements of the input you can process (e.g., whichhave some values>16) and use subscripting on both the left and rightside of the assignment operator to put the results in the right place.  E.g.,    x <- c(-1, 1, 2)    ok <- x>0    x[ok] <- log(x[ok])(b) make your function handle any case so you don't have to do anysubsetting on either side.  In your case it may be easy since sum(zeroLongNumericVector) is 0. In other cases you may want to use ifelse,as in   x <- c(-1, 1, 2)   x <- ifelse(x>0, log(x), x)

Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, May 20, 2015 at 4:13 PM, Vin Cheng <newrnewbie at hotmail.com> wrote:
Hi,



I'm trying to group rows in a dataframe with SPCLORatingValue factor >16 and summing the Wgt's that correspond to this condition.  There are 100 dataframes in a list.



Some of the dataframes won't have any rows that have this condition SPCLORatingValue>16 and therefore no corresponding weight.



My problem is that I need to have a corresponding value for each dataframe in the list - so 100 values.



If dataframe 44 doesn't have any SPCLORatingValue>16, then I end up getting a vector that's 99 long vs. 100.  putting value 45 into 44's slot and so on.



Is there either an if/else statement or argument I can place into subset to put a 0 for the data frames that don't have SPCLORatingValue>16?



GenEval[18,1:100] <- t(summaryBy(Wgt.sum~as.numeric(.id),data=subset(ldply(Generation,function(x) summaryBy(Wgt ~ SPCLORatingValue, data=x, FUN=c(sum))),SPCLORatingValue>16),FUN=c(sum),order=FALSE))



Any help or guidance would be greatly appreciated!

Many Thanks,

Vince










______________________________________________

R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.
#
I renamed your 'c' to be 'toyData' and your 'e' to be 'desiredResult'.  Do
you
want the following, which uses only base R code?
FUN=function(V)with(V, sum(Wgt[SPCLORatingValue>16])),
              FUN.VALUE=0)
         V5          V8         V10         V44          V2
0.008714910 0.000000000 0.000000000 0.004357455 0.008714910

It what is in your desired result but in a more useful format (e.g., numbers
instead of character strings for sum).
[,1]          [,2] [,3]  [,4]          [,5]
[1,] "V5"          "V8" "V10" "V44"         "V2"
[2,] "0.008714910" "0"  "0"   "0.004357455" "0.008714910"


Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, May 21, 2015 at 9:50 AM, Vin Cheng <newrnewbie at hotmail.com> wrote:

            

  
  
#
This is perfect!  Thanks William!!!

Vince