Skip to content

a replace for subset

5 messages · James C. Whanger, Elahe chalabi, Jeff Newmiller

#
Hi, 
I have a data set (mydata), which a part of this is like the following: 


'data.frame':   36190 obs. of 16 variables: 
$ RE                    : int  38 41 11 67 30 18 38 41 41 30 ... 
$ LU                     : int  4200 3330 530 4500 3000 1790 4700 3400 3640 4000 ... 
$ COUNTRY        : Factor w/ 4 levels "DE","FR","JP", "FR"? 
$Light                  : Factor w/2 levels   "ON","OFF","ON", ?. 
$OR                     : Factor w/2 levels   "S","T","S",?. 
$PAT                  : Factor w/3 levels   "low", "high", "middle",?. 


Now I want to plot RE vs LU with ggplot2 for all the possible cases, I know how to do subsetting for the data but I want to know is there any shorter way to do that? For example I want to have a plot for RE vs LU for (COUNTRY= FR, Light=off, OR=S, PAT=low) and one for (COUNTRY= FR, Light=on, OR=S, PAT=high) and ?., as you see doing subset is time consuming, is there any other way? 
Thank you for any help. 
Elahe
#
Would facet_wrap or facet_grid give you what you want?

On Sat, Apr 16, 2016 at 8:45 AM, ch.elahe via R-help <r-help at r-project.org>
wrote:

  
    
#
-Thank you James, well the problem of my type of data is that there can be many possible subsets and therefore plots, and I want to automatically generate them, and facet_wrap does not give me all the possible cases
On Saturday, April 16, 2016 6:01 AM, James C. Whanger <james.whanger at gmail.com> wrote:
Would facet_wrap or facet_grid give you what you want?
On Sat, Apr 16, 2016 at 8:45 AM, ch.elahe via R-help <r-help at r-project.org> wrote:
Hi,

  
    
#
Use the split function to automatically create a list of pre-subsetted 
data frames, and then generate your output however you wish to. For 
example (using Jim Lemon's sample data generator):

library(ggplot2)

mydata <- data.frame( RE = sample( 5:50, 100, TRUE)
                     , LU = sample( 1500:4500, 100 )
                     , COUNTRY = factor( sample( c( "DE","FR","JP","AU")
                                               , 100
                                               , TRUE
                                               )
                                       )
                     , Light = factor( sample( c( "ON", "OFF" )
                                             , 100
                                             , TRUE
                                             )
                                     )
                     , OR = factor( sample( c( "S", "T" )
                                          , 100
                                          , TRUE
                                          )
                                  )
                     , PAT = factor( sample( c( "low", "high", "middle" )
                                           ,100
                                           ,TRUE
                                           )
                                   )
                     )
# split wants you to specify a list of columns to create unique
# groups by;
# data frames are lists of columns;
# data frame indexing lets you specify a subset of columns
mydataList0 <- split( mydata
                     , mydata[ , c( "COUNTRY", "Light" ) ]
                     )
# you should use the str() function frequently in an interactive
# fashion to help you understand the data you are working with:
str( mydataList0 )

# if you try to specify a single column as a subset of columns,
# R will by default forget the "list of" aspect... to keep it, use 
# drop=FALSE
mydataList <- split( mydata
                    , mydata[ , c( "COUNTRY" ), drop = FALSE ]
                    )

# I happen to like packing information into a single plot where possible.
# Since you did not provide a minimial reproducible example, I cannot
# tell whether this will work for you. You can use some variant of 
# mydataList0 if you don't like this approach.
for ( idx in seq_along( mydataList ) ) {
     print( ggplot( mydataList[[ idx ]], aes( x=RE, y=LU, shape=Light ) ) +
             geom_point() +
             facet_grid( PAT ~ OR ) +
             ggtitle( paste( "Country ="
                           , mydataList[[ idx ]][1,"COUNTRY"]))
     )
}

For future reference, the Posting Guide mentions several good practices 
for asking questions online that will help you understand your own problem 
better as well as making it easier for us to provide answers.
On Sat, 16 Apr 2016, ch.elahe via R-help wrote:

            
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
#
On Sat, 16 Apr 2016, ch.elahe via R-help wrote:

            
Not true.

It may get cramped, but it does give you all of the cases listed in the 
levels of the factor you plot with. If you specify a vector of character 
strings then ggplot will automatically convert it to a factor using only 
those cases present in the data. You can use the "factor" function to 
specify how you want the data represented more precisely than this 
automatic conversion will represent it. Read ?factor.

Note that if you have missing cases throughout, you may encounter 
difficulties plotting some graphs due to not having any data.
[...]

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k