Skip to content

how can I convert a long to wide matrix?

7 messages · Marna Wagley, Jim Lemon, Jeff Newmiller

#
Hi R user,
I was trying to convert a long matrix to wide? I have an example and would
like to get a table (FinalData1):


FinalData1
         B1    B2
id_X   "A"   "B"
id_Y   "A"   "B"

but I got the following table using the following code.

FinalData1

     B1  B2

id_X "A" "A"

id_Y "A" "B"


the code and the example data I used are given below. Is there any
suggestions to fix the problem?


dat<-structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("id_X",


"id_Y"), class = "factor"), EventDate = structure(c(4L, 5L, 2L,

3L, 1L), .Label = c("9/15/16", "9/15/17", "9/7/16", "9/8/16",

"9/9/16"), class = "factor"), timeGroup = structure(c(1L, 1L,

2L, 1L, 2L), .Label = c("B1", "B2"), class = "factor"), SITE = structure(c(
1L,

1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor")), .Names = c("ID",

"EventDate", "timeGroup", "SITE"), class = "data.frame", row.names = c(NA,

-5L))


tmp <- split(dat, dat$ID)

tmp1 <- do.call(rbind, lapply(tmp, function(dat){

tb <- table(dat$timeGroup)

idx <- which(tb>0)

tb1 <- replace(tb, idx, as.character(dat$SITE))

}))


tmp1

FinalData<-print(tmp1, quote=FALSE)
#
Hi Marna,
Try this:

library(prettyR)
stretch_df(dat,idvar="ID",to.stretch=c("EventDate","SITE"))

Jim
On Wed, May 2, 2018 at 8:24 AM, Marna Wagley <marna.wagley at gmail.com> wrote:
#
Hi Jim,
Thank you very much for your suggestions. I used it but it gave me three
sites. But actually I do have only two sites "Id_X" and "Id_y" . In fact
"A" is repeated two times for "Id_X". If it is repeated, I would like to
take the first one among many repeated values.

dat<-structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("id_X",


"id_Y"), class = "factor"), EventDate = structure(c(4L, 5L, 2L,

3L, 1L), .Label = c("9/15/16", "9/15/17", "9/7/16", "9/8/16",

"9/9/16"), class = "factor"), timeGroup = structure(c(1L, 1L,

2L, 1L, 2L), .Label = c("B1", "B2"), class = "factor"), SITE = structure(c(
1L,

1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor")), .Names = c("ID",

"EventDate", "timeGroup", "SITE"), class = "data.frame", row.names = c(NA,

-5L))

library(prettyR)

stretch_df(dat,idvar="ID",to.stretch=c("EventDate","SITE"))

ID timeGroup EventDate_1 EventDate_2 EventDate_3 SITE_1 SITE_2 SITE_3
1 id_X        B1      9/8/16      9/9/16     9/15/17      A      A      B
2 id_Y        B1      9/7/16     9/15/16        <NA>      A      B   <NA>
Basically I am looking for like following table

ID timeGroup EventDate_1 EventDate_2 EventDate_3 SITE_1 SITE_2
1 id_X        B1      9/8/16      9/9/16     9/15/17      A      B
2 id_Y        B1      9/7/16     9/15/16        <NA>      A      B

Thanks
On Tue, May 1, 2018 at 3:32 PM, Jim Lemon <drjimlemon at gmail.com> wrote:

            

  
  
#
Hi Marna,
I think this is due to having three rows for id_X and only two for
id_Y. The function creates a data frame with enough columns to hold
the greatest number of values for each ID variable. Notice that the
SITE_n columns contain three values for id_X (A, A, B) and two for
id_Y (A, B, NA) as there was no third occasion of measurement for the
latter. Even though there are only two _values_ for SITE, there must
be enough space for three. In your desired output, SITE for the second
occasion of measurement is wrong (it should be "A"), and for the third
occasion it is unknown. Even if there was only one value for SITE in
the original data frame, it should be repeated for the correct number
of observations. I think you may be mixing up case ID with location of
observation.

Jim
On Wed, May 2, 2018 at 8:48 AM, Marna Wagley <marna.wagley at gmail.com> wrote:
#
Hi Jim,
The data set is correct. I took two readings from the "SITE A" within a
short time interval, therefore I want to take the first value if there are
repeated within a same group of "timeGroup".
Therefore I wanted following

FinalData1
         B1    B2
id_X   "A"   "B"
id_Y   "A"   "B"

thanks,
On Tue, May 1, 2018 at 4:05 PM, Jim Lemon <drjimlemon at gmail.com> wrote:

            

  
  
#
Hi Marna,
This is a condition that the function cannot handle. It would be
possible to reformat the result based on the time intervals, but the
stretch_df function doesn't try to interpret the values, just
stretches them out to a wide format.

Jim
On Wed, May 2, 2018 at 9:16 AM, Marna Wagley <marna.wagley at gmail.com> wrote:
#
Here is a stab in the dark. I agree with Jim that the description of the 
problem is hard to follow. The original posting being in HTML format did 
not help.

#########
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)

# indenting was just a side-effect of me cleaning up the HTML mess
dat <- structure( list( ID = structure( c( 1L, 1L, 1L, 2L, 2L)
                                       , .Label = c("id_X","id_Y")
                                       , class = "factor"
                                       )
                       , EventDate = structure( c( 4L, 5L, 2L
                                                 , 3L, 1L )
                                              , .Label = c( "9/15/16"
                                                          , "9/15/17"
                                                          , "9/7/16"
                                                          , "9/8/16"
                                                          , "9/9/16"
                                                          )
                                              , class = "factor"
                                              )
                       , timeGroup = structure( c( 1L, 1L, 2L, 1L, 2L)
                                              , .Label = c("B1", "B2")
                                              , class = "factor"
                                              )
                       , SITE = structure( c( 1L, 1L, 2L, 1L, 2L)
                                         , .Label = c("A", "B" )
                                         , class = "factor"
                                         )
                       )
                 , .Names = c( "ID", "EventDate"
                             , "timeGroup", "SITE")
                 , class = "data.frame"
                 , row.names = c(NA, -5L)
                 )
dat2 <- (   dat
         %>% mutate( EventDate = as.Date( as.character( EventDate )
                                        , format = "%m/%d/%y"
                                        )
                   )
         %>% arrange( ID, timeGroup, EventDate )
         %>% group_by( ID, timeGroup )
         %>% top_n( 1, EventDate )
         %>% ungroup
         )
dat2
#> # A tibble: 4 x 4
#>   ID    EventDate  timeGroup SITE
#>   <fct> <date>     <fct>     <fct>
#> 1 id_X  2016-09-09 B1        A
#> 2 id_X  2017-09-15 B2        B
#> 3 id_Y  2016-09-07 B1        A
#> 4 id_Y  2016-09-15 B2        B
dat3a <- (   dat2
          %>% mutate( timeGroup = paste( "EventDate"
                                       , timeGroup
                                       , sep="_"
                                       )
                    )
          %>% select( ID, timeGroup, EventDate )
          %>% spread( timeGroup, EventDate )
          )
dat3a
#> # A tibble: 2 x 3
#>   ID    EventDate_B1 EventDate_B2
#>   <fct> <date>       <date>
#> 1 id_X  2016-09-09   2017-09-15
#> 2 id_Y  2016-09-07   2016-09-15
dat3b <- (   dat2
          %>% mutate( timeGroup = paste( "SITE"
                                       , timeGroup
                                       , sep = "_"
                                       )
                    )
          %>% select( ID, timeGroup, SITE )
          %>% spread( timeGroup, SITE )
          )
dat3b
#> # A tibble: 2 x 3
#>   ID    SITE_B1 SITE_B2
#>   <fct> <fct>   <fct>
#> 1 id_X  A       B
#> 2 id_Y  A       B
dat4 <- (   dat3a
         %>% left_join( dat3b, by = "ID" ) )
dat4
#> # A tibble: 2 x 5
#>   ID    EventDate_B1 EventDate_B2 SITE_B1 SITE_B2
#>   <fct> <date>       <date>       <fct>   <fct>
#> 1 id_X  2016-09-09   2017-09-15   A       B
#> 2 id_Y  2016-09-07   2016-09-15   A       B
#########
On Wed, 2 May 2018, Jim Lemon wrote:

            
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k