Skip to content

Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

10 messages · Mark Sharp, Daniel Nordlund, Muhuri, Pradip (SAMHSA/CBHSQ) +1 more

#
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260
#
Pradip,

mutate() works on the entire column as a vector so that you find the maximum of the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 <- data1
data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

data3
  id    mrjdate    cocdate    inhdate    haldate    oidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msharp at TxBiomed.org





NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.
#
On 11/8/2014 8:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
I am not familiar with the mutate() function from dplyr, but you can get 
your wanted results as follows:

data2 <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA
#
Hi Dan,

Thank you so much for sending me your code that provides me desired results. But, I don't understand  why I am getting the follow warning message, In FUN(newX[, i], ...) : no non-missing arguments, returning NA. Any thoughts?

Regards,

Pradip



data2x <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))

Warning message:
In FUN(newX[, i], ...) : no non-missing arguments, returning NA
id    mrjdate    cocdate    inhdate    haldate    oidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Nordlund
Sent: Sunday, November 09, 2014 5:33 AM
To: r-help at r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
On 11/8/2014 8:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
I am not familiar with the mutate() function from dplyr, but you can get your wanted results as follows:

data2 <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi Mark,

Your code has also given me the results I expected.  Thank you so much for your help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: Mark Sharp [mailto:msharp at TxBiomed.org] 
Sent: Sunday, November 09, 2014 3:01 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help at r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Pradip,

mutate() works on the entire column as a vector so that you find the maximum of the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 <- data1
data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

data3
  id    mrjdate    cocdate    inhdate    haldate    oidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msharp at TxBiomed.org





NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.
#
You could try

library(dplyr)
data1 %>% 

      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                       na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04

A.K.
On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260






______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.    But,  I am getting the following warning message.  Any suggestions on how to avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#################################################################
data1 %>% 
+   
+   rowwise() %>%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+                              na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

  id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com] 
Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help at r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %>% 

      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                       na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04

A.K.
On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260






______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Dear Pradip,
The minimum and maximum of a numeric empty set are ?+Inf? and
        ?-Inf? 

One of the rows in your dataset is all `NAs.`  I am not sure you want to keep that row with all NAs.  You could remove it and run the code or keep it and run with that warning.

data1 <- data1[rowSums(is.na(data1[,-1]))!=4,]

data1 %>% 

      rowwise()%>%
      mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')


A.K.
On Sunday, November 9, 2014 9:16 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote:
Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.    But,  I am getting the following warning message.  Any suggestions on how to avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#################################################################
data1 %>% 
+  
+   rowwise() %>%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+                              na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

  id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----

Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help at r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %>% 

      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                       na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: <by row>

id    mrjdate    cocdate    inhdate    haldate    oldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04

A.K.
On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov> wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260






______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
On 11/9/2014 3:05 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
It means what it says.  In this case, for id=2 there are no non-missing 
values.  Since, na.rm was set to true, it is just warning you that since 
there was nothing left to get the max of, it is returning NA.

Dan
#
Hi Arun and Dennis,

This is just an FYI.

You're right - In one row, there are all NA's in  the four  "date" columns.  I have tested below the "TRUEness" of the condition Arun has set.

is.logical(data1[rowSums(is.na(data1[,-1]))!=4,])
[1] FALSE

All these 3 approaches below provide the exact same results.

# Approach 1 (suggested by Arun): The code gives the expected results, but with a warning message.
data1 %>% 

   rowwise() %>%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                             na.rm=TRUE), origin='1970-01-01'))

# Approach 2: This code (suggested by Dan) does not provide now a warning message although it provided such message earlier.
data2x <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))


# Approach 2: This code (suggested by Mark) does not provide a warning message
data2 <- data1
data2$oidflag <- as.Date(sapply(seq_along(data2$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")


##########################  ends here ################

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com] 
Sent: Sunday, November 09, 2014 10:18 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help at r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)



Dear Pradip,