Skip to content
Back to formatted view

Raw Message

Message-ID: <20200821141522.GC68582@posteo.no>
Date: 2020-08-21T14:15:22Z
From: Rasmus Liland
Subject: filter() question
In-Reply-To: <2a3500a0-5862-3288-3b92-932dcbb20083@lisse.NA>

On 2020-08-21 13:45 +0200, Dr Eberhard Lisse wrote:
| 
| Eric, Rasmus,
| 
| thank you very much,
| 
| 	 ALLPAP  %>%
| 		 group_by(Provider) %>%
| 		 mutate( minDt=min(CollectionDate),
| 			 maxDt=max(CollectionDate)) %>%
| 		 summarize( minDt = min(minDt),
| 			 maxDt = max(maxDt), .groups="keep" ) %>%
| 		 ungroup() %>%
| 		 mutate(MAX_MIN_DATE = max(minDt),
| 			 MIN_MAX_DATE = min(maxDt)) %>%
| 		 distinct(MAX_MIN_DATE, MIN_MAX_DATE)
| 
| gives me
| 
| 	 # A tibble: 1 x 2
| 		MAX_MIN_DATE MIN_MAX_DATE
| 		<chr>        <chr>       
| 	 1 2010-02-05   2019-08-30  
| 
| which is correct, and what I wanted.
| 
| This is so cool :-)-O

Dear Eberhard,

handling Dates is a bit tricky in normal 
R, but as long as they are characters, 
like in your example there, everything 
is fine.  So I made this example based 
on Eric's example:

	set.seed(3)
	size <- 20
	x <- as.Date("2016-11-03") + 
	  sample(
	    0:30, 
	    size, 
	    repl=TRUE)
	provider <- paste("Dr", 
	  sample(
	    LETTERS[1:3],
	    size,
	    repl=TRUE))
	lDf <- data.frame(
	  Provider=provider,
	  CollectionDate=x,
	  stringsAsFactors=FALSE)
	
	Provider <- sort(unique(lDf$Provider))
	a <- t(sapply(Provider, function(provider, lDf) {
	    cd <- lDf[
	      lDf$Provider==provider,
	      "CollectionDate"]
	    c("Provider"=provider,
	      as.character(c(
	        "u"=min(cd),
	        "v"=max(cd))))
	  }, lDf=lDf))
	a

which yields

	     Provider u            v
	Dr A "Dr A"   "2016-11-06" "2016-12-01"
	Dr B "Dr B"   "2016-11-07" "2016-12-03"
	Dr C "Dr C"   "2016-11-04" "2016-11-12"

Before I did that, I thought about doing 
something with reshape2, but I could not 
come up with something good.

If you want to work with tibbles in that 
tidyverse thing, which probably can more 
easily work with Dates, rbinding tibbles 
together apparently works:

	a <- lapply(Provider, function(provider, lDf) {
	    cd <- lDf[
	      lDf$Provider==provider,
	      "CollectionDate"]
	    dplyr::tibble(
	      "Provider"=provider,
	      "u"=min(cd),
	      "v"=max(cd))
	  }, lDf=lDf)
	a <- do.call(rbind, a)
	a

which yields

	# A tibble: 3 x 3
	  Provider u          v
	  <chr>    <date>     <date>
	1 Dr A     2016-11-06 2016-12-01
	2 Dr B     2016-11-07 2016-12-03
	3 Dr C     2016-11-04 2016-11-12

Best,
Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200821/189ef2f6/attachment.sig>