Skip to content

filter() question

7 messages · Dr Eberhard W Lisse, Rasmus Liland, Eric Berger +1 more

#
Hi,

I have a small test sample with lab reports (PAP smears) from a number
of different providers.  These have Collection Dates and the relevant 
columns glimpse() something like this:

$ Provider       <chr> "Dr C", "Dr D", "Dr C", "Dr D"
$ CollectionDate <chr> "2016-11-03", "2016-11-02", "2016-11-03", "2016-11-03"


I am looking to find (filter) the reports which were collected in the
time period common to all providers?

Something like 

	 the largest First Common CollectionDate 
and 
	 the smallest Last Common CollectionDate

How would I do that? 

I can of course do this "manually", ie collect all Providers and their 
first and last Collection dates and then find the Common First and Last 
one, but wonder if there is an elegant way of doing this :-)-O



greetings, el
#
On 2020-08-21 09:03 +0200, Dr Eberhard Lisse wrote:
Dear Eberhard,

Is each report in a csv file with those 
two columns, and you want to unify them 
into a dataframe with CollectionDate 
along the rows, and other details for 
each provider along the columns?  This 
can be done with various apply calls and 
reshape.  Can you please subset some 
more example data here using dput.  It 
makes it so much easier.

/Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200821/c8448814/attachment.sig>
#
Hi Eberhard,
Here is one possibility using dplyr.

library(dplyr)
set.seed(3)

## set up some fake data
dtV <- as.Date("2020-08-01") + 0:4
x <- sample(dtV,20,repl=TRUE)
provider <- sample(LETTERS[1:3],20,repl=TRUE)
lDf <- data.frame(Provider=provider,CollectionDate=x,stringsAsFactors=FALSE)

## get min/max date for each provider
a <- lDf %>% dplyr::group_by( Provider ) %>%
  dplyr::mutate( minDt=min(CollectionDate), maxDt=max(CollectionDate)) %>%
  dplyr::summarize( u = min(minDt), v = max(maxDt) )

## get the common interval
c(max(a$u), min(a$v))

# [1] "2020-08-02" "2020-08-04"

HTH,
Eric
On Fri, Aug 21, 2020 at 12:34 PM Rasmus Liland <jral at posteo.no> wrote:

            

  
  
#
Eric, Rasmus,

thank you very much,

	 ALLPAP  %>%
		 group_by(Provider) %>%
		 mutate( minDt=min(CollectionDate),
			 maxDt=max(CollectionDate)) %>%
		 summarize( minDt = min(minDt),
			 maxDt = max(maxDt), .groups="keep" ) %>%
		 ungroup() %>%
		 mutate(MAX_MIN_DATE = max(minDt),
			 MIN_MAX_DATE = min(maxDt)) %>%
		 distinct(MAX_MIN_DATE, MIN_MAX_DATE)

gives me

	 # A tibble: 1 x 2
		MAX_MIN_DATE MIN_MAX_DATE
		<chr>        <chr>       
	 1 2010-02-05   2019-08-30  

which is correct, and what I wanted.

This is so cool :-)-O

el
On 21/08/2020 11:41, Eric Berger wrote:
[...]
[...]
[...]
-- 
Dr. Eberhard W. Lisse   \         /       Obstetrician & Gynaecologist 
el at lisse.NA             / *      |  Telephone: +264 81 124 6733 (cell)
PO Box 8421 Bachbrecht  \      /  If this email is signed with GPG/PGP
10007, Namibia           ;____/ Sect 20 of Act No. 4 of 2019 may apply
#
Using mutate followed by summarise in this case is completely unnecessary.

a <- (   lDf
     %>% dplyr::group_by( Provider )
     %>% dplyr::summarise( u = min( CollectionDate )
                         ,, v = max( CollectionDate )
                         )
     )
On August 21, 2020 2:41:26 AM PDT, Eric Berger <ericjberger at gmail.com> wrote:

  
    
#
On 2020-08-21 13:45 +0200, Dr Eberhard Lisse wrote:
| 
| Eric, Rasmus,
| 
| thank you very much,
| 
| 	 ALLPAP  %>%
| 		 group_by(Provider) %>%
| 		 mutate( minDt=min(CollectionDate),
| 			 maxDt=max(CollectionDate)) %>%
| 		 summarize( minDt = min(minDt),
| 			 maxDt = max(maxDt), .groups="keep" ) %>%
| 		 ungroup() %>%
| 		 mutate(MAX_MIN_DATE = max(minDt),
| 			 MIN_MAX_DATE = min(maxDt)) %>%
| 		 distinct(MAX_MIN_DATE, MIN_MAX_DATE)
| 
| gives me
| 
| 	 # A tibble: 1 x 2
| 		MAX_MIN_DATE MIN_MAX_DATE
| 		<chr>        <chr>       
| 	 1 2010-02-05   2019-08-30  
| 
| which is correct, and what I wanted.
| 
| This is so cool :-)-O

Dear Eberhard,

handling Dates is a bit tricky in normal 
R, but as long as they are characters, 
like in your example there, everything 
is fine.  So I made this example based 
on Eric's example:

	set.seed(3)
	size <- 20
	x <- as.Date("2016-11-03") + 
	  sample(
	    0:30, 
	    size, 
	    repl=TRUE)
	provider <- paste("Dr", 
	  sample(
	    LETTERS[1:3],
	    size,
	    repl=TRUE))
	lDf <- data.frame(
	  Provider=provider,
	  CollectionDate=x,
	  stringsAsFactors=FALSE)
	
	Provider <- sort(unique(lDf$Provider))
	a <- t(sapply(Provider, function(provider, lDf) {
	    cd <- lDf[
	      lDf$Provider==provider,
	      "CollectionDate"]
	    c("Provider"=provider,
	      as.character(c(
	        "u"=min(cd),
	        "v"=max(cd))))
	  }, lDf=lDf))
	a

which yields

	     Provider u            v
	Dr A "Dr A"   "2016-11-06" "2016-12-01"
	Dr B "Dr B"   "2016-11-07" "2016-12-03"
	Dr C "Dr C"   "2016-11-04" "2016-11-12"

Before I did that, I thought about doing 
something with reshape2, but I could not 
come up with something good.

If you want to work with tibbles in that 
tidyverse thing, which probably can more 
easily work with Dates, rbinding tibbles 
together apparently works:

	a <- lapply(Provider, function(provider, lDf) {
	    cd <- lDf[
	      lDf$Provider==provider,
	      "CollectionDate"]
	    dplyr::tibble(
	      "Provider"=provider,
	      "u"=min(cd),
	      "v"=max(cd))
	  }, lDf=lDf)
	a <- do.call(rbind, a)
	a

which yields

	# A tibble: 3 x 3
	  Provider u          v
	  <chr>    <date>     <date>
	1 Dr A     2016-11-06 2016-12-01
	2 Dr B     2016-11-07 2016-12-03
	3 Dr C     2016-11-04 2016-11-12

Best,
Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200821/189ef2f6/attachment.sig>
#
Rasmus,

thank you,

I am an elderly Gynecologist, dabbling a little, ie exactly the
clientele for which the tidyverse "thingy" was developed :-)-O. 

In addition I like readable code so I later understand what I was trying
to do :-)-O

el
On 2020-08-21 16:15 , Rasmus Liland wrote:
[...]
[...]