Hi Eberhard,
Here is one possibility using dplyr.
library(dplyr)
set.seed(3)
## set up some fake data
dtV <- as.Date("2020-08-01") + 0:4
x <- sample(dtV,20,repl=TRUE)
provider <- sample(LETTERS[1:3],20,repl=TRUE)
lDf <-
data.frame(Provider=provider,CollectionDate=x,stringsAsFactors=FALSE)
## get min/max date for each provider
a <- lDf %>% dplyr::group_by( Provider ) %>%
dplyr::mutate( minDt=min(CollectionDate), maxDt=max(CollectionDate))
%>%
dplyr::summarize( u = min(minDt), v = max(maxDt) )
## get the common interval
c(max(a$u), min(a$v))
# [1] "2020-08-02" "2020-08-04"
HTH,
Eric
On Fri, Aug 21, 2020 at 12:34 PM Rasmus Liland <jral at posteo.no> wrote:
On 2020-08-21 09:03 +0200, Dr Eberhard Lisse wrote:
Hi,
I have a small test sample with lab
reports (PAP smears) from a number of
different providers. These have
Collection Dates and the relevant
columns glimpse() something like
this:
$ Provider <chr> "Dr C", "Dr D", "Dr C", "Dr D"
$ CollectionDate <chr> "2016-11-03", "2016-11-02", "2016-11-03",
I am looking to find (filter) the
reports which were collected in the
time period common to all providers?
Something like
the largest First Common CollectionDate
and
the smallest Last Common CollectionDate
How would I do that?
I can of course do this "manually", ie
collect all Providers and their first
and last Collection dates and then
find the Common First and Last one,
but wonder if there is an elegant way
of doing this :-)-O
Dear Eberhard,
Is each report in a csv file with those
two columns, and you want to unify them
into a dataframe with CollectionDate
along the rows, and other details for
each provider along the columns? This
can be done with various apply calls and
reshape. Can you please subset some
more example data here using dput. It
makes it so much easier.
/Rasmus