Skip to content

Coding a new variable based on criteria in a dataset

3 messages · RaoulD, Ben Bolker, Hadley Wickham

#
Hi,

I'm a bit stuck and need some help with R code to code a variable F_R based
on a combination of conditions. 

The first condition would code F_R as "F" and would be based on the
min(Date) and Min(Time) for each combination of UniqueID & Reason. The
second condition would code the variable as "R" as it would be the rest of
the data that dont meet the first condition. 

For example: for "UID 1" & "Reason 1" the first record would be coded "F"
and the 4th record would be coded "R". 

   UniqueID   Reason       Date  Time
1     UID 1   Reason 1 19/12/2010 15:00
2     UID 1   Reason 2 19/12/2010 16:00
3     UID 1   Reason 3 19/12/2010 16:30
4     UID 1   Reason 1 20/12/2010 08:00
5     UID 1   Reason 2 20/12/2010 10:01
6     UID 1   Reason 3 20/12/2010 11:30
7     UID 1   Reason 1 21/12/2010 12:45
8     UID 1   Reason 2 21/12/2010 18:44
9     UID 1   Reason 3 21/12/2010 19:29
10    UID 2  Reason 1 19/12/2010 17:00
11    UID 2  Reason 2 19/12/2010 18:00
12    UID 2  Reason 3 19/12/2010 18:10
13    UID 2  Reason 1 20/12/2010 13:00
14    UID 2  Reason 2 20/12/2010 13:30
15    UID 2  Reason 3 20/12/2010 16:15 

Is a loop the most efficient way to do this or is there some pre-existing
function that can help me with this? The sample dataset is what is given
below.

Thanks in advance,
Raoul
#
RaoulD <raoul.t.dsouza <at> gmail.com> writes:
It isn't quite convenient to read the data posted below into R
(if it was originally tab-separated, that formatting got lost) but
ddply from the plyr package is good for this: something like (untested)

  d <- with(data,ddply(data,interaction(UniqueID,Reason),
                    function(x) {
                          ## make sure x is sorted by date/time here
                          x$F_R <- c("F",rep("R",nrow(x)-1))
                          x
                     })
[snip]
1 day later
#
Or a little more succinctly:

d <- ddply(data, c("UniqueID", "Reason"), transform, F_R =
c("F",rep("R",nrow(x)-1))

Hadley