Help finding first value in a BY group

Fri, Mar 15, 2013 11:56 AM

Hi,

There is a potential gotcha with the approach of using head(..., 1) in each of the solutions that Arun has below, which is the assumption that the data is sorted, as is the case in the example data. It seems reasonable to consider that the real data at hand may not be entered in order or presorted.

If the data is not sorted (switching the order of the two K2 related entries):

Period <- c(1, 2, 3, 1, 2, 3, 4, 2, 1)
Forecast <- c(99, 103, 128, 63, 69, 72, 75, 201, 207)
SKU <- c("A1","A1","A1","X4","X4","X4","X4","K2","K2")

PeriodSKUForecast <- data.frame(Period, SKU, Forecast)

Period SKU Forecast
1      1  A1       99
2      2  A1      103
3      3  A1      128
4      1  X4       63
5      2  X4       69
6      3  X4       72
7      4  X4       75
8      2  K2      201
9      1  K2      207

A1  K2  X4 
 99 201  63

SKU Forecast
1  A1       99
2  K2      201
3  X4       63


Note that the wrong value for K2 is returned.

You would either have to pre-sort the data frame before using these approaches:

NewDF <- PeriodSKUForecast[with(PeriodSKUForecast, order(SKU, Period)), ]

Period SKU Forecast
1      1  A1       99
2      2  A1      103
3      3  A1      128
9      1  K2      207
8      2  K2      201
4      1  X4       63
5      2  X4       69
6      3  X4       72
7      4  X4       75

A1  K2  X4 
 99 207  63 


Or consider an approach that does not depend upon the sort order, but which subsets based upon the minimum value of Period for each SKU:

do.call(rbind, lapply(split(PeriodSKUForecast, PeriodSKUForecast$SKU), 
                      function(x) x[which.min(x$Period), ]))
   Period SKU Forecast
A1      1  A1       99
K2      1  K2      207
X4      1  X4       63

or remove the Period column if you don't want it:

function(x) x[which.min(x$Period), -1]))
   SKU Forecast
A1  A1       99
K2  K2      207
X4  X4       63



Regards,

Marc Schwartz

On Mar 15, 2013, at 12:37 PM, arun <smartpink111 at yahoo.com> wrote:

Help finding first value in a BY group

Thread (4 messages)