Skip to content

Excluding fixed number of rows from calculation while summarizing using ddply() function.

9 messages · siddu479, arun, Jeff Newmiller

#
Hello All,

   I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)

Unique,StepNo,Data1,Data2    #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5           #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*A,1,5,6 *
A,1,7,8           #Exclude this last row for this "StepNo" and "Unique"
combination.
A,2,9,10         #Exclude this row because this 1st row for this "StepNo"
and "Unique" combination. 
*A,2,45,25*
A,2,10,11      #Exclude this last row for this "StepNo" and "Unique"
combination.
B,2,34,12      #Exclude this 1st row for this "StepNo" and "Unique"
combination. 
*B,2,5,6
B,2,7,8*
B,2,6,7           #Exclude this last row for this "StepNo" and "Unique"
combination.
B,3,1,2           #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,3,3,4*
B,3,4,5          #Exclude this last row for this "StepNo" and "Unique"
combination.

My existing code to calculate mean* for all rows* is 
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")   
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*   # Calculating
mean for each Unique and StepNo combination and summarizing the results.

*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo)))  #Just a skeleton
script.

Please revert to me if my question is not clear.







-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.
#
dat1<-read.table(text="
Unique,StepNo,Data1,Data2
A,1,4,5?? 
A,1,5,6
A,1,7,8?? 
A,2,9,10? 
A,2,45,25
A,2,10,11 
B,2,34,12 
B,2,5,6
B,2,7,8
B,2,6,7? 
B,3,1,2? 
B,3,3,4
B,3,4,5? 
",sep=",",header=TRUE,stringsAsFactors=FALSE)


dat2<-ddply(dat1,.(Unique,StepNo),function(x) x[c(1,nrow(x)),])

? dat1$newcoldat1<-TRUE
?dat2$newcoldat2<-TRUE
?dat3<-merge(dat1,dat2,all=TRUE)
dat4<-dat3[is.na(dat3$newcoldat2),1:4]

dat4
#?? Unique StepNo Data1 Data2
#2?????? A????? 1???? 5???? 6
#6?????? A????? 2??? 45??? 25
#7?????? B????? 2???? 5???? 6
#9?????? B????? 2???? 7???? 8
#12????? B????? 3???? 3???? 4

?ddply(dat4,.(Unique,StepNo),numcolwise(mean))
#? Unique StepNo Data1 Data2
#1????? A????? 1???? 5???? 6
#2????? A????? 2??? 45??? 25
#3????? B????? 2???? 6???? 7
#4????? B????? 3???? 3???? 4

A.K.

----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Sunday, November 4, 2012 9:40 AM
Subject: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function.

Hello All,

?  I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)

Unique,StepNo,Data1,Data2? ? #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5? ? ? ? ?  #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*A,1,5,6 *
A,1,7,8? ? ? ? ?  #Exclude this last row for this "StepNo" and "Unique"
combination.
A,2,9,10? ? ? ?  #Exclude this row because this 1st row for this "StepNo"
and "Unique" combination. 
*A,2,45,25*
A,2,10,11? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
B,2,34,12? ? ? #Exclude this 1st row for this "StepNo" and "Unique"
combination. 
*B,2,5,6
B,2,7,8*
B,2,6,7? ? ? ? ?  #Exclude this last row for this "StepNo" and "Unique"
combination.
B,3,1,2? ? ? ? ?  #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,3,3,4*
B,3,4,5? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.

My existing code to calculate mean* for all rows* is 
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")? 
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*?  # Calculating
mean for each Unique and StepNo combination and summarizing the results.

*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo)))? #Just a skeleton
script.

Please revert to me if my question is not clear.







-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi,
One more way:
dat1<-read.table(text="
Unique,StepNo,Data1,Data2
A,1,4,5?? 
A,1,5,6
A,1,7,8?? 
A,2,9,10? 
A,2,45,25
A,2,10,11 
B,2,34,12 
B,2,5,6
B,2,7,8
B,2,6,7? 
B,3,1,2? 
B,3,3,4
B,3,4,5? 
",sep=",",header=TRUE,stringsAsFactors=FALSE)

?dat2<-dat1[!(!duplicated(dat1[,1:2])|!duplicated(dat1[,1:2],fromLast=TRUE)),]
library(plyr)

ddply(dat2,.(Unique,StepNo),numcolwise(mean))
#? Unique StepNo Data1 Data2
#1????? A????? 1???? 5???? 6
#2????? A????? 2??? 45??? 25
#3????? B????? 2???? 6???? 7
#4????? B????? 3???? 3???? 4
A.K.


----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Sunday, November 4, 2012 9:40 AM
Subject: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function.

Hello All,

?  I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)

Unique,StepNo,Data1,Data2? ? #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5? ? ? ? ?  #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*A,1,5,6 *
A,1,7,8? ? ? ? ?  #Exclude this last row for this "StepNo" and "Unique"
combination.
A,2,9,10? ? ? ?  #Exclude this row because this 1st row for this "StepNo"
and "Unique" combination. 
*A,2,45,25*
A,2,10,11? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
B,2,34,12? ? ? #Exclude this 1st row for this "StepNo" and "Unique"
combination. 
*B,2,5,6
B,2,7,8*
B,2,6,7? ? ? ? ?  #Exclude this last row for this "StepNo" and "Unique"
combination.
B,3,1,2? ? ? ? ?  #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,3,3,4*
B,3,4,5? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.

My existing code to calculate mean* for all rows* is 
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")? 
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*?  # Calculating
mean for each Unique and StepNo combination and summarizing the results.

*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo)))? #Just a skeleton
script.

Please revert to me if my question is not clear.







-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi Arun,

   Thanks for your reply but your script is removing only one row( first row
and last row) for each Unique and StepNo combination and calculating mean
for the rest of rows.
For below data , your script removing the #'s rows perfectly.
But in reality I may need to ignore *say first 10 rows and last 20 rows for
each Unique and StepNo combination.
* for statistics calculation.
 Unique StepNo Data1 Data2
1      A      1     4     5      #Your script removing this row
successfully.
2      A      1     5     6
3      A      1     7     8    
4      A      1     3     4      
5      A      1     1     1      #Your script removing this row
successfully.
6      B      1     2     4      #Your script removing this row
successfully.
7      B      1     3     5   
8      B      1     4     5
9      B      1     5     6
10    B      1     6     7      #Your script removing this row successfully.

Can you modify your script to get my requirement like below (making it
generic, here *N=2*, removing first 2 lines and last 2 lines.. *sometimes I
may have two numbers N1 & N2 (no.of rows need to be removed from and top and
bottom respectively*)

 Unique StepNo Data1 Data2
1      A      1     4     5      #Ignore this
2      A      1     5     6      #Ignore this
3      A      1     7     8    
4      A      1     3     4      #Ignore this
5      A      1     1     1      #Ignore this
6      B      1     2     4      #Ignore this
7      B      1     3     5      #Ignore this
8      B      1     4     5
9      B      1     5     6     #Ignore this
10    B      1     6     7     #Ignore this

and then calculate the statistics using ddply.

I hope my problem statement is much clear now.







-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648447.html
Sent from the R help mailing list archive at Nabble.com.
#
HI Siddu,

Sorry, I looked only at your first statement:
"in this example case I need to exclude the first row and last row(N=1)"

You can try this: 
dat1<-read.table(text="
Unique StepNo Data1 Data2
1????? A????? 1??? 4??? 5???? 
2????? A????? 1??? 5??? 6???? 
3????? A????? 1??? 7??? 8?? 
4????? A????? 1??? 3??? 4???? 
5????? A????? 1??? 1??? 1???? 
6????? B????? 1??? 2??? 4????? 
7????? B????? 1??? 3??? 5???? 
8????? B????? 1??? 4??? 5
9????? B????? 1??? 5??? 6??? 
10??? B????? 1??? 6??? 7?? 
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-ddply(dat1,.(Unique,StepNo), head,2)
?dat3<-ddply(dat1,.(Unique,StepNo), tail,2)
?dat1$newcoldat1<-TRUE
?dat2$newcoldat2<-TRUE
?dat3$newcoldat3<-TRUE
dat4<-merge(merge(dat1,dat2,all=TRUE),dat3,all=TRUE)
?dat5<-dat4[is.na(dat4$newcoldat2) &is.na(dat4$newcoldat3),1:4] 
?ddply(dat5,.(Unique,StepNo),numcolwise(mean)) #not required here as there were only one row for A and B.
#? Unique StepNo Data1 Data2
#1????? A????? 1???? 7???? 8
#2????? B????? 1???? 4???? 5

?
A.K.



----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Sunday, November 4, 2012 10:36 PM
Subject: Re: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function.

Hi Arun,

?  Thanks for your reply but your script is removing only one row( first row
and last row) for each Unique and StepNo combination and calculating mean
for the rest of rows.
For below data , your script removing the #'s rows perfectly.
But in reality I may need to ignore *say first 10 rows and last 20 rows for
each Unique and StepNo combination.
* for statistics calculation.
Unique StepNo Data1 Data2
1? ? ? A? ? ? 1? ?  4? ?  5? ? ? #Your script removing this row
successfully.
2? ? ? A? ? ? 1? ?  5? ?  6
3? ? ? A? ? ? 1? ?  7? ?  8? ? 
4? ? ? A? ? ? 1? ?  3? ?  4? ? ? 
5? ? ? A? ? ? 1? ?  1? ?  1? ? ? #Your script removing this row
successfully.
6? ? ? B? ? ? 1? ?  2? ?  4? ? ? #Your script removing this row
successfully.
7? ? ? B? ? ? 1? ?  3? ?  5? 
8? ? ? B? ? ? 1? ?  4? ?  5
9? ? ? B? ? ? 1? ?  5? ?  6
10? ? B? ? ? 1? ?  6? ?  7? ? ? #Your script removing this row successfully.

Can you modify your script to get my requirement like below (making it
generic, here *N=2*, removing first 2 lines and last 2 lines.. *sometimes I
may have two numbers N1 & N2 (no.of rows need to be removed from and top and
bottom respectively*)

Unique StepNo Data1 Data2
1? ? ? A? ? ? 1? ?  4? ?  5? ? ? #Ignore this
2? ? ? A? ? ? 1? ?  5? ?  6? ? ? #Ignore this
3? ? ? A? ? ? 1? ?  7? ?  8? ? 
4? ? ? A? ? ? 1? ?  3? ?  4? ? ? #Ignore this
5? ? ? A? ? ? 1? ?  1? ?  1? ? ? #Ignore this
6? ? ? B? ? ? 1? ?  2? ?  4? ? ? #Ignore this
7? ? ? B? ? ? 1? ?  3? ?  5? ? ? #Ignore this
8? ? ? B? ? ? 1? ?  4? ?  5
9? ? ? B? ? ? 1? ?  5? ?  6? ?  #Ignore this
10? ? B? ? ? 1? ?  6? ?  7? ?  #Ignore this

and then calculate the statistics using ddply.

I hope my problem statement is much clear now.







-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648447.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi,
You can also try this:

dat1<-read.table(text="
Unique StepNo Data1 Data2
1????? A????? 1??? 4??? 5??? 
2????? A????? 1??? 5??? 6??? 
3????? A????? 1??? 7??? 8? 
4????? A????? 1??? 3??? 4??? 
5????? A????? 1??? 1??? 1??? 
6????? B????? 1??? 2??? 4???? 
7????? B????? 1??? 3??? 5??? 
8????? B????? 1??? 4??? 5
9????? B????? 1??? 5??? 6?? 
10??? B????? 1??? 6??? 7? 
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-ddply(dat1,.(Unique,StepNo), head,2)
?dat3<-ddply(dat1,.(Unique,StepNo), tail,2)
??? fun1 <- function(x,y,z,...){
???? xp <- do.call("paste", x)
??? yp <- do.call("paste", y)
??? zp <-do.call("paste",z)??? 
???? x[! ((xp %in% yp) | (xp %in% zp)), ]
?}

?res<-fun1(dat1,dat2,dat3)
res
#? Unique StepNo Data1 Data2
#3????? A????? 1???? 7???? 8
#8????? B????? 1???? 4???? 5

ddply(res,.(Unique,StepNo),numcolwise(mean)) 


#or
dat2<-do.call(rbind,lapply(split(dat1,list(dat1$Unique,dat1$StepNo)),function(x) do.call(rbind,list(head(x,2),tail(x,2)))))
fun2 <- function(x,y,...){
???? xp <- do.call("paste", x)
??? yp <- do.call("paste", y)
???? x[! xp %in% yp, ]
?}

?res2<-fun2(dat1,dat2)
?res2
#? Unique StepNo Data1 Data2
#3????? A????? 1???? 7???? 8
#8????? B????? 1???? 4???? 5


ddply(res2,.(Unique,StepNo),numcolwise(mean)) 

A.K.




----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Sunday, November 4, 2012 10:36 PM
Subject: Re: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function.

Hi Arun,

?  Thanks for your reply but your script is removing only one row( first row
and last row) for each Unique and StepNo combination and calculating mean
for the rest of rows.
For below data , your script removing the #'s rows perfectly.
But in reality I may need to ignore *say first 10 rows and last 20 rows for
each Unique and StepNo combination.
* for statistics calculation.
Unique StepNo Data1 Data2
1? ? ? A? ? ? 1? ?  4? ?  5? ? ? #Your script removing this row
successfully.
2? ? ? A? ? ? 1? ?  5? ?  6
3? ? ? A? ? ? 1? ?  7? ?  8? ? 
4? ? ? A? ? ? 1? ?  3? ?  4? ? ? 
5? ? ? A? ? ? 1? ?  1? ?  1? ? ? #Your script removing this row
successfully.
6? ? ? B? ? ? 1? ?  2? ?  4? ? ? #Your script removing this row
successfully.
7? ? ? B? ? ? 1? ?  3? ?  5? 
8? ? ? B? ? ? 1? ?  4? ?  5
9? ? ? B? ? ? 1? ?  5? ?  6
10? ? B? ? ? 1? ?  6? ?  7? ? ? #Your script removing this row successfully.

Can you modify your script to get my requirement like below (making it
generic, here *N=2*, removing first 2 lines and last 2 lines.. *sometimes I
may have two numbers N1 & N2 (no.of rows need to be removed from and top and
bottom respectively*)

Unique StepNo Data1 Data2
1? ? ? A? ? ? 1? ?  4? ?  5? ? ? #Ignore this
2? ? ? A? ? ? 1? ?  5? ?  6? ? ? #Ignore this
3? ? ? A? ? ? 1? ?  7? ?  8? ? 
4? ? ? A? ? ? 1? ?  3? ?  4? ? ? #Ignore this
5? ? ? A? ? ? 1? ?  1? ?  1? ? ? #Ignore this
6? ? ? B? ? ? 1? ?  2? ?  4? ? ? #Ignore this
7? ? ? B? ? ? 1? ?  3? ?  5? ? ? #Ignore this
8? ? ? B? ? ? 1? ?  4? ?  5
9? ? ? B? ? ? 1? ?  5? ?  6? ?  #Ignore this
10? ? B? ? ? 1? ?  6? ?  7? ?  #Ignore this

and then calculate the statistics using ddply.

I hope my problem statement is much clear now.







-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648447.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
This is not a free do-my-work-for-me forum, this is a help-you-learn-R forum. If you study the solution you have been given until you understand how it works, and combine that with an understanding of indexing from the Introduction to R document that comes with R, you should be able to complete your code the way you want it. Don't abuse the generosity of your helpers.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
siddu479 <onlyfordigitalstuff at gmail.com> wrote:

            
#
Thanks Arun.. I appreciate your time.. in making the generic script.

I  believe that this is not a very trivial question..






-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648546.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi Jeff,

  First of all I am not a good programmer in R.
I have a 2 line code in shell script using awk, head, tail and for loop
combination that accomplish this task.
So I thought of using R itself to do this task but I don't have any idea how
to do it in R.
Any way I will try to learn my own where ever possible and don't keep any
free do my work commitments.





-----
Sidda
Business Analyst Lead
Applied Materials Inc.

--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648547.html
Sent from the R help mailing list archive at Nabble.com.