Hello All,
I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)
Unique,StepNo,Data1,Data2 #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5 #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*A,1,5,6 *
A,1,7,8 #Exclude this last row for this "StepNo" and "Unique"
combination.
A,2,9,10 #Exclude this row because this 1st row for this "StepNo"
and "Unique" combination.
*A,2,45,25*
A,2,10,11 #Exclude this last row for this "StepNo" and "Unique"
combination.
B,2,34,12 #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,2,5,6
B,2,7,8*
B,2,6,7 #Exclude this last row for this "StepNo" and "Unique"
combination.
B,3,1,2 #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,3,3,4*
B,3,4,5 #Exclude this last row for this "StepNo" and "Unique"
combination.
My existing code to calculate mean* for all rows* is
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))* # Calculating
mean for each Unique and StepNo combination and summarizing the results.
*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo))) #Just a skeleton
script.
Please revert to me if my question is not clear.
-----
Sidda
Business Analyst Lead
Applied Materials Inc.
--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.
Excluding fixed number of rows from calculation while summarizing using ddply() function.
9 messages · siddu479, arun, Jeff Newmiller
dat1<-read.table(text="
Unique,StepNo,Data1,Data2
A,1,4,5??
A,1,5,6
A,1,7,8??
A,2,9,10?
A,2,45,25
A,2,10,11
B,2,34,12
B,2,5,6
B,2,7,8
B,2,6,7?
B,3,1,2?
B,3,3,4
B,3,4,5?
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat2<-ddply(dat1,.(Unique,StepNo),function(x) x[c(1,nrow(x)),])
? dat1$newcoldat1<-TRUE
?dat2$newcoldat2<-TRUE
?dat3<-merge(dat1,dat2,all=TRUE)
dat4<-dat3[is.na(dat3$newcoldat2),1:4]
dat4
#?? Unique StepNo Data1 Data2
#2?????? A????? 1???? 5???? 6
#6?????? A????? 2??? 45??? 25
#7?????? B????? 2???? 5???? 6
#9?????? B????? 2???? 7???? 8
#12????? B????? 3???? 3???? 4
?ddply(dat4,.(Unique,StepNo),numcolwise(mean))
#? Unique StepNo Data1 Data2
#1????? A????? 1???? 5???? 6
#2????? A????? 2??? 45??? 25
#3????? B????? 2???? 6???? 7
#4????? B????? 3???? 3???? 4
A.K.
----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Sunday, November 4, 2012 9:40 AM
Subject: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
Hello All,
? I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)
Unique,StepNo,Data1,Data2? ? #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5? ? ? ? ? #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*A,1,5,6 *
A,1,7,8? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
A,2,9,10? ? ? ? #Exclude this row because this 1st row for this "StepNo"
and "Unique" combination.
*A,2,45,25*
A,2,10,11? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
B,2,34,12? ? ? #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,2,5,6
B,2,7,8*
B,2,6,7? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
B,3,1,2? ? ? ? ? #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,3,3,4*
B,3,4,5? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
My existing code to calculate mean* for all rows* is
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")?
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*? # Calculating
mean for each Unique and StepNo combination and summarizing the results.
*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo)))? #Just a skeleton
script.
Please revert to me if my question is not clear.
-----
Sidda
Business Analyst Lead
Applied Materials Inc.
--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi,
One more way:
dat1<-read.table(text="
Unique,StepNo,Data1,Data2
A,1,4,5??
A,1,5,6
A,1,7,8??
A,2,9,10?
A,2,45,25
A,2,10,11
B,2,34,12
B,2,5,6
B,2,7,8
B,2,6,7?
B,3,1,2?
B,3,3,4
B,3,4,5?
",sep=",",header=TRUE,stringsAsFactors=FALSE)
?dat2<-dat1[!(!duplicated(dat1[,1:2])|!duplicated(dat1[,1:2],fromLast=TRUE)),]
library(plyr)
ddply(dat2,.(Unique,StepNo),numcolwise(mean))
#? Unique StepNo Data1 Data2
#1????? A????? 1???? 5???? 6
#2????? A????? 2??? 45??? 25
#3????? B????? 2???? 6???? 7
#4????? B????? 3???? 3???? 4
A.K.
----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Sunday, November 4, 2012 9:40 AM
Subject: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
Hello All,
? I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)
Unique,StepNo,Data1,Data2? ? #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5? ? ? ? ? #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*A,1,5,6 *
A,1,7,8? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
A,2,9,10? ? ? ? #Exclude this row because this 1st row for this "StepNo"
and "Unique" combination.
*A,2,45,25*
A,2,10,11? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
B,2,34,12? ? ? #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,2,5,6
B,2,7,8*
B,2,6,7? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
B,3,1,2? ? ? ? ? #Exclude this 1st row for this "StepNo" and "Unique"
combination.
*B,3,3,4*
B,3,4,5? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique"
combination.
My existing code to calculate mean* for all rows* is
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")?
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*? # Calculating
mean for each Unique and StepNo combination and summarizing the results.
*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo)))? #Just a skeleton
script.
Please revert to me if my question is not clear.
-----
Sidda
Business Analyst Lead
Applied Materials Inc.
--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi Arun, Thanks for your reply but your script is removing only one row( first row and last row) for each Unique and StepNo combination and calculating mean for the rest of rows. For below data , your script removing the #'s rows perfectly. But in reality I may need to ignore *say first 10 rows and last 20 rows for each Unique and StepNo combination. * for statistics calculation. Unique StepNo Data1 Data2 1 A 1 4 5 #Your script removing this row successfully. 2 A 1 5 6 3 A 1 7 8 4 A 1 3 4 5 A 1 1 1 #Your script removing this row successfully. 6 B 1 2 4 #Your script removing this row successfully. 7 B 1 3 5 8 B 1 4 5 9 B 1 5 6 10 B 1 6 7 #Your script removing this row successfully. Can you modify your script to get my requirement like below (making it generic, here *N=2*, removing first 2 lines and last 2 lines.. *sometimes I may have two numbers N1 & N2 (no.of rows need to be removed from and top and bottom respectively*) Unique StepNo Data1 Data2 1 A 1 4 5 #Ignore this 2 A 1 5 6 #Ignore this 3 A 1 7 8 4 A 1 3 4 #Ignore this 5 A 1 1 1 #Ignore this 6 B 1 2 4 #Ignore this 7 B 1 3 5 #Ignore this 8 B 1 4 5 9 B 1 5 6 #Ignore this 10 B 1 6 7 #Ignore this and then calculate the statistics using ddply. I hope my problem statement is much clear now. ----- Sidda Business Analyst Lead Applied Materials Inc. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648447.html Sent from the R help mailing list archive at Nabble.com.
HI Siddu, Sorry, I looked only at your first statement: "in this example case I need to exclude the first row and last row(N=1)" You can try this: dat1<-read.table(text=" Unique StepNo Data1 Data2 1????? A????? 1??? 4??? 5???? 2????? A????? 1??? 5??? 6???? 3????? A????? 1??? 7??? 8?? 4????? A????? 1??? 3??? 4???? 5????? A????? 1??? 1??? 1???? 6????? B????? 1??? 2??? 4????? 7????? B????? 1??? 3??? 5???? 8????? B????? 1??? 4??? 5 9????? B????? 1??? 5??? 6??? 10??? B????? 1??? 6??? 7?? ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<-ddply(dat1,.(Unique,StepNo), head,2) ?dat3<-ddply(dat1,.(Unique,StepNo), tail,2) ?dat1$newcoldat1<-TRUE ?dat2$newcoldat2<-TRUE ?dat3$newcoldat3<-TRUE dat4<-merge(merge(dat1,dat2,all=TRUE),dat3,all=TRUE) ?dat5<-dat4[is.na(dat4$newcoldat2) &is.na(dat4$newcoldat3),1:4] ?ddply(dat5,.(Unique,StepNo),numcolwise(mean)) #not required here as there were only one row for A and B. #? Unique StepNo Data1 Data2 #1????? A????? 1???? 7???? 8 #2????? B????? 1???? 4???? 5 ? A.K. ----- Original Message ----- From: siddu479 <onlyfordigitalstuff at gmail.com> To: r-help at r-project.org Cc: Sent: Sunday, November 4, 2012 10:36 PM Subject: Re: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function. Hi Arun, ? Thanks for your reply but your script is removing only one row( first row and last row) for each Unique and StepNo combination and calculating mean for the rest of rows. For below data , your script removing the #'s rows perfectly. But in reality I may need to ignore *say first 10 rows and last 20 rows for each Unique and StepNo combination. * for statistics calculation. Unique StepNo Data1 Data2 1? ? ? A? ? ? 1? ? 4? ? 5? ? ? #Your script removing this row successfully. 2? ? ? A? ? ? 1? ? 5? ? 6 3? ? ? A? ? ? 1? ? 7? ? 8? ? 4? ? ? A? ? ? 1? ? 3? ? 4? ? ? 5? ? ? A? ? ? 1? ? 1? ? 1? ? ? #Your script removing this row successfully. 6? ? ? B? ? ? 1? ? 2? ? 4? ? ? #Your script removing this row successfully. 7? ? ? B? ? ? 1? ? 3? ? 5? 8? ? ? B? ? ? 1? ? 4? ? 5 9? ? ? B? ? ? 1? ? 5? ? 6 10? ? B? ? ? 1? ? 6? ? 7? ? ? #Your script removing this row successfully. Can you modify your script to get my requirement like below (making it generic, here *N=2*, removing first 2 lines and last 2 lines.. *sometimes I may have two numbers N1 & N2 (no.of rows need to be removed from and top and bottom respectively*) Unique StepNo Data1 Data2 1? ? ? A? ? ? 1? ? 4? ? 5? ? ? #Ignore this 2? ? ? A? ? ? 1? ? 5? ? 6? ? ? #Ignore this 3? ? ? A? ? ? 1? ? 7? ? 8? ? 4? ? ? A? ? ? 1? ? 3? ? 4? ? ? #Ignore this 5? ? ? A? ? ? 1? ? 1? ? 1? ? ? #Ignore this 6? ? ? B? ? ? 1? ? 2? ? 4? ? ? #Ignore this 7? ? ? B? ? ? 1? ? 3? ? 5? ? ? #Ignore this 8? ? ? B? ? ? 1? ? 4? ? 5 9? ? ? B? ? ? 1? ? 5? ? 6? ? #Ignore this 10? ? B? ? ? 1? ? 6? ? 7? ? #Ignore this and then calculate the statistics using ddply. I hope my problem statement is much clear now. ----- Sidda Business Analyst Lead Applied Materials Inc. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648447.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi,
You can also try this:
dat1<-read.table(text="
Unique StepNo Data1 Data2
1????? A????? 1??? 4??? 5???
2????? A????? 1??? 5??? 6???
3????? A????? 1??? 7??? 8?
4????? A????? 1??? 3??? 4???
5????? A????? 1??? 1??? 1???
6????? B????? 1??? 2??? 4????
7????? B????? 1??? 3??? 5???
8????? B????? 1??? 4??? 5
9????? B????? 1??? 5??? 6??
10??? B????? 1??? 6??? 7?
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-ddply(dat1,.(Unique,StepNo), head,2)
?dat3<-ddply(dat1,.(Unique,StepNo), tail,2)
??? fun1 <- function(x,y,z,...){
???? xp <- do.call("paste", x)
??? yp <- do.call("paste", y)
??? zp <-do.call("paste",z)???
???? x[! ((xp %in% yp) | (xp %in% zp)), ]
?}
?res<-fun1(dat1,dat2,dat3)
res
#? Unique StepNo Data1 Data2
#3????? A????? 1???? 7???? 8
#8????? B????? 1???? 4???? 5
ddply(res,.(Unique,StepNo),numcolwise(mean))
#or
dat2<-do.call(rbind,lapply(split(dat1,list(dat1$Unique,dat1$StepNo)),function(x) do.call(rbind,list(head(x,2),tail(x,2)))))
fun2 <- function(x,y,...){
???? xp <- do.call("paste", x)
??? yp <- do.call("paste", y)
???? x[! xp %in% yp, ]
?}
?res2<-fun2(dat1,dat2)
?res2
#? Unique StepNo Data1 Data2
#3????? A????? 1???? 7???? 8
#8????? B????? 1???? 4???? 5
ddply(res2,.(Unique,StepNo),numcolwise(mean))
A.K.
----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Sunday, November 4, 2012 10:36 PM
Subject: Re: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
Hi Arun,
? Thanks for your reply but your script is removing only one row( first row
and last row) for each Unique and StepNo combination and calculating mean
for the rest of rows.
For below data , your script removing the #'s rows perfectly.
But in reality I may need to ignore *say first 10 rows and last 20 rows for
each Unique and StepNo combination.
* for statistics calculation.
Unique StepNo Data1 Data2
1? ? ? A? ? ? 1? ? 4? ? 5? ? ? #Your script removing this row
successfully.
2? ? ? A? ? ? 1? ? 5? ? 6
3? ? ? A? ? ? 1? ? 7? ? 8? ?
4? ? ? A? ? ? 1? ? 3? ? 4? ? ?
5? ? ? A? ? ? 1? ? 1? ? 1? ? ? #Your script removing this row
successfully.
6? ? ? B? ? ? 1? ? 2? ? 4? ? ? #Your script removing this row
successfully.
7? ? ? B? ? ? 1? ? 3? ? 5?
8? ? ? B? ? ? 1? ? 4? ? 5
9? ? ? B? ? ? 1? ? 5? ? 6
10? ? B? ? ? 1? ? 6? ? 7? ? ? #Your script removing this row successfully.
Can you modify your script to get my requirement like below (making it
generic, here *N=2*, removing first 2 lines and last 2 lines.. *sometimes I
may have two numbers N1 & N2 (no.of rows need to be removed from and top and
bottom respectively*)
Unique StepNo Data1 Data2
1? ? ? A? ? ? 1? ? 4? ? 5? ? ? #Ignore this
2? ? ? A? ? ? 1? ? 5? ? 6? ? ? #Ignore this
3? ? ? A? ? ? 1? ? 7? ? 8? ?
4? ? ? A? ? ? 1? ? 3? ? 4? ? ? #Ignore this
5? ? ? A? ? ? 1? ? 1? ? 1? ? ? #Ignore this
6? ? ? B? ? ? 1? ? 2? ? 4? ? ? #Ignore this
7? ? ? B? ? ? 1? ? 3? ? 5? ? ? #Ignore this
8? ? ? B? ? ? 1? ? 4? ? 5
9? ? ? B? ? ? 1? ? 5? ? 6? ? #Ignore this
10? ? B? ? ? 1? ? 6? ? 7? ? #Ignore this
and then calculate the statistics using ddply.
I hope my problem statement is much clear now.
-----
Sidda
Business Analyst Lead
Applied Materials Inc.
--
View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648447.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This is not a free do-my-work-for-me forum, this is a help-you-learn-R forum. If you study the solution you have been given until you understand how it works, and combine that with an understanding of indexing from the Introduction to R document that comes with R, you should be able to complete your code the way you want it. Don't abuse the generosity of your helpers.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
siddu479 <onlyfordigitalstuff at gmail.com> wrote:
Hi Arun, Thanks for your reply but your script is removing only one row( first row and last row) for each Unique and StepNo combination and calculating mean for the rest of rows. For below data , your script removing the #'s rows perfectly. But in reality I may need to ignore *say first 10 rows and last 20 rows for each Unique and StepNo combination. * for statistics calculation. Unique StepNo Data1 Data2 1 A 1 4 5 #Your script removing this row successfully. 2 A 1 5 6 3 A 1 7 8 4 A 1 3 4 5 A 1 1 1 #Your script removing this row successfully. 6 B 1 2 4 #Your script removing this row successfully. 7 B 1 3 5 8 B 1 4 5 9 B 1 5 6 10 B 1 6 7 #Your script removing this row successfully. Can you modify your script to get my requirement like below (making it generic, here *N=2*, removing first 2 lines and last 2 lines.. *sometimes I may have two numbers N1 & N2 (no.of rows need to be removed from and top and bottom respectively*) Unique StepNo Data1 Data2 1 A 1 4 5 #Ignore this 2 A 1 5 6 #Ignore this 3 A 1 7 8 4 A 1 3 4 #Ignore this 5 A 1 1 1 #Ignore this 6 B 1 2 4 #Ignore this 7 B 1 3 5 #Ignore this 8 B 1 4 5 9 B 1 5 6 #Ignore this 10 B 1 6 7 #Ignore this and then calculate the statistics using ddply. I hope my problem statement is much clear now. ----- Sidda Business Analyst Lead Applied Materials Inc. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648447.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks Arun.. I appreciate your time.. in making the generic script. I believe that this is not a very trivial question.. ----- Sidda Business Analyst Lead Applied Materials Inc. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648546.html Sent from the R help mailing list archive at Nabble.com.
Hi Jeff, First of all I am not a good programmer in R. I have a 2 line code in shell script using awk, head, tail and for loop combination that accomplish this task. So I thought of using R itself to do this task but I don't have any idea how to do it in R. Any way I will try to learn my own where ever possible and don't keep any free do my work commitments. ----- Sidda Business Analyst Lead Applied Materials Inc. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406p4648547.html Sent from the R help mailing list archive at Nabble.com.