Skip to content

Formatting data for bootstrapping for confidence intervals

4 messages · Paul Wennekes, Rui Barradas, arun

#
Hi all,

New to R, so this may be obvious to some.
I've been trying to figure this out for a while, I have a dataset "events"
that looks something like this: 

Area	NAME	DATE	X	Xn	Y
1	        X	1/10/10	        1	1	0
1	        Y	1/11/10	        0	0	1
1	        X	1/12/10	        1	0	0
1        	X	1/12/10	        1	0	0
1	        X	1/12/10	        1	0	0
2	        X	2/12/10	        1	1	0
2	        X	2/12/10	        1	0	0
2	        Y	2/12/10	        0	0	1
2	        X	2/13/10	        1	0	0
2	        X	2/13/10	        1	0	0
2	        X	2/13/10	        1	0	0
2	        X	2/14/10	        1	0	0
2	        X	2/14/10	        1	0	0
2	        X	2/14/10	        1	1	0
2	        X	2/14/10	        1	0	0
3        	X	7/27/11	        1	0	0
3	        X	7/27/11	        1	1	0
3	        X	7/27/11	        1	0	0
3	        X	7/28/11	        1	0	0
3	        X	7/28/11	        1	1	0
3	        X	7/28/11	        1	0	0
3	        X	7/28/11	        1	0	0
3	        Y	7/28/11	        0	0	1
3	        X	7/28/11	        1	0	0
3	        X	7/28/11	        1	1	0
3	        Y	7/28/11	        0	0	1
3	        X	7/28/11	        1	0	0
3	        X	7/29/11	        1	0	0
3	        X	7/29/11	        1	0	0
3	        X	7/29/11	        1	1	0

X and Y are events. Every row represents a single event happening, with a 1
indicating which one happens at that time. Xn indicates X happening at
night. I want to bootstrap these events over days but I think I need to
summarize them first, ie. get something that looks like this: 

Area		DATE	        X	Xn	Y
1	        	1/10/10	        1	1	0
1	        	1/11/10	        0	0	1
1	        	1/12/10	        3	0	0
2	        	2/12/10	        2	1	1
etc.

and then for each Area, bootstrap the data over the days. Any ideas? I've
tried using the 'reshape' package but I don't know how to sum over parts of
the columns as defined by the DATE values...

Many thanks ahead!



--
View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

To aggregate the data use, yes, it's exists, function aggregate.

with(dat, aggregate(cbind(X, Xn, Y), list(Area, DATE), FUN = sum))
# output
   Group.1 Group.2 X Xn Y
1       1 1/10/10 1  1 0
2       1 1/11/10 0  0 1
3       1 1/12/10 3  0 0
4       2 2/12/10 2  1 1
5       2 2/13/10 3  0 0
6       2 2/14/10 4  1 0
7       3 7/27/11 3  1 0
8       3 7/28/11 7  2 2
9       3 7/29/11 3  1 0

And take a look at package boot. Maybe you'll find something there.

Hope this helps,

Rui Barradas


Em 11-10-2012 16:55, Paul Wennekes escreveu:
#
Hi,
Try this:

dat1<-read.table(text="
Area??? NAME??? DATE??? X??? Xn??? Y
1??????????? X??? 1/10/10??????????? 1??? 1??? 0
1??????????? Y??? 1/11/10??????????? 0??? 0??? 1
1??????????? X??? 1/12/10??????????? 1??? 0??? 0
1??????????? X??? 1/12/10??????????? 1??? 0??? 0
1??????????? X??? 1/12/10??????????? 1??? 0??? 0
2??????????? X??? 2/12/10??????????? 1??? 1??? 0
2??????????? X??? 2/12/10??????????? 1??? 0??? 0
2??????????? Y??? 2/12/10??????????? 0??? 0??? 1
2??????????? X??? 2/13/10??????????? 1??? 0??? 0
2??????????? X??? 2/13/10??????????? 1??? 0??? 0
2??????????? X??? 2/13/10??????????? 1??? 0??? 0
2??????????? X??? 2/14/10??????????? 1??? 0??? 0
2??????????? X??? 2/14/10??????????? 1??? 0??? 0
2??????????? X??? 2/14/10??????????? 1??? 1??? 0
2??????????? X??? 2/14/10??????????? 1??? 0??? 0
3??????????? X??? 7/27/11??????????? 1??? 0??? 0
3??????????? X??? 7/27/11??????????? 1??? 1??? 0
3??????????? X??? 7/27/11??????????? 1??? 0??? 0
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? X??? 7/28/11??????????? 1??? 1??? 0
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? Y??? 7/28/11??????????? 0??? 0??? 1
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? X??? 7/28/11??????????? 1??? 1??? 0
3??????????? Y??? 7/28/11??????????? 0??? 0??? 1
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? X??? 7/29/11??????????? 1??? 0??? 0
3??????????? X??? 7/29/11??????????? 1??? 0??? 0
3??????????? X??? 7/29/11??????????? 1??? 1??? 0
",sep="",header=TRUE,stringsAsFactors=FALSE)

#You can either use aggregate(), ddply() from library(plyr) or using library(data.table)
library(data.table)
dat2<-data.table(dat1)
dat2[,list(X=sum(X),Xn=sum(Xn),Y=sum(Y)),list(Area,DATE)]
#?? Area??? DATE X Xn Y
#1:??? 1 1/10/10 1? 1 0
#2:??? 1 1/11/10 0? 0 1
#3:??? 1 1/12/10 3? 0 0
#4:??? 2 2/12/10 2? 1 1
#5:??? 2 2/13/10 3? 0 0
#6:??? 2 2/14/10 4? 1 0
#7:??? 3 7/27/11 3? 1 0
#8:??? 3 7/28/11 7? 2 2
#9:??? 3 7/29/11 3? 1 0
library(plyr)
ddply(dat1,.(Area,DATE),colwise(sum,c("X","Xn","Y")))
# Area??? DATE X Xn Y
#1??? 1 1/10/10 1? 1 0
#2??? 1 1/11/10 0? 0 1
#3??? 1 1/12/10 3? 0 0
#4??? 2 2/12/10 2? 1 1
#5??? 2 2/13/10 3? 0 0
#6??? 2 2/14/10 4? 1 0
#7??? 3 7/27/11 3? 1 0
#8??? 3 7/28/11 7? 2 2
#9??? 3 7/29/11 3? 1 0

A.K.


----- Original Message -----
From: Paul Wennekes <paul.wennekes at evobio.eu>
To: r-help at r-project.org
Cc: 
Sent: Thursday, October 11, 2012 11:55 AM
Subject: [R] Formatting data for bootstrapping  for confidence intervals

Hi all,

New to R, so this may be obvious to some.
I've been trying to figure this out for a while, I have a dataset "events"
that looks something like this: 

Area??? NAME??? DATE??? X??? Xn??? Y
1??? ? ? ? ? X??? 1/10/10??? ? ? ? ? 1??? 1??? 0
1??? ? ? ? ? Y??? 1/11/10??? ? ? ? ? 0??? 0??? 1
1??? ? ? ? ? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0
1? ? ? ? ??? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0
1??? ? ? ? ? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/12/10??? ? ? ? ? 1??? 1??? 0
2??? ? ? ? ? X??? 2/12/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? Y??? 2/12/10??? ? ? ? ? 0??? 0??? 1
2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 1??? 0
2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0
3? ? ? ? ??? X??? 7/27/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/27/11??? ? ? ? ? 1??? 1??? 0
3??? ? ? ? ? X??? 7/27/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 1??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? Y??? 7/28/11??? ? ? ? ? 0??? 0??? 1
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 1??? 0
3??? ? ? ? ? Y??? 7/28/11??? ? ? ? ? 0??? 0??? 1
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 1??? 0

X and Y are events. Every row represents a single event happening, with a 1
indicating which one happens at that time. Xn indicates X happening at
night. I want to bootstrap these events over days but I think I need to
summarize them first, ie. get something that looks like this: 

Area??? ??? DATE??? ? ? ? ? X??? Xn??? Y
1??? ? ? ? ? ??? 1/10/10??? ? ? ? ? 1??? 1??? 0
1??? ? ? ? ? ??? 1/11/10??? ? ? ? ? 0??? 0??? 1
1??? ? ? ? ? ??? 1/12/10??? ? ? ? ? 3??? 0??? 0
2??? ? ? ? ? ??? 2/12/10??? ? ? ? ? 2??? 1??? 1
etc.

and then for each Area, bootstrap the data over the days. Any ideas? I've
tried using the 'reshape' package but I don't know how to sum over parts of
the columns as defined by the DATE values...

Many thanks ahead!



--
View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.