Hi All,
I was trying to estimate standard error (SE) for the proportion value using
some kind of randomization process (bootstrapping or jackknifing) in R, but
I could not figure it out.
Is there any way to generate SE for the proportion?
The example of the data and the code I am using is attached for your
reference. I would like to generate the value of proportion with a SE using
a 1000 times randomization.
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
mutate(Time1.but.not.in.Time2 = case_when(
Time1 %in% "1" & Time2 %in% "0" ~ "1"),
Time2.but.not.in.Time1 = case_when(
Time1 %in% "0" & Time2 %in% "1" ~ "1"),
BothTimes = case_when(
Time1 %in% "1" & Time2 %in% "1" ~ "1")))
daT
summary(daT)
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
ProportionValue
standard error??
How to generate SE for the proportion value using a randomization process in R?
9 messages · Rui Barradas, Marna Wagley
Hello,
Something like this, using base package boot?
library(boot)
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
Hope this helps,
Rui Barradas
?s 21:57 de 22/01/21, Marna Wagley escreveu:
Hi All,
I was trying to estimate standard error (SE) for the proportion value using
some kind of randomization process (bootstrapping or jackknifing) in R, but
I could not figure it out.
Is there any way to generate SE for the proportion?
The example of the data and the code I am using is attached for your
reference. I would like to generate the value of proportion with a SE using
a 1000 times randomization.
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
mutate(Time1.but.not.in.Time2 = case_when(
Time1 %in% "1" & Time2 %in% "0" ~ "1"),
Time2.but.not.in.Time1 = case_when(
Time1 %in% "0" & Time2 %in% "1" ~ "1"),
BothTimes = case_when(
Time1 %in% "1" & Time2 %in% "1" ~ "1")))
daT
summary(daT)
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
ProportionValue
standard error??
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Rui,
I was wondering whether we have to square root of SD to find SE, right?
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
sd(b$t)/sqrt(1000)
pandit*(1-pandit)
hist(b$t, freq = FALSE)
On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello,
Something like this, using base package boot?
library(boot)
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
Hope this helps,
Rui Barradas
?s 21:57 de 22/01/21, Marna Wagley escreveu:
Hi All, I was trying to estimate standard error (SE) for the proportion value
using
some kind of randomization process (bootstrapping or jackknifing) in R,
but
I could not figure it out. Is there any way to generate SE for the proportion? The example of the data and the code I am using is attached for your reference. I would like to generate the value of proportion with a SE
using
a 1000 times randomization. dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
mutate(Time1.but.not.in.Time2 = case_when(
Time1 %in% "1" & Time2 %in% "0" ~ "1"),
Time2.but.not.in.Time1 = case_when(
Time1 %in% "0" & Time2 %in% "1" ~ "1"),
BothTimes = case_when(
Time1 %in% "1" & Time2 %in% "1" ~ "1")))
daT
summary(daT)
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
ProportionValue
standard error??
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, Inline. ?s 07:47 de 23/01/21, Marna Wagley escreveu:
Dear Rui, I was wondering whether we have to square root of SD to find SE, right?
No, we don't. var already divides by n, don't divide again. This is the code, that can be seen by running the function name at a command line. sd #function (x, na.rm = FALSE) #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), # na.rm = na.rm)) #<bytecode: 0x55f3ce900848> #<environment: namespace:stats>
bootprop <- function(data, index){
? ?d <- data[index, ]
? ?sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0? ? ?# original
sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
sd(b$t)/sqrt(1000)
pandit*(1-pandit)
hist(b$t, freq = FALSE)
Try plotting the normal densities for both cases, the red line is
clearly wrong.
f <- function(x, xbar, s){
dnorm(x, mean = xbar, sd = s)
}
hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
"red", add = TRUE)
Hope this helps,
Rui Barradas
On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>> wrote:
Hello,
Something like this, using base package boot?
library(boot)
bootprop <- function(data, index){
? ?d <- data[index, ]
? ?sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0? ? ?# original
sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
Hope this helps,
Rui Barradas
?s 21:57 de 22/01/21, Marna Wagley escreveu:
> Hi All,
> I was trying to estimate standard error (SE) for the proportion
value using
> some kind of randomization process (bootstrapping or jackknifing)
in R, but
> I could not figure it out.
>
> Is there any way to generate SE for the proportion?
>
> The example of the data and the code I am using is attached for your
> reference. I would like to generate the value of proportion with
a SE using
> a 1000 times randomization.
>
> dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
= c("id1",
> "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
"data.frame",
> row.names = c(NA,
> -19L))
> daT<-data.frame(dat %>%
>? ? mutate(Time1.but.not.in.Time2 = case_when(
>? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "0"? ~ "1"),
> Time2.but.not.in.Time1 = case_when(
>? ? ? ? ? ? ? Time1 %in% "0" & Time2 %in% "1"? ~ "1"),
>? ?BothTimes = case_when(
>? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "1"? ~ "1")))
>? ?daT
>? ?summary(daT)
>
> cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> "BothTimes")
> daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> summary(daT)
> ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1, na.rm=T)
> ProportionValue
> standard error??
>
>? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Yes Rui, I can see we don't need to divide by square root of sample size. The example is great to understand it. Thank you. Marna
On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello, Inline. ?s 07:47 de 23/01/21, Marna Wagley escreveu:
Dear Rui, I was wondering whether we have to square root of SD to find SE, right?
No, we don't. var already divides by n, don't divide again. This is the code, that can be seen by running the function name at a command line. sd #function (x, na.rm = FALSE) #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), # na.rm = na.rm)) #<bytecode: 0x55f3ce900848> #<environment: namespace:stats>
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
sd(b$t)/sqrt(1000)
pandit*(1-pandit)
hist(b$t, freq = FALSE)
Try plotting the normal densities for both cases, the red line is
clearly wrong.
f <- function(x, xbar, s){
dnorm(x, mean = xbar, sd = s)
}
hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
"red", add = TRUE)
Hope this helps,
Rui Barradas
On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>> wrote:
Hello,
Something like this, using base package boot?
library(boot)
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
Hope this helps,
Rui Barradas
?s 21:57 de 22/01/21, Marna Wagley escreveu:
> Hi All,
> I was trying to estimate standard error (SE) for the proportion
value using
> some kind of randomization process (bootstrapping or jackknifing)
in R, but
> I could not figure it out.
>
> Is there any way to generate SE for the proportion?
>
> The example of the data and the code I am using is attached for
your
> reference. I would like to generate the value of proportion with
a SE using
> a 1000 times randomization.
>
> dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L,
16L,
> 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
= c("id1",
> "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
"data.frame",
> row.names = c(NA,
> -19L))
> daT<-data.frame(dat %>%
> mutate(Time1.but.not.in.Time2 = case_when(
> Time1 %in% "1" & Time2 %in% "0" ~ "1"),
> Time2.but.not.in.Time1 = case_when(
> Time1 %in% "0" & Time2 %in% "1" ~ "1"),
> BothTimes = case_when(
> Time1 %in% "1" & Time2 %in% "1" ~ "1")))
> daT
> summary(daT)
>
> cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> "BothTimes")
> daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> summary(daT)
> ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1,
na.rm=T)
> ProportionValue
> standard error??
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
5 days later
Hi Rui,
I am sorry for asking you several questions.
In the given example, randomizations (reshuffle) were done 1000 times, and
its 1000 proportion values (results) are stored and it can be seen using
b$t; but I was wondering how the table was randomized (which rows have been
missed/or repeated in each randomizing procedure?).
Is there any way we can see the randomized table and its associated
results? Here in this example, I randomized (or bootstrapped) the table
into three times (R=3) so I would like to store these three tables and look
at them later to know which rows were repeated/missed. Is there any
possibility?
The example data and the code is given below.
Thank you for your help.
####
library(boot)
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
mutate(Time1.but.not.in.Time2 = case_when(
Time1 %in% "1" & Time2 %in% "0" ~ "1"),
Time2.but.not.in.Time1 = case_when(
Time1 %in% "0" & Time2 %in% "1" ~ "1"),
BothTimes = case_when(
Time1 %in% "1" & Time2 %in% "1" ~ "1")))
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
b$t
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
str(b)
b$data
b$seed
b$sim
b$strata
################
On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wagley at gmail.com>
wrote:
Yes Rui, I can see we don't need to divide by square root of sample size. The example is great to understand it. Thank you. Marna On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello, Inline. ?s 07:47 de 23/01/21, Marna Wagley escreveu:
Dear Rui, I was wondering whether we have to square root of SD to find SE, right?
No, we don't. var already divides by n, don't divide again. This is the code, that can be seen by running the function name at a command line. sd #function (x, na.rm = FALSE) #sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), # na.rm = na.rm)) #<bytecode: 0x55f3ce900848> #<environment: namespace:stats>
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
sd(b$t)/sqrt(1000)
pandit*(1-pandit)
hist(b$t, freq = FALSE)
Try plotting the normal densities for both cases, the red line is
clearly wrong.
f <- function(x, xbar, s){
dnorm(x, mean = xbar, sd = s)
}
hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col = "blue",
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1, col =
"red", add = TRUE)
Hope this helps,
Rui Barradas
On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas <ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>> wrote:
Hello,
Something like this, using base package boot?
library(boot)
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
TRUE)
}
R <- 1e3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
Hope this helps,
Rui Barradas
?s 21:57 de 22/01/21, Marna Wagley escreveu:
> Hi All,
> I was trying to estimate standard error (SE) for the proportion
value using
> some kind of randomization process (bootstrapping or jackknifing)
in R, but
> I could not figure it out.
>
> Is there any way to generate SE for the proportion?
>
> The example of the data and the code I am using is attached for
your
> reference. I would like to generate the value of proportion with
a SE using
> a 1000 times randomization.
>
> dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L,
16L,
> 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
= c("id1",
> "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 =
c(1L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
"data.frame",
> row.names = c(NA,
> -19L))
> daT<-data.frame(dat %>%
> mutate(Time1.but.not.in.Time2 = case_when(
> Time1 %in% "1" & Time2 %in% "0" ~ "1"),
> Time2.but.not.in.Time1 = case_when(
> Time1 %in% "0" & Time2 %in% "1" ~ "1"),
> BothTimes = case_when(
> Time1 %in% "1" & Time2 %in% "1" ~ "1")))
> daT
> summary(daT)
>
> cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> "BothTimes")
> daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> summary(daT)
> ProportionValue<-sum(daT$BothTimes, na.rm=T)/sum(daT$Time1,
na.rm=T)
> ProportionValue
> standard error??
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible
code.
>
Hello,
I don't know why you would need to see the indices but rewrite the
function bootprop as
bootprop_ind <- function(data, index){
d <- data[index, ]
#sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
index
}
and call in the same way. It will now return a matrix of indices with R
= 1000 rows and 19 columns.
Hope this helps,
Rui Barradas
?s 19:29 de 28/01/21, Marna Wagley escreveu:
Hi Rui,
I am sorry for asking you several questions.
In the given example, randomizations (reshuffle) were done 1000 times,
and its 1000 proportion values (results) are stored and it can be seen
using b$t; but I was wondering how the table was randomized (which rows
have been missed/or repeated in each randomizing procedure?).
Is there any way we can see the randomized table and its associated
results? Here in this example, I randomized (or bootstrapped) the table
into three times (R=3) so I would like to store these three tables and
look at them later to know which rows were repeated/missed. Is there any
possibility?
The example data and the code is given below.
Thank you for your help.
####
library(boot)
dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
? mutate(Time1.but.not.in.Time2 = case_when(
? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "0" ?~ "1"),
Time2.but.not.in.Time1 = case_when(
? ? ? ? ? ? Time1 %in% "0" & Time2 %in% "1" ?~ "1"),
?BothTimes = case_when(
? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "1" ?~ "1")))
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
bootprop <- function(data, index){
? ?d <- data[index, ]
? ?sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0? ? ?# original
b$t
sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
str(b)
b$data
b$seed
b$sim
b$strata
################
On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wagley at gmail.com
<mailto:marna.wagley at gmail.com>> wrote:
Yes Rui, I can see we don't need to divide by square root of sample
size. The example is great to understand it.
Thank you.
Marna
On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>> wrote:
Hello,
Inline.
?s 07:47 de 23/01/21, Marna Wagley escreveu:
> Dear Rui,
> I was wondering whether we have to square root of SD to find
SE, right?
No, we don't. var already divides by n, don't divide again.
This is the code, that can be seen by running the function name
at a
command line.
sd
#function (x, na.rm = FALSE)
#sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
#? ? na.rm = na.rm))
#<bytecode: 0x55f3ce900848>
#<environment: namespace:stats>
>
> bootprop <- function(data, index){
>? ? ?d <- data[index, ]
>? ? ?sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
> }
>
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0? ? ?# original
> sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
> sd(b$t)/sqrt(1000)
> pandit*(1-pandit)
>
> hist(b$t, freq = FALSE)
Try plotting the normal densities for both cases, the red line is
clearly wrong.
f <- function(x, xbar, s){
? ?dnorm(x, mean = xbar, sd = s)
}
hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
"blue",
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
col =
"red", add = TRUE)
Hope this helps,
Rui Barradas
>
>
>
>
> On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
<ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>
> <mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>>>
wrote:
>
>? ? ?Hello,
>
>? ? ?Something like this, using base package boot?
>
>
>? ? ?library(boot)
>
>? ? ?bootprop <- function(data, index){
>? ? ? ? ?d <- data[index, ]
>? ? ? ? ?sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
>? ? ?}
>
>? ? ?R <- 1e3
>? ? ?set.seed(2020)
>? ? ?b <- boot(daT, bootprop, R)
>? ? ?b
>? ? ?b$t0? ? ?# original
>? ? ?sd(b$t)? # bootstrapped estimate of the SE of the sample
prop.
>? ? ?hist(b$t, freq = FALSE)
>
>
>? ? ?Hope this helps,
>
>? ? ?Rui Barradas
>
>? ? ??s 21:57 de 22/01/21, Marna Wagley escreveu:
>? ? ? > Hi All,
>? ? ? > I was trying to estimate standard error (SE) for the
proportion
>? ? ?value using
>? ? ? > some kind of randomization process (bootstrapping or
jackknifing)
>? ? ?in R, but
>? ? ? > I could not figure it out.
>? ? ? >
>? ? ? > Is there any way to generate SE for the proportion?
>? ? ? >
>? ? ? > The example of the data and the code I am using is
attached for your
>? ? ? > reference. I would like to generate the value of
proportion with
>? ? ?a SE using
>? ? ? > a 1000 times randomization.
>? ? ? >
>? ? ? > dat<-structure(list(Sample = structure(c(1L, 12L, 13L,
14L, 15L, 16L,
>? ? ? > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L), .Label
>? ? ?= c("id1",
>? ? ? > "id10", "id11", "id12", "id13", "id14", "id15",
"id16", "id17",
>? ? ? > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
"id7", "id8",
>? ? ? > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L,
0L, 0L,
>? ? ? > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L),
Time2 = c(1L,
>? ? ? > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 1L, 0L,
>? ? ? > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
>? ? ?"data.frame",
>? ? ? > row.names = c(NA,
>? ? ? > -19L))
>? ? ? > daT<-data.frame(dat %>%
>? ? ? >? ? mutate(Time1.but.not.in.Time2 = case_when(
>? ? ? >? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "0"? ~ "1"),
>? ? ? > Time2.but.not.in.Time1 = case_when(
>? ? ? >? ? ? ? ? ? ? Time1 %in% "0" & Time2 %in% "1"? ~ "1"),
>? ? ? >? ?BothTimes = case_when(
>? ? ? >? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "1"? ~ "1")))
>? ? ? >? ?daT
>? ? ? >? ?summary(daT)
>? ? ? >
>? ? ? > cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>? ? ? > "BothTimes")
>? ? ? > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>? ? ? > summary(daT)
>? ? ? > ProportionValue<-sum(daT$BothTimes,
na.rm=T)/sum(daT$Time1, na.rm=T)
>? ? ? > ProportionValue
>? ? ? > standard error??
>? ? ? >
>? ? ? >? ? ? ?[[alternative HTML version deleted]]
>? ? ? >
>? ? ? > ______________________________________________
>? ? ? > R-help at r-project.org <mailto:R-help at r-project.org>
<mailto:R-help at r-project.org <mailto:R-help at r-project.org>>
mailing list
>? ? ?-- To UNSUBSCRIBE and more, see
>? ? ? > https://stat.ethz.ch/mailman/listinfo/r-help
>? ? ? > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>? ? ? > and provide commented, minimal, self-contained,
reproducible code.
>? ? ? >
>
Thank you Rui, This is great. How about the following? SimilatedData<-boot.array(b, indices=T) seems it is giving the rows ID which are used in the calculation, isn't it?
On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello,
I don't know why you would need to see the indices but rewrite the
function bootprop as
bootprop_ind <- function(data, index){
d <- data[index, ]
#sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
index
}
and call in the same way. It will now return a matrix of indices with R
= 1000 rows and 19 columns.
Hope this helps,
Rui Barradas
?s 19:29 de 28/01/21, Marna Wagley escreveu:
Hi Rui, I am sorry for asking you several questions. In the given example, randomizations (reshuffle) were done 1000 times, and its 1000 proportion values (results) are stored and it can be seen using b$t; but I was wondering how the table was randomized (which rows have been missed/or repeated in each randomizing procedure?). Is there any way we can see the randomized table and its associated results? Here in this example, I randomized (or bootstrapped) the table into three times (R=3) so I would like to store these three tables and look at them later to know which rows were repeated/missed. Is there any possibility? The example data and the code is given below. Thank you for your help. #### library(boot) dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label =
c("id1",
"id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
"id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
"id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class = "data.frame",
row.names = c(NA,
-19L))
daT<-data.frame(dat %>%
mutate(Time1.but.not.in.Time2 = case_when(
Time1 %in% "1" & Time2 %in% "0" ~ "1"),
Time2.but.not.in.Time1 = case_when(
Time1 %in% "0" & Time2 %in% "1" ~ "1"),
BothTimes = case_when(
Time1 %in% "1" & Time2 %in% "1" ~ "1")))
cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
"BothTimes")
daT[cols.num] <- sapply(daT[cols.num],as.numeric)
summary(daT)
bootprop <- function(data, index){
d <- data[index, ]
sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
}
R <- 3
set.seed(2020)
b <- boot(daT, bootprop, R)
b
b$t0 # original
b$t
sd(b$t) # bootstrapped estimate of the SE of the sample prop.
hist(b$t, freq = FALSE)
str(b)
b$data
b$seed
b$sim
b$strata
################
On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley <marna.wagley at gmail.com
<mailto:marna.wagley at gmail.com>> wrote:
Yes Rui, I can see we don't need to divide by square root of sample
size. The example is great to understand it.
Thank you.
Marna
On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas <ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>> wrote:
Hello,
Inline.
?s 07:47 de 23/01/21, Marna Wagley escreveu:
> Dear Rui,
> I was wondering whether we have to square root of SD to find
SE, right?
No, we don't. var already divides by n, don't divide again.
This is the code, that can be seen by running the function name
at a
command line.
sd
#function (x, na.rm = FALSE)
#sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
# na.rm = na.rm))
#<bytecode: 0x55f3ce900848>
#<environment: namespace:stats>
>
> bootprop <- function(data, index){
> d <- data[index, ]
> sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
> }
>
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0 # original
> sd(b$t) # bootstrapped estimate of the SE of the sample prop.
> sd(b$t)/sqrt(1000)
> pandit*(1-pandit)
>
> hist(b$t, freq = FALSE)
Try plotting the normal densities for both cases, the red line is
clearly wrong.
f <- function(x, xbar, s){
dnorm(x, mean = xbar, sd = s)
}
hist(b$t, freq = FALSE)
curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
"blue",
add = TRUE)
curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0, to = 1,
col =
"red", add = TRUE)
Hope this helps,
Rui Barradas
>
>
>
>
> On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
<ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>
> <mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>>>
wrote:
>
> Hello,
>
> Something like this, using base package boot?
>
>
> library(boot)
>
> bootprop <- function(data, index){
> d <- data[index, ]
> sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
na.rm = TRUE)
> }
>
> R <- 1e3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0 # original
> sd(b$t) # bootstrapped estimate of the SE of the sample
prop.
> hist(b$t, freq = FALSE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> ?s 21:57 de 22/01/21, Marna Wagley escreveu:
> > Hi All,
> > I was trying to estimate standard error (SE) for the
proportion
> value using
> > some kind of randomization process (bootstrapping or
jackknifing)
> in R, but
> > I could not figure it out.
> >
> > Is there any way to generate SE for the proportion?
> >
> > The example of the data and the code I am using is
attached for your
> > reference. I would like to generate the value of
proportion with
> a SE using
> > a 1000 times randomization.
> >
> > dat<-structure(list(Sample = structure(c(1L, 12L, 13L,
14L, 15L, 16L,
> > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L), .Label
> = c("id1",
> > "id10", "id11", "id12", "id13", "id14", "id15",
"id16", "id17",
> > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
"id7", "id8",
> > "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L,
0L, 0L,
> > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L),
Time2 = c(1L,
> > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 1L, 0L,
> > 1L, 1L)), .Names = c("Sample", "Time1", "Time2"),
class =
> "data.frame",
> > row.names = c(NA,
> > -19L))
> > daT<-data.frame(dat %>%
> > mutate(Time1.but.not.in.Time2 = case_when(
> > Time1 %in% "1" & Time2 %in% "0" ~ "1"),
> > Time2.but.not.in.Time1 = case_when(
> > Time1 %in% "0" & Time2 %in% "1" ~ "1"),
> > BothTimes = case_when(
> > Time1 %in% "1" & Time2 %in% "1" ~ "1")))
> > daT
> > summary(daT)
> >
> > cols.num <-
c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> > "BothTimes")
> > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> > summary(daT)
> > ProportionValue<-sum(daT$BothTimes,
na.rm=T)/sum(daT$Time1, na.rm=T)
> > ProportionValue
> > standard error??
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org <mailto:R-help at r-project.org>
<mailto:R-help at r-project.org <mailto:R-help at r-project.org>>
mailing list
> -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide
> > and provide commented, minimal, self-contained,
reproducible code.
> >
>
Hello, Yes, sorry for my previous post, I had forgotten about boot.array. That's a much better solution for your problem. Rui Barradas ?s 20:29 de 28/01/21, Marna Wagley escreveu:
Thank you Rui,
This is great. How about the following?
SimilatedData<-boot.array(b, indices=T)
seems it is giving the rows ID which are used in the calculation, isn't it?
On Thu, Jan 28, 2021 at 12:21 PM Rui Barradas <ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>> wrote:
Hello,
I don't know why you would need to see the indices but rewrite the
function bootprop as
bootprop_ind <- function(data, index){
? ?d <- data[index, ]
? ?#sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm = TRUE)
? ?index
}
and call in the same way. It will now return a matrix of indices with R
= 1000 rows and 19 columns.
Hope this helps,
Rui Barradas
?s 19:29 de 28/01/21, Marna Wagley escreveu:
> Hi Rui,
> I am sorry for asking you several questions.
>
> In the given example, randomizations (reshuffle) were done 1000
times,
> and its 1000 proportion values (results) are stored and it can be
seen
> using b$t; but I was wondering how the table was randomized
(which rows
> have been missed/or repeated in each randomizing procedure?).
>
> Is there any way we can see the randomized table and its associated
> results? Here in this example, I randomized (or bootstrapped) the
table
> into three times (R=3) so I would like to store these three
tables and
> look at them later to know which rows were repeated/missed. Is
there any
> possibility?
> The example data and the code is given below.
>
> Thank you for your help.
>
> ####
> library(boot)
> dat<-structure(list(Sample = structure(c(1L, 12L, 13L, 14L, 15L, 16L,
> 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label
= c("id1",
> "id10", "id11", "id12", "id13", "id14", "id15", "id16", "id17",
> "id18", "id19", "Id2", "id3", "id4", "id5", "id6", "id7", "id8",
> "id9"), class = "factor"), Time1 = c(0L, 1L, 1L, 1L, 0L, 0L,
> 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), Time2 = c(1L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
> 1L, 1L)), .Names = c("Sample", "Time1", "Time2"), class =
"data.frame",
> row.names = c(NA,
> -19L))
> daT<-data.frame(dat %>%
>? ? mutate(Time1.but.not.in.Time2 = case_when(
>? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "0" ?~ "1"),
> Time2.but.not.in.Time1 = case_when(
>? ? ? ? ? ? ? Time1 %in% "0" & Time2 %in% "1" ?~ "1"),
>? ?BothTimes = case_when(
>? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "1" ?~ "1")))
> cols.num <- c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
> "BothTimes")
> daT[cols.num] <- sapply(daT[cols.num],as.numeric)
> summary(daT)
>
> bootprop <- function(data, index){
>? ? ?d <- data[index, ]
>? ? ?sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]], na.rm =
TRUE)
> }
>
> R <- 3
> set.seed(2020)
> b <- boot(daT, bootprop, R)
> b
> b$t0? ? ?# original
> b$t
> sd(b$t)? # bootstrapped estimate of the SE of the sample prop.
> hist(b$t, freq = FALSE)
>
> str(b)
> b$data
> b$seed
> b$sim
> b$strata
> ################
>
>
> On Sat, Jan 23, 2021 at 12:36 AM Marna Wagley
<marna.wagley at gmail.com <mailto:marna.wagley at gmail.com>
> <mailto:marna.wagley at gmail.com <mailto:marna.wagley at gmail.com>>>
wrote:
>
>? ? ?Yes Rui, I can see we don't need to divide by square root of
sample
>? ? ?size. The example is great to understand it.
>? ? ?Thank you.
>? ? ?Marna
>
>
>? ? ?On Sat, Jan 23, 2021 at 12:28 AM Rui Barradas
<ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>
>? ? ?<mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>>>
wrote:
>
>? ? ? ? ?Hello,
>
>? ? ? ? ?Inline.
>
>? ? ? ? ??s 07:47 de 23/01/21, Marna Wagley escreveu:
>? ? ? ? ? > Dear Rui,
>? ? ? ? ? > I was wondering whether we have to square root of SD
to find
>? ? ? ? ?SE, right?
>
>? ? ? ? ?No, we don't. var already divides by n, don't divide again.
>? ? ? ? ?This is the code, that can be seen by running the
function name
>? ? ? ? ?at a
>? ? ? ? ?command line.
>
>
>? ? ? ? ?sd
>? ? ? ? ?#function (x, na.rm = FALSE)
>? ? ? ? ?#sqrt(var(if (is.vector(x) || is.factor(x)) x else
as.double(x),
>? ? ? ? ?#? ? na.rm = na.rm))
>? ? ? ? ?#<bytecode: 0x55f3ce900848>
>? ? ? ? ?#<environment: namespace:stats>
>
>
>
>? ? ? ? ? >
>? ? ? ? ? > bootprop <- function(data, index){
>? ? ? ? ? >? ? ?d <- data[index, ]
>? ? ? ? ? >? ? ?sum(d[["BothTimes"]], na.rm = TRUE)/sum(d[["Time1"]],
>? ? ? ? ?na.rm = TRUE)
>? ? ? ? ? > }
>? ? ? ? ? >
>? ? ? ? ? > R <- 1e3
>? ? ? ? ? > set.seed(2020)
>? ? ? ? ? > b <- boot(daT, bootprop, R)
>? ? ? ? ? > b
>? ? ? ? ? > b$t0? ? ?# original
>? ? ? ? ? > sd(b$t)? # bootstrapped estimate of the SE of the
sample prop.
>? ? ? ? ? > sd(b$t)/sqrt(1000)
>? ? ? ? ? > pandit*(1-pandit)
>? ? ? ? ? >
>? ? ? ? ? > hist(b$t, freq = FALSE)
>
>
>? ? ? ? ?Try plotting the normal densities for both cases, the red
line is
>? ? ? ? ?clearly wrong.
>
>
>? ? ? ? ?f <- function(x, xbar, s){
>? ? ? ? ? ? ?dnorm(x, mean = xbar, sd = s)
>? ? ? ? ?}
>
>? ? ? ? ?hist(b$t, freq = FALSE)
>? ? ? ? ?curve(f(x, xbar = b$t0, s = sd(b$t)), from = 0, to = 1, col =
>? ? ? ? ?"blue",
>? ? ? ? ?add = TRUE)
>? ? ? ? ?curve(f(x, xbar = b$t0, s = sd(b$t)/sqrt(R)), from = 0,
to = 1,
>? ? ? ? ?col =
>? ? ? ? ?"red", add = TRUE)
>
>
>? ? ? ? ?Hope this helps,
>
>? ? ? ? ?Rui Barradas
>
>? ? ? ? ? >
>? ? ? ? ? >
>? ? ? ? ? >
>? ? ? ? ? >
>? ? ? ? ? > On Fri, Jan 22, 2021 at 3:07 PM Rui Barradas
>? ? ? ? ?<ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>
<mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>>
>? ? ? ? ? > <mailto:ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt> <mailto:ruipbarradas at sapo.pt
<mailto:ruipbarradas at sapo.pt>>>>
>? ? ? ? ?wrote:
>? ? ? ? ? >
>? ? ? ? ? >? ? ?Hello,
>? ? ? ? ? >
>? ? ? ? ? >? ? ?Something like this, using base package boot?
>? ? ? ? ? >
>? ? ? ? ? >
>? ? ? ? ? >? ? ?library(boot)
>? ? ? ? ? >
>? ? ? ? ? >? ? ?bootprop <- function(data, index){
>? ? ? ? ? >? ? ? ? ?d <- data[index, ]
>? ? ? ? ? >? ? ? ? ?sum(d[["BothTimes"]], na.rm =
TRUE)/sum(d[["Time1"]],
>? ? ? ? ?na.rm = TRUE)
>? ? ? ? ? >? ? ?}
>? ? ? ? ? >
>? ? ? ? ? >? ? ?R <- 1e3
>? ? ? ? ? >? ? ?set.seed(2020)
>? ? ? ? ? >? ? ?b <- boot(daT, bootprop, R)
>? ? ? ? ? >? ? ?b
>? ? ? ? ? >? ? ?b$t0? ? ?# original
>? ? ? ? ? >? ? ?sd(b$t)? # bootstrapped estimate of the SE of the
sample
>? ? ? ? ?prop.
>? ? ? ? ? >? ? ?hist(b$t, freq = FALSE)
>? ? ? ? ? >
>? ? ? ? ? >
>? ? ? ? ? >? ? ?Hope this helps,
>? ? ? ? ? >
>? ? ? ? ? >? ? ?Rui Barradas
>? ? ? ? ? >
>? ? ? ? ? >? ? ??s 21:57 de 22/01/21, Marna Wagley escreveu:
>? ? ? ? ? >? ? ? > Hi All,
>? ? ? ? ? >? ? ? > I was trying to estimate standard error (SE)
for the
>? ? ? ? ?proportion
>? ? ? ? ? >? ? ?value using
>? ? ? ? ? >? ? ? > some kind of randomization process
(bootstrapping or
>? ? ? ? ?jackknifing)
>? ? ? ? ? >? ? ?in R, but
>? ? ? ? ? >? ? ? > I could not figure it out.
>? ? ? ? ? >? ? ? >
>? ? ? ? ? >? ? ? > Is there any way to generate SE for the proportion?
>? ? ? ? ? >? ? ? >
>? ? ? ? ? >? ? ? > The example of the data and the code I am using is
>? ? ? ? ?attached for your
>? ? ? ? ? >? ? ? > reference. I would like to generate the value of
>? ? ? ? ?proportion with
>? ? ? ? ? >? ? ?a SE using
>? ? ? ? ? >? ? ? > a 1000 times randomization.
>? ? ? ? ? >? ? ? >
>? ? ? ? ? >? ? ? > dat<-structure(list(Sample = structure(c(1L,
12L, 13L,
>? ? ? ? ?14L, 15L, 16L,
>? ? ? ? ? >? ? ? > 17L, 18L, 19L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
>? ? ? ? ?11L), .Label
>? ? ? ? ? >? ? ?= c("id1",
>? ? ? ? ? >? ? ? > "id10", "id11", "id12", "id13", "id14", "id15",
>? ? ? ? ?"id16", "id17",
>? ? ? ? ? >? ? ? > "id18", "id19", "Id2", "id3", "id4", "id5", "id6",
>? ? ? ? ?"id7", "id8",
>? ? ? ? ? >? ? ? > "id9"), class = "factor"), Time1 = c(0L, 1L,
1L, 1L,
>? ? ? ? ?0L, 0L,
>? ? ? ? ? >? ? ? > 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
0L),
>? ? ? ? ?Time2 = c(1L,
>? ? ? ? ? >? ? ? > 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L,
>? ? ? ? ?0L, 1L, 0L,
>? ? ? ? ? >? ? ? > 1L, 1L)), .Names = c("Sample", "Time1",
"Time2"), class =
>? ? ? ? ? >? ? ?"data.frame",
>? ? ? ? ? >? ? ? > row.names = c(NA,
>? ? ? ? ? >? ? ? > -19L))
>? ? ? ? ? >? ? ? > daT<-data.frame(dat %>%
>? ? ? ? ? >? ? ? >? ? mutate(Time1.but.not.in.Time2 = case_when(
>? ? ? ? ? >? ? ? >? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "0"? ~
"1"),
>? ? ? ? ? >? ? ? > Time2.but.not.in.Time1 = case_when(
>? ? ? ? ? >? ? ? >? ? ? ? ? ? ? Time1 %in% "0" & Time2 %in% "1"? ~
"1"),
>? ? ? ? ? >? ? ? >? ?BothTimes = case_when(
>? ? ? ? ? >? ? ? >? ? ? ? ? ? ? Time1 %in% "1" & Time2 %in% "1"? ~
"1")))
>? ? ? ? ? >? ? ? >? ?daT
>? ? ? ? ? >? ? ? >? ?summary(daT)
>? ? ? ? ? >? ? ? >
>? ? ? ? ? >? ? ? > cols.num <-
>? ? ? ? ?c("Time1.but.not.in.Time2","Time2.but.not.in.Time1",
>? ? ? ? ? >? ? ? > "BothTimes")
>? ? ? ? ? >? ? ? > daT[cols.num] <- sapply(daT[cols.num],as.numeric)
>? ? ? ? ? >? ? ? > summary(daT)
>? ? ? ? ? >? ? ? > ProportionValue<-sum(daT$BothTimes,
>? ? ? ? ?na.rm=T)/sum(daT$Time1, na.rm=T)
>? ? ? ? ? >? ? ? > ProportionValue
>? ? ? ? ? >? ? ? > standard error??
>? ? ? ? ? >? ? ? >
>? ? ? ? ? >? ? ? >? ? ? ?[[alternative HTML version deleted]]
>? ? ? ? ? >? ? ? >
>? ? ? ? ? >? ? ? > ______________________________________________
>? ? ? ? ? >? ? ? > R-help at r-project.org
<mailto:R-help at r-project.org> <mailto:R-help at r-project.org
<mailto:R-help at r-project.org>>
>? ? ? ? ?<mailto:R-help at r-project.org
<mailto:R-help at r-project.org> <mailto:R-help at r-project.org
<mailto:R-help at r-project.org>>>
>? ? ? ? ?mailing list
>? ? ? ? ? >? ? ?-- To UNSUBSCRIBE and more, see
>? ? ? ? ? >? ? ? > https://stat.ethz.ch/mailman/listinfo/r-help
>? ? ? ? ? >? ? ? > PLEASE do read the posting guide
>? ? ? ? ? > http://www.R-project.org/posting-guide.html
>? ? ? ? ? >? ? ? > and provide commented, minimal, self-contained,
>? ? ? ? ?reproducible code.
>? ? ? ? ? >? ? ? >
>? ? ? ? ? >
>