An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090126/57043b99/attachment-0001.pl>
Plotting graph for Missing values
12 messages · Shreyasee, jim holtman, Bart Joosen +1 more
What does you data look like? You could use 'split' and then examine the data in each range to count the number missing. Would have to have some actual data to suggest a solution.
On Sun, Jan 25, 2009 at 8:30 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi,
I have imported one dataset in R.
I want to calculate the percentage of missing values for each month (May
2006 to March 2007) for each variable.
Just to begin with I tried the following code :
*for(i in 1:length(dos))
for(j in 1:length(patientinformation1)
if(dos[i]=="May-06" && patientinformation1[j]=="")
a <- j+1
a*
The above code was written to calculate the number of missing values for May
2006, but I am not getting the correct results.
Can anybody help me?
Thanks,
Shreyasee
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090126/6e413edc/attachment-0001.pl>
Here is an example of how you might approach it:
dos <- seq(as.Date('2006-05-01'), as.Date('2007-03-31'), by='1 day')
pat1 <- rbinom(length(dos), 1, .5) # generate some data
# partition by month and then list out the number of zero values (missing)
tapply(pat1, format(dos, "%Y%m"), function(x) sum(x==0))
200605 200606 200607 200608 200609 200610 200611 200612 200701 200702 200703
21 22 16 18 16 15 16 17 14 16 13
On Sun, Jan 25, 2009 at 8:51 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, The dataset has 4 variables (dos, patientinformation1, patientinformation2, patientinformation3). In dos variable ther are months (May 2006 to March 2007) when the surgeries were formed. I need to calculate the percentage of missing values for each variable (patientinformation1, patientinformation2, patientinformation3) for each month. I need a common script to calculate that for each variable. Thanks, Shreyasee On Mon, Jan 26, 2009 at 9:46 AM, jim holtman <jholtman at gmail.com> wrote:
What does you data look like? You could use 'split' and then examine the data in each range to count the number missing. Would have to have some actual data to suggest a solution. On Sun, Jan 25, 2009 at 8:30 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi,
I have imported one dataset in R.
I want to calculate the percentage of missing values for each month (May
2006 to March 2007) for each variable.
Just to begin with I tried the following code :
*for(i in 1:length(dos))
for(j in 1:length(patientinformation1)
if(dos[i]=="May-06" && patientinformation1[j]=="")
a <- j+1
a*
The above code was written to calculate the number of missing values for
May
2006, but I am not getting the correct results.
Can anybody help me?
Thanks,
Shreyasee
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090126/cbaad6d4/attachment-0001.pl>
YOu can save the output of the tapply and then replicate it for each of the variables. The data can be used to plot the graphs.
On Sun, Jan 25, 2009 at 9:38 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, I need to calculate the missing values in variable "patientinformation1" for the period of May 2006 to March 2007 and then plot the graph of the percentage of the missing values over these months. This has to be done for each variable. The code which you have provided, calculates the missing values for the months variable, am I right? I need to calculate for all the variables for each month. Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:29 AM, jim holtman <jholtman at gmail.com> wrote:
Here is an example of how you might approach it:
dos <- seq(as.Date('2006-05-01'), as.Date('2007-03-31'), by='1 day')
pat1 <- rbinom(length(dos), 1, .5) # generate some data
# partition by month and then list out the number of zero values
(missing)
tapply(pat1, format(dos, "%Y%m"), function(x) sum(x==0))
200605 200606 200607 200608 200609 200610 200611 200612 200701 200702 200703 21 22 16 18 16 15 16 17 14 16 13
On Sun, Jan 25, 2009 at 8:51 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, The dataset has 4 variables (dos, patientinformation1, patientinformation2, patientinformation3). In dos variable ther are months (May 2006 to March 2007) when the surgeries were formed. I need to calculate the percentage of missing values for each variable (patientinformation1, patientinformation2, patientinformation3) for each month. I need a common script to calculate that for each variable. Thanks, Shreyasee On Mon, Jan 26, 2009 at 9:46 AM, jim holtman <jholtman at gmail.com> wrote:
What does you data look like? You could use 'split' and then examine the data in each range to count the number missing. Would have to have some actual data to suggest a solution. On Sun, Jan 25, 2009 at 8:30 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi,
I have imported one dataset in R.
I want to calculate the percentage of missing values for each month
(May
2006 to March 2007) for each variable.
Just to begin with I tried the following code :
*for(i in 1:length(dos))
for(j in 1:length(patientinformation1)
if(dos[i]=="May-06" && patientinformation1[j]=="")
a <- j+1
a*
The above code was written to calculate the number of missing values
for
May
2006, but I am not getting the correct results.
Can anybody help me?
Thanks,
Shreyasee
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090126/2f4b4630/attachment-0001.pl>
do: str(dos) str(patientinformation1) They must be the same length for the command to work: must be a one to one match of the data.
On Sun, Jan 25, 2009 at 10:23 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, I tried the code which u provided. In place of "dos" in command "pat1 <- rbinom(length(dos), 1, .5) # generate some data" I added "patientinformation1" variable and then I gave the command for "tapply" but its giving me the following error: Error in tapply(pat1, format(dos, "%Y%m"), function(x) sum(x == 0)) : arguments must have same length Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:50 AM, jim holtman <jholtman at gmail.com> wrote:
YOu can save the output of the tapply and then replicate it for each of the variables. The data can be used to plot the graphs. On Sun, Jan 25, 2009 at 9:38 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, I need to calculate the missing values in variable "patientinformation1" for the period of May 2006 to March 2007 and then plot the graph of the percentage of the missing values over these months. This has to be done for each variable. The code which you have provided, calculates the missing values for the months variable, am I right? I need to calculate for all the variables for each month. Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:29 AM, jim holtman <jholtman at gmail.com> wrote:
Here is an example of how you might approach it:
dos <- seq(as.Date('2006-05-01'), as.Date('2007-03-31'), by='1 day')
pat1 <- rbinom(length(dos), 1, .5) # generate some data
# partition by month and then list out the number of zero values
(missing)
tapply(pat1, format(dos, "%Y%m"), function(x) sum(x==0))
200605 200606 200607 200608 200609 200610 200611 200612 200701 200702 200703 21 22 16 18 16 15 16 17 14 16 13
On Sun, Jan 25, 2009 at 8:51 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, The dataset has 4 variables (dos, patientinformation1, patientinformation2, patientinformation3). In dos variable ther are months (May 2006 to March 2007) when the surgeries were formed. I need to calculate the percentage of missing values for each variable (patientinformation1, patientinformation2, patientinformation3) for each month. I need a common script to calculate that for each variable. Thanks, Shreyasee On Mon, Jan 26, 2009 at 9:46 AM, jim holtman <jholtman at gmail.com> wrote:
What does you data look like? You could use 'split' and then examine the data in each range to count the number missing. Would have to have some actual data to suggest a solution. On Sun, Jan 25, 2009 at 8:30 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi,
I have imported one dataset in R.
I want to calculate the percentage of missing values for each
month
(May
2006 to March 2007) for each variable.
Just to begin with I tried the following code :
*for(i in 1:length(dos))
for(j in 1:length(patientinformation1)
if(dos[i]=="May-06" && patientinformation1[j]=="")
a <- j+1
a*
The above code was written to calculate the number of missing
values
for
May
2006, but I am not getting the correct results.
Can anybody help me?
Thanks,
Shreyasee
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090126/e0df6628/attachment-0001.pl>
From your original posting:
I tried the code which u provided. In place of "dos" in command "pat1 <- rbinom(length(dos), 1, .5) # generate some data" I added "patientinformation1" variable and then I gave the command for "tapply" but its giving me the following error: Error in tapply(pat1, format(dos, "%Y%m"), function(x) sum(x == 0)) : arguments must have same length
I would say that "pat1" and "dos" were not of the same length. Check your code and objects to verify this; that is what the error message is saying. You said you added the "patientinformation1" variable, but it does not seem to appear in the error message.
On Sun, Jan 25, 2009 at 11:48 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, I run the following code ds <- read.csv(file="D:/Shreyasee laptop data/ASC Dataset/Subset of the ASC Dataset.csv", header=TRUE)
attach(ds) str(dos)
I am getting the following message: Factor w/ 12 levels "0000-00-00","6-Aug",..: 6 6 6 6 6 6 6 6 6 6 ... Thanks, Shreyasee On Mon, Jan 26, 2009 at 12:20 PM, jim holtman <jholtman at gmail.com> wrote:
do: str(dos) str(patientinformation1) They must be the same length for the command to work: must be a one to one match of the data. On Sun, Jan 25, 2009 at 10:23 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, I tried the code which u provided. In place of "dos" in command "pat1 <- rbinom(length(dos), 1, .5) # generate some data" I added "patientinformation1" variable and then I gave the command for "tapply" but its giving me the following error: Error in tapply(pat1, format(dos, "%Y%m"), function(x) sum(x == 0)) : arguments must have same length Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:50 AM, jim holtman <jholtman at gmail.com> wrote:
YOu can save the output of the tapply and then replicate it for each of the variables. The data can be used to plot the graphs. On Sun, Jan 25, 2009 at 9:38 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, I need to calculate the missing values in variable "patientinformation1" for the period of May 2006 to March 2007 and then plot the graph of the percentage of the missing values over these months. This has to be done for each variable. The code which you have provided, calculates the missing values for the months variable, am I right? I need to calculate for all the variables for each month. Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:29 AM, jim holtman <jholtman at gmail.com> wrote:
Here is an example of how you might approach it:
dos <- seq(as.Date('2006-05-01'), as.Date('2007-03-31'), by='1
day')
pat1 <- rbinom(length(dos), 1, .5) # generate some data
# partition by month and then list out the number of zero values
(missing)
tapply(pat1, format(dos, "%Y%m"), function(x) sum(x==0))
200605 200606 200607 200608 200609 200610 200611 200612 200701 200702 200703 21 22 16 18 16 15 16 17 14 16 13
On Sun, Jan 25, 2009 at 8:51 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, The dataset has 4 variables (dos, patientinformation1, patientinformation2, patientinformation3). In dos variable ther are months (May 2006 to March 2007) when the surgeries were formed. I need to calculate the percentage of missing values for each variable (patientinformation1, patientinformation2, patientinformation3) for each month. I need a common script to calculate that for each variable. Thanks, Shreyasee On Mon, Jan 26, 2009 at 9:46 AM, jim holtman <jholtman at gmail.com> wrote:
What does you data look like? You could use 'split' and then examine the data in each range to count the number missing. Would have to have some actual data to suggest a solution. On Sun, Jan 25, 2009 at 8:30 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi,
I have imported one dataset in R.
I want to calculate the percentage of missing values for each
month
(May
2006 to March 2007) for each variable.
Just to begin with I tried the following code :
*for(i in 1:length(dos))
for(j in 1:length(patientinformation1)
if(dos[i]=="May-06" && patientinformation1[j]=="")
a <- j+1
a*
The above code was written to calculate the number of missing
values
for
May
2006, but I am not getting the correct results.
Can anybody help me?
Thanks,
Shreyasee
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
I added "patientinformation1" variable and then I gave the command for "tapply" but its giving me the following error: Error in tapply(pat1, format(dos, "%Y%m"), function(x) sum(x == 0)) : arguments must have same length
seems like you added patientinformation1, but still use pat1 in the tapply call. Bart
View this message in context: http://www.nabble.com/Plotting-graph-for-Missing-values-tp21659322p21666790.html Sent from the R help mailing list archive at Nabble.com.
Hi Jim r-help-bounces at r-project.org napsal dne 26.01.2009 15:44:32:
From your original posting:
I tried the code which u provided. In place of "dos" in command "pat1 <- rbinom(length(dos), 1, .5) #
generate
some data" I added "patientinformation1" variable and then I gave the command for "tapply" but its giving me the following error: Error in tapply(pat1, format(dos, "%Y%m"), function(x) sum(x == 0)) : arguments must have same length
I would say that "pat1" and "dos" were not of the same length. Check your code and objects to verify this; that is what the error message is saying. You said you added the "patientinformation1" variable, but it does not seem to appear in the error message.
You are really patient. I presume Shreyasee does not know much about data structures and function use in R. It probably could help a lot if s/he looked into same basic documents like R intro. If I understand correctly what was done is pat1 <- rbinom(length(patientinformation1), 1, .5) what does not make much sense as it code an artificial data as well and most probably there is "dos" version in memory which was constructed during testing your code and which has length 335. This could result in mentioned error
Error in tapply(pat1, format(dos, "%Y%m"), function(x) sum(x == 0)) : arguments must have same length
Then note
ds <- read.csv(file="D:/Shreyasee laptop data/ASC Dataset/Subset of
the ASC
Dataset.csv", header=TRUE)
attach(ds) str(dos)
if str(ds) is issued, it could reveal what kind of data s/he has. Also format(dos, ...) would not work as dos is factor not Date
str(dos)
I am getting the following message: Factor w/ 12 levels "0000-00-00","6-Aug",..: 6 6 6 6 6 6 6 6 6 6 ...
If it was
aggregate(ds[,-1], list(format(ds$dos, "%Y%m")), function(x) sum(x==0))
Group.1 pat1 pat2 1 200605 12 16 2 200606 20 18 3 200607 12 13 4 200608 18 15 5 200609 18 11 6 200610 17 15 7 200611 19 17 8 200612 14 15 9 200701 14 18 10 200702 13 13 11 200703 16 19 could do the trick if patientinformation variables had the same structure as you anticipate which is not true
*for(i in 1:length(dos)) for(j in 1:length(patientinformation1) if(dos[i]=="May-06" && patientinformation1[j]=="") a <- j+1
Well, if Shreyasee manage to redefine dos to Date mode (which will not be straightforward if "dos" has awkward structure), then something like aggregate(ds[,-1], list(format(ds$dos, "%Y%m")), function(x) sum(x=="")) could do the trick. Regards Petr
On Sun, Jan 25, 2009 at 11:48 PM, Shreyasee
<shreyasee.pradhan at gmail.com> wrote:
Hi Jim, I run the following code ds <- read.csv(file="D:/Shreyasee laptop data/ASC Dataset/Subset of
the ASC
Dataset.csv", header=TRUE)
attach(ds) str(dos)
I am getting the following message: Factor w/ 12 levels "0000-00-00","6-Aug",..: 6 6 6 6 6 6 6 6 6 6 ... Thanks, Shreyasee On Mon, Jan 26, 2009 at 12:20 PM, jim holtman <jholtman at gmail.com>
wrote:
do: str(dos) str(patientinformation1) They must be the same length for the command to work: must be a one
to
one match of the data. On Sun, Jan 25, 2009 at 10:23 PM, Shreyasee
<shreyasee.pradhan at gmail.com>
wrote:
Hi Jim, I tried the code which u provided. In place of "dos" in command "pat1 <- rbinom(length(dos), 1, .5) # generate some data" I added "patientinformation1" variable and then I gave the command
for
"tapply" but its giving me the following error: Error in tapply(pat1, format(dos, "%Y%m"), function(x) sum(x == 0))
:
arguments must have same length Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:50 AM, jim holtman <jholtman at gmail.com> wrote:
YOu can save the output of the tapply and then replicate it for
each
of the variables. The data can be used to plot the graphs. On Sun, Jan 25, 2009 at 9:38 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, I need to calculate the missing values in variable "patientinformation1" for the period of May 2006 to March 2007 and then plot the graph of
the
percentage of the missing values over these months. This has to be done for each variable. The code which you have provided, calculates the missing values
for
the months variable, am I right? I need to calculate for all the variables for each month. Thanks, Shreyasee On Mon, Jan 26, 2009 at 10:29 AM, jim holtman
<jholtman at gmail.com>
wrote:
Here is an example of how you might approach it:
dos <- seq(as.Date('2006-05-01'), as.Date('2007-03-31'),
by='1
day') pat1 <- rbinom(length(dos), 1, .5) # generate some data # partition by month and then list out the number of zero
values
(missing) tapply(pat1, format(dos, "%Y%m"), function(x) sum(x==0))
200605 200606 200607 200608 200609 200610 200611 200612 200701 200702 200703 21 22 16 18 16 15 16 17 14
16
13
On Sun, Jan 25, 2009 at 8:51 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi Jim, The dataset has 4 variables (dos, patientinformation1, patientinformation2, patientinformation3). In dos variable ther are months (May 2006 to March 2007) when
the
surgeries were formed. I need to calculate the percentage of missing values for each variable (patientinformation1, patientinformation2,
patientinformation3)
for each month. I need a common script to calculate that for each variable. Thanks, Shreyasee On Mon, Jan 26, 2009 at 9:46 AM, jim holtman
<jholtman at gmail.com>
wrote:
What does you data look like? You could use 'split' and
then
examine the data in each range to count the number missing. Would
have
to have some actual data to suggest a solution. On Sun, Jan 25, 2009 at 8:30 PM, Shreyasee <shreyasee.pradhan at gmail.com> wrote:
Hi, I have imported one dataset in R. I want to calculate the percentage of missing values for
each
month (May 2006 to March 2007) for each variable. Just to begin with I tried the following code : *for(i in 1:length(dos)) for(j in 1:length(patientinformation1) if(dos[i]=="May-06" && patientinformation1[j]=="") a <- j+1 a* The above code was written to calculate the number of
missing
values
for
May
2006, but I am not getting the correct results.
Can anybody help me?
Thanks,
Shreyasee
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
reproducible
code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.