An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121109/6d268899/attachment.pl>
Remove missings (quick question)
4 messages · Eiko Fried, Bert Gunter, Marc Schwartz
On Nov 9, 2012, at 10:50 AM, Eiko Fried <torvon at gmail.com> wrote:
A colleague wrote the following syntax for me:
D = read.csv("x.csv")
## Convert -999 to NA
for (k in 1:dim(D)[2]) {
I = which(D[,k]==-999)
if (length(I) > 0) {
D[I,k] = NA
}
}
The dataset has many missing values. I am running several regressions on
this dataset, and want to ensure every regression has the same subjects.
Thus I want to drop subjects listwise for dependent variables y1-y9 and
covariates x1-x5 (if data is missing on ANY of these variables, drop
subject).
How would I do this after running the syntax above?
Thank you
Modify the initial read.csv() call to:
D <- read.csv("x.csv", na.strings = "-999")
That will convert all -999 values to NA's upon import so that you don't have to post-process it.
See ?read.csv for more info.
Once that is done, R's default behavior is to remove observations with any missing data (eg. NA values) when using modeling functions. Or you can pre-process using:
D.New <- na.omit(D)
and then use D.New for all of your subsequent analyses. See ?na.omit.
Regards,
Marc Schwartz
Marc et. al:
On Fri, Nov 9, 2012 at 9:05 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
On Nov 9, 2012, at 10:50 AM, Eiko Fried <torvon at gmail.com> wrote:
A colleague wrote the following syntax for me:
D = read.csv("x.csv")
## Convert -999 to NA
for (k in 1:dim(D)[2]) {
I = which(D[,k]==-999)
if (length(I) > 0) {
D[I,k] = NA
}
}
The dataset has many missing values. I am running several regressions on
this dataset, and want to ensure every regression has the same subjects.
Thus I want to drop subjects listwise for dependent variables y1-y9 and
covariates x1-x5 (if data is missing on ANY of these variables, drop
subject).
How would I do this after running the syntax above?
Thank you
Modify the initial read.csv() call to:
D <- read.csv("x.csv", na.strings = "-999")
That will convert all -999 values to NA's upon import so that you don't have to post-process it.
See ?read.csv for more info.
Once that is done, R's default behavior is to remove observations with any missing data (eg. NA values)
when using modeling functions. This appears to be false. From ?lme (nlme package, nlme_3.1-105, R 2.15.2): "na.action a function that indicates what should happen when the data contain NAs. The default action (na.fail) causes lme to print an error message and terminate if there are any incomplete observations." Frankly, I doubt that there is any uniformity for practically any modeling options across the vast array of "modeling functions" in R and (even recommended?) packages. Cheers, Bert Or you can pre-process using:
D.New <- na.omit(D) and then use D.New for all of your subsequent analyses. See ?na.omit. Regards, Marc Schwartz
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
On Nov 9, 2012, at 11:23 AM, Bert Gunter <gunter.berton at gene.com> wrote:
Marc et. al: On Fri, Nov 9, 2012 at 9:05 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
On Nov 9, 2012, at 10:50 AM, Eiko Fried <torvon at gmail.com> wrote:
A colleague wrote the following syntax for me:
D = read.csv("x.csv")
## Convert -999 to NA
for (k in 1:dim(D)[2]) {
I = which(D[,k]==-999)
if (length(I) > 0) {
D[I,k] = NA
}
}
The dataset has many missing values. I am running several regressions on
this dataset, and want to ensure every regression has the same subjects.
Thus I want to drop subjects listwise for dependent variables y1-y9 and
covariates x1-x5 (if data is missing on ANY of these variables, drop
subject).
How would I do this after running the syntax above?
Thank you
Modify the initial read.csv() call to:
D <- read.csv("x.csv", na.strings = "-999")
That will convert all -999 values to NA's upon import so that you don't have to post-process it.
See ?read.csv for more info.
Once that is done, R's default behavior is to remove observations with any missing data (eg. NA values)
when using modeling functions. This appears to be false. From ?lme (nlme package, nlme_3.1-105, R 2.15.2): "na.action a function that indicates what should happen when the data contain NAs. The default action (na.fail) causes lme to print an error message and terminate if there are any incomplete observations." Frankly, I doubt that there is any uniformity for practically any modeling options across the vast array of "modeling functions" in R and (even recommended?) packages. Cheers, Bert
Good point Bert. That's what I get for over-generalizing... :-) Thanks, Marc
Or you can pre-process using:
D.New <- na.omit(D) and then use D.New for all of your subsequent analyses. See ?na.omit. Regards, Marc Schwartz