NA rows appeared in data.frame
Hi You put NA to some variable in 150 rows. So you do not have "mysterious" NA rows in your file. If you want to select anything based on column with NA values you have to perform your selection using which (as Rui suggested). It is documented in help page, although it is probably rather less comprehensible (maybe some example added to help page could be useful). ----- NAs in indexing When extracting, a numerical, logical or character NA index picks an unknown element and so returns NA in the corresponding element of a logical, integer, numeric, complex or character result, and NULL for a list. (It returns 00 for a raw result.) ----- I believe that this behaviour has some reason, because you compare 2 to NA and NA is basically "I do not know". So it could be 2 and therefore also rows with NA are returned. If I am wrong, I hope R gurus will correct me. You said you want to remove rows with NA values, therefore I suggested complete.cases function. After this you end with object stripped from rows with NA values so with less rows. I would be rather cautious with word "errorneous". I remember old days when Excel considered empty cells as zeros and gave "errorneous" calculations but I believe that it was pretty sensible from accountant point of view as empty cell means 0. In almost all cases, analysis in R give you correct results, you just need to tell R how to apply function to object with NA values.
mean(t1$Petal.Width)
[1] NA
mean(t1$Petal.Width, na.rm=T)
[1] 1.147101
Cheers Petr
-----Original Message----- From: Ernest Han <ernest.hec at gmail.com> Sent: Wednesday, January 16, 2019 3:27 AM To: PIKAL Petr <petr.pikal at precheza.cz> Cc: r-help at r-project.org Subject: Re: [R] NA rows appeared in data.frame Dear Rui and Petr, Thank you for taking time and effort to help. Rui's solution is an effective workaround so that I can continue to work with the data. However, the appearance of these NA rows (with NA rownames) is clearly errorneous (possibly a bug behaviour due to R base code). What I am interested is a solution that removes these NA rows. The reasons is because (1) prior to the NA assignment, one does not need to test for NA value. (2) Besides, sometimes these NA values are needed as part of the data to indicate that the missing data.
t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA
Petr's solution is also not apt in my case, because it removes 12 rows that have NA values in "Petal.Width". I would like a solution that keeps the 150 rows, but not the mysterious 12 rows with all NA values in all columns.
Now I am puzzled what do you really want?
with your example and my suggestion you get
t1 <- iris
t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA
t2 <- t1[!is.na(t1$Petal.Width),]
t2[t2$Petal.Width == 2.0, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
111 6.5 3.2 5.1 2 virginica
114 5.7 2.5 5.0 2 virginica
122 5.6 2.8 4.9 2 virginica
123 7.7 2.8 6.7 2 virginica
132 7.9 3.8 6.4 2 virginica
148 6.5 3.0 5.2 2 virginica
dim(t2)
[1] 138 5
dim(t1)
[1] 150 5
Once again, I appreciate your suggestions and I am hoping that this 'errorneous' behaviour has a fix. Cheers, Ernest On Mon, Jan 14, 2019 at 4:25 PM PIKAL Petr <petr.pikal at precheza.cz> wrote:
Hi If you want to remove rows with NA values from your data you could use ?complete.cases or t2 <- t1[!is.na(t1$Petal.Width),] Cheers Petr
-----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Rui Barradas Sent: Saturday, January 12, 2019 12:55 PM To: Ernest Han <ernest.hec at gmail.com>; r-help at r-project.org Subject: Re: [R] NA rows appeared in data.frame Hello, You have to test for NA. Some (12) of the values of t1$Petal.Width are NA therefore t1$Petal.Width == 2.0 alone returns 12 NA values. t1[t1$Petal.Width == 2.0 & !is.na(t1$Petal.Width == 2.0), ] Or use which(t1$Petal.Width == 2.0). t1[which(t1$Petal.Width == 2.0), ] Hope this helps, Rui Barradas ?s 08:23 de 12/01/2019, Ernest Han escreveu:
Dear All, After replacing some values in a data.frame, NAs rows have appeared and cannot be removed. I have googled these issues and found that several people have encountered it. Solutions in stackoverflow seem to provide work-arounds but does not remove it from
the data.frame.
Therefore, I am turning to experts in this community for help. The code is as follows,
t1 <- iris t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA t1[t1$Petal.Width == 2.0, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species NA NA NA NA NA <NA> NA.1 NA NA NA NA <NA> NA.2 NA NA NA NA <NA> NA.3 NA NA NA NA <NA> 111 6.5 3.2 5.1 2 virginica 114 5.7 2.5 5.0 2 virginica NA.4 NA NA NA NA <NA> 122 5.6 2.8 4.9 2 virginica 123 7.7 2.8 6.7 2 virginica NA.5 NA NA NA NA <NA> NA.6 NA NA NA NA <NA> NA.7 NA NA NA NA <NA> NA.8 NA NA NA NA <NA> 132 7.9 3.8 6.4 2 virginica NA.9 NA NA NA NA <NA> NA.10 NA NA NA NA <NA> 148 6.5 3.0 5.2 2 virginica NA.11 NA NA NA NA <NA> ## Twelve values were replaced, twelve NA rows appeared. ### MISC INFO ###
sessionInfo()
R version 3.4.0 (2017-04-21) Platform: x86_64-apple-darwin16.5.0 (64-bit) Running under: macOS 10.14.2 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Framewo rks/ vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Framewo rks/ vecLib.framework/Versions/A/libLAPACK.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.0 tools_3.4.0
Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-
8"
Thank you, Ernest
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/