Skip to content

replace "" to NA.

4 messages · vikram ranga, Marc Schwartz, jim holtman +1 more

#
Dear All,

I am bit stuck to a problem of replacing "" to NA.
I have big data set but here is the toy example:-

test<-data.frame(
test1=c("","Hi","Hello"),
test2=c("Hi","","Bye"),
test3=c("Hello","",""))

If the data as in above, I could change all "" to NA by this code:-

for(i in 1:3){
for(j in 1:3){
if(test[j,i]==""){
test[j,i]=NA
}
}
}

but the problem arises if data frame has NA at some places

test<-data.frame(
test1=c("","Hi","Hello"),
test2=c("Hi",NA,"Bye"),
test3=c("Hello","",""))

the above loop script does not work on this data frame as NA is has
logical class and does not return TRUE/FALSE.

Can anyone provide some help?

My sessionInfo is:
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] RColorBrewer_1.0-5 plotrix_3.5-2      foreign_0.8-57
splancs_2.01-34    spatstat_1.34-0
 [6] polyclip_1.1-0     tensor_1.5         abind_1.4-0
deldir_0.1-1       mgcv_1.7-26
[11] nlme_3.1-111       xlsx_0.5.1         xlsxjars_0.5.0
rJava_0.9-4        ggplot2_0.9.3.1
[16] rgdal_0.8-11       rgeos_0.3-2        maptools_0.8-27    sp_1.0-14

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 dichromat_2.0-0  digest_0.6.3     grid_3.0.2
  gtable_0.1.2
 [6] labeling_0.2     lattice_0.20-23  MASS_7.3-29      Matrix_1.0-14
  munsell_0.4.2
[11] plyr_1.8         proto_0.3-10     reshape2_1.2.2   scales_0.2.3
  stringr_0.6.2
[16] tcltk_3.0.2      tools_3.0.2
#
On Jan 6, 2014, at 5:57 AM, vikram ranga <babuawara at gmail.com> wrote:

            
<snip>


See ?is.na, which is used to test for NA values and is the canonical way to replace values with NA:
test1 test2 test3
1          Hi Hello
2    Hi            
3 Hello   Bye 


# Where test == "", replace with NA
is.na(test) <- test == ""
test1 test2 test3
1  <NA>    Hi Hello
2    Hi  <NA>  <NA>
3 Hello   Bye  <NA>


Regards,

Marc Schwartz
#
try this:
+ test1=c("","Hi","Hello"),
+ test2=c("Hi",NA,"Bye"),
+ test3=c("Hello","",""))
test1 test2 test3
1          Hi Hello
2    Hi  <NA>
3 Hello   Bye
+     x[!is.na(x) & x == ''] <- NA
+     x
+ })
test1 test2 test3
1  <NA>    Hi Hello
2    Hi  <NA>  <NA>
3 Hello   Bye  <NA>
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Mon, Jan 6, 2014 at 6:57 AM, vikram ranga <babuawara at gmail.com> wrote:
#
Hi,
Try:


test[test=="" & !is.na(test)] <- NA
A.K.
On Monday, January 6, 2014 7:51 AM, vikram ranga <babuawara at gmail.com> wrote:
Dear All,

I am bit stuck to a problem of replacing "" to NA.
I have big data set but here is the toy example:-

test<-data.frame(
test1=c("","Hi","Hello"),
test2=c("Hi","","Bye"),
test3=c("Hello","",""))

If the data as in above, I could change all "" to NA by this code:-

for(i in 1:3){
for(j in 1:3){
if(test[j,i]==""){
test[j,i]=NA
}
}
}

but the problem arises if data frame has NA at some places

test<-data.frame(
test1=c("","Hi","Hello"),
test2=c("Hi",NA,"Bye"),
test3=c("Hello","",""))

the above loop script does not work on this data frame as NA is has
logical class and does not return TRUE/FALSE.

Can anyone provide some help?

My sessionInfo is:
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_India.1252? LC_CTYPE=English_India.1252
LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C? ? ? ? ? ? ? ? ?  LC_TIME=English_India.1252

attached base packages:
[1] stats? ?  graphics? grDevices utils? ?  datasets? methods?  base

other attached packages:
[1] RColorBrewer_1.0-5 plotrix_3.5-2? ? ? foreign_0.8-57
splancs_2.01-34? ? spatstat_1.34-0
[6] polyclip_1.1-0? ?  tensor_1.5? ? ? ?  abind_1.4-0
deldir_0.1-1? ? ?  mgcv_1.7-26
[11] nlme_3.1-111? ? ?  xlsx_0.5.1? ? ? ?  xlsxjars_0.5.0
rJava_0.9-4? ? ? ? ggplot2_0.9.3.1
[16] rgdal_0.8-11? ? ?  rgeos_0.3-2? ? ? ? maptools_0.8-27? ? sp_1.0-14

loaded via a namespace (and not attached):
[1] colorspace_1.2-4 dichromat_2.0-0? digest_0.6.3? ?  grid_3.0.2
? gtable_0.1.2
[6] labeling_0.2? ?  lattice_0.20-23? MASS_7.3-29? ? ? Matrix_1.0-14
? munsell_0.4.2
[11] plyr_1.8? ? ? ?  proto_0.3-10? ?  reshape2_1.2.2?  scales_0.2.3
? stringr_0.6.2
[16] tcltk_3.0.2? ? ? tools_3.0.2

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.