Skip to content

How to globally convert NaN to NA in dataframe?

11 messages · Luigi Marongiu, PIKAL Petr, Andrew Simmons +1 more

#
Hello,
I have some NaN values in some elements of a dataframe that I would
like to convert to NA.
The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
Is there an alternative for the global modification at once of all
instances?
I have seen from
https://stackoverflow.com/questions/18142117/how-to-replace-nan-value-with-zero-in-a-huge-data-frame/18143097#18143097
that once could use:
```

is.nan.data.frame <- function(x)
do.call(cbind, lapply(x, is.nan))

data123[is.nan(data123)] <- 0
```
replacing o with NA, but I got
```
str(df)
```
when modifying my dataframe df.
What would be the correct syntax?
Thank you
#
Hi

what about

data[sapply(data, is.nan)] <- NA

Cheers
Petr
to
instances?
#
Hello,


I would use something like:


x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
as.data.frame()
x[] <- lapply(x, function(xx) {
    xx[is.nan(xx)] <- NA_real_
    xx
})


This prevents attributes from being changed in 'x', but accomplishes the
same thing as you have above, I hope this helps!

On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu <marongiu.luigi at gmail.com>
wrote:

  
  
#
`data[sapply(data, is.nan)] <- NA` is a nice compact command, but I
still get NaN when using the summary function, for instance one of the
columns give:
```
Min.   : NA
1st Qu.: NA
Median : NA
Mean   :NaN
3rd Qu.: NA
Max.   : NA
NA's   :110
```
I tried to implement the second solution but:
```
df <- lapply(x, function(xx) {
  xx[is.nan(xx)] <- NA
})
List of 1
 $ sd_ef_rash_loc___palm: logi NA
```
What am I getting wrong?
Thanks
On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons <akwsimmo at gmail.com> wrote:

  
    
#
You removed the second line 'xx' from the function, put it back and it
should work
On Thu, Sep 2, 2021, 09:45 Luigi Marongiu <marongiu.luigi at gmail.com> wrote:

            

  
  
#
Sorry,
still I don't get it:
```
[1] 302 626
+   xx[is.nan(xx)] <- NA
+   xx
+ })
NULL
```
On Thu, Sep 2, 2021 at 3:47 PM Andrew Simmons <akwsimmo at gmail.com> wrote:

  
    
#
It seems like you might've missed one more thing, you need the brackets
next to 'x' to get it to work.


x[] <- lapply(x, function(xx) {
    xx[is.nan(xx)] <- NA_real_
    xx
})

is different from

x <- lapply(x, function(xx) {
    xx[is.nan(xx)] <- NA_real_
    xx
})

Also, if all of your data is numeric, it might be better to convert to a
matrix before doing your calculations. For example:

x <- as.matrix(x)
x[is.nan(x)] <- NA_real_

I'd also suggest this same solution for the other question you posted,

x[x == 0] <- NA

On Thu, Sep 2, 2021 at 10:01 AM Luigi Marongiu <marongiu.luigi at gmail.com>
wrote:

  
  
#
Thank you!
On Thu, Sep 2, 2021 at 4:17 PM Andrew Simmons <akwsimmo at gmail.com> wrote:

  
    
#
Hi Luigi.

Weird. But maybe it is the desired behaviour of summary when calculating
mean of numeric column full of NAs.

See example

dat <- data.frame(x=rep(NA, 110), y=rep(1, 110), z= rnorm(110))

# change all values in second column to NA
dat[,2] <- NA
# change some of them to NAN
dat[5:6, 2:3] <- 0/0

# see summary
summary(dat)
    x                 y             z          
 Mode:logical   Min.   : NA   Min.   :-1.9798  
 NA's:110       1st Qu.: NA   1st Qu.:-0.4729  
                Median : NA   Median : 0.1745  
                Mean   :NaN   Mean   : 0.1856  
                3rd Qu.: NA   3rd Qu.: 0.8017  
                Max.   : NA   Max.   : 2.5075  
                NA's   :110   NA's   :2        

# change NAN values to NA
dat[sapply(dat, is.nan)] <- NA
*************************

#summary is same
summary(dat)
    x                 y             z          
 Mode:logical   Min.   : NA   Min.   :-1.9798  
 NA's:110       1st Qu.: NA   1st Qu.:-0.4729  
                Median : NA   Median : 0.1745  
                Mean   :NaN   Mean   : 0.1856  
                3rd Qu.: NA   3rd Qu.: 0.8017  
                Max.   : NA   Max.   : 2.5075  
                NA's   :110   NA's   :2        

# but no NAN value in data
dat[1:10,]
    x  y          z
1  NA NA -0.9148696
2  NA NA  0.7110570
3  NA NA -0.1901676
4  NA NA  0.5900650
5  NA NA         NA
6  NA NA         NA
7  NA NA  0.7987658
8  NA NA -0.5225229
9  NA NA  0.7673103
10 NA NA -0.5263897

So my "nice compact command"
dat[sapply(dat, is.nan)] <- NA

works as expected, but summary gives as mean NAN.

Cheers
Petr
get
#
Fair enough, I'll check the actual data to see if there are indeed any
NaN (which should not, since the data are categories, not generated by
math).
Thanks!
On Fri, Sep 3, 2021 at 8:26 AM PIKAL Petr <petr.pikal at precheza.cz> wrote:

  
    
#
Yes, even
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
     NA      NA      NA     NaN      NA      NA       1 

which is presumably because the mean is an empty sum (= 0) divided by a zero count, and 0/0 = NaN.

Notice also the differenc between
[1] NA
[1] NaN