Skip to content
Back to formatted view

Raw Message

Message-ID: <CAGxFJbQFB66wp5T3SaoqvnL6T7uHcEjAyJA7yupESfc9rtuD9g@mail.gmail.com>
Date: 2018-12-09T21:47:00Z
From: Bert Gunter
Subject: Spark DataFrame: replace NULL cell by NA
In-Reply-To: <CALJKBv_DrtA_SsY3HhvFWhj4GChWk9DfzbDYu3pTq1ootRYw2Q@mail.gmail.com>

"...   if("factor" %in% class(x)) x <- as.character(x) ## since ifelse wont
work with factors  "
Nonsense!

> x <- factor(c("a","", "b"))
> x
[1] a   b
Levels:  a b

> levels(x)
[1] ""  "a" "b"

> x <- factor(ifelse(x==(""),NA,x))
> x
[1] 2    <NA> 3
Levels: 2 3


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Dec 9, 2018 at 1:07 PM Karim Mezhoud <kmezhoud at gmail.com> wrote:

> Dear All,
> ## function to relpace empty cell by NA
> empty_as_na <- function(x){
>   if("factor" %in% class(x)) x <- as.character(x) ## since ifelse wont work
> with factors
>   ifelse(as.character(x)!="", x, NA)
> }
>
> ## connect to spark local
> sc <- spark_connect(master = "local")
> # load an example of dataframe taht has empty cells (needs cgdsr package)
> clinicalData <- cgdsr::getClinicalData(cgds, "gbm_tcga_pub_all")
> ## copy to spark
> clinicalData_tbl <- dplyr::copy_to(sc, clinicalData, overwrite = TRUE)
>
>  # This works
> clinicalData %>% mutate_all(funs(empty_as_na))
> # This Does not works
> clinicalData_tbl %>% mutate_all(funs(empty_as_na))
> Thanks,
> Karim
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]