Message-ID: <CAGxFJbTrtkL-SQ5Gc+EiBYs+xuiyobuxDzVQzbJjr=YdmAzyTg@mail.gmail.com>
Date: 2021-09-19T23:18:40Z
From: Bert Gunter
Subject: how to remove factors from whole dataframe?
In-Reply-To: <01a801d7ad9b$373ade60$a5b09b20$@verizon.net>
You do not understand factors. There is no "base type" that can be recovered.
> f <- factor(c(5.1, 6.2), labels = c("whoa","baby"))
> f
[1] whoa baby
Levels: whoa baby
> unclass(f)
[1] 1 2
attr(,"levels")
[1] "whoa" "baby"
> typeof(f)
[1] "integer"
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sun, Sep 19, 2021 at 2:15 PM Avi Gross via R-help
<r-help at r-project.org> wrote:
>
> Glad we have solutions BUT I note that the more abstract question is how to convert any columns that are factors to their base type and that may well NOT be character. They can be integers or doubles or complex or Boolean and maybe even raw.
>
> So undoing factorization may require using something like typeof() to get the base type and then depending on what final type you have, you may have to do things like as.integer(as.character(the_factor)) to get it as an integer and for a logical, as.logical(factor(c(TRUE, TRUE, FALSE, TRUE, FALSE))) and so on.
>
> This seems like a fairly basic need so I wonder if anyone has already done it. I can see a fairly straightforward way to build a string and use eval and I suspect others might use something else like do.call() and yet others use multiple if statements or a case_when or something
>
>
>
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Luigi Marongiu
> Sent: Sunday, September 19, 2021 4:43 PM
> To: Rui Barradas <ruipbarradas at sapo.pt>
> Cc: r-help <r-help at r-project.org>
> Subject: Re: [R] how to remove factors from whole dataframe?
>
> Awesome, thanks!
>
> On Sun, Sep 19, 2021 at 4:22 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:
> >
> > Hello,
> >
> > Using Jim's lapply(., is.factor) but simplified, you could do
> >
> >
> > df1 <- df
> > i <- sapply(df1, is.factor)
> > df1[i] <- lapply(df1[i], as.character)
> >
> >
> > a one-liner modifying df, not df1 is
> >
> >
> > df[sapply(df, is.factor)] <- lapply(df[sapply(df, is.factor)],
> > as.character)
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > ?s 11:03 de 19/09/21, Luigi Marongiu escreveu:
> > > Thank you Jim, but I obtain:
> > > ```
> > >> str(df)
> > > 'data.frame': 5 obs. of 3 variables:
> > > $ region : Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> > > $ sales : num 13 16 22 27 34
> > > $ country: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
> > >> df1<-df[,!unlist(lapply(df,is.factor))]
> > >> str(df1)
> > > num [1:5] 13 16 22 27 34
> > >> df1
> > > [1] 13 16 22 27 34
> > > ```
> > > I was expecting
> > > ```
> > > str(df)
> > > 'data.frame': 5 obs. of 3 variables:
> > > $ region : char "A","B","C","D",..: 1 2 3 4 5
> > > $ sales : num 13 16 22 27 34
> > > $ country: char "a","b","c","d",..: 1 2 3 4 5 ```
> > >
> > > On Sun, Sep 19, 2021 at 11:37 AM Jim Lemon <drjimlemon at gmail.com> wrote:
> > >>
> > >> Hi Luigi,
> > >> It's easy:
> > >>
> > >> df1<-df[,!unlist(lapply(df,is.factor))]
> > >>
> > >> _except_ when there is only one column left, as in your example. In
> > >> that case, you will have to coerce the resulting vector back into a
> > >> one column data frame.
> > >>
> > >> Jim
> > >>
> > >> On Sun, Sep 19, 2021 at 6:18 PM Luigi Marongiu <marongiu.luigi at gmail.com> wrote:
> > >>>
> > >>> Hello,
> > >>> I woul dlike to remove factors from all the columns of a dataframe.
> > >>> I can do it n a column at the time with ```
> > >>>
> > >>> df <- data.frame(region=factor(c('A', 'B', 'C', 'D', 'E')),
> > >>> sales = c(13, 16, 22, 27, 34),
> > >>> country=factor(c('a', 'b', 'c', 'd', 'e')))
> > >>>
> > >>> new_df$region <- droplevels(new_df$region) ```
> > >>>
> > >>> What is the syntax to remove all factors at once (from all columns)?
> > >>> For this does not work:
> > >>> ```
> > >>>> str(df)
> > >>> 'data.frame': 5 obs. of 3 variables:
> > >>> $ region : Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> > >>> $ sales : num 13 16 22 27 34
> > >>> $ country: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
> > >>>> df = droplevels(df)
> > >>>> str(df)
> > >>> 'data.frame': 5 obs. of 3 variables:
> > >>> $ region : Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> > >>> $ sales : num 13 16 22 27 34
> > >>> $ country: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5 ```
> > >>> Thank you
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >>> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
>
>
>
> --
> Best regards,
> Luigi
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.