Hi all, i have a dataframe with 34 columns and 1534 observations. In some columns I have strings with spaces, i want remove the space. Is there a function that removes whitespace from the entire dataframe? I use gsub but I would need some function to automate this. Thank you very much in advance,
function for remove white space
6 messages · José Luis Cañadas, Rolf Turner, William Michels +1 more
On 22/02/17 12:51, Jos? Luis Aguilar wrote:
Hi all, i have a dataframe with 34 columns and 1534 observations. In some columns I have strings with spaces, i want remove the space. Is there a function that removes whitespace from the entire dataframe? I use gsub but I would need some function to automate this.
Something like
X <- as.data.frame(lapply(X,function(x){gsub(" ","",x)}))
Untested, since you provide no reproducible example (despite being told
by the posting guide to do so).
I do not know what my idea will do to numeric columns or to factors.
However it should give you at least a start.
cheers,
Rolf Turner
Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Hi Jos? (and Rolf), It's not entirely clear what type of 'whitespace' you're referring to, but if you're using read.table() or read.csv() to create your dataframe in the first place, setting 'strip.white = TRUE' will remove leading and trailing whitespace 'from unquoted character fields (numeric fields are always stripped).'
?read.table ?read.csv
Cheers, Bill
On 2/21/17, Rolf Turner <r.turner at auckland.ac.nz> wrote:
On 22/02/17 12:51, Jos? Luis Aguilar wrote:
Hi all, i have a dataframe with 34 columns and 1534 observations. In some columns I have strings with spaces, i want remove the space. Is there a function that removes whitespace from the entire dataframe? I use gsub but I would need some function to automate this.
Something like
X <- as.data.frame(lapply(X,function(x){gsub(" ","",x)}))
Untested, since you provide no reproducible example (despite being told
by the posting guide to do so).
I do not know what my idea will do to numeric columns or to factors.
However it should give you at least a start.
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank's for your answer. I'm using read.csv. Enviado desde mi iPad
El 22/2/2017, a las 3:39, William Michels <wjm1 at caa.columbia.edu> escribi?: Hi Jos? (and Rolf), It's not entirely clear what type of 'whitespace' you're referring to, but if you're using read.table() or read.csv() to create your dataframe in the first place, setting 'strip.white = TRUE' will remove leading and trailing whitespace 'from unquoted character fields (numeric fields are always stripped).'
?read.table ?read.csv
Cheers, Bill
On 2/21/17, Rolf Turner <r.turner at auckland.ac.nz> wrote:
On 22/02/17 12:51, Jos? Luis Aguilar wrote: Hi all, i have a dataframe with 34 columns and 1534 observations. In some columns I have strings with spaces, i want remove the space. Is there a function that removes whitespace from the entire dataframe? I use gsub but I would need some function to automate this.
Something like
X <- as.data.frame(lapply(X,function(x){gsub(" ","",x)}))
Untested, since you provide no reproducible example (despite being told
by the posting guide to do so).
I do not know what my idea will do to numeric columns or to factors.
However it should give you at least a start.
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Try the following function to apply gsub to all character or factor
columns of a data.frame (and maintain change the class of all
columns):
gsubDataFrame <- function(pattern, replacement, x, ...) {
stopifnot(is.data.frame(x))
for(i in seq_len(ncol(x))) {
if (is.character(x[[i]])) {
x[[i]] <- gsub(pattern, replacement, x[[i]], ...)
} else if (is.factor(x[[i]])) {
levels(x[[i]]) <- gsub(pattern, replacement, levels(x[[i]]), ...)
} # else do nothing for numeric or other column types
}
x
}
E.g.,
d <- data.frame(stringsAsFactors = FALSE,
+ Int=1:5,
+ Char=c("a a", "baa", "a a ", " aa", "b a a"),
+ Fac=factor(c("x x", "yxx", "x x ", " xx", "y x x")))
str(d)
'data.frame': 5 obs. of 3 variables: $ Int : int 1 2 3 4 5 $ Char: chr "a a" "baa" "a a " " aa" ... $ Fac : Factor w/ 5 levels " xx","x x","x x ",..: 2 5 3 1 4
str(gsubDataFrame(" ", "", d)) # delete spaces, use "[[:space:]]" for whitespace
'data.frame': 5 obs. of 3 variables: $ Int : int 1 2 3 4 5 $ Char: chr "aa" "baa" "aa" "aa" ... $ Fac : Factor w/ 2 levels "xx","yxx": 1 2 1 1 2 Bill Dunlap TIBCO Software wdunlap tibco.com
On Tue, Feb 21, 2017 at 11:35 PM, Jos? Luis <josestadistico at gmail.com> wrote:
Thank's for your answer. I'm using read.csv. Enviado desde mi iPad
El 22/2/2017, a las 3:39, William Michels <wjm1 at caa.columbia.edu> escribi?: Hi Jos? (and Rolf), It's not entirely clear what type of 'whitespace' you're referring to, but if you're using read.table() or read.csv() to create your dataframe in the first place, setting 'strip.white = TRUE' will remove leading and trailing whitespace 'from unquoted character fields (numeric fields are always stripped).'
?read.table ?read.csv
Cheers, Bill
On 2/21/17, Rolf Turner <r.turner at auckland.ac.nz> wrote:
On 22/02/17 12:51, Jos? Luis Aguilar wrote: Hi all, i have a dataframe with 34 columns and 1534 observations. In some columns I have strings with spaces, i want remove the space. Is there a function that removes whitespace from the entire dataframe? I use gsub but I would need some function to automate this.
Something like
X <- as.data.frame(lapply(X,function(x){gsub(" ","",x)}))
Untested, since you provide no reproducible example (despite being told
by the posting guide to do so).
I do not know what my idea will do to numeric columns or to factors.
However it should give you at least a start.
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Oh, thank you so much!! It's perfect!! Enviado desde mi iPhone
El 22 feb 2017, a las 17:49, William Dunlap <wdunlap at tibco.com> escribi?:
Try the following function to apply gsub to all character or factor
columns of a data.frame (and maintain change the class of all
columns):
gsubDataFrame <- function(pattern, replacement, x, ...) {
stopifnot(is.data.frame(x))
for(i in seq_len(ncol(x))) {
if (is.character(x[[i]])) {
x[[i]] <- gsub(pattern, replacement, x[[i]], ...)
} else if (is.factor(x[[i]])) {
levels(x[[i]]) <- gsub(pattern, replacement, levels(x[[i]]), ...)
} # else do nothing for numeric or other column types
}
x
}
E.g.,
d <- data.frame(stringsAsFactors = FALSE,
+ Int=1:5,
+ Char=c("a a", "baa", "a a ", " aa", "b a a"),
+ Fac=factor(c("x x", "yxx", "x x ", " xx", "y x x")))
str(d)
'data.frame': 5 obs. of 3 variables: $ Int : int 1 2 3 4 5 $ Char: chr "a a" "baa" "a a " " aa" ... $ Fac : Factor w/ 5 levels " xx","x x","x x ",..: 2 5 3 1 4
str(gsubDataFrame(" ", "", d)) # delete spaces, use "[[:space:]]" for whitespace
'data.frame': 5 obs. of 3 variables: $ Int : int 1 2 3 4 5 $ Char: chr "aa" "baa" "aa" "aa" ... $ Fac : Factor w/ 2 levels "xx","yxx": 1 2 1 1 2 Bill Dunlap TIBCO Software wdunlap tibco.com
On Tue, Feb 21, 2017 at 11:35 PM, Jos? Luis <josestadistico at gmail.com> wrote: Thank's for your answer. I'm using read.csv. Enviado desde mi iPad
El 22/2/2017, a las 3:39, William Michels <wjm1 at caa.columbia.edu> escribi?: Hi Jos? (and Rolf), It's not entirely clear what type of 'whitespace' you're referring to, but if you're using read.table() or read.csv() to create your dataframe in the first place, setting 'strip.white = TRUE' will remove leading and trailing whitespace 'from unquoted character fields (numeric fields are always stripped).'
?read.table ?read.csv
Cheers, Bill
On 2/21/17, Rolf Turner <r.turner at auckland.ac.nz> wrote: On 22/02/17 12:51, Jos? Luis Aguilar wrote: Hi all, i have a dataframe with 34 columns and 1534 observations. In some columns I have strings with spaces, i want remove the space. Is there a function that removes whitespace from the entire dataframe? I use gsub but I would need some function to automate this.
Something like
X <- as.data.frame(lapply(X,function(x){gsub(" ","",x)}))
Untested, since you provide no reproducible example (despite being told
by the posting guide to do so).
I do not know what my idea will do to numeric columns or to factors.
However it should give you at least a start.
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.