An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120228/804d5982/attachment.pl>
Cleaning up messy Excel data
6 messages · Noah Silverman, Robert Baer, jim holtman +2 more
-----Original Message----- From: Noah Silverman Sent: Tuesday, February 28, 2012 3:27 PM To: r-help Subject: [R] Cleaning up messy Excel data Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 <2.0 3.7 Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0. Can anyone suggest an easy way to do this? -------------------------------------- ?as.character will show you how to change the "factor" column into a character column. Then, you can replace text using any of a number of procedures. see for example ?gsub finally, you can use as.numeric if you want numbers. "Coding" is best done in the context of factors, so you might want to consider where replacing <2 with NA is more appropriate than replacing with 0. In this end, the choice might be context sensitive. Rob -------------------------------- ------------------------------------------ Robert W. Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A. T. Still University of Health Sciences 800 W. Jefferson St. Kirksville, MO 63501 660-626-2322 FAX 660-626-2965
First of all when reading in the CSV file, use 'as.is = TRUE' to
prevent the changing to factors.
Now that things are character in that column, you can use some pattern
expressions (gsub, regex, ...) to search for and change your data.
E.g.,
sub("<.*", "0", yourCol)
should do it for you.
On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman <noahsilverman at ucla.edu> wrote:
Unfortunately, some data I need to work with was delivered in a rather messy Excel file. ?I want to import into R and clean up some things so that I can do my analysis. ?Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. ? 118 ? 5.7 ? <2.0 ?3.7 Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels. ?Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120228/04d6754c/attachment.pl>
Just replace that value with zero. If you provide some reproducible code I could probably give you a solution. ?dput good luck, Stephen
On 02/28/2012 03:27 PM, Noah Silverman wrote:
Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7<2.0 3.7 Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Stephen Sefick ************************************************** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ************************************************** sas0025 at auburn.edu http://www.auburn.edu/~sas0025 ************************************************** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis "A big computer, a complex algorithm and a long time does not equal science." -Robert Gentleman
(mydata <- as.factor(c("1","2","3", ">2", "5", ">2")))
str(mydata)
newdata <- as.character(mydata)
newdata[newdata==">2"] <- 0
newdata <- as.numeric(newdata)
str(newdata)
We really need to keep Excel (and other spreadsheets) out of peoples hands.
John Kane
Kingston ON Canada
-----Original Message----- From: noahsilverman at ucla.edu Sent: Tue, 28 Feb 2012 13:27:13 -0800 To: r-help at r-project.org Subject: [R] Cleaning up messy Excel data Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 <2.0 3.7 Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________ FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more!