Skip to content

Cleaning up messy Excel data

6 messages · Noah Silverman, Robert Baer, jim holtman +2 more

#
-----Original Message----- 
From: Noah Silverman
Sent: Tuesday, February 28, 2012 3:27 PM
To: r-help
Subject: [R] Cleaning up messy Excel data

Unfortunately, some data I need to work with was delivered in a rather messy 
Excel file.  I want to import into R and clean up some things so that I can 
do my analysis.  Pulling in a CSV from Excel is the easy part.

My current challenge is dealing with some text mixed in the values.
i.e.   118   5.7   <2.0  3.7

Since this column in Excel has a "<2.0" value, then R reads the column as a 
factor with levels.  Ideally, I want to convert it a normal vector of 
scalars and code code the "<2.0" as 0.

Can anyone suggest an easy way to do this?
--------------------------------------
?as.character
will show you how to change the "factor" column into a character column. 
Then, you can replace text using any of a number of procedures.
see for example
?gsub

finally, you can use as.numeric if you want numbers.  "Coding" is best done 
in the context of factors, so you might want to consider where replacing  <2 
with NA is more appropriate than replacing with 0.  In this end, the choice 
might be context sensitive.

Rob
--------------------------------
------------------------------------------
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965
#
First of all when reading in the CSV file, use 'as.is = TRUE' to
prevent the changing to factors.

Now that things are character in that column, you can use some pattern
expressions (gsub, regex, ...) to search for and change your data.
E.g.,

sub("<.*", "0", yourCol)

should do it for you.
On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman <noahsilverman at ucla.edu> wrote:

  
    
#
Just replace that value with zero.  If you provide some reproducible 
code I could probably give you a solution.
?dput
good luck,

Stephen
On 02/28/2012 03:27 PM, Noah Silverman wrote:

  
    
#
(mydata <- as.factor(c("1","2","3", ">2", "5", ">2")))
str(mydata)

newdata <- as.character(mydata)

newdata[newdata==">2"] <- 0
newdata <- as.numeric(newdata)
str(newdata)

We really need to keep Excel (and other spreadsheets) out of peoples hands.

John Kane
Kingston ON Canada
____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
Visit http://www.inbox.com/photosharing to find out more!