Skip to content

Cleaning up messy Excel data

8 messages · Rolf Turner, jim holtman, Greg Snow +2 more

#
On 01/03/12 04:43, John Kane wrote:
Amen, bro'!!!

     cheers,

         Rolf Turner
#
But there are some important reasons to use Excel.  In my work there
are a lot of people that I have to send the equivalent of a data.frame
to who want to look at the data and possibly slice/dice the data
differently and then send back to me updates.  These folks do not know
how to use R, but do have Microsoft Office installed on their
computers and know how to use the different products.

I have been very successful in conveying what I am doing for them by
communicating via Excel spreadsheets.  It is also an important medium
in dealing with some international companies who provide data via
Excel and expect responses back via Excel.

When dealing with data in a tabular form, Excel does provide a way for
a majority of the people I work with to understand the data.  Yes,
there are problems with some of the ways that people use Excel, and
yes I have had to invest time in scrubbing some of the data that I get
from them, but if I did not, then I would probably not have a job
working for them.  I use R exclusively for the analysis that I do, but
find it convenient to use Excel to provide a communication mechanism
to the majority of the non-R users that I have to deal with.  It is a
convenient "work-around" because I would never get them to invest the
time to learn R.

So in the real world these is a need to Excel and we are not going to
cause it to go away; we have to learn how to live with it, and from my
standpoint, it has definitely benefited me in being able to
communicate with my users and continuing to provide them with results
that they are happy with.  They refer to letting me work my "magic" on
the data; all they know is they see the result via Excel and in the
background R is doing the heavy lifting that they do not have to know
about.
On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner <rolf.turner at xtra.co.nz> wrote:

  
    
1 day later
#
Try sending your clients a data set (data frame, table, etc) as an MS
Access data table instead.  They can still view the data as a table,
but will have to go to much more effort to mess up the data, more
likely they will do proper edits without messing anything up (mixing
characters in with numbers, have more sexes than your biology teacher
told you about, add extra lines at top or bottom that makes reading
back into R more difficult, etc.)

I have had a few clients that I talked into using MS Access from the
start to enter their data, there was often a bit of resistance at
first, but once they tried it and went through the process of
designing the database up front they ended up thanking me and believed
that the entire data entry process was easier and quicker than had the
used excel as they originally planned.

Access is still part of MS office, so they don't need to learn R or in
any way break their chains from being prisoners of bill, but they will
be more productive in more ways than just interfacing with you.

Access (databases in general) force you to plan things out and do the
correct thing from the start.  It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.
On Thu, Mar 1, 2012 at 6:15 AM, jim holtman <jholtman at gmail.com> wrote:

  
    
#
Unfortunately, a lot of people who use MS Office don't have or know how 
to use MS Access. Where I work now (as in the past) I have to tie 
someone to their chair, give them a few pokes with the cattle prod and 
then show them that a CSV file will load straight into Excel before I 
can convince them that they can use such a heretical data format. You 
don't want to know what I have to do to convince them that they can view 
my listings in HTML.

Jim

PS - Always give them a _copy_ of the CSV file.
On 03/03/2012 10:41 AM, Greg Snow wrote:
#
On 03/03/12 12:41, Greg Snow wrote:
<SNIP>
<SNIP>

Fortune!

     cheers,

         Rolf Turner
#
Unfortunately they only know how to use Excel and Word.  They are not
folks who use a computer every day.  Many of them run factories or
warehouses and asking them to use something like Access would not
happen in my lifetime (I have retired twice already).

I don't have any problems with them "messing" up the data that I send
them; they are pretty good about making changes within the context of
the spreadsheet.  The other issue is that I working with people in
twenty different locations spread across the US, so I might be able to
one of them to use Access (there is one I know that uses it), but that
leaves 19 other people I would not be able to communicate with.

The other thing is, is that I use Excel myself to slice/dice data
since there are things that are easier in Excel than R (believe it or
not).  There are a number of tools I keep in my toolkit, and R is
probably the most important, but I have not thrown the rest of them
away since they still serve a purpose.

So if you can come up with a way to 20 diverse groups, who are not
computer literate, to change over in a couple of days from Excel to
Access let me know.  BTW, I tried to use Access once and gave it up
because it was not as intuitive as some other tools and did not give
me any more capability than the ones I was using.  So I know I would
have a problem in convincing other to make the change just so they
could communicate with me, while they still had to use Excel to most
of their other interfaces.

This is the real world where you have to learn how to adapt to your
environment and make the best of it.  So you just have to learn that
Excel can be your friend (or at least not your enemy) and can serve a
very useful purpose in getting your ideas across to other people.
On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow <538280 at gmail.com> wrote:

  
    
#
Sometimes we adapt to our environment, sometimes we adapt our
environment to us. I like fortune(108).

I actually was suggesting that you add a tool to your toolbox, not limit it.

In my experience (and I don't expect everyone else's to match) data
manipulation that seems easier in Excel than R is only easier until
the client comes back and wants me to redo the whole analysis with one
typo fixed.  Then rerunning the script in R (or Perl or other tool) is
a lot easier than trying to remember where all I clicked, dragged,
selected, etc.

I do use Excel for somethings (though I would be happy to find other
tools for that if it were possible to expunge Excel from the earth)
and Word (I actually like using R2wd to send tables and graphs to word
that I can then give to clients who just want to be able to copy and
paste them to something else), I just think that many of the tasks
that many people use excel for would be better served with a better
tool.

If someone reading this decides to put some more thought into a
project up front and actually design a database up front rather than
letting it evolve into some monstrosity in Excel, and that decision
saves them some later grief, then the world will be a little bit
better place.
On Fri, Mar 2, 2012 at 6:04 PM, jim holtman <jholtman at gmail.com> wrote:
--
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com
#
Seconded 

John Kane
Kingston ON Canada
____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
Visit http://www.inbox.com/photosharing to find out more!