Skip to content

[OT] (slightly) - OpenOffice Calc and text files

10 messages · Schwab,Wilhelm K, Albyn Jones, Peter Langfelder +4 more

#
Hello all,

I had a very strange looking problem that turned out to be due to unexpected (by me at least) format changes to one of my data files.  We have a small lab study in which each run is represented by a row in a tab-delimited file; each row identifies a repetition of the experiment and associates it with some subjective measurements and times from our notes that get used to index another file with lots of automatically collected data.  In short, nothing shocking.

In a moment of weakness, I opened the file using (I think it's version 3.2) of OpenOffice Calc to edit something that I had mangled when I first entered it, saved it (apparently the mistake), and reran my analysis code.  The results were goofy, and the problem was in my code that runs before R ever sees the data.  That code was confused by things that I would like to ensure don't happen again, and I suspect that some of you might have thoughts on it.

The problems specifically:

(1) OO seems to be a little stingy about producing tab-delimited text; there is stuff online about using the csv and editing the filter and folks (presumably like us) saying that it deserves to be a separate option.

(2) Dates that I had formatted as YYYY got chopped to YY (did we not learn anything last time?<g>) and times that I had formatted in 24 hours ended up AM/PM.

Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this?  If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :)  I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc.

Bill
#
How about emacs?

albyn
On Wed, Oct 13, 2010 at 01:13:03PM -0400, Schwab,Wilhelm K wrote:

  
    
#
On Wed, Oct 13, 2010 at 10:13 AM, Schwab,Wilhelm K
<bschwab at anest.ufl.edu> wrote:
No idea about Calc, I use it regularly but only to view files (and
that mostly csv, not tab-delinited). The most primitive solution is to
use a plain text editor such as vi that will save everything as it
loaded it except for what you change. The second most primitive idea
(or maybe not so primitive after all) is to read the table into R and
manually fix it there such as table$column[row] = "ABCD" (this is my
favorite way of changing things :)). The third most primitive idea
which I have actually never used but which may be viable is to load it
into R and use the fix() function that pulls up a rather primitive but
functional data editor.

Peter
#
On Oct 13, 2010, at 12:13 PM, Schwab,Wilhelm K wrote:

            
I don't use OpenOffice (soon to be LibreOffice) much these days, but one of the things that you can try, is when you go to save the file as a CSV and edit the filter, there is an option there "Save cell content as shown". If that is checked, then any cell formatting that has been applied, either by default or by your actions, will be retained in the exported data. If that is unchecked, then the 'raw' data is exported to the file. I just tried it here (on OSX) and with the option checked, the years were exported with the default two digits. The years were exported with four digits with the box unchecked.

Unfortunately, I had no joy with a time field. The AM/PM formatting was retained with the box checked or unchecked.
On Linux, a spreadsheet based alternative would be Gnumeric (http://projects.gnome.org/gnumeric/) and of course, there is always Emacs, which I have now used on Windows, Linux and OSX.

HTH,

Marc Schwartz
#
Albyn,

I'll look into it.  In fact, I have a small book on it that I bought in my very early days of using Linux.  I quickly found TeX Maker (for the obvious), Code::Blocks for C/C++ and I would not have started the move without a working Smalltalk (http://pharo-project.org/home).

For editing data files, I really just want something that shows data in an understandable grid and does not do weird stuff thinking it's being helpful.

Bill
#
On Wed, Oct 13, 2010 at 10:13 AM, Schwab,Wilhelm K
<bschwab at anest.ufl.edu> wrote:
It is, but it is doable (you can manually edit the extension to .txt
and edit the field and then choose tab or a _few_ other options that
your heart desires.  Importing should be similar.
The "general" cell format can be quite convenient, but usually seems
one of the most awful creations in both Excel and Calc.  Rants aside,
try forcing the cell format (I like text because it generally just
treats it asis then).

  
    
#
Peter,

vi is *really* primitive =:0  R is a little late because I tend to do shape changes prior to invoking R.  However, I could load tweak and re-save and then bring R back into it later.  I never would have thought of it.  Thanks!

Bill
#
On Oct 13, 2010, at 1:13 PM, Schwab,Wilhelm K wrote:

            
You have been little stingy yourself about describing what you did. I  
see no specifics about the actual data used as input nor the specific  
operations. I just opened an OO.o Calc workbook and dropped a  
character vector, "1969-12-31 23:59:50" copied from help(POSIXct) into  
a2. I then copied it to a3 and formatted it to be in the precanned  
format, MM/DD/YYYY HH:MM:SS , noticed that it had not been interpreted  
as a data-time vlaue at all so entered =TODAY()+TIME(13;0;0) in a4 and  
=TIME(13;0;0) in a5, formated to a user specified custom time format  
of YYYY-MM-DD HH:MM:SS


Copied a5 to c1:c5
  saved to a text-csv file specifying the field separator as tab and  
the text-delimiter as '"' and got:

""time"		1899-12-30 13:00:00
"1969-12-31 23:59:50"		1899-12-30 13:00:00
"1969-12-31 23:59:50"		1899-12-30 13:00:00
2010-10-13 13:00:00		1899-12-30 13:00:00
1899-12-30 13:00:00		1899-12-30 13:00:00


This handling of dates and times does not seem particularly difficult  
to elicit andseems to represent dates in YYYY and times in "military  
time".
I didn't do anything I thought was out of the ordinary and so cannot  
reproduce your problem. (This was on a Mac, but OO.o is probably going  
to behave the same across *NIX cultures.)
#
----------------------------------------
Probably instead of guessing and seeing how various things react, you
could go get a utility like octal dump or open in an editor that
has a hex mode and see what happened. This could be anything- crlf convention,
someone turned it to unicode, etc. On linux or cygwin I think you have
"od" available. Then of course, if you know what R likes, you can use
sed to fix it...
#
I know *what* happened (Calc reformatted the data in ways I did not want or expect).   It is not end-of-line conventions; they reformatted the data leaving the structure intact.  As to why/how, that could depend on the sequence of operations, so I thought to ask here to see if you had collectively either found something specific to do or to avoid. 

Gnumeric is now freshly installed and will get some testing; if I don't care for it, I'll look more at emacs.  I don't ask much of a spreadsheet (show/edit a grid and maybe hide/show columns for complex data sets), but it would be nice if it did not reformat everything every time I open a file :(

So far, gnumeric successfully opened a file; I will be a little less trusting when it comes to saving one.  Thanks!!

Bill