Skip to content

how can I import op.gz files with read.csv or otherwise

4 messages · Bernd Dittmann, arun, John Kane +1 more

#
HI,

May be this link helps you:
http://stackoverflow.com/questions/5764499/decompress-gz-file-using-r
A.K.




----- Original Message -----
From: herr dittmann <herrdittmann at yahoo.co.uk>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Friday, December 21, 2012 9:51 AM
Subject: [R] how can I import op.gz files with read.csv or otherwise

Dear R-users,

I am struggling to directly read an "op.gz" file into R. NOAA kindly provides daily weather data on their FTP server for download.
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252? LC_CTYPE=English_United Kingdom.1252??? LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C??????????????????????????? LC_TIME=English_United Kingdom.1252?? ?
attached base packages:
[1] stats???? graphics? grDevices utils???? datasets? methods?? base??? ?
loaded via a namespace (and not attached):
[1] tools_2.15.1

Here is the data set in question:
x <- read.csv(file="ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz", skip = 1, sep = "")

and "structure" returns some incomprehensible gibberish:
'data.frame':?? 70 obs. of? 6 variables:
?$ X4?tYd...??8.?WD...??.??.X.QT?VP..????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? : Factor w/ 70 levels "\005~4?d???E\031y\020???\035J???I???\021?B??R{?Ykv\035`???a\017Z\021?sP?e?h??""| __truncated__,..: 44 13 56 64 28 23 67 3 2 33 ...
?$ X.?oM??T..??_...??g?7.?..T?.??.?...5?J...???j.Q?..?.e???Gm????g..a...??.J..?.?s?.??.?.?klsUD?.?..?U...u1.z?.W?..x...3._..E.?.?ZD.?o??.dv?....?k.C.y...8h: Factor w/ 41 levels "","\025?\016\v??i;?4??\002\001P??\0025????{????=??4&?w\\\\Q??????\"hn???I???b*??\035b\b>6??$W!??R=?\022?Pq?[?j\004$T??3?*??%??N??\"| __truncated__,..: 1 1 1 39 1 1 1 8 1 5 ...
?$ X.i?.y??..?2.h..??7.?J.3k..jLm...Q..uY?J?.K.zkU.8.??..Y.7.3...???A.?.3??..Z..5...??.??????????????????????????????????????????????????????????????????? : Factor w/ 29 levels "","\001q?^+n?1",..: 1 1 1 4 1 1 1 14 1 6 ...
?$ X.?Fd.m.?..v.?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? : Factor w/ 17 levels "","??SL\a????\035d)??$?Z??????J?????B6?\006%5l[???\025a\024??+gT+3",..: 1 1 1 16 1 1 1 1 1 1 ...
?$ ?A..E.?JkEZ.??.?.?.......?.z..?.z..?..??.???g.????????????????????????????????????????????????????????????????????????????????????????????????????????? : Factor w/ 8 levels "","\001S\177?\017iS??i??#\017\"Ug?:i??\016p?\031U??D""| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
?$ X..o??nP?o?W?j.??..B..???.Q??...????? ?


While I can manually open and read the op.gz file in a text editor, read.csv() or read.table() the imported file is simply unreadable.

How can I best get the job done? Any pointers, suggestions, ideas most welcome!!

Thanks in advance!

Bernd

??? [[alternative HTML version deleted]]


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Try downloading it and decompress it:

  url <- "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz"
  dest <- "/home/john/rdata/weather.op.gz"
  download.file(url, dest) 

However it does not look like a nicely formatted file and you may have to do some cleanup in a text editior or perhaps load it into a spreadsheet before you read it into R.  

I tried the method from the link arun provided and it did not work.  It looks like the headers and data are not consistant
John Kane
Kingston ON Canada
read.csv(file="ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz",
"","\025C'\016\vC?B2i;B'4F?C1\002\001Pb??C!\0025B6b?"C"C?{C?C?b?"B9=C?B$4&C0w\\\\QB-B4b?:B-C?B4\"hnC?b?"b?0IB(C?E
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
#
Hello,

It can be read using readLines. I've changed url to URL because there's 
a function of that name.
I've also changed dest.

URL <- "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz"
dest <- "weather.op.gz"
download.file(URL, dest)
gz <- gzfile(dest, open = "rt")
x <- readLines(gz)
close(gz)
x


Like you say, headers and data are not consistent, it seems some column 
headers are missing.

Hope this helps,

Rui Barradas

Em 21-12-2012 17:44, John Kane escreveu: