Skip to content

[ncdf4] error converting GEIA data to netCDF

13 messages · Roy Mendelssohn - NOAA Federal, Tom Roche, David W. Pierce +1 more

#
summary: I can successfully ncvar_put(...) data to a file, but when I
try to ncvar_get(...) the same data, I get
How to fix or debug?

details:

R code @

https://github.com/TomRoche/GEIA_to_NetCDF

successfully (if crudely) uses R packages={ncdf4, maps, fields} to

+ extract data from a GEIA-distributed datafile

https://github.com/downloads/TomRoche/GEIA_to_netCDF/N2OOC90Y.1A

+ display the data (mostly successfully--the map's legend has problems
  which I'll attack later)

https://github.com/downloads/TomRoche/GEIA_to_netCDF/output.1.png

+ create a netCDF file using the data read from the GEIA file. (At
  least, after nc_sync(netcdf.file), the file `ncdump -h`s properly.)

However, I can only *put* the data to the netCDF file:
When I try to *pull* the data *from* the netCDF I created,
I get
And I get the same error if I try the minor variation(s)
+   nc=netcdf.file,
+ #  varid=emis.var,
+   varid=emis.var.name,
+   # read all the data
+   start=rep(1, emis.var$ndims),
+   count=c(-1, -1, 1))

But the data itself appears to be OK--at least, it virtualizes
properly (above). So I'm thinking I must just be missing something
simple, and hoping Someone Out There with fresh eyeballs can point to
my error(s).

(And, in case you're wondering:

- I'm not just ncvar_put'ing the data for the exercise: I want the
  GEIA data in netCDF format for subsequent use.

- I tried to find the GEIA data distributed in netCDF format, and
  asked around, but have gotten no responses.

) TIA, Tom Roche <Tom_Roche at pobox.com>
#
If I had to guess, you did not call nc_close or nc_sync.  In netcdf, the data is actually really written until that is called.

-Roy
On Aug 27, 2012, at 8:31 PM, Tom Roche wrote:

            
**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
#
Just looked at your code, and you do call nc_sync.  I am wondering if it works if you call nc_close, reopen and then write. Perhaps the nc_sync is sufficent when you have been in define mode in the nc_create call.

-Roy
On Aug 27, 2012, at 8:31 PM, Tom Roche wrote:

            
**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
#
Hello,

The following works fine for me:

 > nc <- nc_open("~/GEIA_N2O_oceanic.nc")
 > emi_n2o <- ncvar_get(nc, 'emi_n2o', start=c(1,1,1), count=c(-1,-1,1))

Regards,
Pascal



Le 28/08/2012 12:31, Tom Roche a ?crit :
#
Tom Roche Mon, 27 Aug 2012 23:31:23 -0400
in the one and only datavar
David W. Pierce Mon, 27 Aug 2012 21:35:35 -0700
So nc_sync is not enough--good to know. But ...

Unfortunately I do that (nc_close then nc_open) in the latest code @

https://github.com/TomRoche/GEIA_to_NetCDF

(direct link to relevant file=

https://github.com/TomRoche/GEIA_to_netCDF/blob/master/GEIA.to.netCDF.r

) but no fix--same error. Any other suggestions?

your assistance is appreciated, Tom Roche <Tom_Roche at pobox.com>
#
Hi Tom:

Sorry for top-posting.  My bad etiquette for the day.  The order that you do things when you create nercdf files matters a lot.  Here is a section from the help for ncdf4:
And here is the relevant section from your code:
You can't write until all dimensions have been defined, and all variables defined.

HTH,

-Roy
On Aug 28, 2012, at 4:29 AM, Tom Roche wrote:

            
**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
#
Roy Mendelssohn Tue, 28 Aug 2012 09:07:32 -0700
But in fact, that's only a part of the code ... which omits the prior
dimension and variable definitions :-( See the current version @

https://github.com/TomRoche/GEIA_to_netCDF/blob/c380c0a28dc8c71dbf0c2ba18130a2439a4fe089/GEIA.to.netCDF.r

I've also attached that (quoted) following my .sig, with the top-most
constant and function declarations removed for brevity. The dimension
definitions are prefixed with '1', the variable definition is prefixed
with '2'.

HTH, Tom Roche <Tom_Roche at pobox.com>
current GEIA.to.netCDF.r code block follows to end of post------------
# code----------------------------------------------------------------

 > # process input
 > library(maps) # on tlrPanP5 as well as clusters

 > # input file path
 > GEIA.emis.txt.fp <- sprintf('%s/%s', GEIA.emis.txt.dir, GEIA.emis.txt.fn)
 > # columns are grid#, mass
 > GEIA.emis.mx <-
 >   as.matrix(read.table(GEIA.emis.txt.fp, skip=GEIA.emis.txt.n.header))
 > # mask zeros? no, use NA for non-ocean areas
 > # GEIA.emis.mx[GEIA.emis.mx == 0] <- NA
 > # <simple input check>
 > dim(GEIA.emis.mx) ## [1] 36143 2
 > # start debug
 > GEIA.emis.mx.rows <- dim(GEIA.emis.mx)[1]
 > if (GEIA.emis.mx.rows > GEIA.emis.grids.dim) {
 >   cat(sprintf('ERROR: %s: GEIA.emis.mx.rows=%.0d > GEIA.emis.grids.dim=%.0d\n',
 >     this.fn, GEIA.emis.mx.rows, GEIA.emis.grids.dim))
 > } else {
 >   cat(sprintf('debug: %s: GEIA.emis.mx.rows < GEIA.emis.grids.dim\n',
 >     this.fn))
 > }
 > # end debug
 > # </simple input check>

 > global.emis.vec <-
 >   create.global.emissions.vector(
 >     GEIA.emis.grids.dim, GEIA.emis.mx.rows, GEIA.emis.mx)

 > # <visual input check>
 > # Need sorted lat and lon vectors: we know what those are a priori
 > # Add 0.5 since grid centers
 > lon.vec <- 0.5 +
 >   seq(from=grid.lon.degree.start, by=grid.lon.degree.per.cell, length.out=GEIA.emis.lon.dim)
 > lat.vec <- 0.5 +
 >   seq(from=grid.lat.degree.start, by=grid.lat.degree.per.cell, length.out=GEIA.emis.lat.dim)

 > # Create emissions matrix corresponding to those dimensional vectors
 > # (i.e., global.emis.mx is the "projection" of global.emis.vec)
 > # First, create empty global.emis.mx? No, fill from global.emis.vec.
 > # Fill using byrow=T? or "bycol" == byrow=FALSE? (row=lat)
 > # I assigned (using lon.lat.vec.to.grid.index)
 > # "grid indices" (global.emis.vec.index values)
 > # "lon-majorly" (i.e., iterate over lats before incrementing lon),
 > # so we want to fill byrow=FALSE ... BUT,
 > # that will "fill from top" (i.e., starting @ 90N) and
 > # we want to "fill from bottom" (i.e., starting @ 90S) ...
 > # global.emis.mx <- matrix(
 > # global.emis.vec, nrow=GEIA.emis.lat.dim, ncol=GEIA.emis.lon.dim,
 > # # so flip/reverse rows/latitudes when done
 > # byrow=FALSE)[GEIA.emis.lat.dim:1,]

 > # NO: I cannot just fill global.emis.mx from global.emis.vec:
 > # latter's/GEIA's grid numbering system ensures 1000 lons per lat!
 > # Which overflows the "real-spatial" global.emis.mx :-(
 > # So I need to fill global.emis.mx using a for loop to decode the grid indices :-(
 > # (but at least I can fill in whatever direction I want :-)
 > global.emis.mx <- matrix(
 >   rep(NA, GEIA.emis.grids.dim), nrow=GEIA.emis.lat.dim, ncol=GEIA.emis.lon.dim)

 > # 1: works if subsequently transposed: TODO: FIXME
 > for (i.lon in 1:GEIA.emis.lon.dim) {
 >   for (i.lat in 1:GEIA.emis.lat.dim) {
 > # 2: fails with 'dimensions of z are not length(x)(-1) times length(y)(-1)'
 > # for (i.lat in 1:GEIA.emis.lat.dim) {
 > # for (i.lon in 1:GEIA.emis.lon.dim) {
 > # 3: fails with 'dimensions of z are not length(x)(-1) times length(y)(-1)'
 > # for (i.lon in GEIA.emis.lon.dim:1) {
 > # for (i.lat in GEIA.emis.lat.dim:1) {
 > # 4: fails with 'dimensions of z are not length(x)(-1) times length(y)(-1)'
 > # for (i.lat in GEIA.emis.lat.dim:1) {
 > # for (i.lon in GEIA.emis.lon.dim:1) {
 >     lon <- lon.vec[i.lon]
 >     lat <- lat.vec[i.lat]
 >     GEIA.emis.grid.val <-
 >       global.emis.vec[
 >         lon.lat.vec.to.grid.index(c(lon, lat),
 >           n.lon=GEIA.emis.lon.dim, n.lat=GEIA.emis.lat.dim)]
 >     if (!is.na(GEIA.emis.grid.val)) {
 >       if (is.na(global.emis.mx[i.lat, i.lon])) {
 >         global.emis.mx[i.lat, i.lon] <- GEIA.emis.grid.val
 > # start debug
 > # cat(sprintf(
 > # 'debug: %s: writing val=%f to global.emis.mx[%.0f,%.0f] for grid center=[%f,%f]\n',
 > # this.fn, GEIA.emis.grid.val, i.lon, i.lat, lon, lat))
 > # end debug
 >       } else {
 >         # error if overwriting presumably-previously-written non-NA!
 >         cat(sprintf(
 >           'ERROR: %s: overwriting val=%f with val=%f at global.emis.mx[%.0f,%.0f] for grid center=[%f,%f]\n',
 >           this.fn, global.emis.mx[i.lat, i.lon], GEIA.emis.grid.val,
 >           i.lon, i.lat, lon, lat))
 >         return # TODO: abend
 >       } # end testing target != NA (thus not previously written)
 >     } # end testing source != NA (don't write if is.na(lookup)
 >   } # end for loop over lats
 > } # end for loop over lons

 > # Now draw the damn thing
 > # 1: TODO: FIXME: why do I need to transpose global.emis.mx?
 > image(lon.vec, lat.vec, t(global.emis.mx))
 > # 2,3,4: how it should work ?!?
 > # image(lon.vec, lat.vec, global.emis.mx)
 > map(add=TRUE)
 > # </visual input check>

 > # write output to netCDF
 > library(ncdf4)

 > # output file path (not currently used by package=ncdf4)
 > netcdf.fp <- sprintf('%s/%s', netcdf.dir, netcdf.fn)

1> # create dimensions and dimensional variables
1> time.vec <- c(0) # annual value, corresponding to no specific year
1> time.dim <- ncdim_def(
1>   name=time.var.name,
1>   units=time.var.units,
1>   vals=time.vec,
1>   unlim=TRUE,
1>   create_dimvar=TRUE,
1>   calendar=time.var.calendar,
1>   longname=time.var.long_name)

1> lon.dim <- ncdim_def(
1>   name=lon.var.name,
1>   units=lon.var.units,
1>   vals=lon.vec,
1>   unlim=FALSE,
1>   create_dimvar=TRUE,
1>   longname=lon.var.long_name)

1> lat.dim <- ncdim_def(
1>   name=lat.var.name,
1>   units=lat.var.units,
1>   vals=lat.vec,
1>   unlim=FALSE,
1>   create_dimvar=TRUE,
1>   longname=lat.var.long_name)

2> # create data variable (as container--can't put data until we have a file)
2> emis.var <- ncvar_def(
2>   name=emis.var.name,
2>   units=emis.var.units,
2> # dim=c(time.dim, lat.dim, lon.dim),
2> # dim=list(time.dim, lat.dim, lon.dim),
2> # note dim order desired for result=var(time, lat, lon)
2>   dim=list(lon.dim, lat.dim, time.dim),
2>   missval=as.double(emis.var._FillValue),
2>   longname=emis.var.long_name,
2>   prec="double")

 > # get current time for creation_date
 > # system(intern=TRUE) -> return char vector, one member per output line)
 > netcdf.timestamp <- system('date', intern=TRUE)

 > # create netCDF file
 > netcdf.file <- nc_create(
 >   filename=netcdf.fn,
 > # vars=list(emis.var),
 > # verbose=TRUE)
 >   vars=list(emis.var))

 > # Write data to data variable: gotta have file first.
 > # Gotta convert 2d global.emis.mx[lat,lon] to 3d global.emis.arr[time,lat,lon]
 > # Do this before adding _FillValue to prevent:
 > # > Error in R_nc4_put_vara_double: NetCDF: Not a valid data type or _FillValue type mismatch
 > ## global.emis.arr <- global.emis.mx
 > ## dim(global.emis.arr) <- c(1, dim(global.emis.mx))
 > ## global.emis.arr[1,,] <- global.emis.mx

 > # Note
 > # * global.emis.mx[lat,lon]
 > # * datavar needs [lon, lat, time] (with time *last*)

 > ncvar_put(
 >   nc=netcdf.file,
 >   varid=emis.var,
 > # vals=global.emis.arr,
 >   vals=t(global.emis.mx),
 > # start=rep.int(1, length(dim(global.emis.arr))),
 >   start=c(1, 1, 1),
 > # count=dim(global.emis.arr))
 >   count=c(-1,-1, 1)) # -1 -> all data

 > # Write netCDF attributes
 > # Note: can't pass *.dim as varid, even though these are coordinate vars :-(

 > # add datavar attributes
 > ncatt_put(
 >   nc=netcdf.file,
 > # varid=lon.var,
 >   varid=lon.var.name,
 >   attname="comment",
 >   attval=lon.var.comment,
 >   prec="text")

 > ncatt_put(
 >   nc=netcdf.file,
 > # varid=lat.var,
 >   varid=lat.var.name,
 >   attname="comment",
 >   attval=lat.var.comment,
 >   prec="text")

 > # put _FillValue after putting data!
 > ncatt_put(
 >   nc=netcdf.file,
 >   varid=emis.var,
 >   attname="_FillValue",
 >   attval=emis.var._FillValue,
 >   prec="float") # why is "emi_n2o:missing_value = -999."?

 > # add global attributes (varid=0)
 > ncatt_put(
 >   nc=netcdf.file,
 >   varid=0,
 >   attname="creation_date",
 >   attval=netcdf.timestamp,
 >   prec="text")

 > ncatt_put(
 >   nc=netcdf.file,
 >   varid=0,
 >   attname="source_file",
 >   attval=netcdf.source_file,
 >   prec="text")

 > ncatt_put(
 >   nc=netcdf.file,
 >   varid=0,
 >   attname="Conventions",
 >   attval=netcdf.Conventions,
 >   prec="text")

 > # flush to file (there may not be data on disk before this point)
 > # nc_sync(netcdf.file) # so we don't hafta reopen the file, below
 > # Nope: per David W. Pierce Mon, 27 Aug 2012 21:35:35 -0700, ncsync is not enough
 > nc_close(netcdf.file)
 > nc_open(netcdf.fn,
 >         write=FALSE, # will only read below
 >         readunlim=TRUE) # it's a small file

 > # <simple output check>
 > system(sprintf('ls -alth %s', netcdf.fp))
 > system(sprintf('ncdump -h %s', netcdf.fp))
 > # </simple output check>

 > # <visual output check>
 > # TODO: do plot-related refactoring! allow to work with projects={ioapi, this}
 > # <copied from plotLayersForTimestep.r>
 > library(fields)
 > # double-sprintf-ing to set precision by constant: cool or brittle?
 > stats.precision <- 3 # sigdigs to use for min, median, max of obs
 > stat.str <- sprintf('%%.%ig', stats.precision)
 > # use these in function=subtitle.stats as sprintf inputs
 > max.str <- sprintf('max=%s', stat.str)
 > med.str <- sprintf('med=%s', stat.str)
 > min.str <- sprintf('min=%s', stat.str)
 > # </copied from plotLayersForTimestep.r>

 > # Get the data out of the datavar, to test reusability
 > # target.data <- emis.var[,,1] # fails, with
 > # > Error in emis.var[, , 1] : incorrect number of dimensions
 > target.data <- ncvar_get(
 >   nc=netcdf.file,
 > # varid=emis.var,
 >   varid=emis.var.name,
 >   # read all the data
 > # start=rep(1, emis.var$ndims),
 >   start=c(1, 1, 1),
 > # count=rep(-1, emis.var$ndims))
 >   count=c(-1, -1, 1))
 > # MAJOR: all of the above fail with
 > # > Error in if (nc$var[[li]]$hasAddOffset) addOffset = nc$var[[li]]$addOffset else addOffset = 0 :
 > # > argument is of length zero

 > # Note that, if just using the raw data, the following plot code works.
 > target.data <- t(global.emis.mx)
 > # <simple output check/>
 > dim(target.data) # n.lon, n.lat

 > # <copied from windowEmissions.r>
 > palette.vec <- c("grey","purple","deepskyblue2","green","yellow","orange","red","brown")
 > colors <- colorRampPalette(palette.vec)
 > probabilities.vec <- seq(0, 1, 1.0/(length(palette.vec) - 1))
 > # </copied from windowEmissions.r>

 > # <copy/mod from plotLayersForTimestep.r>
 > plot.layer(target.data,
 >   title=netcdf.title,
 >   subtitle=subtitle.stats(target.data),
 >   x.centers=lon.vec,
 >   y.centers=lat.vec,
 >   q.vec=probabilities.vec,
 >   colors=colors)
 > # </copy/mod from plotLayersForTimestep.r>
 > map(add=TRUE)
 > # </visual output check>

 > # teardown
 > dev.off()
 > nc_close(netcdf.file)
#
On Aug 28, 2012, at 12:24 PM, Tom Roche wrote:

            
**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097

e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
#
Pascal Oettli: MERCI BEAUCOUP! (though I would have thanked you
earlier if I hadn't had to dig through the r-help digest first :-)

Tom Roche Mon, 27 Aug 2012 23:31:23 -0400
https://stat.ethz.ch/pipermail/r-help/2012-August/322576.html
* > nc <- nc_open("~/GEIA_N2O_oceanic.nc")
And that appears to have been the problem, since when I

-nc_open(netcdf.fp,
-  write=FALSE,
-  readunlim=TRUE)
+netcdf.file <- nc_open(netcdf.fp,
+  write=FALSE,
+  readunlim=TRUE)

I don't get the error when I subsequently ncvar_get, and the code @

https://github.com/TomRoche/GEIA_to_NetCDF

now works.

Dave Pierce: I assert that the current error is waaay too subtle:

- I don't get an error when I nc_open without assigning.

- I don't get an error when I ncvar_put to that file's datavar.

- I only get an error when I ncvar_get from the datavar.

- Nothing about the error text (IMHO) would lead one to the fix.

(Note also that neither ncvar_put or nc_close appear to require
assignment, which is probably what made me think I could nc_open
without assignment.)

Can ncdf4 be made to fail more helpfully? E.g., to fail immediately on
nc_open without assignment?

thanks again! Tom Roche <Tom_Roche at pobox.com>
#
De rien. You're welcome.

Pascal


Le 12/08/29 5:23, Tom Roche a ?crit :
#
Tom Roche Tue, Aug 28, 2012 at 4:23 PM
David W. Pierce Tue, 28 Aug 2012 13:23:53 -0700
TIA.
Oh, it's worse than that :-) Drawing on my career (and many colleagues')
as a producer and consumer of software, I have come to formulate
Roche's 3 Laws of Software Adoption:

1, A software project becomes alive when it has some community of users.

2. If the intelligence of individual users can always be quantified,
   there is some PDF characterizing the collective intelligence
   of the project's users at every point in its life.

3. At any time t when that project's user community is increasing,
   new users tend to add to the LHS of the intelligence PDF at t.

(OK, 2 definitions and 1 law :-) FWIW, Tom Roche <Tom_Roche at pobox.com>