Skip to content

read function for LAS data

11 messages · Michael Sumner, Alex Mandel, Etienne Bellemare Racine +1 more

#
Hello,

I'm looking for interest in this functionality and eventually working
it into a package.

I don't actually use LAS data much, and only have one example file so
I'm hoping others who do or know others who would be interested can
help. I have previously posted this to r-spatial-devel.

R's support for the "raw" type provides a very neat way of reading
data from binary files, such as LAS files. These files have a header
containing metadata that is used to specify reading from the bulk data
in the file. Essentially, you can read large sets of records at once
into a raw matrix with readBin and then use the indexing tools to
extract the relevant bytes for each element of the record. This means
that the bulk read is fast, and processing is also vectorized as the
raw matrix can then be read in index pieces by readBin.

I've done this in a simplistic way in the attached functions

  o "publicHeaderDescription"  - function returns a list generated to hold
      information about the header

  o "readLAS" - read file function


It works for a LAS 1.0 file that I have (see lasinfo summary below)
and returns a matrix of record values.  Certainly it works fast and
efficiently enough for smallish files. I really don't have experience
with the likely range of sizes of these files.

The remaining work I see involves:

 - write as a "chunked" read so large files can be processed in parts,
    not as one big slurp
 - return only desired record values, and / or subset of records
 - extract all components of the records, and generalize for LAS 1.1, 1.2
 - re-organize the header read and arrangement (what I've done was
    easy enough for me to understand, but it's a bit of a mess)
 (- no doubt many more general improvements I haven't thought of)

To use:

source("http://staff.acecrc.org.au/~mdsumner/las/readLAS.R")

## I cannot provide LAS data, this is just an illustration

f <- "lfile.las"
## [1] 53.83393 Mb

file.info(f)$size/1e6

system.time({
  d <- readLAS(f)
})
# user  system elapsed
#   1.01    0.21    1.21

dim(d)
# [1] 1922632       5

colnames(d)
#[1] "x"         "y"         "z"         "gpstime"   "intensity"

Optionally the list of header components (some extracted as actual
data, some left in their "raw" form) can be returned instead of data
using "returnHeaderOnly=TRUE".

###
## example: simple plot with rgl
library(rgl)
scl <- function(x) (x - min(x))/diff(range(x))
plot3d(d[,1:3], col = topo.colors(256)[scl(d[,5]) * 256])

I hope this is of interest.

Cheers, Mike.




lasinfo output on "lfile.las"
---------------------------------------------------------
 Header Summary
---------------------------------------------------------
 File Name: lfile.las
 Version:                    1.0
 Source ID:                  0
 Reserved:                   0
 Project ID/GUID:           '00000000-0000-0000-0000-000000000000'
 System Identifier:         ''
 Generating Software:       'TerraScan'
 File Creation Day/Year:    0/0
 Header Size                227
 Offset to Point Data       229
 Number Var. Length Records 0
 Point Data Format          1
 Point Data Record Length   28
 Number of Point Records    1922632
 Number of Points by Return 1572923 324305 24659 740 5
 Scale Factor X Y Z         0.01 0.01 0.01
##  etc...
#
On 06/03/2010 07:54 PM, Michael Sumner wrote:
I think there are people who would use it. You might want to have a look
at http://liblas.org/ (some of the same people that do gdal/org work)
Wrapping this library might be a good approach. There are example files
available too.

Thanks,
Alex
#
Thanks Alex, I will eventually post this to a broader audience.

I've used liblas and lastools, but the aim here is for a pure R
implementation that is built directly from the LAS specification
without 3rd party tools.

The R code already works quite well to extract x/y/z/time/intensity,
it just needs some extra work to tidy up and generalize things and
ensure that very big datasets can be read.

Cheers, Mike.
On Sat, Jun 5, 2010 at 6:07 AM, Alex Mandel <tech_dev at wildintellect.com> wrote:
1 day later
#
This is interesting, I'll try your code on my lidar files in the next 
few days.

  2010-06-04 22:36, Michael Sumner wrote :
What might be of interest in using liblas is that it provides support 
for many las versions and they plan to provide support for some versions 
to come (conditional to funding) so having an R binding might be of 
interest here. They are also working on the integration of a spatial 
index which would allow easier handling of large files. I must say I 
don't know how hard writing a wrapper for R might be for that particular 
tool.

To other issue I see here is that R is loading the whole file in memory, 
so if you can manage small files, that might not be that easy with 
(standard) larger ones. Don't you think ? Did you give a try to the R 
SAGA package. There is a module for loading las files but again, I don't 
know how it manages memory. I guess that it could be possible to use 
some sort of ff package to handle bigger files, but that's just on the 
top of my head.

Etienne
#
Hello,
That was certainly true of the version of code I posted, but writing a
more flexible version is not difficult, and actually less difficult
than I expected. I've implemented arguments to "skip" and read "nrows"
at a time, so there is the beginnings of a wrapper around the core
read for building more flexibility.

(I was thinking of including subsetting of various kinds which really
makes it more complicated, and the appropriate level to handle that is
in a wrapper to this function).

I've updated the R source on my site, and here's a new example. This
should be considered as a rough working draft, the details can be
hidden in the final suite of functions. My chunk/rows handling is
pretty awkward, and may have bugs for particular record numbers.

Any testing you can provide would be greatly appreciated.


# new version with "skip" and "nrows" arguments
source("http://staff.acecrc.org.au/~mdsumner/las/readLAS.R")

f <- "lfile.las"

## get just the header
hd <- readLAS(f, returnHeaderOnly = TRUE)

numrows <- hd$`Number of point records`
## [1] 1922632

## read in chunks, and pass to DB or ff, or subset by sampling, etc..
rowskip <- 0
chunk <- 1e5
rowsleft <- numrows

system.time({

## keep track of how many rows we skip, and how many are left
for (i in 1:ceiling(numrows / chunk)) {
	if (rowsleft < chunk) chunk <- rowsleft
	if (chunk < 1) break;
	d <- readLAS(f, skip = rowskip, nrows = chunk)
	rowskip <- rowskip + nn
	rowsleft <- numrows - nn
}

})

#   user  system elapsed
#   1.10    0.55    1.64







On Sun, Jun 6, 2010 at 8:09 PM, Etienne Bellemare Racine
<etiennebr at gmail.com> wrote:
1 day later
#
On Jun 6, 2010, at 5:09 AM, Etienne Bellemare Racine wrote:

            
and controllable point caching, reprojection on the fly (when linked w/ GDAL), coordinate system description, and LAS 1.0-1.3 (no waveform for 1.3 though) support.  I would note that contrary to what the LAS specification says, there are many softwares out in the wild that don't write LAS files properly.  This causes big hassles, but it's libLAS' job to ensure that it can read everything, so we bend over backwards to ensure things work while doing our best to write valid files.

I maintain an LAS sample library at http://liblas.org/samples .  They vary based on format types and the softwares that wrote them as this is what is most interesting to libLAS.  They are free for non-commercial usages.

I understand the desire to not have external dependencies, which is the impetus for efforts like this.  It is unfortunate that distribution and platform issues make duplication of efforts like these economical in the short term.

Howard
#
Thanks Etienne.

On Tue, Jun 8, 2010 at 12:31 AM, Etienne Bellemare Racine
<etiennebr at gmail.com> wrote:
#
Thanks Howard, do you know of people working with LAS data in R, or
any efforts to link to liblas? I'd be keen to get involved eventually,
but it's not an area I have particular strengths. I've noticed the "R"
data format driver in GDAL (for reading/writing simple arrays in R's
save() format), so there must be some in that community using R in a
fairly intimate way.

BTW, I don't want this example to be seen as some kind of snub to the
efforts of projects like liblas, I'm really interested in the fact
that it *can* be done - not necessarily pushing that it should be
done. The impetus here was really my interest in alternative methods,
not just avoiding dependencies.

Tools like this can be helpful for basic data rescue when a full
solution is not at hand. There are endless varieties of raw binary
formats out there and very few have a commited team like liblas or
GDAL working to help.

Cheers, Mike.
On Tue, Jun 8, 2010 at 6:17 AM, Howard Butler <hobu.inc at gmail.com> wrote:
#
On Jun 7, 2010, at 7:41 PM, Michael Sumner wrote:
I didn't take it as a snub, but rather I was firing back with a warning at how deep this LAS rabbit hole actually goes :)  After first reading the spec, I thought, "oh a nice simple binary format, I can write code for that."    It seems everyone else thought the same as well, and we all came back with different answers.  Over time, I hope the bigger vendors pick up libLAS internally (one or two large commercial vendors and a number of mid-level ones now use libLAS), so the variations in data start to go away.  I've given them every possible reason to take it up too -- BSD license, loads of features, multiple APIs, multiple platforms, not slow.
#
Thanks Howard, I appreciate the perspective.
Heh - yes indeed this is an attractive pit.

Cheers, Mike.
On Tue, Jun 8, 2010 at 11:51 AM, Howard Butler <hobu.inc at gmail.com> wrote: