Turning LiDAR Data into Actionable Insight

default rendering of DSM in QGIS
Default rendering of DSM in QGIS on top of OS StreetView

My pitch to Open Data Camp 3 is to demonstrate my work with LiDAR data showing how it can be used to provide insights which can improve efficiency in a variety of business sectors.

I really enjoyed the last Open Data Camp in Manchester with many fascinating sessions, in particular Andrew Newman’s thought provoking pitch “Let go of the O – it is just data”. However, for me, the highlight was Christopher Gutteridge’s pitch and demonstration “Lidar 4 3D – What would you do with it?” which was an eye opener as to the opportunities presented by this release of data.

A couple of weeks before ODCamp, Environment Agency released the largest UK open data archive published to date, LiDAR aerial data captured at 2m, 1m, 50cm and 25cm resolution, with varying coverage and time points in addition to a composite (aggregated) layer. In total 11TB of data was released for England.

The data was published as ASCII Grid Digital Elevation Models (DEMs) compatible with popular GIS systems as a Digital Surface Model (DSM) including buildings, vehicles and trees, and a Digital Terrain Model where these objects have been filtered out with software to estimate the ground elevation at the point.

Christopher’s talk fired my imagination as to the possibility of using LiDAR to derive building insight to solve problems in the insurance, telecoms and utilities sectors by estimating building metrics without the need to send a surveyor on site.

The first step was to download all the LiDAR data for my local area in Chester (10km grid square SJ46) and explore it in QGIS. Using a simple Python script, I then derived a Digital Height Model (DHM), the net difference between the DSM and DTM data for the same point, and mapped it.

Derived DHM of Chester Cathedral
Derived DHM of Chester Cathedral and surrounding area rendered in QGIS

Further research showed that the DHM is very useful as a threshold filtering tool to remove smaller objects such as cars, street furniture and small trees, leaving the more interesting buildings and other large objects to classify.

Having completed this research, it was clear that LiDAR offers huge potential to business. However, the biggest challenge was the sheer size of the data and the format in which it is supplied.

Storing it in a spatial database was one option, but following a test of the SJ area, it would have required 110TB of storage to hold both the 1m DSM and DTM Composite data for England with indexes, so I set about designing my own solution which would:

  • Run on out of the box commodity hardware.
  • Use lossless compression.
  • Be fast enough to facilitate bulk queries in batch on large datasets, for example Code Point Open, or (if licenced) OS AddressBase products at property level.
  • Facilitate queries with standard spatial objects of point, buffer, polygon and linestring.
  • Generate DEM data or raster images on the fly.
  • Be generic enough to handle other DEM and raster data sources.

The solution:

  • Convert the DEMs into binary arrays and compress with zlib in Python to store as 1km squares.
  • Use a MySQL database to hold the metadata relating to the collection, including, data type, resolution and missing value. It also holds a field list so DEMs can be combined. Origin coordinates are not needed as these are implied by the file name.
  • Use Ordnance Survey’s 1km grid system for the compressed file name as supplied.
  • Create a folder hierarchy on major grid squares.
  • Implement caching for batch operations (retain most recently accessed 1km squares).
  • Create a Python class to act as API to the repository.
  • Create functions for point and rectangle queries.
  • Create a function to filter a rectangle on a polygon boundary. It was quicker to do it this way on MBR, rather than call the API for individual points.
  • Machine spec:
    • Intel Core i7-4790K 4.00GHz 8MB S1150
    • 32 GB 2400Mhz DDR3 RAM
    • 512GB SSD (OS), 1TB SSD (LiDAR)
    • Ubuntu Mate Linux 15.10

This approach allows multiple DEMs to be combined, so in the case of the 1m Composite LiDAR data, I combined the DTM and DSM for the same grid square in a single array as pairs. In the small number of cases where one DEM existed, but the other didn’t, I filled it with the missing value.

In England, this technique compressed 1.3TB of raw DEM ASCII grid data to 202GB binary compressed.

The first test I did was to estimate the height of 1 Canada Square, Canary Wharf tower, using the postcode centroid from CodePoint, which came out as 236.634994507 mOAD at the highest point 9.81000041962 mOAD at ground level (226.824994087m net height). The official height is 235m according to the CAA.

I calculated a number of estimates for building heights in Chester, at different points, and compared these with known attributes from the planning department of my local council, Cheshire West and Chester, and performed an ANOVA analysis to test the accuracy which came out as 98.25% within 25cm on residential properties.

Following release of Natural Resources Wales equivalent LiDAR data, I created a separate repository for this. In dealing with overlapping squares, the software queries first the country the point is in, and then the other second, combining on the fly. I did consider merging the two sources, but observed some slight differences so decided to do it on the fly.

A polygon, or rectangle query returns a list with missing values omitted and DHM calculated. When rendered as CSV, the list looks like:

when rendered as CSV the list looks like

And looks like this when visualised:

and looks like this when visualised
Result of LiDAR polygon query on Cadastral Polygon

In processing bulk data in batch, the input file should be pre-sorted on geometry, but my experience is that postcode is sufficient with a reasonable size cache. Running the software on CodePoint Open, sorted on postcode, yielded an average rate of 1800 point queries/sec so the whole file of 1.7 million records was tagged in less than 20 minutes.

Ordnance Survey used Open Data Camp 1 in Winchester to launch their excellent Open Map Local and Open Roads products, which represented the first significant game changer in open data with the release of the first small scale vector data under OGL.

There is a generalised buildings layer in OS’s Open Map Local product which is good enough for most purposes as it provides a rough outline of the buildings. However, the generalisation algorithm often misses extensions and often combines detached buildings as can be seen below:

lidar versus OS Open Map Local
LiDAR versus OS Open Map Local, Garden Lane, Chester

This research led to investigating the use of LiDAR to improve the accuracy of the OS buildings layer which, when combined with Land Registry Cadastral Polygons (note these are not fully open data, but can be used internally within an organisation), yielded an opportunity to estimate more accurate building footprints. For example, being able to identify when one half of a semi-detached house has been extended, while the other remains as original.

By using filtering, edge detection and vectorisation algorithms on the LiDAR data, it can be compared with the OS building layer and the output used to slice and edit the OS layer.

Identifying bungalows is important for insurers and knowing building gutter line height is of interest to telecommunications providers installing overhead lines to properties. By having these parameters in advance, a telecoms engineer can despatch a cherry picker beforehand if necessary, eliminating a wasted trip.

Another application is to classify roof shape. Insurers are concerned about flat roof extensions while solar panel companies wish to target households with suitable roofs. To do this, I use a slicing technique to calculate the building footprint at net height intervals. Ultimately, I wish to generate a set of vector layers which show outline at predetermined net-heights which can be stored in a spatial database. This is the easiest way of classifying roof shape and proportion of flat roof areas. When calculating net height in this way, the reference point for the net height is fixed as the DTM value at the centroid of the object in question.

A test against a sample of known data again showed bungalow identification accuracy of around 96% although the software cannot (yet!) distinguish between regular and dormer bungalows.

For recreation, LiDAR has also been used by enthusiasts to generate scenarios for computer games such as Minecraft and OpenTTD.

OpenTTD scenario of Chester and Liverpool
OpenTTD scenario of Chester and Liverpool from LiDAR DTM

The storage and query engine works with any form of gridded data and I have successfully loaded OS StreetView allowing small area local maps to be generated on the fly or rough land use estimates taken by sampling the map colours in an area.

There are many more future enhancements I have identified:

  • Large tree detection near buildings (of interest to insurers)
  • Comparing LiDAR from adjacent years to identify unauthorised building work (of interest to councils)
  • Road carriageway width estimation by sample slicing using Open Roads (pavements can be detected by filtering in a certain range)
  • Investigate use of GPU computing (e.g. NVIDIA CUDA which I use in other work) to speed up batch processing

In summary, LiDAR data is a very valuable asset with many applications. I achieved my objective to make the data more accessible and combine it with other spatial data, to gain insights previously only available through expensive licenced products.

The author, John Murray. Photo by Anne Murray
The author, John Murray.

Image credits

John Murray photo, courtesy of Anne Murray.

All other maps and graphics produced using OS and Environment Agency data.
© Crown copyright and database right 2016
Licensed under the Open Government Licence v3.0