MapInfo Pro

Expand all | Collapse all

Denmark DTM – Building a trillion cell terrain raster

  • 1.  Denmark DTM – Building a trillion cell terrain raster

    Pitney Bowes
    Posted 09-12-2019 19:27
    Edited by Sam Roberts 09-13-2019 17:12

    This article contains data from The Danish Agency for Data Supply and Efficiency, The Danish Elevation Model, August 2019.
    The data was downloaded from the Danish Agency for Data Supply and Efficiency in July 2019.
    More information (in English) is available at  https://eng.sdfe.dk/product-and-services/the-danish-elevation-model-dk-dem/


    Introduction

    Denmark is a low-lying country with significant areas below sea level protected by man-made dykes and sea walls. For those who don't know where Denmark is, or think that Denmark and the Netherlands are the same place - Google Maps to the rescue.

    Denmark - sourced from Google Maps

    The image below shows the extent of dyke development in the South West of Denmark adjacent to the Wadden Sea. It is not surprising, therefore, that the Danes want to get a thorough understanding of their terrain and topography. To that end, they have flown at least two LiDAR surveys over the entire country to create high-resolution digital terrain models. I was given access to the most recent survey data which can be downloaded via FTP.

    Wadden Sea Dyke - sourced from Google Images


    In 2015, MapInfo acquired data from an earlier LiDAR survey of Denmark and we created a DTM raster in MRR format. This was a useful exercise at the time but the raster has a number of problems which I hoped to rectify using the new survey data in 2019.

    Both DTM (the ground surface) and DSM (the first return surface including trees, buildings etc.) rasters are available for download in TIFF format. The LiDAR data is also available for download at the same site, although I have not tried to download or process any of this data. Towards the end of June 2019, I downloaded the DTM rasters - a total of 668 ZIP archives, each containing up to 100 TIFF rasters. The total download size was 548 GB. The TIFF rasters have a horizontal resolution of 0.4 metres (40 centimetres) and with a size of 2500 columns by 2500 rows and provide coverage of one square kilometre each. They use the ETRS TM Zone 32, Northern Hemisphere (ETRS89) projection. The RL data is stored in metres as 32-bit floating-point real numbers which are able to represent data accurate to less than a millimetre vertically.

    My goal was to join all of these TIFF rasters back together into a single raster in MRR format. I wanted to remove any offshore data and I wanted to minimise the size of the final raster to make it as portable as possible. This article describes how I went about it.

    Processing challenges

    With a dataset of this size, the first challenge is disk space. I have a laptop with a small SSD and a slow HDD. I do not have room on either of these drives for the downloaded data or any of the downstream raster products. So, just about all processing had to be done from source external HDD to destination external HDD, using the SSD for temporary data whenever possible. Needless to say, this does not provide high performance and the first lesson is – get yourself a big, fast SSD. Secondly, I performed this processing on a laptop with a quad-core CPU. Whilst it is possible to work with huge rasters on modest hardware, do yourself a favour and get a decent desktop workstation.

    The next issue is how to efficiently extract all the TIFF files from the ZIP archives so that they can be processed. The TIFF files use no compression, so once they are extracted from the ZIP archive they explode in size. Consequently, it was necessary to process one archive at a time and clean up all the waste products as I proceeded.

    I decided to merge all the TIFF files in each of the 668 ZIP archives into a "naked" MRR – leaving me with 668 MRR raster files to work with thereafter. A "naked" MRR is an MRR that does not include an overview data pyramid or statistics. Computing statistics and building an overview pyramid takes time and consumes disk space, so if you are only going to use the rasters as intermediate products in a processing pipeline it can be advantageous to omit these processing tasks.

    To achieve this I use "RasterProcessor". This is a stand-alone program that taps into the raster processing engine and, following a script supplied in XML, executes processing operations. At one stage we toyed with exposing this in MapInfo Pro Advanced, but it was withdrawn. It is available in the Raster API – look for an API called "MIR_ExecuteProcess". My script, shown below, copies the ZIP file to a scratch directory, extracts the TIFFs, merges them together using the undocumented "Join" operation, moves the naked MRR back to external HDD and then deletes all the rubbish. After running this I can forget about all the TIFFs – instead, I have 668 naked MRR rasters containing all the unmodified source data.

    <?xml version="1.0" encoding="UTF-8"?>
    <RasterProcessing>
    
    	<RasterOperationList Primary="True">
    		<GatherFiles Mode="OneByOne">
    			<TargetDir>F:\Data\OpenSource\Denmark\DTM_2019\Downloads\</TargetDir>
    			<TargetName>*.zip</TargetName>
    			<OpListName>ProcessZIP</OpListName>
    			<Parallel>False,8</Parallel>
    		</GatherFiles>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="GatherAndJoin">
    		<GatherFiles Mode="GroupByAll">
    			<TargetDir>C:\Temp\DanishDTM\Working\</TargetDir>
    			<TargetName>*.tif</TargetName>
    			<OpListName>JoinTIFF</OpListName>
    		</GatherFiles>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="GatherDeleteTIFF">
    		<GatherFiles Mode="OneByOne">
    			<TargetDir>C:\Temp\DanishDTM\Working\</TargetDir>
    			<TargetName>*.tif</TargetName>
    			<OpListName>DeleteFile</OpListName>
    			<Parallel>False</Parallel>
    		</GatherFiles>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="GatherDeleteMD5">
    		<GatherFiles Mode="OneByOne">
    			<TargetDir>C:\Temp\DanishDTM\Working\</TargetDir>
    			<TargetName>*.md5</TargetName>
    			<OpListName>DeleteFile</OpListName>
    			<Parallel>False</Parallel>
    		</GatherFiles>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="ProcessZIP">
    		<ExecuteShellCmd>
    			<CommandData Command="Copy" Target="%FULLPATHNAME%" Destination="C:\Temp\DanishDTM\Working\%FULLNAME%"/>
    			<CommandData Command="Execute" Executable="C:/Program Files/7-Zip/7z.exe" Target="C:\Temp\DanishDTM\Working\%FULLNAME%" Parameters="e -y"/>
    			<CommandData Command="Delete" Target="C:\Temp\DanishDTM\Working\%FULLNAME%"/>
    		</ExecuteShellCmd>
    		<ExecuteOpList OpListName="GatherDeleteMD5"/>
    		<ExecuteOpList OpListName="GatherAndJoin"/>
    		<ExecuteOpList OpListName="GatherDeleteTIFF"/>
    		<ExecuteShellCmd>
    			<CommandData Command="Move" Target="C:\Temp\DanishDTM\Working\%NAME%.mrr" Destination="F:\Data\OpenSource\Denmark\DTM_2019\NakedMRRTiles\%NAME%.mrr"/>
    		</ExecuteShellCmd>
    	</RasterOperationList>
    
    	<RasterOperationList Name="DeleteFile">
    		<ExecuteShellCmd>
    			<CommandData Command="Delete" Target="%FULLPATHNAME%"/>
    		</ExecuteShellCmd>
    	</RasterOperationList>
    	
    	<RasterOperationList Name="JoinTIFF">
    		<Join>
    			<DestinationFilename>C:\Temp\DanishDTM\Working\%NAME%.mrr</DestinationFilename>
    			<Format>MI_MRR</Format>
    			<InvalidToEmpty>True</InvalidToEmpty>
    			<CellCombineRule>Last</CellCombineRule>
    			<RasterInfo>
    				<Name>DTM</Name>
    				<UnderviewMapSize>32,32</UnderviewMapSize>
    				<UnderviewTileSize>256,256</UnderviewTileSize>
    				<BaseMapSize>32,32</BaseMapSize>
    				<BaseTileSize>1024,1024</BaseTileSize>
    				<OverviewMapSize>32,32</OverviewMapSize>
    				<OverviewTileSize>1024,1024</OverviewTileSize>
    				<FieldInfo>
    					<Name>DTM</Name>
    					<Type>Continuous</Type>
    					<Compression>LZMA,3</Compression>
    					<OverviewCellCoverage>Coverage_Half</OverviewCellCoverage>
    					<ValidFlagPerBand>True</ValidFlagPerBand>
    					<BandInfo>
    						<Name>RL</Name>
    						<DataType>REAL4</DataType>
    						<StoreDataType>REAL4</StoreDataType>
    						<RestrictDecimals>False</RestrictDecimals>
    						<Clip>False</Clip>
    						<Transform>False</Transform>
    						<PredEncode>False</PredEncode>
    						<NullValueType>Mask</NullValueType>
    						<DiscreteValue>False</DiscreteValue>
    					</BandInfo>
    				</FieldInfo>
    			</RasterInfo>
    			<Finalisation>
    				<BuildOverviews>None</BuildOverviews>
    				<ComputeStatistics>None</ComputeStatistics>
    				<StatisticsLevel>None</StatisticsLevel>
    				<CreateTABFile>False</CreateTABFile>
    			</Finalisation>
    		</Join>
    	</RasterOperationList>
    </RasterProcessing>


    The next step is to merge these MRR files together. I did this twice – once to produce an MRR with statistics and overviews so I could visualise the raster, and once to produce another "naked" MRR which I would use for further processing. To perform the merge I used the "Join" operation via RasterProcessor, as before. Here is what the raster looks like once you join all the TIFF tiles together.

    Denmark DTM - Joined TIFF files


    The completed raster revealed the following problems –

    • Oceans. Where a tile intersects the coast, the cells over the ocean are valid and set to zero. So the raster contains billions of useless cells filled with zero. These cells skew the distribution statistics and provide no useful information. About 15% of all the cells are zero, and I wanted to remove them. The picture below shows the typical extent of ocean cells in the raster.

    Ocean data shown in blue
    • Patches. I discovered four TIFF tiles that contained data that was clearly not valid. I guess the LiDAR processing on these tiles went wrong and the failures were not detected. It was not easy to find these tiles – they are needles in a haystack. Two I found by eye, the remainder I found later by comparing this raster to the raster made in 2015, displaying the differential using virtual rasters. The picture below shows a bad tile - it contains data, but that data is clearly no good.

    A bad tile
    • Invalid Cells. The statistics indicated that there were a number of cells that had invalid RL values. The DTM range in Denmark is between about -8 and +173 metres. But lurking within this trillion cell raster were cells with values much lower and higher than these limits. Some of this data is valid – for example, there are some quarries or excavations that are substantially below sea level – but the majority of these cells must surely be invalid.


    Removing Oceans

    I want to remove all the cells that are "ocean" and I know they all have a value of zero. You can use a calculator operation to hunt down and remove cells but this has the undesirable side effect of removing all cells with a value of zero, regardless of whether they are the ocean or not. In this dataset, there are many zero value cells that are landlocked and should not be removed. I wanted a better approach.

    An alternative approach is to edit the raster by "flood filling" the areas of ocean and replacing the valid cells with invalid cells. This has the advantage that it searches for contiguous regions of the ocean and will not touch zero value cells that are not connected to the ocean. The first problem with a flood fill is that it has to be initiated somewhere (at a cell in the raster) - and to edit this raster would require many separate initiation points. These initiation points need to be located somehow. Secondly, the flood fill algorithm is recursive which is a euphemism for "can consume an unpredictable amount of resources". Acting upon a trillion cell raster, a recursive algorithm has plenty of scope to consume all your processing resources - and fail.

    The algorithm I developed works in two stages. Firstly, I process the raster to identify cells that are valid, cells that are invalid, cells that are ocean and cells that are zero but not ocean. This uses a flood fill algorithm that self-initiates and is designed to keep recursion levels very low. The output of this is another raster (at the same resolution as the original raster) where the types of cells are clearly identified. As this "mask" raster will be used in the second stage of processing, I can manually edit it before using it to eliminate the ocean. The final ocean mask is shown below.

    Ocean mask


    Manual editing was required to remove areas where the algorithm decided zero value cells were landlocked, but the human eye can determine that they are not. For example, this can occur when bridges are not removed from the LiDAR and two bridges cross a neck of the ocean. I used a raster editing test harness to manually and visually correct these issues in the mask raster. In the second stage, I then "applied" the mask to the DTM raster and removed all the cells that I had identified as the ocean. The picture below shows a typical error in the mask where two bridges cross a neck of ocean and in between the algorithm incorrectly identifies landlocked zero value cells. I corrected these manually.

    Mask error


    Correcting Patches

    For posterity, these are the UTM coordinate ranges of the patches that look to contain invalid data.

    (523000, 6142000) - (523800, 6143000), (523000, 6118000) - (523800, 6119000), (623000, 6133000) - (625000, 6134000), (533000, 6182000) - (534000, 6183000)

    I repaired these areas at the end of the processing by using the Merge operation in MapInfo Pro Advanced and taking advantage of its ability to edit an existing raster. I clipped out these areas from the older 2015 DTM raster and then inserted those four small rasters into the large DTM raster.

    Treating Invalid Cells

    A very small percentage of the cells in the raster – about 90 thousand out of a trillion - have invalid cell values. We know this from the raster statistics which report the minimum and maximum values of the raster. Finding those cells in a trillion cell raster is no easy thing to do. You cannot do it by visual inspection.

    I decided to find out where these bad cells are. To do so, I executed another Join operation on the naked MRR's but this time using data conditioning to reject all data between -7m and 170m RL. I was left with just those cells that are outside that RL range. As I expected the cells to be isolated and difficult to spot, I used a different method to generate the overview pyramid which ensures that any cells in the base level remain present in the overview levels. That way, you can see the rogue cells from far away and zoom in on them.

    I found that much of the data between -20 and -7m RL is likely to be legitimate. It includes some known broad areas that are quite low lying and other features like quarries and excavations that can be significantly below sea level. Similarly, some of the RL's up to about 173m are also legitimate high ground. But much of the rogue data is clearly illegitimate and wrong. I think much of it probably originates from a single bad LiDAR reading that then gets turned into a bad area of cells as it gets incorporated into the surface triangulation. Then again, some areas seem to have systematically bad data.

    I addressed these invalid cells by simply clipping them out using data conditioning. I rejected all cells below -25m RL and above 200m RL. This leaves behinds holes in the raster, which I did not attempt to fill or rectify. The image below shows a typical cluster of cells where the cell values are negative and invalid. The second image shows a region of cells with invalid high values - perhaps caused by surface water?

    Bad data - lows
    Bad data - high


    Keeping a lid on it

    The original TIFF files, stored in ZIP files, amounted to 548 GB. My goal is to end up with a smaller and more portable dataset and to achieve this I use the MRR format. Remember that an MRR will contain an overview pyramid which increases the amount of data in storage by at least 33%, so you start from behind.

    The primary way you can keep raster files small is to use compression. In this case, I used the lossless LZMA codec (level 9) to compress the cell data inside the MRR. Apart from compression, you can take advantage of the rich variety of data types MRR supports as well as the scaling and transformation properties for bands. You can also choose to sacrifice vertical resolution if you deem it to be appropriate. I chose to transform the data from 4-byte floating-point to 2-byte signed integer using a transform that reduced the vertical resolution of the data to 10 millimetres. As the data is being stored in the MRR as integers, I was then able to use predictive encoding. This transformation sacrifices no information but makes the data more compressible and at the end of the day you get a smaller file. With a horizontal resolution of 0.4 metres and a vertical resolution of 10 millimetres the final MRR file, including the overview pyramid, is 164 GB. I think that is a good result. The final DTM raster is shown below.

    Final DTM


    Summary


    By removing the ocean we eliminated billions of useless cells from the raster and this results in more useful statistics. Better statistics mean better rendering and better analysis. By patching the areas that were clearly bad and removing cells with crazy RL's, I made some attempt to improve the overall quality of the raster.

    The processing operations used – principally the Join operation and the automated flood edit operation – have now been tested in the heat of battle and may find their way into future releases of MapInfo Pro Advanced.

    Now that we have two rasters of Denmark (the first from 2015 and the second from 2019), we will be able to make comparisons to see what has changed over that time. To do this, I plan to use virtual rasters and I might report on that analysis in a future article.

    I have attached a DTM of Denmark that you can download, extracted from the 0.4-metre resolution raster. It is simply one of the overview resolution levels (level 6), extracted and written to a new raster. The horizontal resolution is 25.6 metres and the vertical resolution is 10 centimetres.



    ------------------------------
    Sam Roberts
    Engineer, MapInfo Pro Advanced (Raster)
    Australia
    ------------------------------

    Attachment(s)



  • 2.  RE: Denmark DTM – Building a trillion cell terrain raster

    Posted 09-16-2019 08:02
    Great article, Sam. Thanks for sharing.

    Maybe I should consider adding support for the MIR_ExecuteProcess in the MapInfo Raster Tool, that I work on.

    ------------------------------
    Peter Horsbøll Møller
    Pitney Bowes
    ------------------------------



  • 3.  RE: Denmark DTM – Building a trillion cell terrain raster

    Posted 09-19-2019 06:09
    Nice article Sam. Pushing the limits!

    ------------------------------
    - Stevemann4@gmail.com
    Knowledge Community Shared Account
    ------------------------------