MapInfo Pro

Expand all | Collapse all

Limits -in converting ecw/tif to mrr?

  • 1.  Limits -in converting ecw/tif to mrr?

    Posted 07-03-2019 01:40
    I am having some issues in converting large (~200GB) ecw/tif files to mrr.

    It's probably not a space or memory issue - here's the machine spec. There's 2TB free on the drive I was writing to.


    I had about 500, 1km X 1Km 10cm tiles to merge - this worked and created an MRR of 170GB at zip compression 5.

    Then I added about 1200, 20cm tiles and the process hung after about 1/2 a day after creating a 200GB file.

    I also tried to merge the 20cm tiles by themselves but this didn't work either - it crashed after the file grew to about 200GB. I also attempted to convert the merged 65GB ecw (Dimensions X: 465536 Y: 656977 Bands: 3) to an mrr by itself but this also failed.

    The outcome I need is to create a multi-resolution file that will cover an area of about 8000 sq m with some 10cm, some 20cm and some 80cm imagery. Does anyone have any suggestions for doing this in Mapinfo Advanced v17?

    Is this possible? if not I guess the only option is to split it into 2 or 3 zones but I would like to understand if others have done this and if I am simply doing something wrong.

    If the only option is to split/grid it -are there any options in the Raster merge workflow to do this automatically?

    Regards,

    ------------------------------
    George Corea
    Mangoesmapping
    ------------------------------


  • 2.  RE: Limits -in converting ecw/tif to mrr?

    Pitney Bowes
    Posted 07-05-2019 00:12

    Hi George,

    You have touched on many issues with this post so I will take the opportunity to make some comments.

    MapInfo Version

    Make sure you are using version 17.03. We are going to update this to 17.04 very soon, to fix some critical bugs. One of these related to merging ArcASCII format rasters.

    Memory

    You have a powerful desktop PC with an 8 core / 16 thread processor and 64 GB of RAM. To make sure the software is accessing the RAM, go to the backstage area and open the MapInfo Pro Raster preferences dialog. On the "Memory and performance" page set the "Memory Cache Size" appropriately. This setting controls the size of the raster tile cache – where all raster data is stored in memory. In general, it is best to set this as high as possible.

    The more raster data you can fit into memory, the less swapping of data to and from the HDD/SSD. However, if the rasters are simply too large to fit into memory then the usefulness of the cache plummets. The strategies for determining which tiles to swap in and out become diabolical – the best possible strategy in many raster applications is the worst possible strategy in general! So having lots of memory is good, but in your scenario (where your rasters are too large to fit into memory) it is not actually helping you as much as you might imagine.

    Also, not all memory consumption is tracked and so there will be memory consumed over and above the memory set aside for the tile cache. In some scenarios, this can be significant, and you are triggering one of them. The ECW/JPEG2000 driver uses a 3rd party library to read these files, and that library uses its own memory cache. The size of this cache can be significant and we do not exercise control over it. Additional unmanaged memory consumption like this could push your total memory consumption for the process into dangerous territory and you could run out of memory and crash the process. For this reason, you might choose to dial back the "Memory Cache Size" setting a notch, to provide some headroom for other memory consumption.

    Also, the size of the cache is determined by looking at your total system memory, so if you are going to run long processing operation that uses significant resources then it is a good idea to leave your machine alone and not run other processes at the same time.

    Parallel Processing

    There are some options for controlling parallel processing in the preferences dialog, but they refer to running multiple operations in parallel. If you run an operation (like Reprojection for example) then it will use multiple threads to take advantage of multi-core and hyper-threaded CPUs or multi-processor architectures. Some operations can do this very efficiently, others less so or not at all. Some customers actually complain about operations using all their CPU time so we generally limit the number of threads. In version 17.03 it is limited to 8. I have already increased this limit to 16 for the next major version. So if you monitor CPU usage on your 8 C / 16 TH PC you probably won't see it go above 50%. Note that threads running in a Core run about 4-5 times faster than a Hyper-thread. So when Windows reports 50% it really means more like 80% actual compute capacity.

    Compression

    Lossless or lossy? Speed (of compression) or size (of the file on disk)? When you are building very large rasters you really need to think hard about it and make an informed choice. It can means days of processing time and hundreds of gigabytes. Note that all compression codec's will decompress very quickly, so that is generally not an issue. But the performance when compressing is a different matter.

    As a rule, you should always compress raster data when you store it in an MRR. You mention you used ZIP level 5. For imagery, I advise you to use either PNG (lossless) or JPEG (lossy). ZIP is not the best choice – it is neither very fast nor very good. I never use it for anything. PNG generally gives better compression, although it is not fast. JPEG gives much better compression and if you choose a low compression level (0 – 3) then the loss of information and noise introduced will be very small.

    Converting ECW to MRR

    As a rule, don't do it. ECW, JPEG2000 and MrSID use lossy wavelet compression codecs. (They do support lossless compression as well but I think this is rarely used). When you convert an ECW to an MRR you push your data through two lossy compression operations – firstly to convert the source image to an ECW and secondly to convert the ECW to MRR. (I assume you will use the JPEG codec in the MRR. If you don't then the size of the raster will increase by anywhere from 2 – 100 X). Also, the second compression operation is harder because it has to compress the high frequency noise you introduced into the imagery in the first phase. In the end you have an image with two overlapping noise signatures.

    It's not ideal. If possible, it is better to leave these kinds of rasters as they are. But in your scenario, that may not be possible.

    Merging rasters

    I like to differentiate between three kinds of merging operations – "Join", "Merge" and "Stitch".

    A "Join" takes a set of rasters that are all in the same coordinate system, have the same cell size and are cell aligned and joins them together into a single raster. There is no interpolation, no levelling, little control over how overlapping data is treated (as there may be little or no overlapping data). Data is copied without modification.

    A "Merge" allows rasters to have different coordinate systems, different cell size etc. Interpolation will be used to populate the output raster cells and strategies are used to deal with overlapping data. This is what we offer in MapInfo Pro Advanced and you can also use it to "Join".

    A "Stitch" takes this a step further to adjust the levels between source rasters to ensure there are no level shifts at raster boundaries. It may also use a variety of feathering techniques to blend data at the edges of overlapping rasters to ensure there are no visible seams between them. For imagery, it may adjust source raster brightness and contrast to ensure this is consistent across the final raster.

    The rule I follow is to "try to minimise harm". In this context "harm" refers to introducing noise into the raster (via lossy compression) or changing raster values by using interpolation to effect a change of resolution, projection or position.

    You have imagery at 10 / 20 / 80 cm resolution. It sounds like you could "Join" these source rasters into three rasters – one for each resolution. If you can do so without interpolation and you can use a lossless compression codec, then so far you have minimised harm. To go further and to merge these three rasters into a single raster will probably do some harm by introducing interpolation. Maybe, that's just what you need to do.

    When you merge rasters with different resolutions then you generally want to preserve the highest resolution data. This can increase the size of the merged raster – in your case all the 20 and 80 cm data would be interpolated at 10 cm resolution and introduce a 4X and 64X increase in data volume respectively. In the "Merge" operation, when you output to an MRR, you can take advantage of the multi-resolution capability of the MRR format to mitigate this problem. It will create a raster at 10 cm resolution, but where a tile only contains 20cm data it will double the cell size and not interpolate down to 10 cm. Same for the 80 cm data except it is 8X. I can give further details on this if you need.

    An alternative solution - one that I prefer - is to keep the 10/20/80 cm rasters separate and use a virtual raster to merge them on the fly. This minimises storage requirements (no duplication or interpolation etc.) and also minimises processing time (because you do not have to run a merge) and minimises harm. The virtual raster puts the source rasters together on the fly as you need them, whether that be for rendering or processing. I have mentioned virtual rasters in this thread for a similar application.

    So, why not just use a virtual raster to merge all the original rasters on the fly? This is also a good approach and in cases like yours could be the best possible approach because your rasters are in ECW format and that's where they ought to stay. Unfortunately, the total number of source rasters is a problem. One ceiling is file handles – on Windows they are a finite resource and so you can only open so many rasters at once. The other ceiling is efficiency – you can have too many source rasters for the virtual raster engine to handle efficiently. Note that the GDAL virtual raster (VRT) has a solution for the file handle problem, but only at the expense of performance. On balance, I would say that you have too many source rasters and you do need to merge.

    Final comments

    I have seen another report of a client trying to convert a large ECW to an MRR and observing a crash. It is a random thing. Quite probably the issue is in the 3rd party code, so there may be nothing we can do about it. It usually works no problem, but sometimes it fails. I have never seen it fail myself.

    I would try to merge the 10 / 20 / 80 cm data into three separate MRR's first. Then try to merge the three together.

    Don't use ZIP compression for imagery. Use PNG or JPEG (level 0  to 3).

    It may help to split the job up so that you are merging a smaller number of source rasters, and then merge those rasters into the final raster. But the Merge operation is capable of processing many source rasters. We have thrown 50,000 TIFF files at it before and it has worked correctly.

    If you want to try the MapInfo Virtual Raster concept I can help you with it. In 17.03 it is an unofficial feature and subject to change.

    In case of emergency, we can break the glass. If you have problems with Merge crashing, there is an unofficial alternative "Join" operation which I can help you access. Also, there is another approach that may be "technically possible". In MapInfo Pro Advanced there is an operation called "Export Image". This uses the new raster rendering engine to render a raster out to an image - which can be an MRR. That doesn't help you, but what we may be able to do is to write a rendering algorithm (XML file) that renders all the original source rasters, and spit that out to an MRR. I have started posting about this hidden rendering algorithm capability here.

    Thanks for getting in touch - you are pushing the software towards its limits. I hope this advice helps. I am in your time zone so contact me at sam.roberts@pb.com if you want to try some of the other ideas I have mentioned or if you have any other questions.

    Regards,

    Sam



    ------------------------------
    Sam Roberts
    Engineer, MapInfo Pro Advanced (Raster)
    Australia
    ------------------------------



  • 3.  RE: Limits -in converting ecw/tif to mrr?

    Posted 07-07-2019 20:17
    Thanks for the detailed response. It's very valuable background to dealing with mrr operations.

    I'm splitting the work into three at each resolution to get 10cm, 20cm and 80cm mrr's and then I am planning on layering them in a workspace for upload into Spectrum. This enables us to access each one separately and also have a best available imagery base map.

    Do you know what the .pprc and .ghx files that are created are for - they have a different name to the .tab and .mrr? so the mrr is imagery_10cm.mrr but the ghx is imagery_10cm.mrr.ghx

    I used jpg, compression level 3 that brought a 179GB zip compression file down to 56Gb.

    Regards,

    ------------------------------
    George Corea
    Mangoesmapping
    ------------------------------



  • 4.  RE: Limits -in converting ecw/tif to mrr?

    Pitney Bowes
    Posted 07-07-2019 20:39
    An MRR is supposed to contain "base resolution" data as well as "overviews" which are data levels stored at a progressively lower resolution. When you display an MRR we grab data from a resolution level that is appropriate for the scale of the view, which means you can display rasters of any size in constant time. We also generate "underviews" on the fly which are interpolated to higher resolution from the base level.

    A .pprc file is actually an MRR file that just contains the overviews. We generate these for rasters that do not have overviews already - like TIFF files and other formats. If you are seeing a .pprc file associated with an MRR file (using the naming convention you mentioned) then it means you have generated an MRR without overviews. This is what I call a "naked" MRR. In general, you should not create naked MRRs.

    Unfortunately, we have made it very easy to create naked MRRs by accident. When you run any processing operation there will be a section of properties at the bottom of the dialog under "Output Settings". In here there will be a checkbox labelled "Display Output File". This not only controls display, it also silently turns on or off the generation of the overviews. Always tick this box "on" so you get overviews in the MRR. I have been begging for this functionality to be decoupled from this checkbox for years.

    If you have a naked MRR then the first time you open the MRR to view it, the .pprc file will be generated. This might take hours... Also, when you ship a raster, you should also ship the .pprc file, if it has one. Otherwise, it will be generated at the other end the first time you open it.

    The .ghx file is an XML file. If you display the raster in MapInfo Pro then the .ghx file will contain the rendering information. It might also contain statistics. For an MRR, the statistics will not be in the .ghx - they are stored in the MRR file.

    ------------------------------
    Sam Roberts
    Engineer, MapInfo Pro Advanced (Raster)
    Australia
    ------------------------------