Data Sandbox

Expand all | Collapse all

What do you do with old data?

  • 1.  What do you do with old data?

    Pitney Bowes
    Posted 05-19-2019 16:45

    I was recently asked if we could provide historical risk information for an insurance application.  Typically the GIS data we provide is a static representation of the real world.  For example, every year we release a data set that represents US coastlines.  Users of this data get a new set of coastlines that they typically just drop in to their systems and overwrite the ones they previously had.  Insurers may keep old coastlines in an archive for historical rating processes but mortgage underwriters and others who use the coastline data just replace their current version with the most recent release.

    What's missing in the replacement process I just described is that it doesn't allow us to capture and summarize the changes in data over time.  If the coast line has moved inward 100 feet due to hurricane storm surge we don't capture that change in latest data set.  And unless we keep the old coastline on the system and do some form of change detection we can miss it.  Similarly, when a post code is split into two new postal delivery areas it can be hard to track what old polygon has been split.  We often deliver these types of changes in change log files and release notes but it is hard for user to integrate all of the new information without going through each change in a feature by feature manner.

    The really big advantage that is missed by not handling old data is the identification of trends.  If, for example, an insurer were able to determine that a particular part of the coast line is receding in a given direction they can price policies accordingly.  We have captured trends in demographic and geodemograpic data but not so much with physical features.  In the demographics and geodemographics data the trends are captured as statistical attributes.  Physical features are tougher because we don't tend to describe a coastline has moving 100 feet north-west as an attribute.  Trend information can help companies plan and for oncoming changes, knowing where the optical fiber is buried can help planners know where to deploy 5G resources.

    One solution is to just supply the old data sets and let people figure out how to identify changing trends in the data themselves.  But that is a lot of data to keep around and what methods will be used to identify changes?  Another is to pick trends and capture them with each release of the data.  This approach would save storage space and file management time.  Some data sets that we deliver actually have the temporal information included.  Our weather data, for example, has the date and time of each weather event recorded as attributes and records events from 1995.  Each cell tower record can have the date it was built or modified captured as an attribute.  But changes to physical features are harder to identify and describe.  Keeping records of the changes from one release to the next is ok but what if we want to see changes over a period of years that might span several data releases?  Our PSAP boundary data has shown a steady decrease in the number boundaries over recent years.  Are there easier ways to identify these trends and represent them in the datasets that we provide?

    One part of the solution is to have unique IDs for each geographic feature in the data.  If we can get the IDs to be persistent then we can quickly identify features that have changed over time.



    ------------------------------
    Lamont Norman
    Product Manager - GeoEnrichment and Telco
    Pitney Bowes
    Boulder, CO
    ------------------------------


  • 2.  RE: What do you do with old data?

    Pitney Bowes
    Posted 05-20-2019 10:08
    One way to analyze the trends would be using R to pull out the stats you're interested in and creating some graphics (box plot, map, etc.) to display them in a way that's easily understood. However to write the code to do that you need to have a general idea of what you're looking for; if the data you have doesn't include an intuitive way to identify what could be "trendworthy" it would be a tough exercise. Maybe even more difficult would be hosting all of that data to pull the stats out of. My computer definitely can't handle that much data. Definitely a good idea to have unique ID's for features so that we can more easily track changes/trends.

    ------------------------------
    Briana Brown
    Data Product Management & Marketing
    New York, NY
    ------------------------------



  • 3.  RE: What do you do with old data?

    Pitney Bowes
    Posted 05-20-2019 14:15
    One of the challenges of time series analysis like this is consistency in data. You have to know about consistency in data capture, consistency in sources, methodologies, processing etc etc. Any changes in these can impact the comparison you make across historic data.

    ------------------------------
    Andy Bell
    Director
    Global Data Product Management
    Pitney Bowes Software & Data
    Leeds, UK
    ------------------------------



  • 4.  RE: What do you do with old data?

    Pitney Bowes
    Posted 05-20-2019 14:54
    Agreed.  That is why I keep thinking about a modeled solution as the consistency and quality of the data diminishes.  We do, however, have some consistently good quality data for the last 15 years in several cases.  We'd have assign IDs to the older data records and work out a way to keep the IDs from changing.

    ------------------------------
    Lamont Norman
    PITNEY BOWES SOFTWARE
    BOULDER CO
    ------------------------------