Data Sandbox

Expand all | Collapse all

Measuring the Maturity of Datasets

  • 1.  Measuring the Maturity of Datasets

    Pitney Bowes
    Posted 04-10-2019 16:18

    For the past couple of weeks I've been investigating ways to measure the relative maturity of POI datasets from country to country, and am looking for some input from the community… Would you expect to find a correlation between the number of POIs and other types of information, like population, or the length of the street network, or something else? Would you expect this correlation to be different for developed countries vs. developing countries? And does anyone have thoughts on what measurable properties of a POI dataset should be considered when assessing the maturity of the dataset?



    ------------------------------
    Colleen Reed
    Knowledge Community Shared Account
    ------------------------------


  • 2.  RE: Measuring the Maturity of Datasets

    Pitney Bowes
    Posted 04-16-2019 15:39
    I think the measures could change depending on the type of POI you were evaluating.  Perhaps there's a correlation between clusters of hotels and airports within a certain driving distance...how about an expected higher percentage of grocery stores in suburban neighborhoods?  Will you find more pharmacies near retirement communities?

    In some scenarios population density could be a good marker...both in developed and developing countries.   We have found that there's an expected number and type of neighborhood coverage based on population density...ex. for a city the size of X with a density of Y we'd expect 70% of the population to be within a named neighborhood region, where as one the size of A with a density of B we'd expect 90% of the population to be within a named neighborhood region.  For POIs perhaps a city of X size with Y density has Z POIs in a given geography (this may vary by country, region, etc).


    ------------------------------
    Cecily Herzig
    PITNEY BOWES SOFTWARE, INC
    Maitland FL
    ------------------------------



  • 3.  RE: Measuring the Maturity of Datasets

    Pitney Bowes
    Posted 04-18-2019 09:53

    This is an interesting question. Maturity will depend largely on from where you are looking and what your intended use case is. You could expect POI data to be richer in a country whose data universe is more developed. i.e. the number of data product types available in a particular country, full postcodes, certain demographics, streets, full set of boundaries, mobile trace etc. But maturity will also be measured on the completeness and coverage of a countries POI data set. The quality of the geocoding available in the country and also currency i.e. how quickly do new stores and closed stores get reflected in the data. Completeness will always be a hard one to measure as the real world is constantly changing and reliable real world counts are often hard to come by.



    ------------------------------
    James Morgan
    UK Data Development Manager
    Pitney Bowes
    Henley-on-Thames, UK
    ------------------------------



  • 4.  RE: Measuring the Maturity of Datasets

    Pitney Bowes
    Posted 04-19-2019 08:23

    Thanks Cecily and James. I am trying to assess the relative POI dataset maturity for ~150 countries. Comparisons to the real-world population of POIs would be ideal but this information is not readily available. For each dataset, I have looked at total POI counts relative to each country's population, length of street network, land area, and population density, and am not seeing a clear trend.

    I suspect I will need to break down each POI dataset and look at a subset of POI categories, as well as the POI distribution within urban vs. rural areas.



    ------------------------------
    Colleen Reed
    Knowledge Community Shared Account
    ------------------------------



  • 5.  RE: Measuring the Maturity of Datasets

    Pitney Bowes
    Posted 05-16-2019 17:42
    Hi Colleen - a couple of possible approaches to measuring completeness, but I don't know if there are any reliable rules of thumb (# of POIs per capita or per mile):

    One approach could be to measure brand (McDonalds, Macys, Exxon, etc) completeness by country, as that information is more readily available than 'Mom-and-Pop' chains.  Its free or relatively inexpensive to get listings of brands  (for example, all the McDonalds in Canada) and then compare that to your target POI dataset.  You could do that for some key brands that way and get some reasonable macro-completeness estimations at a country or state level that way without a lot of fuss.  Using AmazonMechanicalTurk you can also do larger samples including Mom&Pop locations without too much difficulty.

    Another approach would be to do a survey of selected geographies either using field visits in reality (for example, using a service like gigwalk or field agent) or by comparing to a proxy reality(for example, using recently collected street level imagery).  That would allow an assessment of micro-completeness and if you spatially distributed your samples sufficiently then you would get a sense of the quality that way as well.

    One thing to watch out for with POIs is that they are quite volatile in reality (stores open, go out of business, move, change locations, change names with dizzying frequency) and thus the datasets can decay rapidly so your measurements likewise will get stale quickly.

    Thanks
    Tom

    ------------------------------
    Tom Gilligan
    Pitney Bowes
    White River Junction, VT, USA
    ------------------------------



  • 6.  RE: Measuring the Maturity of Datasets

    Pitney Bowes
    Posted 05-18-2019 10:26
    Hi Tom - thanks for proposing a few additional approaches. I very much appreciate this, as I'm still working to figure this one out. One of the bigger challenges is that I am trying to develop a scale that shows relative maturity of POI data from country to country, and the data covers ~150 countries. Because of this, I have to consider the practicality and repeatability when developing the scale.

    I am now looking at a subset of POIs (for example, using only business POIs instead of the full set, which includes things like landmarks - and I may need to further segment the data by business category), and experimenting with weighing the data based on the maturity of the street network in each country...

    ------------------------------
    Colleen Reed
    Pitney Bowes
    White River Junction VT
    ------------------------------