Geocoding

Combining Datasets with pbKey

  • 1.  Combining Datasets with pbKey

    Pitney Bowes
    Posted 11-12-2018 09:27
    Edited by David Bokor 11-14-2018 08:01
    It's now easier than ever before to collect data and extract value out of it. Let's say you have a customer list of thousands or even millions. You know you can offer these customers more if you could get to know them better. But how do you get started?It's now easier than ever before to collect data and extract value out of it. Let's say you have a customer list of thousands or even millions. You know you can offer these customers more if you could get to know them better. But how do you get started?

    The first step is collecting data. Besides Pitney Bowes, there are several sources of good data available (e.g. MLS, Black Knight, Core Logic, Experian, Acxiom). Most of these datasets are based on addresses - for a specific address they may provide dozens of attributes. Demographics, property attribution, risk attributes, and similar types of information can help you get a well-rounded picture of your customer. Addresses are a logical choice since it's often easier to understand a household than an individual.

    The next step is combining these datasets to build a single record for each address that has all the variables we need. We'll quickly find that since this data came from different vendors, none of the addresses match exactly - one dataset says "210 Main Street" but another says 210 Main St.". We need to do a little extra work to match these properly.

    Traditionally, we've used a geocoder to standardize addresses. This can get you most of the way there but there is still a major problem to overcome. What do these addresses have in common?

    9251 MESA DRIVE, HOUSTON, TX, 77028-1664
    9251 MESA ROAD, HOUSTON, TX, 77028-1657

    Both are, in fact, the same address. How can we match up addresses if they don't even look alike? Of course, I wouldn't be writing this if I didn't have a solution. Pitney Bowes has done the hard work of creating a unique ID for over 190 million addresses in the country. This key does not change over time - if the house is torn down to build two smaller houses, that key is retired. In our scenario above, all four addresses are matched to the same pbKey: P0000M3K6X29

    Hopefully you can already see how this can make it easier to join these datasets. We run all the addresses from each dataset through the geocoder to get the pbKey for each address. We join all the datasets based on this pbKey value and create a single, master record for each address. We can even do this in Hadoop where millions of addresses can be processed in just minutes.

    Not only can we understand our current customers better, we can use what we know about them to find new customers with similar attributes. Our best customers are baby boomers making under $100k per year? Let's do a mailing advertisement to target similar households in other neighborhoods. Now that all this information is in one place, it's much easier to unlock the value and grow your business.

    ------------------------------
    David Bokor
    Distinguished Engineer
    Pitney Bowes
    Troy NY
    ------------------------------