This blog entry builds on a previous posting which gives an introduction to downloading and setting up Enterprise Designer and using the Query Spatial Data Stage.
The Find Nearest Stage is found within the Location Intelligence sub-folder in Enterprise Designer.
As you might expect, this stage locates the points of interest that are nearest to a given location. These can be points (a cell phone tower), lines (a road), or polygons (a park or playing field). The data you use can contain any of these types of geometries. The nearest here is calculated “as the crow flies” meaning it is a straight line between points as opposed to driving distance.
Find Nearest Stage
Here we dig deeper into the Find Nearest stage and take a look at an example of finding the nearest fire stations to a set of points. This could be used in the insurance industry where the points represent properties and the distance from the closest fire station is used to determine the fire insurance risk of each property. Other examples could be nearest bank ATMs to homes, restaurants within certain distances of hotels, and so on.
Finding the Fire Stations closest to an address
In this example I’m going to walk through how to use the Find Nearest stage to identify the three closest fire stations to each home that has an insurance policy. There are 16,832 house addresses on my list that need to be calculated in the Find Nearest stage and I am using the Pitney Bowes US Fire Station Data (Fire_Station_Bundle_2017_03) which has 47,150 records.
We’ll look at how to use Spatial Manager to make a connection to the data and then create named tables for use in Enterprise Designer. I’ll then set up a data flow to query for the closest fire stations, test and run the flow, and then display the results.
Step 1: Make the data available to the Spectrum Spatial machine
I need to make sure the data I want to work with is accessible to the Spectrum server. This could be on a share drive, or physically copied to the server (which would bring the best performance).
Step 2: Access my data through Spatial Manager by creating a connection
Once the datasets are on the machine where Spectrum is installed, you need to create a connection to the data in Spatial Manager. Named connections are stored in the repository. This is a way for Spatial Manager to keep track of the location of the data.
To create a connection:
- Go to Create > Connection from the main menu in Spatial Manager.
- In the Create Connection page, type in a name for the Connection and browse to the folder containing the data on the file system.
In this example I am creating file data source connections using file-based tables, but a database can also be used.
- Click Create to set up the Named Connection.
Once this is created, Spatial Manager can access the tables within the connection. You may have one connection per data set, or you may have one connection with all the data underneath. This is something that needs to be decided when setting up the data for use in Spectrum Spatial.
Full details can be found in the product documentation.
Step 3: Create a Named Table to use the data in Enterprise Designer
Now a connection is created, I need to create two named tables to allow the data to be used in Enterprise Designer.
To create a Named Table:
- Go to Create > Table from the main menu in Spatial Manager.
- In the Create Table page, choose the Connection you just created, and you will see the Connection location and Source Folder.
- Choose the dataset you want to use and create the Named Table.
- Optionally choose Settings for your table.
- Provide the name of the table and browse to the folder in the Repository where you want to store your named table.
- Click Create, and your table is ready to use.
We can take a look at the data in Spatial Manager to confirm. You can view a map preview of the data and display the first few rows:
And you can see an overview of the data in the Resource Info tab:
Here is more detailed information about creating Named Tables in the product documentation.
Step 4: Set up the dataflow
Now we have our data available as Named Tables in Spatial Manager, we can create our dataflow. This blog posting has details on how to launch Enterprise Designer and create a flow.
We will create a flow composed of four stages:
- Read Spatial Data – Accesses the data from the named tables to use as input for the Find Nearest stage. It reads the insurance policies one at a time, to search against the fire stations and find the closest ones in the next stage.
- Find Nearest – Calculates the fire stations closest to the policy holders using the data inputted in the Read Spatial Data stage.
- Sorter – Organizes the results of the Find Nearest calculation.
- Write to File – Outputs the results of the sorted calculation to a file. You should create a blank file (I’m using csv format) to capture the results.
Here is the flow:
Step 5: Configure the stages
a. Configure Read Spatial Data
As we want to know the closest fire stations to our full list of policy holders, we will use all the records from the US_Policies table. We can copy the path to the data from the Resource Info tab in Spatial Manager (circled in the screenshot) and paste it into the MISQL query. On clicking Verify you can see the column names and check you are using the correct dataset.
b. Configure Find Nearest
Browse to the US_Firestations named table because this is the dataset we want to use for the distance calculations against the policy holders. The stage will automatically suggest the X and Y columns from the table.
In the Settings tab:
- Set the number of returned records to 3 as this is the number of firestations I want returned. This could be any number but the larger the number the longer the calculation will take.
- Optionally limit the search distance. I am leaving this blank as I don’t need to filter by distance, but again this could be used to enhance performance in a large dataset.
- Keep the default setting to return the Distance for each record as this will be part of the results.
- I am not returning the Count or Index fields as I will be returning 3 records for each input value.
In the Output tab select any columns from the Fire Stations data that you want to see in the results table. I’m choosing some address information.
c. Configure the Sorter
The sorter stage allows you to order the results in the output file.
We want the results ordered by Policy Number, and the fire stations listed with the closest first.
d. Configure Write to File
First browse to the csv file you set up earlier to capture the results. Add the properties about the file such as Delimited record type, US-ASCII etc. Check the box to make the first row of results the header.
In the Fields tab you have the option of picking fields from the two named tables to return in the results and selecting the order they should appear in the output file.
As you add the fields, they are populated in the Sample area underneath so you get an idea of what the output file will look like.
Step 6: Process the flow
Once the stages are configured, use the Validate tool and fix any mistakes found. Then Run the flow.
You will notice that although the status is “running”, the totals are only updated periodically during the processing until the job finishes. Once completed the status will change to “succeeded”. This particular case of running the 47,150 fire station records against the list of 16,832 policy holders took 51 seconds and created a list of 50,496 results.
Step 7: Have a look at the results
Here we can see the list of policy holders, with entries per policy for the three nearest fire stations listed in ascending order.
I have added highlighting to a selection from the output csv to make it easier to see the groupings of 3 fire stations per policy holder. Due to the nature of the data some of the fire stations have the same “Name” because they are in the same area or department, but you can see from the address that they are indeed different physical stations.
Thank you for taking the time to read this post. Please leave your comments and questions below. I’d love to hear your suggestions for more topics to cover in the future. As always, you can start your own conversation about Find Nearest, or anything else for Spectrum Spatial in the community.