so_logo.png

Improved US Census Data Coverage Through Address Geocoding

The DOTS Address Geocode International (AGI) web service has been updated to return US census information from the latest version of data from the US Census Bureau. Additionally, the service has been updated to return this data more often compared to before.

While the main focus of our geocoding services is exactly that, geocoding, we appreciate the value that additional bits of information such as census data can offer. However, when using various disparate sets of data, we are unfortunately not always able to return every bit of information. Coverage from one set of data may sometimes be lacking, and contain holes where others don’t. Sometimes combining different datasets isn’t always possible due to large differences in schema and format. As a result, in our previous version of the AGI web service we were unable to return census information as often as we would have liked. We are proud to say that we have completely restructured how we import and query the census data so that we can now return this information much more often.

Working with the US Census Data

The US Census Bureau is in charge of collecting, producing and organizing the decennial census data in the US. While the data is collected every decade, it does take the bureau about a year to start releasing the various large sets of data that they have processed. Not all data sets are released at once, and the US Census Bureau does not stop releasing data just after that first year. They continue to release periodic updates to the data set each year. The 2010 census data, for example, continued to receive updates up until late 2019.

Previously, we would painstakingly import the US Census data by associating the spatial geocoordinate data with an address that is closely tied to the high-quality United States Postal Service (USPS) address data that we leverage from our address validation services, such as DOTS Address Validation 3. We were very proud of this method because it helped ensure address accuracy as well as precision. Using this approach, we were able to achieve census data coverage on approximately 94% of all USPS Delivery Point Verified (DPV) addresses. While 94% may be considered a solid A as far as grades in general go, you may be thinking, “why not 100% coverage?” Well, we’re glad you asked.

Census Data Coverage

While there are many reasons detailing why 100% coverage was not possible, it boils down simply to that not all mailing addresses are physical addresses. This means that not all physical addresses are mailable and conversely, not all USPS mailing addresses can be accurately represented by a US Census address counterpart. Take PO Boxes for example.

Another example of how mailing addresses and residential addresses differ is ZIP codes. ZIP codes are a product of the USPS used for the purpose of delivering mail. The US Census Bureau on the other hand aggregates ZIPs for statistical purposes, and therefore there is no one-to-one relationship between USPS ZIPs and the US Census ZIP Code Tabulated Areas (ZCTA).

While not all addresses may be residential, such as business addresses, we were still able to identify when they would fall within the scope of some census elements such as census tracts and census blocks. This helped boost census coverage but overall, the previous method of importing and processing the census data was quite cumbersome, time consuming and filled with many challenging hurdles that limited its usage.

Focused on Geocoding

Utilizing USPS address data works well in ensuring address geocoding matching accuracy, but it hinders census data coverage. Therefore, we made the decision to change the way we work with the US census data in a way that emphasizes coverage while maintaining a high level of accuracy. This is achieved by remaining focused on geocoding and disassociating the US Census data from the USPS data. Instead of address matching like we previously did, the US census data is now spatially queried using the geocoded latitudinal and longitudinal coordinates. By disconnecting the census data from the address data, we are free to query the data at any time with our resulting coordinates.

This means that we can return US Census information for just about any set of applicable coordinates, with the key word here being applicable. For instance, coordinates for a US military base on foreign soil would not be expected to return census information. While military bases are technically considered US soil, they are not within the purview of the US Census data. That’s not to say that military personnel living abroad are unaccounted for when it comes to the census. The Census Bureau will use administrative data from DOD to count them and any of their dependents living with them overseas at their home state of record in the United States.

Foreign soil aside, there are other reasons why census data may not always be returned, one of which being the resolution level of the geocode. Not all places can be accurately geocoded to a level of precision where it is reasonable and accurate to return census data using latitudinal and longitudinal coordinates. For example, if a city is geocoded then a centroid coordinate point for that city will be returned. Using that coordinate point to return a census tract and block would be potentially misleading if the census data returned were misinterpreted and/or misused. This is why the Address Geocode International (AGI) service will only return the census information when instructed to in the search parameters and when it is appropriate.

Overall, census data coverage has been increased and we are now able to return census information more often than before making the service more powerful, precise and accurate.