Address Deduplication Using USPS Barcodes
When are two addresses actually the same? And when can you remove one of them from your contact database?
The answer isn’t as simple as it sounds. Suppose you have two addresses as follows: 429 East Figueroa Street, Apartment 1, Santa Barbara, California, 93101 versus 429 E Figueroa St Apt 1, Santa Barbara, CA 93101. Or that for only one of these two addresses, the street address and the apartment number are on separate lines of the address. Simple text or line-by-line comparisons aren’t going to work in this case.
However, the United States Postal Service (USPS) can come to the rescue here, thanks to its standards for delivery barcodes.
What is a barcode?
Barcodes are unique identifiers assigned to each deliverable address by the USPS. A set of digits between 00 and 99 are assigned to each address and then, when that number is combined with the address’ zip+4, a sequence is created to uniquely identify the delivery point. The complete barcode consists of a zip+4, a 2 digit code identifying the premise, and a checksum digit to allow barcode sorters to verify the zip, zip+4 and delivery point code’s correctness.
Barcode Example: 931011445011
|Zip+4||Deliver Point Code||Checksum Digit|
How barcodes help you clean up duplicates
In short, barcodes can be leveraged to help identify duplicate records in your address database. The uniqueness of the barcode helps to solve the age old problem of identifying duplicate data. Let’s go back to the example we mentioned above:
|Address A||Address B|
|429 East Figueroa Street||429 E Figueroa St Apt 1|
|Santa Barbara, California, 93101||Santa Barbara, CA 93101|
On the surface, these addresses seem very similar. They would both be deemed deliverable by the USPS despite their spelling differences. On one hand, you have Address A spelling out “East”, “Street”, “Apartment”, and “California”. On the other hand, Address B abbreviates these same fields. If you were to address an envelope with either of the spellings, it would reach the same destination.
As a human, looking at the two addresses above, it is easy to figure out that these two addresses are really the same delivery point. As a developer, however, figuring out that the two are the same is a nightmare without some sort of unique identifier. You would break these addresses into their component parts – address, address2, city, state, and zip – and then compare each field for Address A versus Address B.
If you came across any field that didn’t match up perfectly, you would assume the addresses were different and handle them accordingly. At this point it is easy to see that this approach is inadequate and would lead to the misidentification of the Address A/B example above. And even if you tried to write a smarter program, you would quickly discover that this a complex problem involving fuzzy matching, distance algorithms, and various other string comparison algorithms. If only there was a unique identifier that could be assigned to an address…
This is where Service Objects’ DOTS Address Validation products shine. On top of the validation of each input, every deliverable address is matched up with its USPS barcode. With these barcodes in hand, it is easy to compare two addresses without having to worry about spelling or standardization differences.
Mailing address input:
Service Objects’ return with barcode:
Detecting duplicate mailing addresses using the address’ USPS barcode is a simple, elegant solution to a complicated problem. If you’d like to try any of our address services, sign up for a free trial key and get your first 500 transactions free.