Posts Tagged ‘messy addresses’

Address Detective – A Deep Dive

Our DOTS Address Detective service is an address-based service looking to help clients clean up the most challenging of addresses. It works well as a secondary check to our DOTS Address Validation – US service, but is also capable of performing the same duties and can be used as a standalone service.

Address Detective is a utility service intended to house operations outside of the scope of normal address validation that can help in a number of different ways. This article will take a detailed look into its operations.

We will explore the three main operations of Address Detective:

  • The FindAddress operation uses extra datasets to link names and phone numbers to a very messy address to solve problems such as a missing house number.
  • FindAddressLines is a helper operation that can assist in cases where the user might not know what pieces of information they have, or perhaps these pieces are out of order.
  • Finally, FindOutlyingAddresses is an operation that aggregates alternative datasets to identify good addresses that are not within the USPS dataset.

First, however, let’s look at an address validation capability that serves as the foundation for Address Detective: the GetBestMatches operation of our DOTS Address Validation service.

Address Validation – The GetBestMatches Operation

It is impossible to talk about Address Detective without a brief dive into our Address Validation service. At its core, Address Detective builds off of our industry leading GetBestMatches operation, which uses a USPS CASS certified engine to validate, standardize, correct and append informational data points to US based addresses. Its response contains a corrected and/or standardized address consisting of Address, City, State and Zip that can be saved to a CRM or database, or set up for a piece of mail.

This response also contains several other outputs. One is a list of address fragments containing the various parts of the address, in case a user needs to know a specific part of the address – for example, whether it is an apartment number or the name of the street that is available. DPV and DPV Notes contain interesting information about the address from things like the deliverability of the address to if it is vacant, returning mail or is a business or residence. Corrections indicate anything that might have changed in the address from the input to the response, such as a city change, zip code change or street suffix change. A full list of possible outputs and descriptions can be found in the Operations section of our Developer Guide, under GetBestMatch.

GetBestMatches does have some ability to fix messy addresses. Some of these changes happen within the CASS engine and some outside of it. In the case of an address that has been changed outside of the CASS engine, there is a flag called IsCass that will be set to false to indicate this. Small changes are accepted by us, and these normally find their way into the next iteration of the CASS engine.

The most important thing for the Address Validation service, however, is that any address returned – whether CASS validated or not – is 100% accurate. So, rigorous testing is always done on all sides, and more likely than not drastic or dangerous inconsistencies will cause an address to fail validation. In addition, the dataset is still strictly linked to the USPS dataset.

These are the reasons that Address Detective exists. The operations below will explain how Address Detective can go beyond the capabilities of Address Validation.

Address Detective – FindAddress

FindAddress is the primary operation in Address Detective. With a reasonably clean USPS known address, it functions identically to GetBestMatches, returning a response object so similar that it is essentially interchangeable with that service. This makes it easy for clients to potentially build in a failover and call FindAddress using almost exactly the same procedure they used with the GetBestMatches operation.

Where this helps are cases where the address is either too different from the actual address or perhaps even incomplete. FindAddress has more leeway to make changes than GetBestMatches, but it also uses potential personal names, business names and/or phone numbers to cross-check against alternate datasets to make more drastic changes.

Take this address for example:

Taco Bell

821 N Milpas St

Santa Barbara, CA 93103

That will validate normally but if you only had something like:

Milpas Street

Santa Barbara, CA 93103

You would not be able to properly validate this address alone in an address validation service. However, by using extra pieces of information such as Taco Bell or (805) 962-1114 and cross-checking these against other datasets, we are able to identify that 821 N Milpas is a good candidate. In this case, FindAddress is able to move forward and successfully correct and validate the address and return that similar response as if the clean 821 N Milpas Street address had been given in the first place.

In some cases, even messier addresses will continue to be fixable. For example:

Milpaaas Str

Santa Bar

CF 93103 

This very ‘messy’ address could still be validated using the extra data points. The MatchRate score provided gives some indication of risk for addresses that need to be drastically changed, and this last example would have a lower MatchRate than the previous one.

FindAddress will continue to evolve with newer algorithms and datasets. Currently, it still tries to take a very messy address and find a clean USPS valid address from that. In the future, it should also be able to validate addresses outside of the USPS dataset as FindOutlyingAddresses does (more on that later).

Address Detective – FindAddressLines

FindAddressLines is a good operation to use if you are not sure which address components are which. There are a number of common examples, like data sloppily collected, stored incorrectly in a database, or perhaps corrupted when transfering from one system to another, to name a few. Address portions can be combined into one field or split up into their own. Instead of the normal Address, City, State, Zip Code paradigm, FineAddressLines allows for up to 10 generic address lines where data can be randomly added. The operation will analyze the components and identify the best candidates for a valid address.

For example, if you had:

Line1: Service Objects

Line2: C/O John Doe

Line3: Floor 5

Line4: 27 E Cota St

Line5: Ste 500

Line6: Santa Barbara

Line7: CA

Line8: 93101

It would correctly identify, validate and return a normal response for:

27 E Cota St STE 500

Santa Barbara, CA 93101

The response looks very similar to that returned by both FindAddress and GetBestMatches. If you were to reverse those lines, the operation still IDs the correct final address and returns a good final result:

Line1: 93101

Line2: CA

Line3: Santa Barbara

Line4: STE 500

Line5: 27 E Cota St

Line6: Floor 5

Line7: C/O John Doe

Line8: Service Objects

The operation will start to error out as data elements that cannot be properly identified are added. For example, if 123 Fake Street were added into the mix ahead of 27 E Cota St, it would identify as a candidate for a street address and cause a failure because it is not. At this point, the given data is deemed too dangerous to try and find a good result for.

That said, this operation solves a problem that is not uncommon for our users. It is not uncommon for databases or CRMs to get populated with bad data points, especially if a service such as Address Validation was not used on the front end to initially clean and parse the addresses. Extra pieces of information like C/O John Doe and 5th Floor that are not important to the validation of the address can actually confuse things if they are stored with the address. Trying to deal with these data points without the help of Address Validation can easily lead to a corruption of data.

Address Detective – FindOutlyingAddresses

FindOutlyingAddresses has the same basic core purpose as the first two operations: find and validate the given address. However, it has a very different response from the previous two operations. The addresses it serves are addresses that are not found in the USPS dataset. They are pulled from datasets aggregated from many different data sources.

Throughout the United States are pockets of areas that do not receive direct mail, such as extremely rural farm houses that would not be cost effective for the USPS to service, or even towns like Summerland and Avalon in California that are General Delivery areas. Mail goes to a central post office instead of being delivered to a door. This means that USPS does not need to service them and may not track their addresses.

Other companies like FedEx and UPS may still do deliveries to these locations, so it is important to know if they are good. It may also be important for non-shipping reasons to know if a location is likely to be good: for example, a fraudster may make up an address to get past a website checkpoint, or a valid user might be blocked because they are in a location that is more unknown. In either case, knowing more about the locations helps identify both potential cases.

FindOutlyingAddresses helps to solve these problems by dipping into datasets outside the USPS to identify these challenging locations. The data is not as complete nor necessarily as authoritative as USPS data, so the response is a best attempt validation and standardization of the given address. Address, City, State and Postal Code are returned. Level indicates how close we were able to get to the desired address.

The possibilities for a Level response are:

  • USPS
  • Premise
  • PremiseInterpolated
  • Street
  • PostalCode
  • City
  • State
  • NotFound

This is one of the most important parts of the operation, as it gives some insight into the likelihood that the address exists. USPS or Premise indicate that the actual address was found in the primary dataset or at least identified as good in one of the aggregated datasets. PremiseInterpolated suggests we did not find the address but know other addresses around that one are good. Street means we have identified the street as existing, and so on.

This kind of response provides obvious value even if the true address cannot be identified. Notes and InformationComponents allow extra information to be returned about the address, however, these are mostly future expansion fields at the moment. Two possible InformationComponents are CountyName and CountyFIPS to return some county information about the address.

This operation has the best direct synergy with our Address Validation service, as it can be a direct call after a failed address call. It does not need new pieces of information and it is not a result of corrupted starting data.

Hopefully, this blog gives a deeper understanding of the operations currently available in DOTS Address Detective. We look forward to continuing to enhance its capabilities to solve even more challenging address problems, as well as adding new operations to solve problems still to be identified.

Please contact us if you want to learn more about the service.