In a previous blog, we discussed the benefits of using DOTS Address Detective – International to detect a contact’s country. This blog will discuss some of the challenges surrounding country detection in more detail, as well as provide an overview on how we determine the best country from your data.
When trying to append a country to a contact, we have four main components to examine.
- IP Address
Each component must be carefully evaluated on its own merit before it can be used to help identify a country for the contact.
The Address component may represent a contact’s physical location or mailable address. It is the most diverse and complex of all the components. International addresses do not follow a singular format, language or standard. Each country has its own set of rules and standards, which can also make the storage of international addresses problematic for US-centric CRMs.
This also means that is common for a contact’s address to be incorrect and/or incomplete. Additionally, some businesses are not always interested in capturing a mailable address and only wish to store a contact’s region. Depending on who is entering the contact address and how it is being stored, it would not be unreasonable to expect this data to be flawed in more ways than one.
Knowing the country is critical to processing most addresses. It determines the address format, which is needed to identify individual address elements, which in turn are needed to identify a locality, postal code or region. With that said, our sophisticated data-driven algorithms are not dependent on completeness and allow for a wide variety of formats and languages.
If you think you can identify a country’s address, take our fun, short Country Quiz.
Similar to the DOTS Address Validation International service, the address component consists of Address Lines 1-8, Locality, Admin Area and Postal Code. The address can be entered entirely in lines 1-8 or in combination with the Locality, Admin Area and Postal Code fields. Address line order does not matter, and common mistakes like putting an address value into the wrong address field are detected and handled.
Not all countries follow the US city-state pairing format or the equivalent locality admin area pairing. Many international addresses do not include an admin area, which can make country detection difficult since many localities from around the world can often share names. Take Venice, for example, which can be found in separate locations of three different countries.
If no other address information besides the name Venice was made available, one would be left having to choose between these three countries. However, by making use of other contact data such as a phone number, IP address and/or email, the service can cross reference various datasets to better determine which country is the best match. Then again, if the locality was entered as Venezia, the Italian endonym for Venice, there would be less ambiguity and the country Italy would be the clear choice.
The phone component consists of a contact’s phone number(s). The format of a phone number is dictated by its country’s numbering plan. Some countries have their own numbering plan, while others share one. The USA and Canada, for example, share the North American Numbering Plan (NANP), whereas the UK and its crown dependencies share the UK National Telephone Numbering Plan. Most countries conform to the E.164 International Telecommunication Numbering Plan, which is published by the International Telecommunication Union (ITU).
The E.164 Numbering Plan
The E.164 currently provides five number structures (numbering plans) for international phone numbers:
- International ITU-T E.164-number for geographic areas.
- International ITU-T E.164-number for global services.
- International ITU-T E.164-number for Networks.
- International ITU-T E.164-number for groups of countries.
- International ITU-T E.164-number for trials.
Each structure has its own set of rules and requirements, but telephone numbers that conform to E.164, in general, will adhere to the following:
- The recommended maximum length for a telephone number is 15 digits.
- Telephone numbers will begin with a Country Code (CC).
- Telephone numbers will not include Prefixes and Suffixes
Country calling codes are published by the Telecommunication Standardization Bureau (TSB). Depending on which E.164 structure is being used the country code (CC) may vary between 1 to 3 digits or may be fixed to 3 digits. Country codes are followed by the destination number in accordance with the E.164 numbering plan. When storing a country code or an international (E.164) number, the number is commonly prefixed with a plus symbol (+) to indicate that when dialing the number, one must first dial the appropriate international call prefix to complete the call.
International call prefixes (also known as call out codes, dial out codes, exit codes or international access codes) are used to make a call from one country to another. The Prefix is dialed before the country code (CC) and the destination telephone number. Prefixes are not a part of the E.164 numbering plan and it is recommended to not include them as they can interfere with country code identification.
Making the Call
Suppose you have a contact in the UK with the following number saved in your CRM, ‘+44 123 456 7890 Ext. 123’, and you wanted to call this person from within the USA. To call them, you would dial 011441234567890, and then after you have been successfully connected you would next dial your contact’s extension of 123.
The table below shows how the prefix and suffix are not a part of an international number.
|Country Code||Destination Number
Now suppose that you wanted to call this contact again, but this time you are in Sweden and not in the USA. Instead of dialing the 011 prefix, which is shared by all countries in the North American Numbering Plan (NANP), you would dial 00 which is the prefix used by many countries in Europe.
At Service Objects, we understand that not all phone numbers will conform to an E.164 numbering plan and that many numbers will have missing country codes, which why our services make use of a wide variety of datasets and are flexible enough to intelligently identify a country.
IP address component
Not all companies capture a contact’s IP address, but when they do they are most likely capturing it via the web form the contact used to submit their information. The captured IP address and the location for that IP is often for the registered owner of the IP, so if the contact filled out a web form from their home computer then it is likely that the IP is for their Internet Service Provider (ISP). If they filled it out from their office computer, then the IP address may belong to the business or to the business’s ISP. IP based geolocation systems will commonly return a general location for the owner of the IP, which in most cases is the end user’s ISP.
There is often a misconception that IP based geolocation services will always return an end user’s exact location. For example, that the IP address assigned to a mobile smartphone can alone be used to pinpoint and track the phone’s exact location. This is simply not true. In most cases, IP based geolocation services will return the city and/or the metropolitan area for where the IP address is commonly served. Subscribers will generally be located within the serviceable area of their ISP, and so the IP based location can be used in confidence to identify the region of the end user.
Identifying anonymous users
If a contact used a Virtual Private Network (VPN) or Proxy connection, such as a Tor network, when filling out a form then that means that the end user’s true IP address was masked and it was not captured. Some users will make use of methods such as these to try and remain anonymous and prevent others from capturing their true IP address. These methods are not only used to mask a user’s true location, but they can also be used to make a user appear to be from somewhere they are not. This is commonly done to circumvent region locked sites and services, however not all VPN and proxy connections are used for this purpose. Many businesses make use of VPN and proxy connections to connect their employees, sites and services from various regions, including remote employees.
A service like DOTS IP Address Validation is capable of identifying proxy related IP addresses as well as IP addresses associated with malicious activity. By leveraging this data, the country detection algorithm can determine if the IP is trustworthy and if the IP based location is genuine.
The email address component uses the contact email address to identify where in the world the mail servers are located. The location of the mail server should not be confused with the location of the mail sender; after all, one of the benefits of email is that you can send and receive it from just about anywhere an internet connection is available. This means that a contact may not necessarily be anywhere near where the mail server is located and could potentially reside in an entirely different country. It’s also worth noting that the domain name, including the Top Level Domain (TLD), can be misleading.
For example, let’s suppose we have an email with a domain that consists of Spanish words and the TLD country code for Spain (ES), like: ejemplo@una_palabra_espanola.es
While the above example email address may appear to be for a contact for Spain, the company could instead be hosted or even located in another country, such as the USA. Another possibility is that the company is located in one country and has their email handled by a provider in another. It is quite common for businesses to outsource email duties to specialized email providers.
Some domains have mail servers located in multiple countries and regions and are not tied to a single location. So, email addresses alone cannot be used to accurately and confidently identify a contact’s country, as doing so would be too far-reaching. However, the country or countries for the email component can be used in some cases to help identify a single country when used in combination with other contact data.
Which country is best
As you can see, each contact component is carefully analyzed to the point where a country may be singled out for each one, but the next step is to now determine which country best represents the overall contact. By taking the countries that are related for each component and carefully weighing their relevance as well as cross-examining them we can in many cases successfully identify the single best country that best exemplifies the contact.
As previously mentioned, contact components like the Address and Email can result in more than one country. The country detection algorithm takes all possible countries into account, so even though a single component may not have a clear country winner, a best match can be found between all the components. Some components have a stronger influence than others. For example, the IP address and email address components do not have as much influence as the address and phone components since they are not always directly related to where a contact resides.
In general, the more complete the contact information is, the more the country detection algorithm will have to work with, choosing a best overall country. However, even when a few contact components are available, the service will still be able to make do with the information it receives.