Enhancing Data Quality: What’s New in Lead Validation – International

DOTS Lead Validation – International is a constantly evolving service that uses Name, Email, Address, Phone Number, IP Address and optionally Business Name to validate customers, contacts, leads and businesses, in order to give companies insight into the people and their data coming into their systems. Quite a few interesting new features have gone into our latest build, and we wanted to share them.

More Detailed Scoring for Matching Data

Lead Validation – International has recently enhanced its algorithms for associating names with emails and phone numbers, significantly expanding its ability to identify matches across a wider spectrum of cases. This development is important because it bolsters the reliability of our data-matching scores. Notably, it strengthens our ability to confirm matching data points, especially when they involve well-established email addresses and phone numbers.

One key improvement lies in the enhanced handling of partial name matches and abbreviation variations – for example, giving at least partial points for “Will B” matching to “W Bartely” or something like “William Bartely” matching to the email “” Input names, names identified in emails and phone contact responses can have many different variations, and it is important to look at them from all angles when considering the quality and accuracy of a lead as a whole.

With the improvements made to our matching algorithms, we’ve also introduced new scoring parameters to provide a more nuanced evaluation. In the past, points were awarded solely for a successful match. Now, our scoring mechanism incorporates two distinct components: points for the match itself and additional points for matching with a known reliable component, such as an email or phone number. For instance, in addition to a generic note like “IsNameEmailMatch,” you might encounter a more specific one like “IsNameGoodEmailMatch,” indicating a higher level of confidence in the match’s quality and therefore higher Quality and Confidence scores.

Detecting Garbage Data in a Business Name

The main new addition in this build is a helper tool for identifying bad data entering the service. We have been investigating ways to look at personal and business names for garbage data and are starting to apply those features to our services. It’s relatively easy to give thumbs up to good data but can be very difficult to differentiate bad data from unknown data.

Whether dealing with personal or business names, it is impossible to be familiar with every single name, especially when it comes to business names, which can often be highly unstructured and diverse. To assess data quality, we rely on our internal data and matching results. However, in cases where we lack sufficient data or cannot find matches, providing a definitive assessment becomes more challenging. It’s also not practical to assign a negative score to a lead simply because we lack information about the business. In the past, for unknown data, we would have to say we do not know and rely on other bad data points to help differentiate the lead. This build gives us a lot more opportunities to take away points for business names that look more questionable.

While the focus of this discussion is on Business Names, Lead Validation – International has already benefitted from enhancements to our DOTS Name Validation service, which has recently had numerous updates addressing the detection of garbage names. We have been working to detect consonant and vowel combinations that are unlikely to exist. When we improve our base services, Lead Validation – International automatically gets most of the benefits of these improvements. In this case, Lead Validation – International becomes better at differentiating between unknown names and bad names.

Business Names can be a bit more tricky than personal names, since they can be more freeform or even things like abbreviations. For example, “Nxgx” would not be a person’s name but it could very well be the abbreviation for a business name (although we would still consider it unlikely). Doing any sort of analysis on Business Names is also not going to be an exact science, however we are often dealing with probabilities, and if we are comfortable saying that something has a very high probability of being bad and a very low probability of being good, it is safe to mark it as risky.

Business Names are run through an algorithm that performs a number of unique tests to come up with a risk score indicating the suggested risk level of the given name. The types of tests we run include word combinations that do not make sense together in a business name, identifying high-risk words, consonant/vowel/special character risky combinations and repeated patterns of characters that seem risky.

Internally, there are many levels of risk, but the result will create either an IsRiskyBusinessNameHigh or IsRiskyBusinessNameLow note in the business component if it is determined that the name seems risky. The risk level can be reduced with matching data, i.e., if a business name is deemed high risk but the business name matches a good email or phone number, the risk level may be reduced to low or even removed altogether. If the risk level is reduced, a business note of RiskMitigatedWithMatchingData will be returned.

New Authoritative Active/Inactive Phone Line Detection

An important new feature in this build is the addition of new authoritative active and inactive flags, primarily for US Mobile phones. We should know if the number is connected or disconnected about 70% of the time, and can add or remove additional points to produce a more authoritative component score. The key new flag will be “IsDisconnected.”

Connected numbers received small bonuses, but for a number to be considered very strong we would still like to see matching data. On the other hand, a disconnected number can be given large penalties as it is unlikely to be usable. This update is crucial as it provides additional indicators to assess the viability of the phone number. Additional tests are also performed, and if the results  look good, it’s still possible to get a decent score. Checking for active/inactive numbers relies on our authoritative sources, like major phone carriers, maintain up-to-date data, otherwise results can be impacted.

A Re-introduction to High-Risk Countries

This is a feature that was introduced previously; however, it was noteworthy, and many clients are not taking advantage of some of the special features of it. This update allowed us to give penalties to certain data points if they were tied to higher-risk countries. For example, a good contact from Nigeria is still likely a higher risk contact than a good contact from Sweden.

Risk can be associated with the IP Address, physical location address, phone number or overall lead. Users can expect to see notes tied to the various components such as IsHighRiskCountryLead, IsHighRiskCountryAddress or IsMediumRiskCountryPhone. A small penalty is assigned for a medium-risk country and a moderate penalty is assigned for a high-risk country. Good matching data can still overcome any penalties assigned, so just being a lead in a high-risk country is not guaranteed to lead to a low score. With custom test types, clients can assign smaller or larger penalties here as well.

In addition to the country risk factors, clients can submit custom lists of countries they either want to Reject (massive penalties) or Accept (ignore the normal risk penalties). Reject or Disqualified Countries might be countries that the user will not do business with regardless of risk factors. Disqualified countries can get additional penalties in addition to normal risk penalties. Accept countries might be countries the user does not want to assign risk penalties to, such as cases where they commonly do business with these countries and are willing to overlook the risk.

Lead Validation – International is a continually adapting service. Please feel free to contact us to learn more about how this service might help you and your business.