There is a glaring need in the business world for a simpler way to unlock the value of data.
The speed, velocity and unrelenting proliferation of all types of data from an ever-increasing number of sources is increasing. Enterprises struggle with harnessing data to realize value and achieve the ultimate goal of monetizing their business data through bold, data-driven actions.
The key to enabling this kind of ambitious action, however, does not come from the sheer abundance of data. Instead, it comes from the linkage of all the right data to create a clear, complete and timely picture of the circumstances, customers or business issues you are trying to engage.
This linkage is what truly unlocks the power and potential of the data to be directly converted to business value and outcomes. This linkage is not without its challenges, however. It requires some key steps.
Step 1: Get Your Data into “Usable Form”
A Precisely report on data integrity trends illustrates the scale of the problem: A typical enterprise has, on average, 27 data sources currently integrated.
Of hundreds of C-level data executives polled, 82% said that data quality concerns are “very” or “quite” challenging, while 74% lack integration technology or services. Many data pundits define Big Data as “huge, overwhelming, and uncontrollable amounts of information.”
And numerous business surveys by top analyst and consulting groups conclude that only about a quarter to a third of businesses across all industries feel like they have mastered the data necessary to be able to monetize the intrinsic value of their data. Some see this as a way for business to differentiate themselves and they are not wrong.
I see this as a situation where the state of the technology is insufficient to the magnitude of the problem. More accurately, it’s the combination of technology and skills, otherwise thought of as capabilities, that are not up to the task. The technologies for solving this problem exist but there are too few, sufficiently skilled people that know how to work these tools to solve the market problem. Just let the sheer size and reach of the problem sink in and you will quickly conclude that this is a situation that is ripe for disruption.
Until disruption happens, however, what is to be done? How can any businessperson initiate the right activities to get their data into a usable form? Like many problems, there are three key points:
- The data must be translated into one language, a common lexicon. Each system that produces data often has a proprietary lexicon that creates a unique “Babel” for each organization.
- The various indicia or PII elements must be standardized so Bob, Robert and Bobby are all recognized as the same name. This must be done for ALL PII elements, not just first names, like some vendors do.
- Finally, it’s up to algorithms based in AI (the discipline formerly known as statistics before the Big Data hype) to probabilistically bring together the data records from all the sources into a group that together can be used to construct that Golden Record of full, complete, accurate and timely data about the entities you are focused on. Be those manufacturing parts, brand customers or financial entities.
Step 2: Standardize Your Influx of Data
The sheer number of data sources, each with its own formatting rules and lexicons is one key barrier to linking data to make it intelligible and valuable throughout the enterprise.
When bringing data together from multiple sources, there are many details to sort through to make sure each source is speaking the same language, in a figurative sense. Zip codes, for example, are a text field with numeric characters, and a leading ‘0’ – as in Massachusetts, 02482 – tends to create chaos in things like spreadsheets and other data stores that auto detect the field type.
Reaching consensus on business definitions provides another example. In the insurance industry, for instance, different regions, waiting periods, policy start dates and other variables often mean different sources have a different understanding for something as basic as what constitutes a sale. All these details need to fly under the same flag for business data to have the same meaning between sources, departments, channels and processes.
Data standardization is the second data quality issue that often confounds organizations dealing with a tsunami of incoming data. Names and addresses with seemingly infinite nuances are the classic example. But simply matching a Jon Smith with a Jonathan Smith when the names may – or may not – represent different people may present several negative implications downstream. This is true whether in marketing or any department that deals with sensitive, PII. Data standardization entails putting all data into a common format so that it can be consistently and properly compared.
Harmonizing different formats and lexicons from various sources, and standardizing names, addresses, emails and other information lead to the final, critical data quality step – the actual linkage of the data.
In most situations where there is a consumer and a brand – marketing, for instance – linking data means matching all data about a person to a record, and then knowing that person’s history, interests, likely behaviors, utilization and preferences so when the person appears on a physical or digital channel (store, website, call center, social, etc.) the brand is best able to deliver relevant content. Identity resolution essentially scores how close, or how far, records may be to one another – and determines if there should be a link.
Step 3: Structure Your Data to Support Decision-Making and Analysis
The proliferation of business data means that, more often than not, the business is no longer able to rely on a hard match key such as a social security or account number to know a person’s identity.
There are such initiatives that have sprung up over the years, such as Open ID 2.0, but they never last and legislation usually eventually kills those initiatives. Similarly, companies that do identity resolution often use reference files, which is too easy; they’re incented to overmatch because it boosts revenue, and eventually the sources of reference data dry up.
Statistical (AI) methods are needed to wade through the endless number of ways people can be represented. Importantly, methods often differ based on the business need for the data. A marketing attribution use case, for instance, will likely accept a less stringent or looser match than a use case that involves regulatory compliance or healthcare PII.
Whatever the intended use case, data must be put into a structure to support decision-making, support analytics, and be properly governed for however long a duration is required; data is never static, it’s a flow. More an art than a science, data linkage requires considerable skill to do it at the level a business needs to successfully leverage and monetize data.
Even when a solution is available, it often exacts a compromise that puts companies into risky situations. One compromise is to send data across the internet to a managed services of SaaS or traditionally based technologies, which puts the data at risk of exposure, even with encryption (China recently surpassed the US for the most powerful Super Computer in the world). The other is to rely on an IT-centered approach to data quality and governance that, while it may keep data in-house, creates bottlenecks associated with data and process siloes, conflicting project priorities and time-consuming configuration and resource shortages.
Take Back Control of Your Data
Complete data transparency and trust in data quality is possible when data quality and identity resolution steps are completed at the moment of data ingest (or within milliseconds), creating an unassailable unified record of either a customer or entity of value to an organization.
Complexity is shed when identities are resolved, in real time, when data is harmonized and perfected without having to leave the security perimeter, and accurate data linkage is completed on a company’s own first-party business and customer data within the enterprise database, wherever it exists.
Far too many companies struggle with data, or do not have their data completely under control or otherwise at a place where it needs to be to monetize the greatest asset they possess. There is opportunity for disruption for companies that step in to fill this glaring need. The need for data to be consistent over time and for it to represent the dimensions of the business in ways that the business expects is too important to let the opportunity pass. The time for a solution is now.
About the Author:
George Corugedo is CTO of Redpoint Global