Someone asked me the other day what the business case was for classifying data. For "data pros" or anyone that has engaged with data to perform analytics or produce business intelligence reports, this may seem like a silly question. However, I've found that there is a large number of people who really don't understand the importance of data classification.
So when does classification become important? It becomes important when data leaves the application that gave it context. For many of our customers this occurs when data leaves the transactional ERP type system, and is moved into a data warehouse or a data lake, whose purpose is to provide access to data from multiple sources.
Traditionally, this movement from transactional to a more generally accessible repository came with a level of curation. Prior to the concept of the “Data Lake,” data was moved into the data warehouse with the goal of making it the “single source” of truth. This often involved significant levels of data stewardship and curation to reconcile conflicting versions of “truth.” With the growing awareness and adoption of analytics, the idea of a stable concept of “truth” is elusive. The right data for an analyst is context driven, and at times highly variable. The Data Lake construct addresses this issue by allowing all data to be loaded so that the user can determine what data to use based on the decision context at the time. This is what data classification enables. Well classified data can be discovered, analyzed, accessed and integrated into a user’s context based on the classification labels that have been exposed to the user in the Data Asset catalog. Based on this perspective, classification is foundational for driving value out of data in the areas of analytics, business intelligence, operational efficiencies, and compliance.
Indeed in the big data space, classification is foundational for analytics, machine learning, the application of higher level logic, and (way up the maturity curve) for building artificial intelligence capabilities. As a foundational building block for AI, classification is an interesting topic; although for many too abstracted from today’s problems. However, as the foundation for making data discoverable, understandable, accessible and able to be integrated into downstream applications, it is highly relevant to today’s challenges – almost regardless of where your current capabilities stand. For this reason any data governance shop should include in its planning a workstream that seeks to evolve classification capabilities
Consider the following use cases:
Business Intelligence: marketers seeking to report on price sensitivity and are comparing the difference between prices quoted, prices invoiced, and prices paid net of discount. Data across all of the ERP or transactional systems in use must be classified such that the BI Team is assured that all fields marked as “Price” are the correct type of price.
Marketing Analysis: Your customer 360ᵒ program seeks to understand external factors that may have influenced pricing and discounts provided. What customers are related to the prices referenced above? What kind of customers are they (industry, buying frequency, average purchase, …)? How can I correlate those with external events (elections, new regulation, natural disasters, …)? All of this analysis is supported by data that is classified to reflect the types of queries that may occur.
Operational Efficiency: Your COO wants to ensure that the acquisition process is fully optimized, and seeks to benchmark operations using the SCOR (Supply Chain Operations Reference) Model. The Operations Team downloads the 250 SCOR performance metrics and seeks to map those to the relevant data. Classification supports the ability to find the right data and map it to the data specified in the SCOR Model.
Compliance & Risk Management. The Compliance Team must produce a catalog of all data that is personal information and organize that by business unit. Data classification that attaches labels identifying business unit and the type of data will drives what data is included in that report.
Bottom line, if classification is not something that you have thought about, consider putting a plan together. It is the key to releasing the value of your data, and fully leveraging data as an asset.