Executive summary.


There’s a lot of buzz around data catalogs right now— and a growing number of solutions from more and more vendors. What exactly is a data catalog? And how do you make sure you are not getting lost in the process of selecting the right catalog to meet your needs? This guide walks through the basics of what a catalog is and how it works, what business challenges it can help solve, and how to make sure you are avoiding common pitfalls and choosing the right one for your needs.

What is a data catalog and how does it work?


There’s actually a lot of misconceptions on what a data catalog is and what it can do. So, what is it? In a nutshell, a data catalog is a place that shows what data assets you have and where they are located. You might be asking, what is a data asset? That is any entity (i.e. reports, databases, websites) that contains data. How does a data catalog work? How does a catalog help organizations get a handle on their data and more importantly, use it to make decisions and drive business value? The next page shows a simple graph that outlines how a data catalog solution can work to deliver business outcomes.

How does an optimal data catalog work?

Here are the 5 stages showing how a data catalog can deliver on the business outcome "I want to delight my customers":

Data Catalog Full Process 3.22.2018-1

Do I need a data catalog?


With the tremendous growth in the volume of data, increased access to multiple data sources, along with new compliance regulations—organizations are working to “get a handle” on their enterprise-wide data. They must be able to answer the questions:

  • What data do I have?
  • Where is it?
  • Is it trusted?

As a result, data catalog solutions have gone from being a “nice to have” to a “must have” in the arsenal of data governance capabilities. In a recent research report Data Catalogs are the New Black in Data Management and Analytics Research1 , Gartner reports that demand for data catalogs is soaring as organizations struggle to inventory their distributed data assets to facilitate data monetization and conform to regulations.

How do you know if you need a data catalog?

If you find yourself saying the following, you may need a data catalog (or data catalog + governance) solution:

“I NEED BETTER ANALYTICS”

Many organizations are asking how they can get more value from analytics and have better visibility into their data. The introduction of IoT and Digital Transformation have resulted in an abundance of data. Now organizations need to find the available data and confirm it’s trusted so it can be used for decision making.

“I’VE INVESTED IN B.I., BUT IS THE REPORTING DATA CORRECT?

There has been a surge in the investment in B.I. software. Locating the right data for analysis and reporting is a challenge that must be solved for when implementing B.I. Some organizations are able to locate their data, but cannot identify the source to confirm it’s valid. Still others are finding conflicting results between two different reports.

“HELP! MY DATA LAKE HAS BECOME A DATA SWAMP.”

Your data lake seemed to be the answer to all of your problems. However now business stakeholders are not able to access the information they need from the data lake. No one is certain what data exists in the lake and how to access it.

“WILL I PASS A GDPR AUDIT? IF NEEDED, CAN I PROVIDE A CUSTOMER WITH ALL OF THEIR PERSONAL DATA?“

There’s a lot of concern around GDPR and growing scrutiny around consumer privacy. If a customer requests to exercise their data subjects right—like the right to be forgotten—can you quickly accomplish this—and locate all available personal data?

“HOW DO I PREPARE MY ORGANIZATION FOR A.I.?”

As A.I. moves into the mainstream, organizations are finding that identifying the right data to inform the algorithm is critical. This applies to the input data along with the features of the data itself, including tagging the data, having the right metadata, user data etc. The first step in this process is therefore to discover and catalog the data.

In all of these cases, there is a common thread. Organizations must be able to answer “What data do we have and where is it?” But they also need to understand how it connects to their enterprise metadata, and more importantly to their business outcomes. 

As organizations start to flock to the most popular solutions, take heed of Gartner’s advice. They caution that organizations must take the time to find the “right” solution and make sure that it can be aligned with organizational initiatives. As stated in the recent Gartner research1 : “Data catalog projects will fall short of their full potential if data and analytics leaders don’t link them to broader data management needs.” See Pitfall #2

What is the typical implementation timeline and how do I avoid pitfalls?


Data catalogs should be easily implemented within a few weeks to months. However, there are a few reasons why companies might experience more painful, timely projects. If you have done your due diligence and selected a data catalog that is cloud-based, “on the stack” and aligned with your EIM and enterprise metadata management strategies, then it should be smooth sailing. However, if you have decided on a catalog that requires up front customization, specific hardware or a team of specialized developers then you might be looking at a costly project.

PITFALL #1 DON’T TAKE A VENDOR’S WORD FOR IT.

Vendors want to sell you their solution. So sometimes weakness and limitations are glossed over. It is your job to make sure that you aren’t falling for “market-tecture”. When deciding on a catalog, check popular review sites like Gartner Peer Insights, speak with analysts and make sure you ask references about implementation.

PITFALL #2 DON’T BE SHORTSIGHTED.

According to Gartner, companies should “Avoid data catalogs that do not have the ability to scale out beyond tactical use case requirements and connect to the broader enterprise metadata management and EIM initiatives.”1 Some companies are choosing data catalogs based on a single, tactical use case, like to inventory the data in their data lakes for instance. It’s important to understand that deploying a catalog for one tool or use will improve data usability, trust and shareability ONLY for that specific tool. This ultimately creates the need for a data catalog of all the data catalogs in your architecture. This is not the way to enable effective monetization in the long term. Before selecting a data catalog for one specific use case, make sure that you have evaluated options that span across use cases and are connected to your broader EIM needs.

PITFALL #3 DON’T ASSUME THAT EVERY CATALOG IS USABLE BY EVERYBODY.

Some catalogs are built for a more technically minded user who is using SQL. These catalogs have some high-tech capabilities and provide a full picture into the technical lineage and providence of every bit of data in the ecosystem. Others are built more for business users that don’t care about SQL or about technical lineage, but rather, want to see the data that matters for the initiative they care about in a user friendly way. Who is going to be using your catalog and for what reason? Make sure that you don’t try to force your business users into being IT coding experts. This could cause serious issues with adoption and ROI.

How do I choose the best data catalog?


It’s important to spend the time up front to identify what functionality is important to your organization. You might find that different groups have different needs. Having this list defined when you start your search will help ensure you’re selecting the right solution. At a bare minimum, data catalogs should be able to:

  • Discover what data is available
  • Identify where it is located
  • Provide information on whether that data can be trusted

Once you’ve checked the box on that basic functionality, there are a few other things you should consider to ensure your catalog can be used to add business value in the future. Will it provide realtime integration with your data sources, so that you are continuously populating the data catalog with the data that is critical to you?

  • Is it easy to use?
  • Can it search all of your databases - on-premise or in the cloud?
  • Will you be able to connect your data assets directly to organizational goals and initiatives so that you can see and measure how data drives your business?

Comparing data catalog solutions? Use this checklist to determine your critical needs and communicate them to vendors.

DOWNLOAD

Data_catalog_checklist_thumbnail_2018

Where can I learn more?


Recent research reports outline what to look for in a data catalog and governance solution, and how all of the vendors stack up:

Data Catalogs Are the New Black in Data Management and Analytics by Ehtisham Zaidi, Guido De Simoni, Roxane Edjlali, Alan D. Duncan, December 13 2017.

Forrester Wave for Data Governance, Stewardship and Discovery Software by Henry Peyret with Alex Cullen, Alex Kramer, and Sam Bartlett, June 26 2017. Download a complimentary report.

About DATUM

DATUM drives decision integrity across any enterprise, empowering organizationsto discover the right data and make the right decisions faster.


By focusing on what data matters and why, DATUM's proven data governance and stewardship platform, Information Value Management®, delivers business value insights. Today, Fortune 500's trust DATUM as the data governance system of record to improve operational efficiency, deliver greater analytical insights and simplify compliance and regulatory reporting.

DATUM was named a Leader in The Forrester Wave™: Data Governance, Stewardship and Discovery Providers 2017 and in Gartner's 2017 Metadata Management Magic Quadrant. DATUM has also received the top score in Bloor Research Data Governance Market Update report. Learn more about DATUM here.

Request a demo.

Ask us how our data catalog helps you find, understand and use your data.

Profile_lrg_Ogden-Cameron
ABOUT THE AUTHOR

Cam's passion is building software products that generate significant value for teams and companies. Cam is responsible for driving product strategy, road-mapping, and execution across all DATUM customers and partners.

What can your data do for you?

A data catalog is just the beginning. What will you accomplish with better access to your data?


Prepare Transformation Card

Organizations must embrace data or die. Use this framework to prepare your enterprise for digital transformation.

Learn More
get started with data governance

Build the case for launching a data governance organization with this framework that is focused on growth and adoption.

Learn More
product transparency

Consumers care about where their products come from and how they are made. Effective data governance can drive the speed and flexibility needed to impact PLM metrics.

Learn More