A transverse definition of catalog projects had taken too long, cost a lot of money, and did not produce financial value outcomes.
A lot has been said about the benefits of controlling your data as an organization. Whether the data is in the organization or arrives from outer sources.
The multiplication of data sources, resulting from the multiple data software in the organizations and outside of them (social media, suppliers’ portals, and more) requires the establishment of a data catalog. The data catalog is a reservoir containing the organization’s entire meta-data, with the purpose of aiding the data consumer (client, analyst, data scientist, etc.) to search and find the most relevant and reliable data source for their needs.
To better illustrate this point, imagine a big library in which the books are not organized, and there is no index to look for a certain book. In a situation like this, the library will cease to exist. Unfortunately, this is the situation in most organizations – there is a partial index, but it’s out of date, and sometimes it doesn’t contain all the data or sources.
In light of the need and benefits of data management, it is necessary to examine why the majority of data catalog projects did not succeed.
Whose data is this?
One of the most important issues to clarify is who is in charge of the data. Is it the IT or the users themselves feeding the data and using the different programs? Will clarify the question by asking another to demonstrate – if the program is missing data about several clients’ addresses, who is in charge of that? The end user. But if invalid default values were entered in another program or the right database management columns were entered, IT is responsible for that. So, in a data quality situation, every user will be responsible for their own mistakes.
In order to build the catalog in the right way, there has to be a full representation of every user.
Integration components
In order to build and maintain a data catalog, we need for components:
- Meta-data loading mechanism, the data structure.
- A simple and modern user interface to enter the context per business column.
- A robust search engine to locate the relevant data source and will state the data quality level using automated Data Profiling.
- A friendly user interface to present every data’s Data Lineage.
Technically, the majority of data catalog-building tools that exist do support these four components to a certain level. In that case, the reasons for the data catalog integration failure are different…
Defining the scale of the project and its focus – SOW
A transverse definition of a catalog project will not work. We have learned that from dozens of failed projects. It took way too long, cost a lot of money, and did not produce any financial value outcomes.
A data catalog project’s success requires defining the project’s scale, focusing the project, and making sure it solves an acute business problem so we can give a specific unit in the organization valuable results. After a successful project, it can be extended to another unit by priority and cost consideration.
Defining the client or clients
Owner
- An owner must be defined at the executive management level, which will be the business user to accompany the project for quantitative inspection of the results.
- The client must be committed to participating in the establishment and maintenance process, including personnel allocation for catalog and support ongoing change.
- The client must be obligated for personal allocation for data optimization when needed.
- The new software must have users and clients (data analysts) that will look and retrieve the relevant data – it should be noted that who’s building and fixing the malfunctions is not necessarily the consumer.
IT
Most clients that defined that project as an IT converging project, will find themselves without a budget, with no ability to present an ROI, and without any results or users.
- Defining the project as a secondary project for each department, topic, or area in the organization, including quantitative success measures.
- Building an organized methodology for integration based on cumulative experience and projects done elsewhere.
- A quick delivery of results following a defined schedule. Every six months expand the catalog by feeding more data sources.
- Enabling Self Service data retrieval for the end user (data analyst) while emphasizing the process will be consecutive and will have a suitable permissions mechanism.
- Resource allocation for a business support center for the organization users and not necessarily technical, for data quality and search solutions.
Both sides need to understand that the process is ongoing and has no end because changes will continue flowing, both in data structures and data sources.
In conclusion
Successful data catalog projects do exist. The approach we recommend is training the IT and the business client in the organization in a combined manner on the tools, defining success measures, and workflow and methodology instruction, including relevant job holders like data owners, data stewards, and more. It’s important to limit the project to a specific topic with specific and measurable success measures and from that point, to expand the project to other areas.
The bottom line – an organization that wishes to produce the best from its data must be adept for change in which the business customer is an active partner regarding data quality and catalog for the benefit of achieving the organization’s changing business goals.
The writer is Yossi Rodrik, CEO of Aqurate