Many government agencies and corporations are currently examining the best metadata management tools in the marketplace to decide which of these tools, if any, meet the requirements for their metadata management solutions.

Often these same organizations want to know what types of functionality and features they should be looking for in this tool category.  Unfortunately, this question becomes very complicated as each tool vendor has their own personalized “marketing spin” about which functions and features are the most advantageous.  This leaves the consumer with a very difficult task, especially when it seems like none of the vendors’ tools fully fit the requirements that the metadata management solution requires.  At EWSolutions, we have several clients that have these same concerns about the tools in the market.

Although I have no plans on starting a software company, I would like to take this opportunity to play software designer, and present my optimal metadata tool’s key functionality.

One of the challenges with this exercise is that metadata functionality has a great deal of depth and breath.  Therefore, to categorize our tool’s functionality, I will use the six major components of a managed metadata environment (MME)

  • Meta Data Sourcing & Meta Data Integration Layers
  • Meta Data Repository
  • Meta Data Management Layer
  • Meta Data Marts
  • Meta Data Delivery Layer

I will now walk through each of these MME components and describe the key functionality that my optimal metadata tool would contain.

Definition and Importance of Metadata Management

Metadata management is the process of collecting, organizing, storing, and maintaining metadata, which is data that describes other data. It is a crucial aspect of data management, as it enables organizations to understand the context, structure, and meaning of their data assets. Effective metadata management is essential for data governance, data quality, and data discovery, as it provides a centralized system for cataloging, tracking, and analyzing data lineage, data relationships, and data usage patterns.

By implementing robust metadata management practices, organizations can ensure that their data assets are well-documented and easily accessible. This not only enhances data quality but also supports data governance initiatives by providing clear visibility into data lineage and usage. Furthermore, effective metadata management facilitates data discovery, enabling users to quickly locate and utilize the data they need for decision-making and analysis. In essence, metadata management is the backbone of a well-organized and efficient data management strategy.

Types of Metadata

There are several types of metadata, each serving a unique purpose in the management and utilization of data assets:

  • Descriptive Metadata: This type of metadata provides information about the data itself, such as the title, author, and date created. It helps users understand the content and context of the data, making it easier to locate and use.
  • Structural Metadata: Structural metadata describes the organization and structure of the data, including file format, size, and relationships between data elements. This type of metadata is crucial for understanding how data is stored and accessed.
  • Administrative Metadata: Administrative metadata provides information about the management and preservation of the data. This includes access permissions, storage location, and retention policies. It ensures that data is properly managed and maintained over its lifecycle.
  • Technical Metadata: Technical metadata includes details about the technical aspects of the data, such as file format, compression algorithm, and hardware/software requirements. This metadata is essential for ensuring that data can be properly processed and utilized by various systems and applications.

Understanding these different types of metadata is essential for effective metadata management, as each type plays a critical role in the overall data management process.

Meta Data Sourcing & Meta Data Integration Layers

For simplicity’s sake, I will discuss this “dream” tool’s functionality for both the metadata sourcing and the metadata integration layers together.  The goal of the metadata sourcing and integration layers is to extract the metadata from its source, integrate it where necessary, and to bring it into the Meta Data Repository.

Platform Flexibility

It is important for the metadata sourcing technology to be able to work on mainframe applications, distributed systems and from files (databases, files, spreadsheets, etc.) off a network, and from remote locations.  These functions must be able to run on each of these environments so that the metadata could be brought into the repository.

Pre-built Bridges

Many of the current metadata integration tools come with a series of pre-built metadata integration bridges.  The optimal metadata tool would also have these pre-built bridges.  Where our optimal tool would differ from the vendor tools is that this tool would have bridges to all of the major relational database management systems (e.g. Oracle, DB2, SQL Server), the most common vendor packages, several code parsers (COBOL, JCL, C+, SQL, XML, etc.), key data modeling tools (ERWin, ERStudio, PowerDesigner, etc.), top ETL (extraction, transformation and load) tools (e.g. Informatica) and the major front-end tools (e.g. Business Objects, Cognos, etc.).

As much as is possible I would want my metadata tool to use utilize XML (extensible markup language) as the transport mechanism for the metadata.  While XML cannot directly interface with all metadata sources, it would cover a great number of them.  These metadata bridges would not just bring metadata from its source and load it into the repository.  These bridges would be bi-directional and allow metadata to be extracted from the metadata repository and brought back into the tool.  Lastly, these metadata bridges would not just be extraction processes, but also have the ability to act as “pointers” to where the metadata is located.  This distributed metadata capability is very important for a repository to have.

Error Checking & Restart

Any high quality metadata tool would have an extensive error checking capability built into the sourcing and integration layers.  Metadata in a MME, like data in a data warehouse, must be of high quality or it will have little value.  This error checking facility would check the metadata that it is reading and would check it for errors and then capture any statistics on the errors that the process is experiencing (meta metadata). In addition, the tool would have error levels of the metadata.

For example it would give the tool administrator the ability to configure the actions based on the error that occurred in the process.  Should the metadata be:

  • Flagged with an informational/error message
  • Flagged as an error and then not loaded into the repository
  • Flagged as a critical error and the entire metadata integration process is stopped.

Also this process would have “check points” that would allow the tool administrator to restart the process. These check points would be placed in the proper locations to ensure that the process could be restarted with the least degree of impact on the metadata itself and on its sourcing locations.

Meta Data Repository

The metadata repository component is the physical database that is persistently cataloging and storing the actual metadata.  The repository, and its corresponding meta model comprise the backbone of the MME. Therefore, in listing out the optimal metadata tool’s functionality I will pay special attention to the design and implementation of the meta model.

Advancing Metadata Repository Design for Enhanced Data Utilization

A central metadata repository forms the foundation of any robust metadata management platform, enabling the efficient cataloging of metadata and streamlining data discovery. To ensure maximum utility, the repository should integrate seamlessly with virtually any metadata provider, allowing organizations to harvest and catalog metadata across diverse sources. Incorporating a data dictionary into the repository helps classify and organize key data elements, improving the accessibility and usage of the organization’s data assets. Proper metadata repository design also facilitates data profiling, data mapping, and data classification, empowering data and analytics teams to maintain a clear view of reference data and sensitive data. By adopting tools with advanced metadata management capabilities, organizations can support data-driven decision-making and enforce data governance policies effectively, ensuring compliance and improved data quality.

Meta Model Design

A meta model is a physical database schema for metadata.  Anytime an MME is implemented there are integration processes that must be custom built to bring metadata into the repository. Therefore, a good meta model needs to be understandable to the repository developers working with it.

As a result, the meta model should not be designed in a highly abstracted, object-oriented manner. Instead mixing classic relational modeling with structured object-oriented design is the preferable approach to designing a meta model.

On the other hand, when highly cryptic (abstracted) object-oriented design is used for the construction of the meta model, it becomes unwieldy and difficult for the IT developers to work with. The possible exception to this guideline would be if the abstracted object-oriented model has relational views built on the model that would allow for read/write/update capabilities. These views must be understandable and fully expendable.

Meta Model Implementation

The metadata repository must not be housed in a proprietary database management system.  Instead, it should be stored on any of the major open relational database platforms (e.g. SQL Server, Oracle, DB2) so that standard SQL can be used with the repository.

Semantic Taxonomy

Many government agencies and large corporations’ IT departments are looking to define an enterprise level classification/definition scheme for their data.  This semantic taxonomy would then provide these organizations with the ability to classify their data, in order to identify data and process redundancies in their IT environment.  Therefore, the optimal metadata tool would provide the capabilities to capture maintain and publish a semantic taxonomy for the metadata in the repository.

Meta Data Management Layer

The purpose of the metadata management layer is to provide the systematic management of the metadata repository and the other MME components.  This layer includes many functions, including (see Figure 1: Meta Data Management Layer):

  • Archiving – of the metadata within the repository
  • Backup – of the metadata on a scheduled basis
  • Database Modifications – allows for the extending of the repository
  • Database Tuning – is the classic tuning of the database for the meta model
  • Environment Management – is the processes that allow the repository administrator to manage and migrate between the different versions/installs of the metadata repository
  • Job scheduling – would manage both the event-based and trigger-based metadata integration processes
  • Purging – should handle the definition of the criteria required to define the MME purging requirements
  • Recovery – process would be tightly tied into the backup and archiving facilities of repository
  • Security Processes – would provide the functionality to define security restrictions from an individual and group perspective
  • Versioning – metadata is historical, so this tool would need to version the metadata by date/time of entry into the MME

Metadata 2

Figure 1 – Metadata Management Layer functions

The optimal metadata tool would also have very good documentation on all of its components, processes, and functions.  Interestingly enough too many of the current metadata vendors neglect to provide good documentation with their tools. If a company wants to be taken seriously in the metadata arena they must “eat their own dog food”.

Enhancing Metadata Management with Collaborative Features

Effective metadata management tools play a pivotal role in fostering collaboration and ensuring compliance within organizations. Features such as data lineage tracking, data cataloging, and business glossary integration enable teams to work cohesively by providing clear insights into the origin, transformation, and usage of data. By offering functionalities like active metadata management and data quality monitoring, these tools empower data teams to maintain consistency across enterprise data catalogs, ensure adherence to data governance policies, and support the organization’s broader data governance strategy. Additionally, solutions like Oracle Enterprise Metadata Management (OEMM) and Informatica Metadata Management stand out for their advanced automation and AI-driven capabilities, which enhance metadata’s accessibility and usability across business units.

Data Discovery and Cataloging

Data discovery and cataloging are critical components of metadata management. Data discovery involves identifying and locating data assets within an organization, while data cataloging involves creating a centralized repository of metadata that describes these data assets. A data catalog provides a single source of truth for data assets, making it easier for users to find, understand, and access the data they need.

A well-implemented data catalog enhances data discovery by providing detailed descriptions, classifications, and relationships of data assets. This centralized repository not only improves data accessibility but also supports data governance by ensuring that data is consistently documented and managed. By leveraging data cataloging tools, organizations can streamline their data management processes, improve data quality, and facilitate better decision-making.

Data Lineage and Impact Analysis

Data lineage and impact analysis are essential for understanding how data is created, transformed, and consumed within an organization. Data lineage involves tracking the flow of data from its source to its various transformations and consumption points, providing a clear view of the data’s journey through the organization. This visibility is crucial for ensuring data integrity and compliance with data governance policies.

Impact analysis, on the other hand, involves analyzing the potential impact of changes to data sources or processes on downstream applications and users. By understanding the dependencies and relationships between data elements, organizations can assess the risks and implications of proposed changes, ensuring that they make informed decisions about data management and governance. Effective data lineage and impact analysis enable organizations to maintain data quality, support data governance initiatives, and enhance overall data management practices.

Meta Data Delivery Layer

The metadata delivery layer is responsible for the delivery of the metadata from the repository to the end users and to any applications or tools that require metadata feeds to them.

Web Enabled

A java based, web-enabled, thin-client front-end has become a standard in the industry on how to present information to the end user and certainly it is the best approach for an MME. This architecture provides the greatest degree of flexibility, lower TCO (total cost of ownership) for implementation and the web browser paradigm is widely understood by most end users within an organization.  This web enabled front-end would be fully and completely configurable.  For example, I may want  options that my users could select or I may want to put my company’s logo in the upper right hand corner of the end user screen.

Pre-built Reports

Impact analysis are technical metadata driven reports that help an IT department assess the impact of a potential change to their IT applications (see Figure 2: “Impact Analysis: Column Analysis for a Bank” for an example).  Impact analysis can come in an almost infinite number of variations; certainly, the optimum metadata tool would provide dozens of these types of reports pre-built and completely configurable.  In addition, the tool would be able to “push” these pre-built reports and any custom built reports to specific users or groups of users’ desktops, or even to their email address.  These pushed reports could be configured to be released based on an event trigger or on a scheduled basis.

Metadata 3

Figure 2: Impact Analysis: Column Analysis for a Bank

Website Meta Data Entry

Most enterprise metadata repositories provide their business users a web-based front-end so that the data stewards can enter metadata directly into the repository. This front-end capability would be fully integrated into the MME and it would be able to write back to the metadata repository.  In addition, not only would this entry point allow metadata to be written to the repository, it would also allow for relationship constraints and drop-down boxes to be fully integrated into the end user front-end.  Moreover, many of these business metadata related entry/update screens would be pre-built and fully configurable to allow the repository administrator to modify them as required.  The ability to use the web front-end to write back to the repository is a feature that is lacking in many of today’s metadata tools.

Publish Graphics

The optimal metadata tool would also have the ability to publish graphics to its web front-end.  The users would then be able to click on the metadata attributes within these graphics for metadata drill-down, drill-up, drill-through and drill-across. For example, a physical data model could be published to the website. As an IT developer looks at this data model they would have the ability to click on any of the columns within the physical model to look at the metadata associated with it.  This is another weakness in many of the major metadata tools on the market.

Metadata Marts

A metadata mart is a database structure, usually sourced from a metadata repository, which is designed for a homogeneous metadata user group (see Figure 3: “Metadata Marts”).  “Homogeneous metadata user group” is a fancy term for a group of users with similar needs.

Metadata 4

Figure 3: Metadata Marts

This tool would come with pre-built metadata marts for a few of the more complex and resource intensive impact analysis.  In addition, we would have metadata marts for each of the significant industry standards like Common Warehouse Meta Model (CWM), Dublin Core, and ISO 11179.

Evaluating Metadata Management Tools

When evaluating metadata management tools, organizations should consider several key factors to ensure they select a solution that meets their specific needs and requirements:

  • Data Discovery and Cataloging Capabilities: The tool should offer robust data discovery and cataloging features, enabling users to easily locate and understand their data assets.
  • Data Lineage and Impact Analysis Capabilities: The ability to track data lineage and perform impact analysis is crucial for maintaining data integrity and supporting data governance.
  • Data Governance and Compliance Features: The tool should support data governance initiatives by providing features for policy enforcement, compliance monitoring, and data stewardship.
  • Data Quality and Profiling Capabilities: Ensuring data quality is essential, so the tool should include features for data profiling, validation, and cleansing.
  • Integration with Existing Data Management Systems and Tools: The tool should seamlessly integrate with the organization’s existing data management infrastructure, including databases, ETL tools, and BI platforms.
  • Scalability and Performance: The tool should be able to handle the organization’s current and future data volumes and performance requirements.
  • User Experience and Accessibility: A user-friendly interface and accessibility features are important for ensuring that all users can effectively utilize the tool.
  • Customization and Flexibility: The tool should offer customization options to meet the organization’s specific needs and workflows.
  • Collaboration and Knowledge-Sharing Capabilities: Features that support collaboration and knowledge sharing among users can enhance the overall effectiveness of the tool.
  • Vendor Support and Community: Reliable vendor support and an active user community can provide valuable resources and assistance.
  • Total Cost of Ownership (TCO): Organizations should consider the overall cost of the tool, including licensing, implementation, and maintenance costs.

By carefully evaluating these factors, organizations can select a metadata management tool that enables them to effectively manage their data assets, support data governance initiatives, and enhance their overall data management strategy.

Establishing a Data Governance Framework

Building a robust data governance framework involves defining roles, policies, and processes that align with organizational structures and objectives. Key steps include:

  1. Leadership and Oversight: Assign a chief data officer to oversee governance initiatives, supported by a committee responsible for approving foundational policies and managing data-related matters.
  2. Data Ownership Identification: Identify data owners across business units to ensure accountability and accurate data management throughout the enterprise.
  3. Stakeholder Training: Implement training programs for business users, ensuring they understand data governance terminology, master data management practices, and how to use governance tools like data catalogs effectively.
  4. Policy Communication: Regularly communicate progress to stakeholders, emphasizing the benefits of governance, such as improved data quality and enhanced data security.
  5. Dispute Resolution: Develop and enforce procedures for addressing conflicts related to data usage, ensuring adherence to internal standards and compliance regulations.

This structured approach ensures that the data governance framework supports risk management, data integrity, and collaboration across business systems, enabling a data-driven culture.

Conclusion

The optimal metadata management tool is possible, as described in this article.  Organizations need this product to manage their metadata effectively, to give meaning to their data and provide value to this asset.  Software development organizations can use this article as the basis for developing a product that will satisfy all metadata management requirements.