Introduction
Almost every corporation and government agency has already built, is in the process of building, or is looking to build a Managed Metadata Environment (MME), either as part of a metadata solution or within an enterprise data management initiative. Many organizations, however, are making fundamental mistakes. An enterprise may build many metadata repositories, or “islands of metadata” that are not linked together, and as a result do not provide as much value (see “Where’s my metadata architecture?” sidebar).
Metadata is information about the physical data, technical and business processes, data rules and constraints, and logical and physical structures of the data, as used by an organization. These descriptive tags describe data, concepts, and the relationships between the data and concepts.
Let’s take a quick metadata management quiz. What is the most common form of metadata architecture? It is likely that most of you will answer, “centralized”; but the real answer is “bad architecture”. Most metadata repository architectures are built the same way data warehouse architectures were built: badly. The data warehouse architecture issue resulted in many Global 2000 companies rebuilding their data warehousing applications, sometimes from the ground up. Many of the metadata repositories under development or already in use need to be completely rebuilt.
Where’s my metadata architecture? At EWSolutions one of our clients is a large pharmaceutical company. Since knowledge is the lifeblood of any pharmaceutical company, these types of firms tend to have very large metadata requirements and staffs. This company had decided to have a “Metadata Day” and as such, they had asked me to come on-site and give a keynote address to kick the day off. Between 60-80 people attended “Metadata Day”.After the keynote address were a series of workshops. We counted 4 separate metadata repositories in production and 3 other separate new metadata repository initiatives starting up – a classic “islands of metadata” problem. This is not an approach that leads to long-term positive results. None of these islands are linked to each other and much of the most valuable metadata functionality will come from the relationships that the metadata has with itself. For example, it is highly valuable to view a physical column name (technical metadata) and then drill-across to the business definition (business metadata) of that physical column name.
MME Overview
The Managed Meta Data Environment (MME) represents the architectural components, people, and processes required to properly and systematically gather, retain, and disseminate metadata throughout the enterprise. The MME encapsulates the concepts of metadata repositories, catalogs, data dictionaries, and any other term that people have thrown out to refer to the systematic management of metadata. Some people mistakenly describe an MME as a data warehouse for metadata. In actuality, an MME is an operational system and as such is architected in a vastly different manner than a data warehouse.
Companies that are looking to truly and efficiently manage metadata from an enterprise perspective need to have a fully functional MME. It is important to note that a company should not try to store all of their metadata in a MME, just as the company would not try to store all of their data in a data warehouse. Without the MME’s components, it is very difficult to effectively manage metadata in a large organization.
The six components of the MME, shown in Figure 1, are:
Meta data sourcing layer
Meta data integration layer
Meta data repository
Meta data management layer
Meta data marts
Meta data delivery layer
Figure 1: Managed Metadata Environment
An MME can be used in either the centralized, decentralized or distributed architecture approaches: Centralized architecture offers a single, uniform, and consistent meta model that mandates the schema for defining and organizing the various metadata stored in a global metadata repository. This allows for a consolidated approach to administering and sharing metadata across the enterprise. Decentralized architecture creates a uniform and consistent meta model that mandates the schema for defining and organizing a global subset metadata to be stored in a global metadata repository and in the designated shared metadata elements that appear in local metadata repositories. All metadata that is shared and re-used among the various local repositories must first go through the global repository, but sharing and access to the local metadata are independent of the global repository.
Distributed architecture includes several disjointed and autonomous metadata repositories that have their own meta models to dictate their internal metadata content and organization with each repository solely responsible for the sharing and administration of its metadata. The global metadata repository will not hold metadata that appears in the local repositories; instead it will have pointers to the metadata in the local repositories and metadata on how to access it. See Figure (1). At EWSolutions we have built MMEs that use each of these three architectural approaches and some implementations use combinations of these techniques in one MME.
MME Ratings: Understanding Their Importance and Methodology
To complement the technical aspects of an MME, it’s vital to consider how Managed Metadata Environment ratings assess a mid-market company’s financial and operational integrity. These ratings, which are derived using a specific methodology, gauge an organization’s ability to fulfill its financial obligations while considering critical business factors.
MME ratings apply exclusively to companies with annual revenues below €1.5 billion, focusing on their capital structure, financial policies, liquidity, and management governance. A firm’s overall competitive position, coupled with country and industry risks, heavily influences its rating. Although they don’t include outlooks, ratings can be adjusted or placed on CreditWatch based on dynamic market conditions.
Organizations with durable revenue growth beyond the eligibility threshold may see their MME ratings withdrawn. These ratings also extend to corporate entities and debt instruments, underscoring their comprehensive nature. Understanding this evaluation process is essential for effectively managing metadata environments aligned with business performance benchmarks.
Metadata Sourcing Layer
The Metadata Sourcing Layer is the first component of the MME architecture. The purpose of the Metadata Sourcing Layer is to extract metadata from its source and to send it into the Metadata Integration Layer or directly into the metadata repository (see Figure 2). Some metadata will be accessed by the MME through the use of pointers (distributed) that will present the metadata to the end user at the time that it is requested. The pointers are managed by the Metadata Sourcing Layer and stored in the Metadata Repository.
Figure 2: Metadata Sourcing Layer
It is best to send the extracted metadata to the same hardware location as the Metadata Repository. Often metadata architects incorrectly build metadata integration processes on the platform that the metadata is sourced from (other than record selection, which is acceptable). This merging of the metadata sourcing layer with the metadata integration layer is a common mistake that causes a whole host of issues.
As sources of metadata are changed and added (and they will), the metadata integration process is negatively affected. When the metadata sourcing layer is separated from the metadata integration layer only the metadata sourcing layer is affected by this type of change. By keeping all of the metadata together on the target platform, the metadata architect can adapt the integration processes much more easily.
Keeping the extraction layer separate from the sourcing layer provides a tidy backup and restart point. Metadata loading errors typically happen in the metadata transformation layer. Without the extraction layer, if an error occurred the architect would have to go back to the source of the metadata and re-read it. This can cause a number of problems. If the source of metadata has been updated it may become out of sync with some of the other sources of metadata that it integrates with. In addition, the metadata source may currently be in use and this processing could affect the performance of the metadata source. The golden rule of metadata extraction is:
Never have multiple processes extracting the same metadata from the same metadata source
In these situations, the timeliness and consequently the accuracy of the metadata can be compromised. For example, suppose that you have built one metadata extraction process (Process #1) that reads physical attribute names from a modeling tool’s tables to load a target entity in the meta model table that contains physical attribute names. You also built a second process (Process #2) to read and load attribute domain values. It is possible that the attribute table in the modeling tool has been changed between the running of Process #1 and Process #2. This situation would cause the metadata to be out-of-sync and lead to meta data loading errors.
This situation can also cause unnecessary delays in the loading of the metadata with metadata sources that have limited availability/batch windows. For example, if you were reading database logs from your enterprise resource planning (ERP) system you would not want to run multiple extraction processes on these logs since they most likely have a limited amount of available batch window. While this situation does not happen often, there is no reason to build in unnecessary flaws into your metadata architecture.
The number and variety of metadata sources will vary greatly based on the business requirements of your MME. Though there are sources of metadata that many companies commonly use, I have never seen two metadata repositories that have exactly the same metadata sources. Have you ever seen two data warehouses with exactly the same source information? Following are the most common metadata sources:
Software tools
End users
Documents and spreadsheets
Messaging and transactions
Applications
Web sites and E-commerce
Third parties
Metadata Integration Layer
The metadata integration layer (Figure 3) takes the various sources of metadata, integrates them through a meta data integration process, and loads it into the metadata repository. This approach differs slightly from the common techniques used to load data into a data warehouse, as the data warehouse clearly separates the transformation (what we call integration) process from the load process. In an MME, these steps are combined because, unlike in a data warehouse, the volume of metadata is not nearly as large as in data warehousing data. However, it is crucial to address the challenges in the meta data integration processes to ensure accuracy and efficiency.
As a general rule the MMEs holds between 5-20 gigabytes of metadata; however, as MME’s are looking to target data audit related metadata then storage can grow into the 20-75 gigabyte range and over the next few years you will see some MME’s reach the terabyte level.
Figure 3: Metadata Integration Layer
The specific steps in this process depend on whether you are building a custom process or if you are using a metadata integration tool to assist your effort. If you decide to use a metadata integration tool, the specific tool selection can also greatly affect this process.
Metadata Repository
A metadata repository is a fancy name for a database designed to gather, retain, and disseminate metadata. The metadata repository (Figure 4) is responsible for the cataloging and persistent physical storage of the metadata, including the meta data stored within various applications.
Figure 4: Metadata Repository
The Metadata Repository should be generic, integrated, current and historical. Generic means that the physical meta model looks to store metadata by metadata subject area as opposed to application-specific.
For example, a generic meta model (a model of the metadata concepts) will have an attribute named “DATABASE_PHYS_NAME” that will hold the physical database names within the company. A meta model that is application-specific would name this same attribute “ORACLE_PHYS_NAME”. The problem with application-specific meta models is that metadata subject areas change. To return to our example, today Oracle may be our company’s database standard. Tomorrow we may switch the standard to SQL Server for cost or compatibility advantages. This situation would cause needless additional changes to the change to the physical meta model. (2)
A Metadata Repository also provides an integrated view of the enterprise’s major metadata subject areas . The repository should allow the user to view all entities within the company, and not just entities loaded in Oracle or entities that are just in the customer relationship management (CRM) applications.
Third, the metadata repository contains current and future metadata, meaning that the metadata is periodically updated to reflect the current and future technical and business environment. Keep in mind that a metadata repository is constantly updated and it needs to be, in order to be truly valuable.
Lastly, metadata repositories are historical . A good repository will hold historical views of the metadata, even as it changes over time. This allows a corporation to understand how their business has changed over time. This is especially critical if the MME is supporting an application that contains historical data, like a data warehouse or a CRM application. For example, if the business metadata definition for “customer” is “anyone that has purchased a product from our company within one of our stores or through our catalog”. A year later, a new distribution channel is added to the strategy. The company constructs a Web site to allow customers to order our products. At that point, the business metadata definition for customer would be modified to “anyone that has purchased a product from our company within one of our stores, through our mail order catalog or through the web”.
A good metadata repository stores both of these definitions because they both have validity, depending on what data you are analyzing (and the age of that data).
Lastly, it is strongly recommended that you implement your Metadata Repository component on an open, relational database platform , as opposed to a custon-designed, proprietary database engine.
Metadata Management Layer
The Metadata Management Layer provides systematic management of the metadata repository and the other MME components (see Figure 5). As with other layers, the approach to this component greatly differs depending on whether a metadata integration tool is used or if the entire MME is custom built. If an enterprise metadata integration tool is used for the construction of the MME, than a metadata management interface is most likely built within the product. This is almost never the case; however, if it is not built in the product, than you would be doing a custom build. The Metadata Management Layer performs the following functions:
Archive
Backup
Database modifications
Database tuning
Environment management
Job scheduling
Load statistics
Purging
Query statistics
Query and report generation
Recovery
Security processes
Source mapping and movement
User interface management
Versioning
Figure 5: Metadata Management Layer
Metadata Marts
A Metadata Mart is a database structure, usually sourced from a Metadata Repository, designed for a homogeneous metadata user group (see Figure 6). “Homogeneous metadata user group” is a fancy term for a group of users with similar needs.
Figure 6: Metadata Marts
There are two reasons why an MME may need to have metadata marts. First, a particular metadata user community may require metadata organized in a manner other than what is in the Metadata Repository component.
Second, an MME with a larger user base often experiences performance problems because of the number of table joins that are required for the metadata reports. In these situations it is best to create metadata mart(s) targeted specifically to meet those user’s needs. The Metadata Marts will not experience the performance degradation because they will be modeled multi-dimensionally.
In addition, a separate meta mart provides a buffer layer between the end users from the Metadata Repository. This allows routine maintenance, upgrades, and backup and recovery to the repository without affecting the availability of the Metadata Mart.
Metadata Delivery Layer
The Metadata Delivery Layer is the sixth and final component of the MME architecture. It delivers the metadata from the Metadata Repository to the end users and any applications or tools that require metadata feeds to them (Figure 7). (3)
Figure 7: Metadata Delivery Layer
The most common targets that require metadata from the MME are:
Applications
Data warehouses and data marts
End users (business and technical)
Messaging and transactions
Meta data marts
Software tools
Third parties
Web sites and e-commerce
Best Practices for MME
Implementing and maintaining a Managed Metadata Environment (MME) effectively requires adherence to best practices that ensure the accuracy, consistency, and reliability of metadata across the enterprise. Here are some key best practices to consider:
Establish a Clear Metadata Strategy and Governance Framework :
A well-defined metadata strategy and governance framework is crucial. This ensures that metadata is properly managed, maintained, and aligned with the organization’s objectives. Data governance policies should cover metadata standards, roles, responsibilities, and processes.
Implement a Universal Metadata Model :
A universal metadata model is essential for accommodating diverse metadata sources and user requirements. This model should be flexible enough to integrate various types of metadata while maintaining consistency and coherence.
Use a Scalable and Flexible Metadata Repository Architecture :
The metadata repository architecture should be designed to scale and adapt to the growing volume of metadata. It should be flexible enough to handle different types of metadata and support future expansions.
Develop a Comprehensive Metadata Integration Process :
A robust metadata integration process is vital for handling multiple sources of metadata. This process should ensure data quality and consistency, integrating metadata seamlessly into the repository.
Implement a Metadata Transformation Layer :
The metadata transformation layer plays a crucial role in converting metadata into a standardized format. This standardization facilitates easier integration and analysis, ensuring that metadata from different sources can be effectively combined.
Leverage Data Warehouse Architectures :
Utilizing data warehouse architectures and technologies can provide a centralized platform for metadata management. This approach supports the MME by offering a robust infrastructure for storing and managing metadata.
Avoid Multiple Processes Extracting the Same Metadata :
To prevent metadata loading errors and inconsistencies, it is essential to avoid having multiple processes extracting the same metadata from the same source. This practice ensures the timeliness and accuracy of the metadata.
Implement a Global Metadata Repository :
A global metadata repository that integrates local metadata repositories provides a unified view of metadata across the enterprise. This approach ensures that metadata is accessible and consistent, supporting better decision-making.
Ensure Security, Flexibility, and Adaptability :
The MME must be secure to protect sensitive metadata. It should also be flexible and adaptable to changing business requirements and user needs, ensuring long-term viability and effectiveness.
By following these best practices, organizations can efficiently manage metadata, ensuring it remains a valuable asset that supports business processes and decision-making.
MME Cost-Effectiveness and Case Studies
A Managed Metadata Environment (MME) can offer significant cost savings and operational benefits to an organization. Here are some insights into its cost-effectiveness and real-world case studies:
Reduction in Manual Metadata Management Costs : Implementing an MME can substantially reduce the costs associated with manual metadata management. Automation and systematic processes improve the accuracy and consistency of metadata, minimizing the need for manual intervention.
Improved Business Process Efficiency : A well-designed MME enhances the efficiency of business processes. By providing a centralized platform for metadata management, it reduces the time and effort required to manage and access metadata, leading to faster and more informed decision-making.
Centralized Platform for Metadata Management : An MME consolidates metadata management into a single, centralized platform. This reduces the need for multiple, disparate systems and applications, streamlining operations and reducing overhead costs.
Enhanced Metadata Quality and Consistency : By improving the quality and consistency of metadata, an MME reduces the risk of errors and inconsistencies. This leads to more reliable data and better business outcomes.
Case Studies Demonstrating Cost Savings and Benefits : Numerous case studies highlight the tangible benefits of implementing an MME. For instance, a large financial institution reported a 30% reduction in metadata management costs and a significant improvement in data quality after deploying an MME. Another case study from a healthcare provider showed enhanced decision-making capabilities and reduced operational costs due to improved metadata integration processes.
Conclusion
For professionals that have built an enterprise metadata repository they realize that it is so much more than just a database that holds metadata and pointers to metadata. Rather it is an entire environment. The purpose of the MME is to illustrate the major architecture components of that managed metadata environment.
This article is adapted from the book “Universal Metadata Models” by David Marco & Michael Jennings, © John Wiley & Sons (2000)
(1) See Chapter 7 of “Building and Managing the Metadata Repository” (David Marco, Wiley 2000) for a more detailed walkthrough of these approaches.
(2) See Chapters 4 – 8 of “Universal Metadata Models” (David Marco & Michael Jennings, Wiley 2004) to see various physical meta models.
(3) See Chapter 10 of “Building and Managing the Metadata Repository” (David Marco, Wiley 2000) for a detailed discussion on metadata consumers and metadata delivery.