Data management is a discipline that deeply cares for semantics and ontologies, so many may find it ironic that there is often confusion regarding what a business glossary
and a data catalog are. Even some data veterans use the terms interchangeably, but they are not the same.
According to the Data Management Body of Knowledge (DMBoK), business glossaries contain agreed-upon definitions of business terms and relate these to data. In contrast, a data catalog will reference the location of an enterprise’s data elements, many times providing other important details such as technical metadata and/or lineage.
Business glossaries host business terms. Data catalogs host data elements. But what are the differences between these two artifacts? It all depends on the types of metadata you are collecting.
Business Terms: The what and why
Business terms are used to provide business context to data that is most important to an organization. Some companies think that having more business terms is better - the number of definitions might even be a performance metric for their data governance teams. However, defining data for the sake of defining data is unproductive to say the least.
That is why smarter organizations prioritize definitions that they recognize as unique for their business processes. For example, how net revenue is calculated will vary depending on factors like functions, business processes, location, legal entities and currency. It is up to the data stewardship teams to define what “net revenue” means within their organization. For this reason, companies should not ingest thousands of data elements from their data sources and call it a Business Glossary.
Data Elements: The where and how
On the other hand, ISO standard 2382 defines data elements as units of data that are considered in context to be indivisible. Data elements can be stored in databases, exchanged in messages and manipulated by software programs. As mentioned in the Open Data Element Framework, a data element could be a database column, a cell in a spreadsheet, an RDF triple, an XML attribute or atomic element, a program variable or array element or a JSON value. A data element can be of any size. This includes a bit containing a binary value or a large file of unstructured data.
The metadata collected for data elements is mostly technical in nature, with attributes such as “table name”, “data type” and “decimal precision” as some of the most common ones found in catalogs across industries. Overall, data catalogs are generally less concerned with business meaning and focus more on where and how the data is stored and accessed.
Modeling the Relationship between Business Terms and Data Elements
Now that we have described key aspects of both business terms and data elements, we can address some of the most common problems for organizations that are struggling to define their governance models.
Defining Metadata Attributes
As we’ve previously discussed, it is critical to organize metadata by “levels of abstraction”. The following analogy may clarify this. Data Models exist in three main types: conceptual, logical and physical. Each type highlights different aspects of the data model. Coincidentally, they can also be used to differentiate metadata types as defined by the table below.
These divisions can also be assigned to the different personas that operationalize a governance framework:
Business stewards are closer to business terms, helping document and update the business meaning behind a particular piece of data.
Enterprise architects define metadata for the data attribute that it represents, with relevant information such as the parent data entity.
Technical data stewards and perhaps data custodians are more involved with the physical “layer”, ensuring that the data’s location is identified and relevant technical metadata is accounted for.
It is important to mention that there are some metadata attributes that lie in a grey area. A typical example is PII (personally identifiable information) flag. A case can be made for either construct. Ultimately, the decision companies make depends on which level has tightly governed metadata, how this is made visible to the broader organization, and their reporting/analytics practices.
Relating Business Terms and Data Elements
Perhaps the most common argument against maintaining both a business glossary and a data catalog is the suggestion of just simply collecting all the metadata in a single construct. To quote American journalist H. L. Mencken, “Every complex problem has a solution which is simple, direct, plausible — and wrong.”
To illustrate why this is generally a bad idea, consider the following example. A company stores phone numbers. Prospects and clients are stored in Salesforce, while employee phone numbers are saved in an Oracle database. Both data elements are shown as they relate to their system of record, but how do we relate them between each other?
The “simple” flat approach would be to grab all of that metadata and stick it into a single “phone” construct, which would look something like this.
This is most definitely not scalable. The model would fall apart the moment another data element for phone is discovered. Browsing and reporting on data by column-table hierarchy would be complicated. A much more scalable alternative can be modeled if you use a separate construct to encompass the semantic relationship between data elements. See how below, even if we add a new data element, (supplier phones stored in another database), we can easily relate it with the Phone business term.
Real metadata systems do not stop at data-element or business-term level. The diagram may get more complex as business rules, policies, data domains and other metadata are introduced. Hence the need to address the conceptual, logical and physical layers of metadata the same way software engineers have been architecting modular software design: with low coupling and high cohesion. That is, to keep data assets linked but independent from each other, each construct containing only the cohesive metadata that makes sense to govern in the same place.
The Bottom Line
Now that we’ve walked through the difference between business terms and data elements, it’s important to remember that modeling the relationship between the two will vary by organization. This can be due to your organization’s structure, your data management maturity, your reporting practices, and the types of metadata you are collecting. A knowledgeable partner can help you determine how to best group your metadata and link it from the semantic layer to technical implementation in your specific context.