For an enterprise to carry out its functions, it needs an ecosystem of business applications, data platforms to store and manage the data, and reporting solutions to provide a view into how the enterprise is performing.
Large enterprises with multiple strategic business focus areas need many such applications, and as often seen, over the years the enterprise landscape gets into a spaghetti-like situation where it becomes incomprehensible to articulate which application and which data store does what!
Various reasons can be attributed to such a state: lack of enterprise-wide data standards, minimal metadata management processes, inadequate data quality and data governance measures, unclear data archival policies and processes, so on.
In order to overcome this problematic situation, enterprise information
management as an organization-wide discipline is needed.
Enterprise Information Management (EIM) is a set of data management initiatives to manage, monitor, protect, and enhance the information needs of all the stakeholders in the enterprise. In other words, EIM lays down foundational components and appropriate policies to deliver the right data at the right place at the right time to the right users.
Figure 2-1 lists these foundational components and describes the roles they play in the overall business and IT environment of any organization. The goal is management of information, data, and content to meet the needs of the business.
The entire framework of EIM has to exist in a collaborative business and IT
EIM in a small company or in a startup may not require the same approach and rigor as EIM in a large, highly matured and/or dvanced enterprise. The interactions between the components will vary from industry to industry and will be largely governed by business priorities; following a one-size fits all kind of approach to EIM implementation may amount to overkill in many situations. But in general, the following are key components any data-driven enterprise must pay attention to.
- Business Model: This component reflects how your organization operates to accomplish its goals. Are you metrics driven? Are you heavily outsourced, or do you do everything in-house? Do you have a wider eco-system of partners/suppliers or do you transact only with a few? Are your governance controls and accountability measures centralized, decentralized, or federated? The manner in which you get your business objectives successfully implemented down to the lowest levels is your business model.
- Information Management and Usage: A key expectation from an EIM program is to make sure that data and content are managed properly, efficiently, and benefit the business without extra risk.
EIM by definition covers all enterprise information, including reports, forms, catalogs, web pages, databases, and spreadsheets: in short, all enterprise- related structured and unstructured data. All enterprise content may be valuable, and all enterprise content can pose risk. Thus enterprise information should be treated as an asset.
- Enterprise Technology and Architecture: Every enterprise
has a defined set of technology and architectures upon which business applications are developed and deployed. Although technology and architecture are largely under the IT department’s purview, business requirements and priorities often dictate which technology and architecture to follow. For example, if the company’s business is primarily through online applications, then the enterprise technologies and architectures will have a heavy footprint of web-centric technology and architectures. If the company decides they would like to interact with their customers through mobile channels, then you need to make provisions for mobility as well. The choice of technologies and architectures also reflects the type of industry the business belongs to. For example, in the financial services industry where data security and privacy is of utmost concern, it is normal for companies to invest in only a few enterprisescale platforms, whereas for the retail industry such measures may not be required. So, you will see a plethora of technologies and architectures, including open source systems. The extent to which organizations deploy various technologies and architectures is also a component of EIM.
- Organization and Culture: Who is responsible for managing your data? Is it business or IT or both? If you want your enterprise data to be treated as an asset, you need to define an owner for it. You will need to implement positions and accountabilities for the information being managed. You cannot manage inventory without a manager, and you cannot tackle information management without someone accountable for accuracy and availability. EIM helps in establishing a data-driven culture within the enterprise. Roles like data stewards further facilitate the datadriven culture, where right from the CxO levels to the lowest level, people in your organization use data to make informed decisions as opposed to gut-feel decisions.
- Business Applications: How data is used is directly proportional to the value of the data. If you are managing your data as an asset, then the only way to know if that asset has value is to understand how it is used, where it is used, and what impact it is having on the business.
Your transactional applications, operational applications,and decision support applications are all considered to be business applications. You just don’t go on creating various types of business applications blindly. The company’s business priorities and road maps serve as a critical input to define what kind of business applications need to be built and when. These inputs are then fed into the EIM program to determine what technology and architectures are required, how they will be governed, who will use them, and so on.
- Enterprise Data Model and Data Stores: Enterprise business applications can’t run by themselves, so they will need data models and data stores. It is not uncommon to find numerous data models and data stores in an enterprise setup. Too many data models and data stores can cause severe challenges to the enterprise IT infrastructure and make it inefficient; but at the same time, too few data models and data stores will put the company at the risk of running its business optimally. A balance needs to be achieved, and EIM helps in defining policies, standards, and procedures to bring some sanity to the enterprise functioning.
- Information Lifecycle Management: Data and content have a lifecycle. It gets created through transactions and interactions and is used for business-specific purposes; it also gets changed and manipulated following business specific rules, and it gets read and analyzed across the enterprise and then finally reaches a stage where it must be archived for later reference or purged, as it has attained a “use by” state.
EIM defines the data policies and procedures for data usage and thus balances the conflict of retiring data versus the cost and risk of keeping data forever.
Information lifecycle management, if properly defined, also helps in addressing the following common questions:
- What data is needed, and for how long?
- How can my business determine which data is most valuable? Are we sure about the quality of the data in the organization?
- How long should we store this “important” data?
- What are the cost implications of collecting everything and storing it forever? Is it even legal to store data in perpetuity?
- Who is going to go back multiple years and begin conducting new
analysis on really old data?
- I don’t understand the definitions of data elements, where will I find the metadata information?
There are several important considerations around data quality, metadata
management, and master data management that need to be taken into account under the purview of information lifecycle management.
A key component of EIM is to establish data lineage (where data came from, who touched it, and where and how it is used) and data traceability (how is it manipulated, who manipulated it, where it is stored, when it
should be archived and/or purged).
This function of EIM is extremely valuable for any enterprise. Its absence creates data silos and unmanageable growth of data in the enterprise. In short, you need to know
full lineage, definitions, and rules that go with each type of data.
Lack of appropriate data hampering business decision making is an acceptable fact; however, poor data quality leading to bad business decisions is not at all acceptable.
Therefore monitoring and controlling the quality of data across the enterprise is of utmost importance. But how do we monitor the quality of data? Using metrics, of course. That means we need a process for defining data quality metrics. Below is a high-level approach to defining DQ metrics your EIM program should follow:
- Define measurable characteristics for data quality. Examples are: state of completeness, validity, consistency, timeliness, and accuracy that make data appropriate for a specific use.
- Monitor the totality of features and characteristics of data that define their ability to satisfy a given purpose.
- Review the processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria.
The end result is a set of measurement processes that associate data quality scores against each business critical data entity. These scores help in quantifying conformance to data quality expectations. Scores that do not meet the specified acceptability thresholds indicate non-conformance.
Closely associated with data quality is the concept of master data management (MDM). MDM comprises a set of processes, governance, policies, standards, and tools that consistently define and manage the master data (i.e. non-transactional data entities) of an organization (which may include reference data).
MDM has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting, and distributing such data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this information. A data element, used in various applications, is likely to mean different things in each of them. For example, organizations find it difficult to agree on the definition of very important entities like customer or supplier.
At a basic level, MDM seeks to ensure that an organization does not use multiple (potentially inconsistent) versions of the same master data in different parts of its operations, which can occur in large organizations.
A common example of poor MDM is the scenario of a bank at which a customer has taken out a mortgage and the bank begins to send mortgage solicitations to that customer, ignoring the fact that the person already has
a mortgage account relationship with the bank. This happens because the customer information used by the marketing section within the bank lacks integration with the customer information used by the customer services section of the bank.
Data quality measures provide means to fix data related issues already existing in the organization whereas MDM, if implemented properly, prevents data-quality-related issues from happening in the organization.
Metadata management deals with the softer side of the data-related issues, but it is one of the key enablers within the purview of information lifecycle management. The simplest definition of metadata is “data about data.” In other words, metadata can be thought of as a label that provides a definition, description, and context for data.
Common examples include relational table definitions and flat file layouts. More detailed examples of metadata include conceptual and logical data models. A famous quote, sometimes referred to as “Segal’s Law,” states that: “A man with one watch knows what time it is. A man with two watches is never sure.”
When it comes to the metrics used to make (or explain) critical business decisions, it is not surprising to witness the “we have too many watches” phenomenon as the primary cause of the confusion surrounding the (often conflicting) answers to common business questions, such as:
- How many customers do we have?
- How many products did we sell?
- How much revenue did we generate?
Therefore, another example of metadata is providing clear definitions of what the terms “customers,” “products,” and “revenue” actually mean.
Metadata is one of the most overlooked aspects of data management, and yet it is the most difficult initiative to implement.
Metadata can potentially encompass many levels; from a single data element on the database to a more complex entity, such as customer, for example, which will be a composite of other elements and/or entities.
- Regulations and Compliance: Irrespective of which industry your company belongs to, regulatory risk and compliance is of utmost concern. In some industries like financial services and health care, meeting regulatory requirements is of the highest order; whereas other industries may not be exposed to such strict compliance rules. EIM helps you address the regulatory risk that goes with data.
- Governance: Governance is primarily a means to ensure the investments you are making in your business and IT are sustainable. Governance ensures that data standards are perpetuated; data models and data stores are not mushrooming across the enterprise, roles like data stewards are effective, and they resolve conflicts related to data arising within business silos. Most importantly, governance, if enforced in the right spirit, helps manage your data growth and cost impact optimally.
As you can see, there are many components in the EIM framework that must interact with each other in a well-orchestrated manner. When we were discussing EIM, we had mostly discussed data in a generic sense to include all possible types of data and all possible types of data sources (internal data sources as well as external data sources).
EIM is at a framework level and does not necessarily anticipate what needs to be done when you are dealing with different kinds of data, especially when we refer to big data characteristics like volume, velocity, and variety.
There are several challenges (some new and some are old, but their impacts are magnified) when we start looking at the finer details of big data and how they impact the EIM framework.
Does this mean we will need a radically different approach for the enterprise information management framework?
This post is an excerpt from the book: “Big Data Imperatives – Enterprise Big Data Warehouse, BI Implementations and Analytics” by Apress available on Amazon