Thursday, January 16, 2014

Commentary: The Case of the Missing Enterprise Data Warehouse

Symptom: You have 100 reports that are unreliably generated for you and the reported numbers conflict from report to report.

Root Cause:  Your company has no Enterprise Wide Data Warehouse.

How Did that Happen?   Here is the Typical Evolving Scenario: 
You or your management commissions someone to provide a specific report.  That report was quickly put together and placed on a server under someone's office desk.  Rinse and Repeat...   Eventually you realize you have a series of reports that give conflicting numbers.  So you have someone investigate why and you get the following answer:  You have 100 Reports coming from 10 Reporting servers under 10 different office desks across 3 office sites.   There are 20 different Extraction, Transformation, and Loading Pipelines all with a different codebase and with different logic.  Looking even deeper you may find that the ETL Pipeline's error handling, late arriving data processes, cleaning processes, data enrichment processes, the data conformation processes, and data variance checks all vary in design or just plain absent.   Then the real bad news hits you:  What you thought were the same measures in different reports are not the same.  They technically have different  meanings even though the have the same name.

No Data Warehouse Diagram:

Hidden Costs in Not Having a Enterprise Wide Data Warehouse:
  • Increased Development Costs for maintaining multiple and fragmented codebases doing same thing.
  • Increased Operations and Development Costs in troubleshooting late or missing reports and erroneous numbers.
  • Increased Business Risks in data landing in the hands of the wrong people due to your report servers not being properly secured.  This results in these servers being hacked or just plain walking out the door.
  • Decreased Business Opportunities and Optimization caused by relying on in accurate and inconsistent reports.
Solution:  Build a Enterprise Wide Data Warehouse by providing the following.
  • Conformed Dimension Definitions 
  • Conformed Measure Definitions
  • Conformed Atomic Level Fact Definitions
  • Managed Dimensional Data
  • Documented Data SLAs going In and Out of the Warehouse
  • Documented Data Lineage
  • Symmetrical ETL Pipeline Process Management
  • Uniform handling late arriving data
  • Uniform Error Handling
  • Uniform Variance Checks
  • Uniform Security Policies and Management
  • Shared Codebase and Framework
  • Shared Environment
  • Uniform backup and replication of data
  • Uniform Data Life Cycle Management:  Data Archival and Deletion as it ages
Traditional Data Warehouse Diagram:

No comments: