Wednesday, January 08, 2014

Baseline Conceptual Models: Pipeline Configuration Model

 
Pipeline Configuration Model:  Pipelines in a data warehouse will have many scheduled jobs that process data from raw to atomic fact to a final report.  Some processes are a dependent one or more datasets and can output one or more datasets.  Were these processes are executed is dependent on where the data is located.   This model is dependent on Data & Schema Model and Warehouse Shared Configuration Model.

Types of Pipelines:  There are different pipelines for different stages of the data.
  • 3rd Party Dataset Pipelines are extracting, minor transformation, and loading processes that pull data from external companies.
  • Raw Log Pipelines are extracting, minor transformation, and loading processes that pull data provided from other systems within the company.  Examples are Website Logs, Sales or Purchase Orders, and Inventory Movement.
  • Master Data Domains Pipelines are extracting, minor transformation, and loading processes that pull from a Master Data Service or collected from across multiple company systems. 
  • Dimension Pipelines are transformation processes to transform Master Data Domains into dimensions.
  • Staging Pipelines are transformation processes that cleanse, enhance, or conform data.
  • Atomic Fact Pipelines  are transformation processes that transform logs into atomic facts.
  • Aggregated Fact Pipelines  are transformation processes that transform atomic into aggregate facts.   
  • Report Pipelines  are transformation processes for transforming atomic or aggregate facts into reports. 
Note:  Pipeline Data entity provides information about the input and output data going into or out of a process and is important for in tracking course grain data lineage.

Please see (Baseline Conceptual Models Commentary) for further details on what conceptual models are to be used for.

No comments: