Data Glass: Technology: ARTIST MDM Technology Selection

Monday, February 21, 2022

Technology: ARTIST MDM Technology Selection

This is part II from the Previous Post.

This is an exploration of how I might implement a master data management and release service. There are so many ways, technologies, and platforms to use to implement this. I chose AWS as the cloud service provider and databricks as the server orchestration rather than going deep into EKS or Kubernetes hard core for reasons of team efficiency. Refactoring to using EKS or Kubernetes in production rather than Databricks can be taken at a future time when the budget and size of the time allows.

Ultimate Technical Goal

Support 20 Billion Episodes & Movies
Support 100 concurrent curators
Support Localization of text
Support Change Control
Support Data Ticket tracking
Support HITL (Human in the Loop)
Data Structure to be flexible and expandable

Approach

Use standard services so its easy to find intermediate level developers
Keep it simple & maintainable

Teams

Data Engineering - Pipeline, Warehouse, Containers
Video Prep - Video Capture, Pre-Processing, Video Security and Storage
Data Science - ML & NLP packages
UI/UX Dev - ARTIST Website, HITL, Data Ticketing, and Data Collection UI

Preliminary Technology Choice

Use S3 buckets
Organize data using S3 folders and naming conventions
Store raw pre-ingested data as json documents
Use Aurora transactional databases

MDM Editor Database
Data Ticketing and Tracking Database

Databricks

Server orchestration
Job Management
Machine Learning
Data Pulls from 3rd Party APIs

ECS/ECR/ELB/Fargate for Data Collection (MDM UI) Website
Use Lambda for internal API
Use json to store schema, data entry rules, ui presentation and editing layouts

React Material UI

AWS OpenSearch

Connector to Aurora for indexing searches

Topological Diagram

Please refer to my previous post on MDM design for details on the data and process models.

Data Glass

Pages