Following the merger of two operating entities, the Group's data infrastructure was fragmented, unobservable, and unable to support a unified business.
Two independent source systems: SQL Servers, REST APIs, payroll platforms, and separate ticketing infrastructure, with no shared data model or centralized warehouse.
Real-time ticket volumes, daily sales, and promotion redemptions were unavailable in consolidated form across US markets. The most data-intensive systems, each carrying up to 550 GB, were running for 2–3 hours per pipeline cycle with no observability into whether they completed. Daily sales reporting added another couple of hours.
Snowpipe had no orchestration, no deterministic completion tracking, and no table-level metadata. Failures were silent; root cause analysis was slow and entirely manual.
Legacy SQL Server infrastructure buckled under post-merger analytical load. Reports were stale and distrusted. No lineage meant teams couldn't trace numbers or diagnose anomalies.
Inferenz rebuilt the Group’s data platform from the ground up, replacing Snowpipe with a fully orchestrated Apache Airflow (MWAA) framework and centralizing all data into a governed Snowflake warehouse.
A config-driven ingestion framework was built so each source system’s tables, load type, and schedule are defined in configuration. No DAG code changes are required to onboard a new source, cutting integration time to 4–6 weeks.
It accommodated a 36× range in source system size, from a 4-table, 15 GB system to 100+ table systems carrying 550 GB, while treating each consistently.
Dynamic dbt DAG generation ensured all transformation pipelines follow a consistent, auto-generated execution pattern. Per-system concurrency isolation eliminated pipeline collisions, enabling safe parallel processing across all sources.
A comprehensive audit layer captures every run: status, row counts, load type, and watermark per table. Ticketing, sales, and promotion data were unified in a Snowflake Consumption Layer purpose-built for Power BI process.
Daily pipeline run time cut from 2-3 hours to ~30-50 minutes via parallel dbt execution with per-system isolation.
Unified view of 12 source systems across ticketing, daily sales, and promotion redemption across US markets, consolidated for the first time.
With config-update-only framework across 12 source systems, 600+ tables within 4-6 weeks.
Table-level status, row counts, load type, and watermark captured across tracked tables on every run.
Whether you’re starting with data modernization or exploring AI copilots, we’re here to help.
Contact Us