Automating Ingestion for Visitor Records via Config-Driven Pipelines

Share:

Automating Ingestion for Visitor Records via Config-Driven Pipelines

INDUSTRY

  • Entertainment & Theme Parks

TECH STACK

  • Apache Airflow (MWAA) · Snowflake · dbt
  • Amazon S3 · Power BI
  • Terraform

SCOPE OF WORK

  • Post-merger data platform modernization: two entities into one warehouse
  • Airflow-orchestrated ingestion replacing unmanaged Snowpipe
  • Unified ticketing, sales, and promotion visibility across US markets
  • Config-driven, zero-code source onboarding framework
  • Full audit layer: run status, row counts, watermarks per table
  • SLA-aligned Power BI Consumption Layer with daily pipeline health reporting

Key Highlights

Previous
Next

Unified Post-Merger Data Platform

Converged two siloed ecosystems of separate SQL Servers, ticketing platforms, APIs, and payroll systems of 12 source systems brought into a single Snowflake warehouse, giving the business one trusted source of truth.

Quick Pipeline Execution

Parallel dbt runs with per-system isolation cut daily pipeline time from ~1 hour to ~10 minutes. The reduced new source integration came down to some weeks, no code changes needed.

100% Sales & Promotion Visibility

Ticketing, daily sales, and promotion redemption data, previously invisible across geographies, consolidated into a single Consumption Layer powering SLA-compliant Power BI dashboards.

Challenges

Following the merger of two operating entities, the Group's data infrastructure was fragmented, unobservable, and unable to support a unified business.

Data Fragmentation Across Merged Entities

Two independent source systems: SQL Servers, REST APIs, payroll platforms, and separate ticketing infrastructure, with no shared data model or centralized warehouse.

No Visibility into Ticketing and Sales

Real-time ticket volumes, daily sales, and promotion redemptions were unavailable in consolidated form across US markets. The most data-intensive systems, each carrying up to 550 GB, were running for 2–3 hours per pipeline cycle with no observability into whether they completed. Daily sales reporting added another couple of hours.

Unmanaged Snowpipe with No Observability

Snowpipe had no orchestration, no deterministic completion tracking, and no table-level metadata. Failures were silent; root cause analysis was slow and entirely manual.

Reporting Bottlenecks and Zero Lineage

Legacy SQL Server infrastructure buckled under post-merger analytical load. Reports were stale and distrusted. No lineage meant teams couldn't trace numbers or diagnose anomalies.

Our Solution

Inferenz rebuilt the Group’s data platform from the ground up, replacing Snowpipe with a fully orchestrated Apache Airflow (MWAA) framework and centralizing all data into a governed Snowflake warehouse.

A config-driven ingestion framework was built so each source system’s tables, load type, and schedule are defined in configuration. No DAG code changes are required to onboard a new source, cutting integration time to 4–6 weeks.

It accommodated a 36× range in source system size, from a 4-table, 15 GB system to 100+ table systems carrying 550 GB, while treating each consistently.

Dynamic dbt DAG generation ensured all transformation pipelines follow a consistent, auto-generated execution pattern. Per-system concurrency isolation eliminated pipeline collisions, enabling safe parallel processing across all sources.

A comprehensive audit layer captures every run: status, row counts, load type, and watermark per table. Ticketing, sales, and promotion data were unified in a Snowflake Consumption Layer purpose-built for Power BI process.

Impact Delivered

~75% Faster

Daily pipeline run time cut from 2-3 hours to ~30-50 minutes via parallel dbt execution with per-system isolation.

100% Visibility

Unified view of 12 source systems across ticketing, daily sales, and promotion redemption across US markets, consolidated for the first time.

3 TB Managed

With config-update-only framework across 12 source systems, 600+ tables within 4-6 weeks.

Full Audit Trail

Table-level status, row counts, load type, and watermark captured across tracked tables on every run.

Success Stories

Intelligent Data Integration for a US-Based Home Care Organization 

Unifying 32 siloed systems into a single, scalable data warehouse across 12 acquired entities

Read More

Automating Policy Ingestion via AI-Powered Extraction

For a leading e-commerce platform for health and wellness serving millions of active customers

Read More

Accelerating Insight Generation via Natural-Language AI

For a leading e-commerce platform for health and wellness serving millions of active customers

Read More

Deploying a Zero-Disruption Cloud Warehouse in 100 Days

For a multi-national carrier migrating live Athena workflows and data pipelines

Read More

Reducing Post-Call Documentation Time via AI Transcription

For a US-based health provider serving across 190+ US care locations

Read More

Unifying 40+ Data Sources into a Governed Analytics Platform

For a high-end charter operator serving a global, high-net-worth clientele

Read More

Let’s create something truly remarkable & intelligent!

Whether you’re starting with data modernization or exploring AI copilots, we’re here to help.

Contact Us