Automating Data Ingestion Across 12 Source Systems for a Nationwide Entertainment Operator

Automating Data Ingestion Across 12 Source Systems for a Nationwide Entertainment Operator

Client Overview

  • 34+

    Parks merged

  • ~50M

    Annual visitors

  • $3.1B+

    Revenue in 2025

INDUSTRY

  • Hi-Tech/ Entertainment and theme parks

TECH STACK

  • Data platform
    • Snowflake
    • dbt
    • Amazon S3
  • Orchestration
    • Apache Airflow (MWAA)
  • Visualization
    • Power BI
  • Infrastructure and IaC
    • Terraform

Executive Summary

Following the merger of two operating entities into a single group of multiple parks, the organization’s data infrastructure was fragmented across separate SQL Servers, ticketing platforms, payroll systems, and APIs with no shared data model and no centralized warehouse. Pipeline cycles ran for two to three hours with no observability, sales and promotion data was unavailable in consolidated form, and Snowpipe had no orchestration or failure detection. Inferenz rebuilt the data platform from the ground up on Snowflake, replaced Snowpipe with a fully orchestrated Apache Airflow framework, and delivered a config-driven engine that onboards new source systems in four to six weeks with no code changes required.

Challenges

Following the merger of two operating entities, the group's data infrastructure was fragmented, unobservable, and unable to support a unified business.

01

Data fragmentation across merged entities

Two independent source systems, including SQL Servers, REST APIs, payroll platforms, and separate ticketing infrastructure, with no shared data model or centralized warehouse. Twelve source systems ranging from 15 GB to 550 GB had no unified foundation.

02

No visibility into ticketing and sales

Real-time ticket volumes, daily sales, and promotion redemptions were unavailable in consolidated form across US markets. The most data-intensive systems ran for two to three hours per pipeline cycle with no observability into whether they completed successfully.

03

Unmanaged Snowpipe with no observability

Snowpipe had no orchestration, no deterministic completion tracking, and no table-level metadata. Failures were silent and root cause analysis was slow and entirely manual, with no way to trace which tables had loaded or why runs had stalled.

04

Reporting bottlenecks and zero lineage

Legacy SQL Server infrastructure buckled under post-merger analytical load. Reports were stale and distrusted. With no lineage in place, teams could not trace numbers back to source or diagnose anomalies when figures conflicted across systems.

Our Solution

Inferenz rebuilt the group's data platform from the ground up, replacing Snowpipe with a fully orchestrated Apache Airflow framework and centralizing all data into a governed Snowflake warehouse.

Config-Driven ingestion with Airflow orchestration

A config-driven framework defines each source system's tables, load type, and schedule in configuration rather than code: no DAG changes required to onboard a new source, cutting integration time to four to six weeks regardless of system size, from a 4-table 15 GB system to 100+ table systems carrying 550 GB. Unmanaged Snowpipe was replaced with a fully orchestrated Apache Airflow pipeline on MWAA, making every run deterministic, tracked, and observable, with structured run status and table-level alerting replacing silent failures.

Dynamic dbt DAG generation with concurrency isolation

Dynamic dbt DAG generation standardized the transformation layer across all 12 systems with a consistent, auto-generated execution pattern, removing manual maintenance of individual transformation logic. Per-system concurrency isolation eliminated pipeline collisions, enabling safe parallel processing, daily dbt pipeline run time dropped from approximately one hour to approximately 10 minutes as a direct result.

Audit layer and observability

A comprehensive audit layer captures every run across 50 to 60 tracked tables: status, row counts, load type, and watermark per table, giving teams full visibility into what loaded, when, and whether it completed within SLA, replacing complete operational blindness.

Consumption layer for Power BI

Ticketing, sales, and promotion data were unified in a Snowflake Consumption Layer purpose-built for Power BI, with SLA-compliant dashboards surfacing daily pipeline health, sales performance, and promotion redemption data across all US markets in consolidated form for the first time.

Impact Delivered

~75%

Faster daily pipeline

Daily pipeline run time reduced from 2 to 3 hours to 30 to 50 minutes via parallel dbt execution with per-system isolation.

100%

Sales and promotion visibility

Ticketing, daily sales, and promotion redemption data consolidated into a single consumption layer across all US markets for the first time.

3 TB

Managed via config only

12 source systems and 600+ tables onboarded and governed through configuration updates alone, with no code changes required for new sources.

Full

Audit trail per run

Table-level status, row counts, load type, and watermark captured across 50 to 60 tracked tables on every pipeline run.

Let’s create something truly remarkable & intelligent!

Whether you’re starting with data modernization or exploring AI copilots, we’re here to help.

Contact Us