Business Case
When two leading park operators merged, each of the 34 venues still ran its own SQL Server. Data teams faced these problems
- Source Fragmentation 2,700 tables lived on separate park servers, so teams pulled numbers from different snapshots
- Schema Drift New or dropped columns crashed hard-coded jobs, triggering late-night fixes
- Zero Audit Trail No run-level log showed row counts or errors, so failures stayed hidden until reports looked wrong
Our Solution
We built a config-driven pipeline that gathers data from all 34 park servers into one Snowflake warehouse.
- Central Config Table All mappings, S3 paths, and load flags now sit in a Snowflake table that analysts can edit
- Object-Level Refresh Tracking Start time, end time, status, and duration are captured for every table, every run
- Row-Count Verification Source and target totals are compared on the fly. Any mismatch triggers an alert
- Dynamic COPY Procedure A stored procedure builds COPY INTO commands for each table and writes outcomes to an audit log
- Airflow-Orchestrated Loads Python tasks move files from SQL Server to S3, call the dynamic COPY, then archive each batch