The client, a global telecommunications company operating 23M+ public WiFi hotspots and serving 31.5M broadband customers, needed real-time insight across its AWS infrastructure and partner tools without risking live services. The existing monitoring and data architecture presented three core operational challenges:
Operations teams relied on periodic batch reports rather than live dashboards. This delay meant leaders could not see issues early or act on them in time, creating blind spots across the network’s real-time performance.
Results from monitoring and analytics workflows were transferred manually to downstream systems and audit stores. This manual work introduced errors, created bottlenecks, and made it difficult to maintain clean, consistent records at scale.
Without automated, priority-routed alerting, critical issues were not reaching the right teams fast enough. Detection-to-action time was too long, increasing the risk of prolonged service disruptions across the company’s massive network footprint.
Any modernization of the data and monitoring infrastructure had to be executed without disrupting live services. The scale of operations-64M institutions on the network footprint-meant even brief downtime could have outsized impact.
Inferenz built a cloud warehouse and price-elasticity engine in 100 days, delivering real-time operational insight across the client’s AWS and partner tools without risking live services. The solution consisted of the following components:
Automated Pipelines
S3, Glue Crawler, Athena, Step Functions, and Lambda run end-to-end with zero manual hops. This replaced fragmented batch processes with a seamless, fully automated data flow that delivers results directly to monitoring systems and an audit store.
Comprehensive Monitoring & Alerts
Metrics and logs flow to CloudWatch, ELK, and Datadog, providing multi-layered observability across the infrastructure. CloudWatch Alarms route priority notifications to Slack, giving operations teams immediate visibility into issues as they emerge.
Integrated Notifications
EventBridge, SNS, and SQS coordinate retries and decoupled alert handling, ensuring that critical notifications reach the right teams reliably. This architecture eliminated single points of failure in the alerting chain and supported graceful degradation under load.
Dynamic Lambda Functions
Purpose-built Lambda functions transform Athena query outputs and stream the results to Datadog and S3 in near real-time. This enabled live dashboards to replace static batch reports, giving leaders the ability to see issues early and act faster.
CI/CD & Reusability
GitHub manages generic payload templates and GitHub Actions automate consistent deployments across environments. This standardized approach reduced deployment friction, minimized configuration drift, and ensured repeatable, reliable releases.
Athena workflows moved while dashboards stayed fully available, ensuring no downtime or data gaps.
Automated transfers and dashboards replaced manual handoffs across monitoring and audit workflows.
Slack-routed priority alerts cut detection-to-action time across operations teams.
End-to-end cloud warehouse and monitoring platform, built and deployed in 100 days.
Whether you’re starting with data modernization or exploring AI copilots, we’re here to help.
Contact Us