Building a Predictive Risk Analytics and Early Alert System for a Global Life Sciences Company

Building a Predictive Risk Analytics and Early Alert System for a Global Life Sciences Company

Client Overview

  • 170+

    consumer health and pharma brands

  • 138+

    Years in operation

  • 80+

    countries with active operations

INDUSTRY

  • Healthcare/ Pharmaceutical and Life Sciences

TECH STACK

  • Clinical Data Source
    • MIMIC-IV (1M+ patient records)
  • Data Engineering,
    • PostgreSQL
    • SQL
  • ML Development
    • Python
    • Decision Tree
    • Random Forest models
  • Cloud Infrastructure
    • AWS

Executive Summary

The client needed to identify patients at risk of Acute Respiratory Distress Syndrome (ARDS) before onset, with a particular focus on ARDS activation triggered by the COVID-19 Delta variant. The challenge was significant: within a dataset of 1 million patient records sourced from MIMIC-IV, only 10% of cases were ARDS-positive, creating a severe class imbalance that caused standard models to overlook the patients who needed intervention most. Inferenz accessed and deployed the MIMIC-IV dataset on a PostgreSQL server on AWS, extracted and preprocessed emergency department and ICU parameters including heart rate, temperature, PaO2, SBP, DBP, and FiO2, and built Decision Tree and Random Forest classification models to predict ARDS onset.

Challenges

Detecting ARDS onset early required solving three compounding problems: disconnected clinical data sources with no unified structure, unreliable patient labelling across datasets, and a class distribution that made standard modelling approaches clinically unusable.

01

Source Mapping Across Disconnected Clinical Systems

Emergency department and ICU feeds used different formats and identifiers for the same parameters, making it impossible to trace a single metric across systems with confidence. Vital signs, lab values, and respiratory parameters such as PaO2 and FiO2 were recorded inconsistently, requiring significant preprocessing before any modelling could begin.

02

Cohort Labelling Without a Shared Definition of ARDS Positive

There was no consistent rule for tagging a patient as ARDS-positive across units and studies within the MIMIC-IV dataset. Label definitions varied by clinical context, creating noisy training targets that made model setup unreliable. Without a standardized cohort labelling approach, any model trained on this data would produce inconsistent and untrustworthy predictions.

03

Severe Class Imbalance in the ARDS Dataset

Of the 1 million patient records in the dataset, only 10% were ARDS-positive, the remaining 90% were negative cases. This extreme imbalance caused standard classification models to overwhelmingly predict the safe majority, systematically missing the 100,000 ARDS cases that represented the entire clinical purpose of the engagement.

04

Understanding ARDS Activation From the COVID-19 Delta Variant

A specific clinical question driving the engagement was why the COVID-19 Delta variant was triggering ARDS at a higher rate than prior variants. This required the model to identify which parameter combinations were most predictive of ARDS in Delta-affected patients specifically.

Our Solution

Inferenz took a phased approach from a clean data foundation first, feature engineering second, and classification model development third, all deployed within the client's own AWS infrastructure.

Built a queryable clinical data foundation

Deployed the MIMIC-IV dataset (1M+ patient records) on a PostgreSQL server within AWS, then generated tables and extracted ED and ICU parameters — heart rate, temperature, PaO2, SBP, DBP, FiO2 — via SQL, structured around the ARDS problem definition.

Preprocessed and analysed clinical signals

Cleaned and visualised all parameters in Python to understand data distribution and identify the signals most predictive of ARDS onset before any model development began.

Engineered features and built classification models

Derived predictive features from vitals, labs, and historical ICU/ED readings, then built and evaluated Decision Tree and Random Forest models — specifically addressing the 10/90 class imbalance to ensure the minority ARDS-positive class was accurately identified.

Validated the full pipeline in the client's environment

Deployed all database, SQL, and Python components on the client's AWS server and tested the end-to-end pipeline within their own infrastructure, ready for production handoff.

Impact Delivered

6-Hour

Earlier risk alerts

Real-time AI scoring flags ARDS patient decline hoursgiving clinical teams the lead time to intervene before onset becomes critical.

25%

Drop in emergency admissions

Predictive onset classificationreduced unplanned emergency admissions and lowering the rate of avoidable escalations.

1-Day

Shorter average ICU stay

Faster ARDS detection enabled quicker treatment decisions, reduced critical-care time and freeing up ICU capacity.

1M

Patient records classified end to end

Processed and classified full MIMIC-IV dataset of 1M+records with 100,000 ARDS-positive cases accurately despite a 10/90 class imbalance.

Let’s create something truly remarkable & intelligent!

Whether you’re starting with data modernization or exploring AI copilots, we’re here to help.

Contact Us