170+
consumer health and pharma brands
138+
Years in operation
80+
countries with active operations
The client needed to identify patients at risk of Acute Respiratory Distress Syndrome (ARDS) before onset, with a particular focus on ARDS activation triggered by the COVID-19 Delta variant. The challenge was significant: within a dataset of 1 million patient records sourced from MIMIC-IV, only 10% of cases were ARDS-positive, creating a severe class imbalance that caused standard models to overlook the patients who needed intervention most. Inferenz accessed and deployed the MIMIC-IV dataset on a PostgreSQL server on AWS, extracted and preprocessed emergency department and ICU parameters including heart rate, temperature, PaO2, SBP, DBP, and FiO2, and built Decision Tree and Random Forest classification models to predict ARDS onset.
Detecting ARDS onset early required solving three compounding problems: disconnected clinical data sources with no unified structure, unreliable patient labelling across datasets, and a class distribution that made standard modelling approaches clinically unusable.
Emergency department and ICU feeds used different formats and identifiers for the same parameters, making it impossible to trace a single metric across systems with confidence. Vital signs, lab values, and respiratory parameters such as PaO2 and FiO2 were recorded inconsistently, requiring significant preprocessing before any modelling could begin.
There was no consistent rule for tagging a patient as ARDS-positive across units and studies within the MIMIC-IV dataset. Label definitions varied by clinical context, creating noisy training targets that made model setup unreliable. Without a standardized cohort labelling approach, any model trained on this data would produce inconsistent and untrustworthy predictions.
Of the 1 million patient records in the dataset, only 10% were ARDS-positive, the remaining 90% were negative cases. This extreme imbalance caused standard classification models to overwhelmingly predict the safe majority, systematically missing the 100,000 ARDS cases that represented the entire clinical purpose of the engagement.
A specific clinical question driving the engagement was why the COVID-19 Delta variant was triggering ARDS at a higher rate than prior variants. This required the model to identify which parameter combinations were most predictive of ARDS in Delta-affected patients specifically.
Inferenz took a phased approach from a clean data foundation first, feature engineering second, and classification model development third, all deployed within the client's own AWS infrastructure.
Deployed the MIMIC-IV dataset (1M+ patient records) on a PostgreSQL server within AWS, then generated tables and extracted ED and ICU parameters — heart rate, temperature, PaO2, SBP, DBP, FiO2 — via SQL, structured around the ARDS problem definition.

Cleaned and visualised all parameters in Python to understand data distribution and identify the signals most predictive of ARDS onset before any model development began.

Derived predictive features from vitals, labs, and historical ICU/ED readings, then built and evaluated Decision Tree and Random Forest models — specifically addressing the 10/90 class imbalance to ensure the minority ARDS-positive class was accurately identified.

Deployed all database, SQL, and Python components on the client's AWS server and tested the end-to-end pipeline within their own infrastructure, ready for production handoff.





Earlier risk alerts
Real-time AI scoring flags ARDS patient decline hoursgiving clinical teams the lead time to intervene before onset becomes critical.
Drop in emergency admissions
Predictive onset classificationreduced unplanned emergency admissions and lowering the rate of avoidable escalations.
Shorter average ICU stay
Faster ARDS detection enabled quicker treatment decisions, reduced critical-care time and freeing up ICU capacity.
Patient records classified end to end
Processed and classified full MIMIC-IV dataset of 1M+records with 100,000 ARDS-positive cases accurately despite a 10/90 class imbalance.
Whether you’re starting with data modernization or exploring AI copilots, we’re here to help.
Contact Us