Deep Technical Dive

Hospital Resource Optimization

Intelligent hospital operations system that predicts patient influx and optimizes floor/medicine allocation using Conv1D + IRL + PPO.

PyTorchscikit-learnStable Baselines3GymnasiumFastAPIPandasNumPy

Repository: Private

Code repository and demo are private due to project constraints.

Problem

Hospitals frequently face operational issues due to unpredictable patient arrivals, which can cause overcrowded wards, medicine shortages, and delayed treatment.

Solution

Built a two-stage decision system: (1) Conv1D multi-task model predicts patient surge probability and expected influx, and (2) IRL + PPO optimizer recommends resource allocation actions only when surge probability is high.

System Architecture

Diagram space is ready — replace with visuals later if needed.

• Historical hospital data ingestion
• Patient influx prediction via multi-task Conv1D (classification + regression)
• Threshold gate: if surge probability > 0.70, trigger optimizer
• Maximum Entropy IRL reward learning from expert demonstrations
• PPO policy optimization for resource actions
• FastAPI service returning recommended floor and medicine allocation

Implementation

• Prepared temporal dataset with date-time, current patients, NEWS acuity scores, and admission statistics.
• Trained Conv1D multi-task predictor in PyTorch (Adam, lr 0.001, 20 epochs, batch size 32).
• Collected expert allocation demonstrations and extracted state-action features for IRL.
• Learned reward function using Maximum Entropy IRL and trained PPO agent in Gymnasium environment.
• Integrated both components into FastAPI prediction endpoint with thresholded action routing.

Results

• Delivered proactive decision support by predicting high-influx situations before overload.
• Generated actionable recommendations such as priority floor and medicine reallocation amount.
• Demonstrated practical combination of forecasting + IRL + RL in healthcare operations use-case.
• Improved operational reliability by separating prediction and action stages.

Lessons Learned

• Two-stage architecture improves reliability by preventing unnecessary optimization triggers.
• Learning from expert decisions is effective when explicit reward design is difficult.
• RL can model complex, dynamic hospital operations better than static allocation rules.
• Deployment-oriented ML projects need clean API boundaries between prediction and policy modules.

Future Improvements

• Multi-step patient demand forecasting horizons
• Larger expert datasets for IRL reward quality
• EHR integration for richer state representation
• Explainable decision dashboards for administrators
• Continuous online learning from live hospital streams

← Back to all projects