Senior Site Reliability Engineer

Heidi Health Ltd • london, england • Posted July 01, 2026

Position Overview

The Role This role sits in the core Platform/SRE team that owns production. You’ll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi’s platform. 
What you’ll do Participate in on-call and incident response:  Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end. 
Improve operational reliability:  Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements. 
Own parts of the production environment:  Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases. 
Strengthen observability:  Improve dashboards, alerts, logs, and traces...