Senior Site Reliability Engineer
Heidi Health Ltd • london, england • Posted July 01, 2026
Position Overview
The Role
This role sits in the core Platform/SRE team that owns production. You’ll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi’s platform.
What you’ll do
Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.
Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.
Strengthen observability: Improve dashboards, alerts, logs, and traces...