Site Reliability Engineer (SRE)

  • Randstad Technologies Recruitment
  • Jun 20, 2026
Contractor Telecommunications

Job Description

Job Title: Lead Site Reliability Engineer (SRE) - Observability

Location: Reading, UK / Hybrid & Remote Options



About the Role

We are looking for a Lead SRE to design, scale, and operate massive-scale observability systems that keep our global services online and performant. You will join an autonomous team of software engineers focused on solving complex data infrastructure challenges.



Key Responsibilities

  • Scale Prometheus metrics infrastructure to handle 100+ million active series.

  • Operate large Elasticsearch clusters holding 2000+TB of data.

  • Grow high-throughput Kafka data pipelines processing hundreds of thousands of events per second.

  • Build custom alerting workflows and self-service APIs for internal engineering teams.

  • Provision cloud and private infrastructure using Terraform.



Requirements

  • 5+ years operating mid-to-large distributed systems on Linux VMs or bare-metal machines.

  • 2+ years developing in Go, Python, Ruby, Scala, or Bash.

  • Hands-on experience with Prometheus/Thanos/Cortex, Kafka, the ELK stack, Ansible, or Consul.

  • Comfortable diving into unfamiliar codebases and participating in an on-call rotation.

Randstad Technologies is acting as an Employment Business in relation to this vacancy.