Site Reliability Engineer (Openshift) - 6-Month Contract - Hybrid (London or Sheffield)
We are seeking a skilled OpenShift Site Reliability Engineer (SRE) to join our team on a 6-month contract. In this role, you will ensure the reliability, availability, and performance of our OpenShift-based virtual and container platforms, with a strong focus on automation. You will collaborate with cross-functional teams including Applications, Hardware, and Network, developing secure and scalable service architectures using cloud-native technologies.
Key Responsibilities
- Ensure reliability, availability, and performance of OpenShift-based virtual and container platforms.
- Collaborate with Applications, Hardware, and Network teams to deliver secure, resilient services.
- Develop automation to prevent outages using Shell Scripting, YAML, Ruby, Python, and Go.
- Establish and enforce SRE best practices through platform constraints and high-fidelity system modelling.
- Participate in an on-call rotation to support critical services.
What You Will Ideally Bring
- Hands-on experience with OpenShift virtualization and Kubernetes administration.
- Strong understanding of distributed systems and common failure domains.
- Experience managing production services on RedHat, Windows, and ESXi platforms.
- Strong knowledge of Linux systems and networking fundamentals.
- Experience with monitoring, logging, alerting, and observability tools (eg, OpenTelemetry, Prometheus, Grafana, Splunk).
- Proficiency in Python, Shell, Go, Terraform, and similar Scripting languages.
- Familiarity with CI/CD tools such as Jenkins or GitLab CI.
- Understanding of containerization (Docker) and microservices architecture.
Contract Details
- Duration: 6 months (with potential extension)
- Day Rate: Up to £500 per day (Inside IR35)
- Location: London/Sheffield - Hybrid Based
- Start Date: ASAP