Site Reliability Engineer (Remote)

  • Rullion Ltd
  • Dec 09, 2025
Contractor Telecommunications

Job Description

Key Responsibilities:

  • Design, implement, and maintain scalable, highly available infrastructure and services.
  • Develop automation scripts and tools to improve system reliability and operational efficiency.
  • Monitor and troubleshoot system performance, identifying and resolving issues to minimise downtime.
  • Implement and maintain CI/CD pipelines to support efficient software delivery.
  • Develop and enforce best practices for security, monitoring, and incident management.
  • Collaborate with development teams to enhance application performance and stability.
  • Create detailed documentation and conduct post-incident reviews to identify root causes and implement long-term solutions.

Essential Skills and Experience:

  • Proven experience in Site Reliability Engineering, DevOps, or similar roles.
  • Strong understanding of cloud platforms (AWS, Azure, or GCP) and containerisation technologies (Kubernetes, Docker).
  • Proficiency in scripting languages such as Python, Bash, or Go.
  • Hands-on experience with monitoring and observability tools like Prometheus, Grafana, and the ELK stack.
  • Familiarity with infrastructure-as-code tools like Terraform or Ansible.
  • Solid understanding of networking concepts and system security best practices.
  • Excellent problem-solving skills and a passion for automation and continuous improvement.

Desirable:

  • Certifications in cloud platforms or DevOps tools.
  • Experience with large-scale distributed systems.

This role offers the opportunity to work on mission-critical projects in a fast-paced and collaborative environment, driving innovation and reliability in our technology ecosystem.

Rullion celebrates and supports diversity and is committed to ensuring equal opportunities for both employees and applicants.