Social network you want to login/join with:
Site Reliability Engineer, Leeds, West Yorkshire
Client: Ranger Technical Resources
Location: Leeds, West Yorkshire, United Kingdom
Job Category: Other
EU work permit required: Yes
Job Views: 2
Posted: 31.05.2025
Expiry Date: 15.07.2025
Job Description:
Position: Site Reliability Engineer #2494
Position Summary:
Our partner, an innovative PaaS company specializing in remote monitoring and network management solutions, is looking for a Site Reliability Engineer to help ensure the reliability, scalability, and performance of critical infrastructure and applications. The role involves building and maintaining highly available systems, supporting CI/CD pipelines, and collaborating with development, DevOps, and other teams to maintain high uptime, security, and user experience standards for millions of endpoints.
Experience and Education:
- Bachelor's or higher degree in Computer Science, Information Systems, Information Technology, or a related field/experience.
- 7+ years of experience in Site Reliability Engineering, DevOps, Infrastructure, or related roles.
- Deep understanding of AWS and its modules and services.
- Strong Linux administration and troubleshooting skills.
- Experience with implementing and managing CI/CD pipelines and Infrastructure as Code (IaC).
- Experience with monitoring and observability tools to proactively manage system health.
Skills and Strengths:
- AWS (Amazon Web Services)
- Auto Scaling
- Fargate
- Route53
- Observability tools (New Relic, DataDog, Splunk)
- Scripting (Ansible, Bash, Python, Go)
- CI/CD
Primary Job Responsibilities:
- Design and support EC2/ECS/EKS/Fargate environments for high availability and fault tolerance.
- Implement advanced AWS features (Route53, ALB/NLB, multi-region setups) to ensure global reliability.
- Maintain and optimize CI/CD pipelines and deployment processes for efficient software delivery.
- Collaborate with Development, QA, and DevOps teams to incorporate best practices into build and release processes.
- Implement and enhance monitoring tools to proactively detect and resolve system issues.
- Manage Linux-based servers and applications, ensuring stability, performance, and security.
- Implement containerization solutions for scalability and efficiency.
- Apply security best practices across AWS environments to ensure compliance and safeguard infrastructure.
- Develop automated incident response and self-healing solutions to minimize downtime.
- Diagnose and resolve infrastructure, networking, and performance issues.
- Design and maintain backup, failover, and disaster recovery strategies.
- Create real-time monitoring dashboards and alerting systems.
- Work with development teams to optimize infrastructure costs while maintaining high performance.