
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A top technology service provider is seeking a skilled Site Reliability Engineer to enhance system reliability and manage cloud infrastructures. Candidates should have over 6 years of experience in site reliability, with solid expertise in AWS or Azure and strong Linux administration skills. Responsibilities include implementing SLIs and SLOs, incident management, and optimizing CI/CD pipelines. This role emphasizes automation and collaboration with development teams to improve system performance and reliability.
๐๐ผ๐ฟ๐ฒ ๐ฆ๐ธ๐ถ๐น๐น๐ ๐ช๐ฒโ๐ฟ๐ฒ ๐๐ผ๐ผ๐ธ๐ถ๐ป๐ด ๐๐ผ๐ฟ (๐ ๐๐๐-๐๐ฎ๐๐ฒ)
6+ years of hands-on experience as a ๐ฆ๐ถ๐๐ฒ ๐ฅ๐ฒ๐น๐ถ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ (๐ฆ๐ฅ๐).
Define, implement, and manage ๐ฆ๐๐๐, ๐ฆ๐๐ข๐, ๐ฎ๐ป๐ฑ ๐๐ฟ๐ฟ๐ผ๐ฟ ๐๐๐ฑ๐ด๐ฒ๐๐ across critical systems
Design and maintain highly reliable, scalable, and fault-tolerant production environments
Drive ๐๐ผ๐ถ๐น ๐ฟ๐ฒ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ป, automation, and self-healing systems using ๐ฃ๐๐๐ต๐ผ๐ป /๐๐ฎ๐๐ต
Strong ๐๐ถ๐ป๐๐ ๐๐๐๐๐ฒ๐บ ๐ฎ๐ฑ๐บ๐ถ๐ป๐ถ๐๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป experience in production environments
Manage and operate container platforms such as ๐๐๐ฏ๐ฒ๐ฟ๐ป๐ฒ๐๐ฒ๐ ๐ผ๐ฟ ๐ข๐ฝ๐ฒ๐ป๐ฆ๐ต๐ถ๐ณ๐
Implement ๐๐ป๐ณ๐ฟ๐ฎ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ ๐ฎ๐ ๐๐ผ๐ฑ๐ฒ (๐๐ฎ๐) using ๐ง๐ฒ๐ฟ๐ฟ๐ฎ๐ณ๐ผ๐ฟ๐บ/๐๐ป๐๐ถ๐ฏ๐น๐ฒ
Build, maintain, and optimize ๐๐/๐๐ ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ๐ using ๐๐ถ๐๐๐ฎ๐ฏ, ๐๐ฒ๐ป๐ธ๐ถ๐ป๐, ๐ผ๐ฟ ๐๐๐๐ฟ๐ฒ ๐๐ฒ๐๐ข๐ฝ๐
Implement safe deployment strategies including ๐๐น๐๐ฒ-๐๐ฟ๐ฒ๐ฒ๐ป, ๐๐ฎ๐ป๐ฎ๐ฟ๐, ๐ฎ๐ป๐ฑ ๐ฅ๐ผ๐น๐น๐ถ๐ป๐ด ๐ฑ๐ฒ๐ฝ๐น๐ผ๐๐บ๐ฒ๐ป๐๐
Hands-on experience with ๐๐ช๐ฆ, ๐๐๐๐ฟ๐ฒ, ๐ผ๐ฟ ๐ฃ๐ฟ๐ถ๐๐ฎ๐๐ฒ ๐๐น๐ผ๐๐ฑ infrastructure
Own incident management, on-call rotations, post-incident reviews, and ๐ฅ๐ผ๐ผ๐ ๐๐ฎ๐๐๐ฒ ๐๐ป๐ฎ๐น๐๐๐ถ๐ (๐ฅ๐๐)
Implement and manage ๐ผ๐ฏ๐๐ฒ๐ฟ๐๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ ๐ฎ๐ป๐ฑ ๐บ๐ผ๐ป๐ถ๐๐ผ๐ฟ๐ถ๐ป๐ด using Prom๐ฒ๐๐ต๐ฒ๐๐, ๐๐ฟ๐ฎ๐ณ๐ฎ๐ป๐ฎ, ๐๐๐/๐๐๐ ๐๐๐ฎ๐ฐ๐ธ
Reduce alert noise and improve alert quality to avoid alert fatigue
Work closely with development and platform teams to improve system reliability
Ensure systems comply with ๐๐ฒ๐ฐ๐๐ฟ๐ถ๐๐, ๐ด๐ผ๐๐ฒ๐ฟ๐ป๐ฎ๐ป๐ฐ๐ฒ, ๐ฎ๐ป๐ฑ ๐ฐ๐ผ๐บ๐ฝ๐น๐ถ๐ฎ๐ป๐ฐ๐ฒ ๐๐๐ฎ๐ป๐ฑ๐ฎ๐ฟ๐ฑ๐ (๐๐ฆ๐ข, ๐ก๐๐ฆ๐ง, ๐๐ง๐๐)
Experience supporting ๐ต๐ถ๐ด๐ต-๐ฎ๐๐ฎ๐ถ๐น๐ฎ๐ฏ๐ถ๐น๐ถ๐๐, ๐บ๐ถ๐๐๐ถ๐ผ๐ป-๐ฐ๐ฟ๐ถ๐๐ถ๐ฐ๐ฎ๐น ๐๐๐๐๐ฒ๐บ๐ in production
๐ฃ๐ฟ๐ฒ๐ณ๐ฒ๐ฟ๐ฟ๐ฒ๐ฑ / ๐๐ผ๐ผ๐ฑ ๐๐ผ ๐๐ฎ๐๐ฒ
Experience in ๐น๐ฎ๐ฟ๐ด๐ฒ-๐๐ฐ๐ฎ๐น๐ฒ ๐ด๐ผ๐๐ฒ๐ฟ๐ป๐บ๐ฒ๐ป๐ ๐ผ๐ฟ ๐ฟ๐ฒ๐ด๐๐น๐ฎ๐๐ฒ๐ฑ ๐ฒ๐ป๐๐ถ๐ฟ๐ผ๐ป๐บ๐ฒ๐ป๐๐
Exposure to ๐บ๐ถ๐ฐ๐ฟ๐ผ๐๐ฒ๐ฟ๐๐ถ๐ฐ๐ฒ๐-๐ฏ๐ฎ๐๐ฒ๐ฑ ๐ฎ๐ฟ๐ฐ๐ต๐ถ๐๐ฒ๐ฐ๐๐๐ฟ๐ฒ๐
Familiarity with ๐ฐ๐ต๐ฎ๐ผ๐ ๐๐ฒ๐๐๐ถ๐ป๐ด ๐๐ผ๐ผ๐น๐ and advanced resilience frameworks
Strong ๐ฑ๐ผ๐ฐ๐๐บ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป ๐ฎ๐ป๐ฑ ๐๐๐ฎ๐ธ๐ฒ๐ต๐ผ๐น๐ฑ๐ฒ๐ฟ ๐ฐ๐ผ๐บ๐บ๐๐ป๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป skills