Enable job alerts via email!

Site Reliability Engineer, India

Evertz Microsystems Limited

Canada

Hybrid

CAD 100,000 - 130,000

Full time

Today
Be an early applicant

Job summary

A leading media technology company is seeking a Site Reliability Engineer in Canada to enhance their AWS-hosted services. Responsibilities include debugging incidents, automating processes, and collaborating on reliability strategies. Ideal candidates will have 4-6 years of experience, proficiency in programming, and expertise in monitoring tools. This role offers a remote/hybrid work mode, and a vibrant tech team environment focused on continuous improvement.

Qualifications

  • At least 4 to 6 years of experience in site reliability engineering.
  • Experience designing and building production-quality automation or tools.
  • Experience translating SLOs/SLIs into actionable improvements.

Responsibilities

  • Debug incidents and drive improvements to the SaaS platform.
  • Automate processes to reduce toil and improve efficiency.
  • Collaborate on reliability, monitoring, and incident response.

Skills

Hands-on experience with production infrastructure
Proficiency in programming languages (Python, Java, Rust)
Experience with monitoring and observability tools (Datadog, CloudWatch)
Excellent analytical skills
Experience with AWS technologies
Foundation in Linux systems administration
Familiarity with CI/CD pipelines

Education

Computer Science and Information Technology graduation

Tools

Terraform
CloudFormation
Jenkins
Job description

We’re looking for highly motivated, passionate site reliability engineers to join our growing team. At evertz.io, our teams are building services used by major players in broadcast and media. Our services are hosted in AWS with a Serverless First mindset. As part of this role, you will help harden our multi-tenant SaaS platform. You will use best-in-class observability tooling to debug incidents and identify and implement improvements to ensure reliability. You will automate processes and build tools to reduce toil.

Responsibilities
  • Debug incidents and drive improvements to the multi-tenant SaaS platform using observability tooling.
  • Translate SLOs and SLIs into actionable reliability improvements.
  • Automate processes and build tooling to reduce toil and improve efficiency.
  • Collaborate with cross-functional teams on reliability, monitoring, and incident response.
Skills and experience you will bring
  • At least 3 years of hands-on experience managing critical, high-availability production infrastructure with proven reliability and uptime improvements.
  • Proficient in at least one programming language (such as Python, Java, or Rust), with experience designing and building production-quality automation or tools.
  • At least 3 years working with monitoring, log aggregation, and observability platforms (e.g., Datadog, CloudWatch, Honeycomb, Splunk, New Relic) and using data-driven insights to resolve issues.
  • Excellent analytical skills to understand end-to-end use cases, map system flows, debug complex issues, and anticipate failure points.
  • Experience translating SLOs/SLIs into actionable improvements; strong focus on reliability, monitoring, and observability.
  • At least 3 years with cloud technologies, particularly AWS (CloudFormation, Lambda, DynamoDB, SQS, SNS, EC2, S3, AWS CLI, Boto3).
  • Solid foundation in Linux systems administration, networking, and security.
  • Familiarity with CI/CD pipelines (e.g., Jenkins, AWS CodePipeline).
Additional skills and experience that will make you standout
  • Experience architecting and deploying serverless applications in cloud environments.
  • Experience with infrastructure-as-code tools like Terraform or CloudFormation for reproducible environments.
  • Participation in production on-call rotations and incident management.
  • Expertise in performance optimization for core AWS services (Lambda, DynamoDB, API Gateway, SQS, EventBridge, EC2).
  • Experience supporting systems with frequent, high-velocity deployments.
  • Familiarity with security compliance frameworks (e.g., OWASP, ISO, CSA, PCI) and hands-on threat assessments and remediation.
  • Security practices including penetration testing, threat modeling, and use of security tools.
  • Experience with advanced deployment strategies (canary, A/B testing, blue/green, etc.).
  • Hands-on experience with chaos engineering to improve fault tolerance.
  • Track record of championing reliability, continuous improvement, and operational excellence.
Experience and working arrangement

Experience: 4 to 6 years +

Education: Computer Science and Information Technology graduation

Work mode: Remote/Hybrid

Office Timing: 1pm to 9pm IST

The Team

The evertz.io Engineering Team builds next-generation systems for content management and distribution in the Media and Entertainment industry. Our technology stack includes a Serverless microservice architecture on AWS, with Python, Rust, and Java; UI using Angular, TypeScript, and NgRx; CI/CD involving AWS, Jenkins, Nexus, Bazel, and our release-management application. The team collaborates across regions in agile, low-bureaucracy environments, with opportunities for growth, mentorship, and continued learning. The team emphasizes trust, openness, and inclusivity.

Please note, this email address will respond only to privacy concerns. When you apply to a job on this site, your personal data will be collected by Evertz Microsystems Ltd, located in Burlington, Ontario, Canada, and processed for recruitment-related activities in accordance with our privacy policy. Your data may be processed under applicable data protection laws. A complete privacy policy is available at https://evertz.com/contact/privacy/.

Your personal data will be retained as long as necessary to evaluate your application. You may have rights regarding access, correction, erasure, and restriction of processing under applicable data protection laws. For regional rights, consult the privacy policy or contact the data protection officer via privacy@evertz.com.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.