Job Search and Career Advice Platform

Enable job alerts via email!

Senior Site Reliability Engineer

Menlo Ventures

Canada

On-site

CAD 90,000 - 120,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI platform company is looking for a Senior Site Reliability Engineer to ensure high availability of services and optimize system performance. The role involves developing Kubernetes resources and designing cost-effective infrastructure solutions. Candidates need a BS/BA in Computer Science and expertise with cloud providers like AWS or GCP. Strong interpersonal skills and experience with CI/CD pipelines are essential. This position offers opportunities to address engineering challenges in a collaborative environment.

Qualifications

  • Good knowledge of cloud providers (AWS, GCP or similar).
  • Solid understanding of web and networking principles (HTTP, TLS, DNS, etc).
  • Strong interpersonal skills working with teams across different time zones and regions.

Responsibilities

  • Ensure the smooth operation and high availability of core services.
  • Monitor system performance and implement optimizations.
  • Develop Kubernetes resources for seamless deployments.
  • Design and implement scalable infrastructure solutions.
  • Partner with teams to solve engineering challenges.

Skills

Kubernetes
AWS
Infrastructure as Code
CI/CD pipelines
Networking principles
Interpersonal skills

Education

BS/BA in Computer Science or related degree

Tools

Terraform
GitHub Actions
ArgoCD
Atlantis
Job description
Senior Site Reliability Engineer
About the Company

Clarifai is a leading, compute orchestration AI platform specializing in computer vision and generative AI. We empower organizations to transform unstructured image, video, text, and audio data into actionable insights, significantly faster and more accurately than manual processes. Founded in 2013 by Matt Zeiler, Ph.D., Clarifai has been at the forefront of AI innovation since achieving the top five placements in the 2013 ImageNet Challenge. Our diverse, globally distributed team operates across the United States, Canada, Estonia, Argentina, and India.

We have secured $100M in funding, including a $60M Series C round, backed by industry leaders such as Menlo Ventures, Union Square Ventures, Lux Capital, NEA, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm, and Osage.

Clarifai is proud to be an equal-opportunity workplace committed to building and maintaining a diverse and inclusive team.

Your Impact

Clarifai’s platform is a kubernetes-native distributed system that requires the orchestration of many components. Efficiently serving and training large neural networks presents unique design and infrastructure challenges.

You will be critical to solving these challenges both in the context of the cloud and in on premise environments. Additionally, you will be responsible for our broader cloud infrastructure and development tools and environments.

The Opportunity
  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Develop Kubernetes resources and custom tooling for seamless cloud and on-premise deployments
  • Design and implement scalable, secure, and cost-effective infrastructure solutions.
  • Partner with teams across the organization to identify & solve engineering challenges
Requirements
  • BS/BA in Computer Science or related degree
  • Good knowledge of cloud providers (AWS, GCP or similar)
  • Expertise with Kubernetes (EKS, GKE, self-hosted) and Infrastructure as Code using Terraform, Helm
  • Solid understanding of web and networking (HTTP, TLS, DNSadena, etc)
  • Experience with CI/CD pipelines using tools such as GitHub Actions, ArgoCD, and Atlantis
  • Strong interpersonal skills working with teams across different time zones and regions
Great to Have
  • Knowledge of basic Microservice Architecture principles
  • Familiarity with security best practices for cloud-based systems.
  • Experience with relational databases, message queues, key value stores
  • Experience writing python, golang, or any other popular programming language
  • Familiarity with any RPC framework
  • Experience developing & building custom Kubernetes operators
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.