Senior Site Reliability Engineer (Kubernetes)
We're looking for a Senior Site Reliability Engineer (Kubernetes) to join our Infrastructure team in Supermetrics.
Location: Canada (fully remote)
Role: Permanent, full-time. This role consists of on-call rotation.
Onboarding: As part of your onboarding, we expect the candidate to spend 2-3 weeks at our HQ in Helsinki (we organize the travel arrangements).
In this role, you'll:
- Raise the team's bar in Kubernetes expertise, mentoring, guiding, and supporting colleagues as well as other members of our Engineering organization in working with managed Kubernetes clusters across providers.
- Operate the platform that enables our SaaS products to be used by thousands of businesses globally, defining SLAs and SLOs and driving the automation that will ensure we meet them.
- Use your expertise in containers, Kubernetes, databases, and automation to streamline our operations and improve our infrastructure.
Your day-to-day work and responsibilities will include:
- Writing Terraform configuration and modules that bootstrap a Kubernetes cluster, or reviewing PRs with contributions from other members, ensuring our modules are reusable and well-defined.
- Writing (using Golang, for example) and maintaining or improving our tooling, facilitating platform utilization by engineering teams.
- Developing and maintaining Helm charts for internal deployments and third-party software.
- Responding to incidents within our production environment.
- Supporting our pre-sales team to answer potential customers' questions on our architecture and data security.
- Reviewing architecture changes involving new databases and participating in discussions regarding their pros and cons.
- Rewriting a GitHub Action to improve how we deploy to Kubernetes using GitOps.
- Troubleshooting and resolving technical issues as they arise.
- Participating in our on-call rotations to provide support, respond to incidents, or handle internal user questions.
Technologies you'll be working with:
- ArgoCD, Helmfile, Helm, External Secrets, Cert-manager, Nginx, Contour
- Terraform
- Cloudflare (CDN, DNS), Aiven, Redis Co.
- GitHub Cloud and GitHub Enterprise
- PHP, Golang
Requirements:
- 4+ years of experience in Site Reliability Engineering, Platform Engineering, or related roles.
- Strong understanding of containers and experience operating Kubernetes clusters at scale.
- Experience operating databases in production.
- Proficient in database concepts with hands-on experience in both relational and NoSQL databases.
- In-depth knowledge of Linux systems and Terraform.
- Experience with AWS and/or GCP.
- Solid understanding of modern observability practices and tools.
- Automation mindset with the ability to automate repetitive tasks using scripting languages such as Python or Bash.
- Team player spirit.
- Willingness to take on-call rotations during non-business hours.
- Good communication skills, particularly in writing (documentation and PRs).
- Strong problem-solving skills with a passion for tools, technologies, and challenges in this space.
Nice to have:
- A developer background with the ability to write CLIs and other tools in Go, Python, or Rust.
- Security mindset with experience implementing security best practices.
- Experience in creating and managing Helm charts.
- Expert knowledge of CI/CD systems and experience developing and maintaining GitHub Actions.
Recruitment Process:
- Screening call with the recruiter.
- Team Interview.
- Final chat with CIO.
Benefits we offer:
- Competitive compensation package, including equity.
- Excellent work equipment and home office allowance for remote workers.
- Health care benefits and leisure time insurance.
- Annual 1000 euros personal learning budget.
- Sports and wellbeing allowance.
Does this sound like your next adventure? Apply now! We'll fill the role as soon as we find the right person.
Join us on our mission to make data a marketing superpower.