Enable job alerts via email!

Senior Site Reliability Engineer

Bold Commerce

Toronto

Remote

CAD 125,000 - 150,000

Full time

Yesterday
Be an early applicant

Job summary

A leading e-commerce technology company is seeking a Senior Site Reliability Engineer to design, build, and maintain systems supporting their SaaS infrastructure. Responsibilities include optimizing monitoring, automating deployments, and ensuring system reliability. The ideal candidate will have extensive experience in a SaaS/cloud environment and strong skills in Linux systems and cloud platforms. This role offers competitive compensation and flexible remote work opportunities.

Benefits

Competitive compensation
Employer Paid Health & Dental Benefits
Flexible work hours
Annual Bonus Program
Competitive paid vacation days

Qualifications

  • 7+ years of experience in SRE or a similar role within a SaaS/cloud environment.
  • Proficient in at least one language (e.g., Python, Go, Ruby).
  • Solid grasp of networking and incident management.

Responsibilities

  • Design and maintain highly available, fault-tolerant infrastructure.
  • Develop and optimize monitoring and incident response processes.
  • Automate deployment and configuration tasks.

Skills

Linux/Unix systems knowledge
Shell scripting
Collaboration and communication skills
Experience with GitOps
Cloud platforms (GCP/AWS/Azure)
Container orchestration (Docker, Kubernetes)
Monitoring tools (Prometheus, Grafana)
Trust and relationship building

Education

Bachelor’s or Master’s degree in Computer Science or related field

Tools

Ansible
Terraform

Job description

Who is Bold Commerce?

Bold Commerce powers personalized checkout experiences for leading omnichannel retailers and direct-to-consumer brands.

As a leader in the composable commerce space, Bold makes checkout better, boosting profitability by enabling personalized, customer-specific checkout flows designed to increase the Checkout Power Trio of conversion, AOV, and LTV - not just conversion. Built with a composable & headless architecture, Bold Checkout fits with any commerce stack, making it easy to overcome platform limitations. Leading omnichannel retailers like Harry Rosen and Staples Canada trust their business with Bold Checkout.

Named one of Built In Austin’s Best Places to Work, Canada’s Top Employers for Young People, and Manitoba’s Top Employers, we're a dynamic team that truly cares about building the future of ecommerce. We live by the BUILDERS Code, a shared set of practices, beliefs, and values that help shape this remote-first company.

Founded in 2012, with team members (Builders) located throughout Canada and the U.S., and backed by investors like OMERS Ventures, WhiteCap Venture Partners, and Round13 Capital, Bold is leading the way to a better, composable ecommerce future.

About the role

Bold is looking for a Senior Site Reliability Engineer (SRE) to design, build, and maintain the systems and tools that support our SaaS infrastructure. You’ll play a key role in ensuring our platforms are reliable, scalable, and performant. Working closely with developers, product managers, and IT operations, you’ll help shape robust solutions that align with our service-level objectives (SLOs) and deliver value to our merchants.

What you’ll do

  • Design and maintain highly available, fault-tolerant infrastructure to support our SaaS products
  • Develop and optimize monitoring, alerting, and incident response processes
  • Improve system performance through capacity planning, load testing, and performance tuning
  • Automate deployment and configuration tasks using infrastructure-as-code practices
  • Partner with development teams to enhance software reliability through efficient CI/CD pipelines and release management
  • Conduct root cause analysis and post-incident reviews to drive continuous improvement
  • Contribute to the architecture of performance monitoring systems and train teams on reliability best practices
  • Organize and manage execution of planned projects
  • Balance speed and stability in product delivery while upholding well-defined SLOs

What we’re looking for

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
  • 7+ years of experience in SRE or a similar role within a SaaS/cloud environment
  • Strong Linux/Unix systems knowledge and shell scripting
  • Experience with GitOps (ArgoCD is a plus)
  • Proficient in at least one language (e.g. Python, Go, Ruby) and familiar with tools like Ansible, Terraform, or similar
  • Hands-on experience with cloud platforms (GCP preferred; AWS/Azure also relevant) and container orchestration (Docker, Kubernetes)
  • Solid grasp of networking, monitoring (e.g. Prometheus, Grafana, OpenTelemetry), and incident management
  • Strong collaboration and communication skills, with a focus on documentation and cross-functional partnership
  • Trusted team player who builds relationships and cultivates a culture of reliability
  • Flexible hours with participation in an on-call rotation and occasional scheduled maintenance

Our investment in YOU!

Benefits designed to support your well-being and happiness:

  • Competitive compensation that reflects your experience and skills
  • Employer Paid Health & Dental Benefits, Virtual Care, & Disability top-up - starting day 1!
  • Virtual mental health and EAP platform for support anytime
  • Annual Health Benefit ($1,000 per year) to help you thrive!
  • Working remotely - anywhere in Canada & the United States!
  • Employee Options to help you grow with us!
  • Flexible work hours
  • Annual Bonus Program aligned to your Job Level
  • Competitive paid vacation days (starting at 3 weeks)
  • Employer Paid Employee & Family Assistance Program (EFAP)
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs