Enable job alerts via email!

Site Reliability Engineer

Botpress

Montreal

On-site

CAD 90,000 - 130,000

Full time

9 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Botpress, a leading AI start-up in Montreal, is seeking a Site Reliability Engineer to enhance the stability and scalability of its cloud services. In this hands-on role, you'll work with innovative technologies and be a key player in shaping how AI is integrated into business logic, ensuring high performance and reliability across its platforms. Join a passionate team and contribute to the future of enterprise AI with excellent benefits and a vibrant office culture.

Benefits

4 weeks of vacation
Paid sick and parental leave
Comprehensive health, dental, and life insurance
Funding for education and skills improvement
Fully-stocked fridge and cupboard
Your own desk
Vibrant office community with weekly socials

Qualifications

  • 3+ years in SRE, DevOps, or infrastructure engineering roles.
  • Deep experience with AWS cloud infrastructure and services.
  • Proficient in CI/CD tools and automation scripting.

Responsibilities

  • Architect and maintain scalable infrastructure.
  • Design and optimize CI/CD pipelines.
  • Improve observability with monitoring and alerting.

Skills

AWS
Linux systems
CI/CD
Infrastructure as code
Incident management
Observability
Communication

Tools

Terraform
Docker
Kubernetes
Datadog
Grafana
Prometheus

Job description

Help bring AI agents to companies worldwide.

Over the next decade, autonomous agents will redefine how we work.

Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic.

Our product works today and at scale, across industries, regions, and limitless use cases.

As the 3rd fastest-growing B2B AI start-up worldwide, we're at the forefront of the AI revolution, providing the most widely-used platform for sophisticated AI agents.

The work ahead is ambitious. The opportunity is rare. We take a deliberate approach to growth: product-led, capital-efficient, and highly focused.

If you want to build foundational technology for one of the most meaningful platform shifts in software, we're looking for top talent to join us.

Key Highlights:

  • Over 1 million AI agents and chatbots deployed
  • 700,000+ platform users
  • Trusted by 35% of Fortune 500 companies
  • 7 years of expertise in AI solutions

About the Role

We're hiring a Site Reliability Engineer to help ensure the stability, scalability, and security of our platform. You'll be part of the product team, owning the systems that keep our services resilient and performant under real-world loads.

This is a hands-on engineering role focused on infrastructure reliability and operational excellence. You'll architect and maintain the cloud systems (e.g. AWS) that power Botpress, with a strong focus on observability, uptime, and automation.

You'll collaborate closely with engineers to refine how we ship, monitor, and operate software — always with an eye toward reducing risk and improving speed. Part of this role will include opening up the site to different regions of users.

Responsibilities

  • Architect and maintain scalable infrastructure
  • Design and optimize CI/CD pipelines to ensure smooth delivery of changes
  • Improve observability through advanced monitoring, logging, and alerting
  • Own incident response and support the engineering team in diagnosing and resolving issues
  • Build systems that increase platform reliability, resiliency, and uptime
  • Enforce security best practices across environments and workflows
  • Manage infrastructure as code using tools like Terraform or Pulumi
  • Document operational procedures, disaster recovery plans, and system runbooks


Requirements

  • 3+ years in SRE, DevOps, or infrastructure engineering roles
  • Deep experience with AWS cloud infrastructure and services (ECS, S3, Lambda, RDS)
  • Comfortable with Linux systems, containerization, and orchestration (e.g. Docker, Kubernetes)
  • Proficient in CI/CD tools, infrastructure-as-code, and automation scripting
  • Familiar with incident management and site reliability principles
  • Experience with observability stacks like Datadog, Grafana, Prometheus, etc
  • Strong communicator and collaborator across technical teams
  • Calm and systematic under pressure when production issues arise
  • Bonus: Previous experience in a fast-paced startup or SaaS environment


About Botpress

As a fast-growing Series A start-up, we run a lean and innovative ship that leans on AI for maximum business impact. At Botpress, everyone is an owner, bringing their unique perspective and talents.

Our teams are talented and passionate. We intentionally hire individuals who are eager, passionate, talented, and hungry to learn and grow throughout their career.

You'll be on a team that's not just adapting to the AI revolution, but leading it. Joining our team means changing the future of enterprise AI and building technology that will define the next era of business automation.

Benefits

  • Work at one of Canada's fastest-growing AI start-ups
  • Work with a talented and passionate team
  • 4 weeks of vacation
  • Paid sick and parental leave
  • Comprehensive health, dental, vision, travel, and life insurance
  • Funding for education and skills improvement
  • Fully-stocked fridge and cupboard - we take snacks seriously
  • Your own desk - no ‘hot-desk'-style sign-up systems
  • A vibrant office community, including weekly socials
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Canonical

Montreal

Remote

CAD 90,000 - 110,000

25 days ago

Site Reliability Engineer

Canonical

Montreal

Remote

CAD 80,000 - 120,000

25 days ago

Site Reliability Engineer

Upsun

Remote

CAD 80,000 - 120,000

3 days ago
Be an early applicant

Site Reliability Engineer

HRB

Remote

CAD 100,000 - 140,000

6 days ago
Be an early applicant

Site Reliability Engineer - Core C++ Team

ClickHouse

Remote

CAD 100,000 - 140,000

6 days ago
Be an early applicant

Site Reliability Engineer (Production support)

Compunnel Inc.

Montreal

On-site

CAD 80,000 - 120,000

4 days ago
Be an early applicant

Senior Turbine Reliability Engineer

Ctrl

Toronto

Remote

CAD 80,000 - 110,000

4 days ago
Be an early applicant

Site Reliability Engineer Canada-Remote

Onestudyteam

Remote

CAD 90,000 - 130,000

11 days ago

Site Reliability Engineer

Botpress, Inc.

Montreal

On-site

CAD 90,000 - 130,000

8 days ago