Enable job alerts via email!

Site Reliability Engineer

Krila Consultancy & Recruitment

Ottawa

On-site

CAD 85,000 - 110,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A forward-thinking consultancy seeks a skilled Site Reliability Engineer to maintain cloud and edge infrastructure. Based in Ottawa, the role focuses on AWS services' reliability, automation, and client installations. Ideal candidates are experienced in managing production environments and enhancing operational efficiency with a collaborative approach.

Qualifications

  • Minimum 3+ years as an SRE or DevOps engineer supporting production AWS.
  • Proven expertise in Datadog (APM, Infrastructure, Logs).
  • Ability to communicate effectively with a focus on collaboration.

Responsibilities

  • Ensure highly available, fault-tolerant AWS services.
  • Build and maintain Datadog dashboards, monitors, and alerts.
  • Lead customer installations configuring edge devices.

Skills

AWS services
Linux administration
Scripting (Bash, Python, Go)
Datadog
Communication

Education

Bachelor's degree in Computer Science or related field

Tools

Freshdesk
Jira
IP cameras

Job description

Site Reliability EngineerLocation:Onsite – Kanata, Ontario

About Our Client

Imagine a startup delivering real-time data insights that empower businesses to make smarter, faster decisions. Backed by one of the world’s top tech groups, we blend cutting-edge technology with deep expertise to help companies stay agile and ahead of the curve. With the strength of a powerhouse behind us, we drive innovation and create transformative solutions for today’s dynamic markets.

Edge Signal provides a full-fledged edge computing platform powering computer-vision applications across Retail, Hospitality and Warehousing. they run entirely on AWS, ingesting and analyzing massive fleets of on-premise devices with Datadog monitoring.

We’re looking for an experienced Site Reliability Engineer to keep their cloud and edge infrastructure running flawlessly—and to help their customers get up and running smoothly.

This position is based at their head office in Kanata, Ottawa, reporting to the Director of Technology.

What You’ll Do
Operations
  • Ensure highly available, fault-tolerant AWS services (auto-scaling, disaster recovery, capacity planning).

  • Build and maintain Datadog dashboards, monitors and alerts for cloud resources and edge devices; author runbooks and automation scripts for incident response.

  • Develop tooling to provision, update and health-check thousands of edge devices; ingest device telemetry into Datadog for unified observability.

  • Automate routine ops tasks (onboarding steps, incident remediation) using shell, Python, etc.

Onboarding
  • Lead customer installations by configuring IP cameras, NVRS, and Edge Signal agents on-site.

  • Guide network, security and firmware setups to ensure seamless data flow from device to cloud.

Support
  • Triage and resolve Freshdesk tickets; conduct root-cause analysis and drive timely closure.

  • Convert complex issues into Jira epics/stories and collaborate with product teams to ship fixes.

Compliance
  • Manage AWS IAM (users, roles, policies, SSO) and enforce security best practices.

  • Monitor and optimize AWS spend—set budgets, report usage and recommend cost-savings strategies.

  • Integrate secrets management, vulnerability scanning and other compliance controls.


  • A minimum of a Bachelor's degree in Computer Science or a related field in engineering is required;

  • Min 3+ years as an SRE or DevOps engineer supporting production AWS environments.

  • Proven expertise in Datadog (APM, Infrastructure, Logs, Synthetic checks)

  • Strong Linux administration skills and proficient scripting ability (Bash, Python, or Go)

  • Experience with AWS IAM, SSO, Control Tower, cost-management tools, and billing dashboards

  • Excellent communicator with a bias toward collaboration and customer empathy

    Bonus Points
    • Prior work with edge computing or IoT device fleets

    • Experience configuring IP cameras, RTSP streams, and NVR systems

    • Freshdesk and Jira administration experience

    • AWS DevOps or Solutions Architect certification

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer (SRE) AWS

Pragmatike

Ottawa null

Remote

Remote

CAD 100.000 - 130.000

Full time

26 days ago

Site Reliability Engineer

Diversis Capital LLC

null null

Remote

Remote

CAD 90.000 - 130.000

Full time

Yesterday
Be an early applicant

Site Reliability Engineer III

Guidewire Software

null null

Remote

Remote

CAD 90.000 - 130.000

Full time

2 days ago
Be an early applicant

Site Reliability Engineer

Krila Consultancy

Ottawa null

On-site

On-site

CAD 80.000 - 120.000

Full time

Today
Be an early applicant

Site Reliability Engineer

Sectigo

Ottawa null

Hybrid

Hybrid

CAD 100.000 - 115.000

Full time

3 days ago
Be an early applicant

Reliability Engineer

Snc-Lavalin

Ottawa null

On-site

On-site

CAD 75.000 - 95.000

Full time

Yesterday
Be an early applicant

Safety Engineer

City of Ottawa / Ville d’Ottawa

Ottawa null

On-site

On-site

CAD 95.000 - 121.000

Full time

Yesterday
Be an early applicant

Senior Turbine Reliability Engineer

Ctrl

Toronto null

Remote

Remote

CAD 80.000 - 110.000

Full time

4 days ago
Be an early applicant

Site Reliability Engineer

Upsun

null null

Remote

Remote

CAD 80.000 - 120.000

Full time

12 days ago