Enable job alerts via email!

Site Reliability Engineer

Krila Consultancy

Ottawa

On-site

CAD 80,000 - 120,000

Full time

12 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

A leading consultancy is seeking a Site Reliability Engineer to ensure the flawless operation of AWS and edge infrastructure. This role involves developing monitoring tools, leading customer setups, and managing AWS security. The ideal candidate will have extensive AWS experience, skills in Datadog, and a background in Linux administration. Join a dynamic team that blends technology with innovative solutions in a growing market.

Qualifications

  • Minimum 3+ years as an SRE or DevOps engineer supporting production AWS environments.
  • Proven expertise in Datadog (APM, Infrastructure, Logs).
  • Strong Linux administration skills and proficient scripting ability.

Responsibilities

  • Ensure highly available, fault-tolerant AWS services.
  • Build and maintain Datadog dashboards, monitors, and alerts.
  • Lead customer installations, ensuring seamless data flow.

Skills

AWS
Datadog
Linux Administration
Scripting (Bash, Python, Go)
Communication

Education

Bachelor's degree in Computer Science or related field

Tools

Freshdesk
Jira

Job description

Site Reliability EngineerLocation:Onsite – Kanata, Ontario

About Our Client

Imagine a startup delivering real-time data insights that empower businesses to make smarter, faster decisions. Backed by one of the world’s top tech groups, we blend cutting-edge technology with deep expertise to help companies stay agile and ahead of the curve. With the strength of a powerhouse behind us, we drive innovation and create transformative solutions for today’s dynamic markets.

Edge Signal provides a full-fledged edge computing platform powering computer-vision applications across Retail, Hospitality and Warehousing. they run entirely on AWS, ingesting and analyzing massive fleets of on-premise devices with Datadog monitoring.

We’re looking for an experienced Site Reliability Engineer to keep their cloud and edge infrastructure running flawlessly—and to help their customers get up and running smoothly.

This position is based at their head office in Kanata, Ottawa, reporting to the Director of Technology.

What You’ll Do
Operations
  • Ensure highly available, fault-tolerant AWS services (auto-scaling, disaster recovery, capacity planning).

  • Build and maintain Datadog dashboards, monitors and alerts for cloud resources and edge devices; author runbooks and automation scripts for incident response.

  • Develop tooling to provision, update and health-check thousands of edge devices; ingest device telemetry into Datadog for unified observability.

  • Automate routine ops tasks (onboarding steps, incident remediation) using shell, Python, etc.

Onboarding
  • Lead customer installations by configuring IP cameras, NVRS, and Edge Signal agents on-site.

  • Guide network, security and firmware setups to ensure seamless data flow from device to cloud.

Support
  • Triage and resolve Freshdesk tickets; conduct root-cause analysis and drive timely closure.

  • Convert complex issues into Jira epics/stories and collaborate with product teams to ship fixes.

Compliance
  • Manage AWS IAM (users, roles, policies, SSO) and enforce security best practices.

  • Monitor and optimize AWS spend—set budgets, report usage and recommend cost-savings strategies.

  • Integrate secrets management, vulnerability scanning and other compliance controls.


  • A minimum of a Bachelor's degree in Computer Science or a related field in engineering is required;

  • Min 3+ years as an SRE or DevOps engineer supporting production AWS environments.

  • Proven expertise in Datadog (APM, Infrastructure, Logs, Synthetic checks)

  • Strong Linux administration skills and proficient scripting ability (Bash, Python, or Go)

  • Experience with AWS IAM, SSO, Control Tower, cost-management tools, and billing dashboards

  • Excellent communicator with a bias toward collaboration and customer empathy

    Bonus Points
    • Prior work with edge computing or IoT device fleets

    • Experience configuring IP cameras, RTSP streams, and NVR systems

    • Freshdesk and Jira administration experience

    • AWS DevOps or Solutions Architect certification

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.