Enable job alerts via email!

Senior Site Reliability Engineer - Remote

Kablamo Pty Ltd

Toronto

Remote

CAD 100,000 - 130,000

Full time

Today
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A fast-growing cloud digital product development company in Toronto is seeking a Sr. Site Reliability Engineer to enhance their AWS infrastructure. This role involves developing automated solutions, ensuring system reliability, and collaborating with development teams. Ideal candidates will have extensive experience in SRE or DevOps, strong problem-solving skills, and familiarity with AWS services. Join a diverse team committed to innovation and customer satisfaction.

Benefits

Remote first with a downtown Toronto office available
Work abroad for up to 3 weeks per year
Career growth
Online rewards platform
Paid birthday leave
Anniversary bonus
Referral bonus
Parental Leave top up
Employee Assistance Program
Swag

Qualifications

  • 5+ years’ experience in an SRE or DevOps role.
  • Deep understanding of system architecture and design principles.

Responsibilities

  • Contribute to the design and maintenance of AWS infrastructure.
  • Develop automated solutions for operational aspects.

Skills

Problem Solving
Critical Thinking
Communication
Proactivity

Education

Bachelor’s degree in computer science

Tools

AWS CloudWatch
Datadog
Grafana
Prometheus
Jira Service Management

Job description

Kablamo is a fast-growing cloud digital product development company. Founded in 2017 in Australia, the business has grown quickly over the last several years, including the expansion of the team to Canada in 2021. We are proud to have assembled an amazing list of customers, including some of the best known enterprise and government organizations, in Australia and Canada. We’re looking to further accelerate our growth in both markets, and we’re seeking a Sr. Site Reliability Engineer to help us support new products to market.

Kablamo is proud to be an Advanced AWS Consulting partner, and we have recently been recognised as a global leader in designing and building cloud-based data and AI/ML solutions. At the 2021 AWS Global Public Sector conference, Kablamo won the award for “Most Innovative AI/ML Solution” for our work building bushfire prediction data platforms in Australia – we were selected from more than 1,800 AWS global partners.

The Role

As we expand the capability across our Product Care offering, we are looking for a Sr. Site Reliability Engineer (SRE) to help us build our capability and deliver insights from massive scale data in real time. The Sr. SRE role is responsible for developing automated solutions for operational aspects such as on-call monitoring, performance and capacity planning, and disaster response. The role will complement our ongoing development teams, looking at continuous delivery and infrastructure automation.

As the bridge between development and operations, you will be our primary escalation point across key customer accounts.

Key Responsibilities:

  • Contribute to the design, implementation, and maintenance of our AWS infrastructure
  • Be proactive in anticipating production issues. Assess risks and mitigate against these, planning for contingencies and counter-measures in advance
  • Ensuring reliability to get systems back to a steady state by quickly investigating and fixing performance, stability and scalability issues, ensuring Kablamo is able to meet SLA and SLO requirements
  • Responsible for ensuring that the underlying infrastructure is running smoothly and that systems and tools are working as expected. You will be assessing risks and mitigating against these or planning appropriate contingencies and counter-measures in advance
  • Develop or implement visual tools for technical and business teams to observe system health and supporting the Technical Account Manager in reporting on reliability metrics
  • Use automation tools to solve problems, writing and developing code to automate processes, such as analysing logs and testing production environments
  • Working with the engineering and/or development team to identify recurring problems which can be resolved through automation
  • Responsible for enhancing performance, efficiency and monitoring of software development processes
  • Act on system incidents; as the SRE you are a key contact involved in incident response and resolutions including active collaboration in any PIRs/Post-mortems
  • Collaborate closely with product developers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability. Actively collaborating with the development team to define fields for logging and tracing.
  • Being a voice to advocate for reliability against competing priorities
  • Helping prepare activities for production release, including facilitating training and enablement of client technical teams and/or attending appropriate meetings (Technical Working Groups, Architecture Review Boards, Change Advisory Boards)

Required skills and experience:

  • 5+ years’ experience in an SRE or DevOps role
  • Deep understanding of system architecture and design principles
  • Ability to think critically and problem solve, providing good performance under pressure
  • Troubleshooting experience with the ability to clearly communicate to customers or the engineering team
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Experience with AWS and its services (Serverless, Deployment Tools, Networking, Containerization, Security, Cost Management)
  • Familiarity with tools such as AWS CloudWatch, Datadog, Grafana, Prometheus, Scalyr, PagerDuty, OpsGenie, Jira Service Management
  • Ability to work cross functionally with support engineering, development teams and/or client vendors to deliver sound outcomes and suggest system improvements
  • Understanding of security requirements and implications and can conform to applicable security frameworks
  • An in-depth knowledge of version control
  • Experience with production rollback
  • Knowledge of fundamental network concepts and protocols
  • A good understanding of DevOps concepts and best practices including Infrastructure-as-Code

Bonus Points for:

  • Bachelor’s degree in computer science or other similar technical qualification
  • AWS Associate and/or Professional Level Certifications
  • Strong grasp of networking, security, and reliability fundamentals
  • Solid understanding of Agile methodologies and practices
  • Lead SRE

Hiring Process:

  • 30-min intro chat with our TA team
  • 1-hr Technical interview
  • 1-hr Final Interview
  • References
  • Offer!

Why Work at Kablamo?

Our Culture

We acknowledge a workplace that is diverse and inclusive, enables for greater innovation and produces benefits including improved performance, improved employee happiness and wellbeing, and superior outcomes for our customers. We attribute our success to all our unique and charismatic Kablamites. Through our fortnightly back to base and our debate Thunderdomes, we enable our Kablamites to provide feedback, share ideas, challenge the status quo and technically challenge each other constructively.

The PERKS!!!

  • Remote first with a downtown Toronto office available
  • Work abroad for up to 3 weeks per year (some restrictions apply)
  • Career growth (we really do promote from within!)
  • Online rewards platform
  • Paid birthday leave
  • Anniversary bonus
  • Referral bonus
  • Parental Leave top up
  • Employee Assistance Program
  • Swag

Kablamo is a proud equal opportunity employer. We make our hiring decisions solely based on your skills and experience, as well as the perspectives and value you can bring to our team. Kablamo believes that diversity is vital to provide the best service to our clients and we are committed to fostering a varied and inclusive work environment. Every effort to accommodate candidates for accessibility will be made upon request. Information received related to accommodations will be addressed confidentially.

Kablamo would like to thank all candidates for their interest however only qualified applicants will be shortlisted.

Role Type
Company Overview

Are you interested in joining one of Australia’s best cloud product development companies? Our team uses cutting-edge cloud technology to design and build digital products and data platforms that deliver transformational change. We’re helping our customers to build digital solutions to manage bushfire risk, perform genomics research on deadly diseases, launch new fintech, deliver millions of hours of media content to viewers, rethink welfare programs for disadvantaged communities, and much more. At the 2021 AWS Global Public Sector conference, Kablamo won the global award for “Most Innovative AI/ML Solution” – we were selected from more than 1,800 AWS global partners! The AWS award was for Kablamo’s work with Victoria’s Department of Environment, Land, Water & Planning to help them predict and manage bushfire risk for the State of Victoria.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer II

Tbwa Chiat / Day Inc

Ontario

Remote

CAD 100,000 - 130,000

2 days ago
Be an early applicant

Senior Site Reliability Engineer

GoDaddy

British Columbia

Remote

CAD 90,000 - 120,000

Today
Be an early applicant

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - Canada)

Hopper

Toronto

Remote

CAD 100,000 - 130,000

Today
Be an early applicant

Observability Engineer - Platform Reliability (Junior to Mid-Level)

Releady

Toronto

Remote

CAD 125,000 - 150,000

2 days ago
Be an early applicant

Senior Site Reliability Engineer (SRE), Private Cloud Operations

RBC

Toronto

On-site

CAD 100,000 - 130,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer

Black Ties Group Inc.

Toronto

Remote

CAD 90,000 - 150,000

30+ days ago

Senior Site Reliability Engineer

VTR Global Com

British Columbia

Remote

CAD 80,000 - 120,000

9 days ago

Senior Site Reliability Engineer

Thomson Reuters

Toronto

Hybrid

CAD 100,000 - 130,000

Yesterday
Be an early applicant

Site Reliability Engineer

Wave Mobile Money

Ontario

Remote

USD 100,000 - 153,000

2 days ago
Be an early applicant