Enable job alerts via email!

Site Reliability Engineer III

BetterCloud

Toronto

On-site

CAD 100,000 - 126,000

Full time

7 days ago

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology company in Toronto is seeking a Site Reliability Engineer to optimize operational efficiency and improve client infrastructure. You will collaborate with teams to develop software solutions and contribute to the resilience of client systems. The ideal candidate has a BS degree in Computer Science or equivalent, 4+ years of experience in programming and cloud services, particularly AWS and Google Cloud. The role offers a competitive salary range of CAD 100K to CAD 126K.

Qualifications

Minimum 4 years of experience programming in at least two of Python, Java, C#, or GO.
Experience with continuous integration and build tools.
Deep knowledge of cloud services and IaC tools.

Responsibilities

Operate and maintain solutions for customer infrastructure.
Provide Root Cause Analysis for outages and incidents.
Identify opportunities for improvement in client systems.

Skills

Python

Java

Kubernetes

Terraform

Amazon Web Services

Google Cloud Platform

Education

BS degree in Computer Science or related technical discipline

Tools

Jenkins

Helm

Who we are looking for:

You will work alongside quality, infrastructure, and the analytics teams to build and ship new features related to our data products. We value practical software experience in addition to computer science fundamentals and training. The technologies you are familiar with are less important to us than your ability to solve complex software problems and apply software engineering best practices. As a Site Reliability Engineer at ACV Auctions you will develop, write, and modify code. You will work alongside software and production engineers to build and ship new features that optimize operational efficiency and drive growth.

What you will do:

Operate, maintain, and administer solutions that contribute to the operational efficiency, availability and visibility of customer infrastructure.
Planning maintenance activity, design documentation and standard procedures
Provide Root Cause Analysis reports for outages/incidents (ITIL - Problem Management)
Observe and provide feedback on the current state of the client’s infrastructure, and identify opportunities to improve resiliency, reduce the occurrence of incidents and automate repetitive administrative and operational tasks.
Contribute to, improve and maintain team documentation about client systems and infrastructure, procedures, policies and schedules.
Gather and document information about client environments through audit activities and analyze the information to identify opportunities for improvement and application of best practices.
Work collaboratively with team mates to contribute to the continuous improvement of our working culture.
Act as a technology leader for clients, as well as drive client discussions on technology road maps.
Participate in an on-call rotation in an escalation capacity.
Perform additional duties as assigned.

What you will need:

BS degree in Computer Science OR a related technical discipline OR equivalent practical experience.
Minimum 4 years of experience with programming in at least two of the following: Python, Java, C#, or GO
Minimum 4 years of experience working with continuous integration and build tools such as Jenkins, building deployment pipelines, etc
Experience building/managing infrastructure deployments on Amazon Web Services and/or Google Cloud Platform.
Deep knowledge in day-day tools and how they work including deployments, k8s, monitoring systems, and testing tools.
Highly proficient in version control systems including trunk based development, multiple release planning, cherry picking, and rebase.
Self-sufficient debugger who can identify and solve complex problems in code
Deep understanding of major data structures (arrays, dictionaries, strings).
Strong problem-solving skills, including the ability to independently diagnose and resolve issues. This involves approaching challenges with confidence and resourcefulness, and using your expertise to explore solutions thoroughly
Familiarity with IaC tools such as Terraform and deployment tools such as Helm
Experience building, maintaining, and scaling Kubernetes clusters for production workloads.

Compensation

$100,000.00 CAD - $126,000.00 CAD annually. Please note that final compensation will be determined based upon the applicant's relevant experience, skillset, location, business needs, market demands, and other factors as permitted by law.

No immigration or work visa sponsorship will be provided for this position.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.