Enable job alerts via email!

Senior Software Reliability Engineer (Production Health) - open to remote across ANZ

Canva

United States

Remote

USD 80,000 - 150,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking a skilled engineer to enhance reliability across its services. In this dynamic role, you will design and implement processes, tools, and automation to improve service reliability while collaborating with product engineering teams. You'll investigate incidents and propose solutions to future-proof the infrastructure. Join a culture that values reliability and creativity, where your contributions will shape the future of a leading design platform. If you thrive in a collaborative environment and are passionate about technology, this opportunity is perfect for you.

Benefits

Equity packages
Inclusive parental leave policy
Annual Vibe & Thrive allowance
Flexible leave options

Qualifications

  • 5+ years of experience in developing complex, distributed web applications.
  • Advanced coding proficiency in Python, Java, or GoLang.

Responsibilities

  • Designing processes and tools to improve service reliability.
  • Investigating production incidents and applying learnings.

Skills

Python
Java
GoLang
Object Oriented Programming
Problem Solving
Communication Skills

Tools

Terraform
AWS
Snowflake
Mode Analytics
Looker

Job description

Join the team redefining how the world experiences design.

Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point.

Where and how you can work

Our flagship campus is in Sydney. We also have a campus in Melbourne and co-working spaces in Brisbane, Perth and Adelaide. But you have choice in where and how you work, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals.

What you’d be doing in this role

As Canva scales, change continues to be part of our DNA. This role is focused on:

  • Designing and implementing processes, tools, automation, and libraries that service teams can use to improve the reliability of the services they own.
  • Working with product engineering teams to ensure reliability best practices and tools are rolled out in every service across the organization.
  • Fostering a culture within the Engineering org that puts reliability first and establishes processes and policies that drive reliability within product engineering teams.
  • A deep investigation into production incidents followed up by applying the learning to code.
  • Researching, developing, and justifying the best choices in the form of design docs for tools and processes that will shape the future of reliability at Canva.
  • Proposing new approaches and solutions to ensure we future-proof Canva’s distributed cloud infrastructure as we scale.
  • Participating in design meetings, hiring interviews, and code reviews.

You're probably a match if

  • You have advanced coding proficiency in Python/Java/GoLang and strong Object Oriented Programming fundamentals.
  • You have five-plus (5+) years of commercial experience working with developing complex, distributed web applications.
  • You have experience diagnosing and addressing issues across the “full stack”, including front-end code, backend, network/infrastructure, and data layer.
  • You have solid understanding of observability principles, such as metrics, logs, tracing, synthetic testing, query construction, dashboarding, and alerting.
  • You have experience with guiding others in the principles of incident review, investigation, and remedial activity.
  • You have disciplined coding practices, experience with code reviews and pull requests, and a creative and conceptual problem-solving approach.
  • You have strong communication and team collaboration skills, both written and verbal.

Nice to have; Not required!

  • Experience in Java is a nice to have. Our platform and infrastructure tooling is primarily written in Python, Go, and Terraform.
  • Experience working with microservice architectures in large containerised, distributed cloud environments (ideally AWS).
  • Experience working with data warehouse, analytics, and reporting tools such as Snowflake, Mode Analytics, and Looker.

About the Group

The Reliability Platform Group is responsible for providing the tools and processes to scale reliability across all Canva services. Our teams work together, and with other groups, to deliver preventive and detective tooling, processes, and best practices that uplift Canva’s reliability.

This role sits within the Production Health team, whose focus is on providing tools and guidance for Canva’s engineering teams to measure and maintain their systems’ reliability.

What's in it for you?

Achieving our crazy big goals motivates us to work hard, but you'll experience lots of moments of magic, connectivity, and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work.

  • Equity packages - we want our success to be yours too.
  • Inclusive parental leave policy that supports all parents & carers.
  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more.
  • Flexible leave options that empower you to be a force for good, take time to recharge, and support you personally.

Check out lifeatcanva.com for more info.

Other stuff to know

We make hiring decisions based on your experience, skills, and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.

We celebrate all types of skills and backgrounds at Canva, so even if you don’t feel like your skills quite match what’s listed above - we still want to hear from you!

Please note that interviews are conducted virtually.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.