Enable job alerts via email!

Senior Site Reliability Engineer, Fleet - REMOTE within Canada

Meraki, LLC

Canada

Remote

CAD 80,000 - 100,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking cloud-managed IT company as a Senior Site Reliability Engineer. In this dynamic role, you'll ensure the stability and efficiency of a vast infrastructure while collaborating with global teams. Your expertise in automation, troubleshooting, and cloud environments will be pivotal in optimizing performance and scaling operations. This innovative firm fosters a culture of diversity and collaboration, empowering you to make impactful contributions. If you're passionate about driving technology forward and enjoy working in a flexible, inclusive environment, this opportunity is perfect for you.

Qualifications

  • 5+ years in SRE, DevOps, or similar roles in large-scale cloud environments.
  • Strong expertise in Ansible, Ruby, and Linux systems administration.

Responsibilities

  • Develop automation code for cloud maintenance using Ansible and Ruby.
  • Debug complex failure scenarios to ensure high availability and reliability.

Skills

Site Reliability Engineering
Ansible
Ruby
Linux systems
GitLab CI
CI/CD pipelines
Distributed systems troubleshooting
Cloud providers (AWS, GCP)
Monitoring and observability tools
Disaster recovery strategies

Tools

RSpec
Automated tools for compliance

Job description

Cisco will observe our annual year-end shutdown from December 24 to January 5. During this period, we will not conduct candidate interviews or respond to job applications. Normal interview processes and application responses will resume after January 6.

Cisco Meraki, a division of Cisco Networking, is a cloud-managed IT company and leader in cloud-controlled Wi-Fi, routing, and security. Our intuitive platform enables organizations of all sizes to deliver customer and employee experiences at scale. To provide best-in-class technologies to our customers, we’ve created an unrivaled company culture for our employees. One where diverse backgrounds, perspectives, and experiences shape our work and fuel our evolution. One that is collaborative, flexible, and inclusive and provides employees with the autonomy to develop technology that’s accessible and secure for everyone.

We are seeking a Senior Site Reliability Engineer (SRE) to join our dynamic SRE Fleet team, which is responsible for ensuring the stability, scalability, and efficiency of our infrastructure. You will play a critical role in maintaining and improving a fleet of over 2000+ machines across a global cloud environment. This role is highly collaborative, involving close interaction with engineering and SRE teams in the UK and San Francisco to scale and optimize our infrastructure.

Responsibilities
  • Develop and maintain automation code for cloud maintenance processes using Ansible and Ruby.
  • Debug and resolve complex failure scenarios across large-scale systems, ensuring high availability and reliability.
  • Design, implement, and optimize GitLab CI pipelines to streamline deployment and testing workflows.
  • Collaborate with engineering teams to identify and address performance bottlenecks and scaling challenges.
  • Proactively troubleshoot issues across the fleet, using a deep understanding of Linux systems and networking.
  • Contribute to the creation of robust unit tests and infrastructure testing suites with RSpec.
  • Participate in collaborative projects to improve infrastructure efficiency, scalability, and observability.
  • Work cross-functionally with teams in different time zones, fostering a culture of shared ownership and reliability.
  • Develop and maintain automated tools for collecting infrastructure data to support compliance requirements.
  • Streamline compliance processes by reducing manual overhead through automation.
You are an ideal candidate if you:
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role in large-scale cloud environments.
  • Strong expertise in:
  • Ansible for infrastructure automation.
  • Ruby programming and testing frameworks like RSpec.
  • Linux systems administration and troubleshooting.
  • CI/CD pipelines, particularly GitLab CI.
  • Demonstrated experience troubleshooting and debugging in complex distributed systems.
  • Experience managing and optimizing fleets of thousands of machines.
  • Excellent collaboration skills and the ability to work effectively across teams in multiple time zones.
  • Passion for automation, scalability, and infrastructure as code.
  • Familiarity with cloud providers (AWS, GCP, or similar).
  • Knowledge of monitoring and observability tools.
  • Experience with disaster recovery and high availability strategies.

At Cisco Meraki, we’re challenging the status quo with the power of diversity, inclusion, and collaboration. When we connect different perspectives, we can imagine new possibilities, inspire innovation, and release the full potential of our people. We’re building an employee experience that includes appreciation, belonging, growth, and purpose for everyone.

Apply for this job

First Name *

Last Name *

Email *

Phone *

Specify your location (not office preference) *

Resume/CV *

LinkedIn Profile

Race/Ethnicity *

Disability Status *

Are you now legally authorized to work in the posted primary location for this requisition? *

Will you require sponsorship in the future for this location (for example, if you are on a temporary visa)? *

How did you hear about Meraki?

Voluntary Demographic Information for EEO Purposes. Why do we ask these questions? Learn More

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer (GCP)

Stacktics

Toronto

Remote

CAD 80.000 - 110.000

30+ days ago

Site Reliability Engineer (GCP)

Stacktics Inc.

Toronto

Hybrid

CAD 80.000 - 110.000

30+ days ago