Enable job alerts via email!

Senior Site Reliability Expert (Retail)

Lightspeed Commerce

Toronto

On-site

CAD 100,000 - 150,000

Full time

6 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a dynamic team at Lightspeed Commerce as a Senior Site Reliability Engineer in Toronto. You will be tasked with enhancing the reliability, scalability, and observability of our Retail Platform, while working with cutting-edge technologies like Kubernetes and Terraform. This role allows for significant professional growth in an inclusive culture focused on innovation and collaboration.

Benefits

Equity for all employees
Flexible paid time off
Health insurance
Pension plan contributions
Health and wellness benefit
Parental leave assistance
Mental health support services
Training opportunities for career development
Fully stocked kitchen
Happy hours with team

Qualifications

  • Comfortable coordinating multi-team projects.
  • Analytical mindset to drive technical decisions.
  • Good understanding of SLAs/SLOs.

Responsibilities

  • Manage and design Kubernetes clusters ensuring reliability and scalability.
  • Act as an incident lead during service disruptions.
  • Advocate best practices for Infrastructure as Code and DevOps.

Skills

Scalability
Reliability
Observability
Cloud services optimization
Agile development
Incident response

Tools

Kubernetes
Terraform
CI/CD (CircleCI, Jenkins)
Scripting languages (Bash, Python)

Job description

Are you actively seeking a new opportunity, or simply exploring the market? Either way, you might have just found the right place!

We’re looking for a Senior SRE to join our Lightspeed Retail group in North America, a team responsible for multiple POS systems infrastructure and developer experiences. The team is at the helm of providing a stable, reliable and efficient system to our retailers.

Our team is also dedicated to designing, building, and operating the infrastructure that powers Lightspeed Retail. This platform supports the entire software delivery lifecycle, from CI / CD pipelines to highly available and scalable production environments.

What you’ll be responsible for :

  • As a member of the Site Reliability Expert team :
  • Being an active member of the Retail Platform team, where you will be responsible for the observability, scalability and reliability of the Retail Platform.
  • Designing and implementing Kubernetes clusters for various use cases, ensuring scalability, reliability, and security.
  • Configuring and managing Kubernetes clusters, including nodes, networking, and storage.
  • Performing updates to multi-platform Kubernetes clusters in critical production environments
  • Act as both a subject matter expert and an incident lead during the incident response process
  • Initiate and contribute to continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product development
  • Obsess over reliability, help teams deliver reliable software
  • Adhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies
  • Provide timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems (You will be on call for periods of time)

What you’ll be bringing to the team :

  • A passion for scalability, reliability and observability and a desire to share that passion with others in a positive, solutions-oriented way
  • Comfortable with leading projects which require coordination and collaboration with other development teams to reach a common goal
  • A desire to quickly grow your ability to champion process changes in the pursuit of the SRE mandate
  • Proven track record of driving optimization of cloud services, including, but not limited to data pipelines, storage, databases, caching layer, cores, memory, etc
  • Understanding different types of SLAs / SLOs and different types of resource contracts, such as reserved instances and savings plans.
  • Analytical mindset : live by the metrics, deeply understand data and use it to drive technical decisions
  • Good understanding of Agile development and continuous delivery best practices, software engineering tools, processes, methods and testing
  • Primary ownership of customer-facing, zero-downtime production environments using the following toolsets :
  • CI / CD pipelines (CircleCI, Jenkins, Github, ArgoCD, Helm)
  • Infrastructure as Code (Terraform)
  • Programming or Scripting languages (Bash, Python, Ruby, Java, Golang, etc.)

Who you are :

  • You are a problem solver who does not shy away from tackling complexity and critical thinking
  • You have a strong will to learn, grow and get out of your comfort zone
  • You have great energy and passion for technology
  • You can express yourself flawlessly in English
  • You have strong interpersonal skills
  • You are a team player and a bar raiser

What's in it for you :

  • Join a growing team and help us move to the next level
  • Amazing benefits & perks, including equity for all Lightspeeders
  • Constant development of both your skill-set and business acumen with limitless growth opportunities
  • Lots of autonomy, flexible work culture
  • Innovation time to explore and learn at work
  • Shaping the company by joining cultural & technical committees
  • Tons of growth opportunities into technical or people management roles
  • Opportunity to join a fast-paced, high-growth company
  • Opportunity to learn, expand your skill set, forge wonderful relationships and make your mark within the diverse and inclusive Lightspeed family, a true Canadian tech success story

And enjoy a range of benefits that will keep you happy, healthy and (not) hungry.

  • Lightspeed equity scheme (we are all owners).
  • Flexible paid time off and remote work policies.
  • Health insurance.
  • Contributions to your pension plan - RRSP.
  • Health and wellness benefit of $500 per year.
  • Paid leave and assistance for new parents.
  • Mental health online platform and counseling & coaching services.
  • Training opportunities to grow your skills and career
  • Fully stacked kitchen (hot and cold beverages, meals served)
  • Happy hours to build your relationships with colleagues after work

To all recruitment agencies : Lightspeed does not accept unsolicited agency resumes. If we have not directly engaged your company in writing to supply candidates for a specific vacancy, Lightspeed will not be responsible for any fees related to unsolicited resumes.

Lightspeed is a proud equal opportunity employer and we are committed tocreating an inclusive and barrier-free workplace. Lightspeed welcomes andencourages applications from people with disabilities. Accommodations areavailable on request for candidates taking part in all aspects of theselection process.

Where to from here?

Obviously, this has to be mutually beneficial : we want you to step into a role you love, and we want to offer you a place you’re proud to come to every day. For a glimpse into our world check out our career page here .

Lightspeed is building communities through commerce, and we need people from all backgrounds and lived experiences to do that. We were founded in 2005, in Montreal’s gay village and our original members were all part of the LGBTQ+ community. The ethos of our business has been about inclusion from the very beginning, and we strive to provide a workplace where everyone belongs.

Who we are :

Powering the businesses that are the backbone of the global economy, Lightspeed's one-stop commerce platform helps merchants innovate to simplify, scale, and provide exceptional customer experiences. Our cloud commerce solution transforms and unifies online and physical operations, multichannel sales, expansion to new locations, global payments, financial solutions, and connection to supplier networks.

Founded in Montréal, Canada in 2005, Lightspeed is dual-listed on the New York Stock Exchange (NYSE : LSPD) and Toronto Stock Exchange (TSX : LSPD). With teams across North America, Europe, and Asia Pacific, the company serves retail, hospitality, and golf businesses in over 100 countries.

Accepted file types : pdf, doc, docx, txt, rtf

Enter manually

Accepted file types : pdf, doc, docx, txt, rtf

Link to your LinkedIn Profile

Link to your Website

How did you initially hear about this job?

  • Select...

If you chose Lightspeed Employee, Lightspeed Event, or Other, please specify here :

How would you rate your proficiency with Terraform for managing cloud infrastructure?

  • Select...

Have you implemented cost optimization strategies in cloud infrastructure?

  • Select...

Which of the following best describes your hands-on experience with Kubernetes clusters in production environments? (Select all that apply)

I have designed Kubernetes clusters from scratch for production workloads

I have configured and managed networking, storage, and node pools in Kubernetes

I have performed in-place upgrades or rolling updates on live, multi-platform clusters (e.g., EKS, GKE, AKS)

I have implemented security best practices (RBAC, NetworkPolicies, Secrets management, etc.)

I have built and maintained Infrastructure as Code for Kubernetes environments (e.g., with Terraform, Helm, ArgoCD)

I have not worked directly with Kubernetes in a production environment

Do you have experience in Helm charts?

  • Select...

What languages do you speak fluently?

Are you legally eligible to work in the country where this role is based and if so, do you require — now or in the future — visa sponsorship from the Company in order to remain legally eligible to work in this country?

  • Select...

J-18808-Ljbffr

Create a job alert for this search

Senior Reliability • Toronto, ON, Canada

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.