Enable job alerts via email!

CloudSite ReliabilityEngineer

Solace Corporation

Ottawa

Hybrid

CAD 80,000 - 120,000

Full time

22 days ago

Job summary

Solace Corporation is seeking a Cloud Site Reliability Engineer to maintain the reliability of its SaaS offerings. The ideal candidate will manage daily operations across cloud platforms like AWS and Azure, ensure service health, improve infrastructure, and resolve incidents efficiently. If you are a technical problem solver with experience in cloud environments, we encourage you to apply and join a company that values diversity and work-life balance.

Benefits

Work-life balance
Hybrid work model
Top-notch training programs
Social, fun environment

Qualifications

  • Hands-on experience with public cloud providers such as AWS, Azure, GCP.
  • Expertise in debugging production alerts.
  • Programming skills in Groovy, Python, or Go.

Responsibilities

  • Ensure Solace Cloud Services are healthy and reliable.
  • Contribute to production operations efficiency.
  • Handle production incidents in multi-cloud environments.

Skills

Cloud Networking
System-Level Debugging
Site Reliability Engineering
Customer-Facing Support
Monitoring Tools

Education

Certified Kubernetes Administrator
Certified Cloud Administrator (AWS, Azure, or GCP)

Tools

Terraform
Datadog
Kibana
Prometheus
AWS EKS
Azure AKS
GCP GKE

Job description

Harnessing the Power of Data, Together.

Solace helps companies connect and integrate all of their assets through the power of event-driven architecture. Our technology makes it easy to unlock data silos and capture events occurring across large enterprises; stream information about those events everywhere it needs to be in real-time; and give the apps, AI agents, and people who receive it the power to immediately react with decisive actions and smart decisions.

Many of the world’s biggest companies trust Solace to modernize their IT infrastructure by embracing trends like AI, cloud, and IoT so they can create exceptional experiences for their customers, partners, and employees.

So, the next time you drive a car, order furniture online, fly in a plane, or check your bank balance on your phone, your positive experience could be a direct result of our technology—and your hard work!

Overview

This position is for a Cloud Site Reliability Engineer. You will be responsible for the daily operations of Solace Cloud, our market-leading SaaS offering, across leading cloud providers and platforms such as Amazon Web Services, Microsoft Azure, Google Cloud Platform, Kubernetes, etc.

What You Will Do:

  1. Ensure that the Solace Cloud Services are healthy and reliable, and that SLAs are being met.
  2. Improve our infrastructure tooling, observability, and automation.
  3. Contribute to making production operations more efficient and less error-prone.
  4. Handle production incidents in multi-cloud environments according to industry-standard incident management processes.
  5. Process handling service requests and provisioning by customers.
  6. Work directly with customers to identify, troubleshoot, and resolve operational issues.
  7. Use expert debugging knowledge in Linux and Kubernetes to detect malicious activity.
  8. Participate in on-call rotation and provide 12x7 off-hours support.

Ideally, You Will Be:

  • Be highly technical, excited by technology, and eager to stay up-to-date in a rapidly evolving environment.
  • Have experience in cloud networking solutions.
  • Be knowledgeable in debugging at a system level and resolving incidents in complex cloud-based environments.
  • Have experience in site reliability engineering and incident response.
  • Be a strong communicator who can articulate complex technical issues clearly and concisely, and communicate effectively with customers.
  • Have experience in SaaS operations and customer-facing technical support.

Required Skills:

  • Hands-on experience with public cloud providers (AWS, Azure, GCP) services and features.
  • Hands-on experience with cloud Kubernetes infrastructure platforms such as AWS EKS, Azure AKS, GCP GKE.
  • Hands-on experience with monitoring tools like Datadog, Kibana, and Prometheus.
  • Experience with infrastructure automation using Terraform, CloudFormation.
  • Expertise in debugging production alerts.
  • Expert-level understanding of Linux operating systems.
  • Programming skills in Groovy, Python, or Go.
  • Certified Kubernetes Administrator.
  • Certified Cloud Administrator (AWS, Azure, or GCP).

Why You’ll Want to Join Us at Solace

  • Work with some of the smartest individuals in the business.
  • We value work-life balance and loving what you do.
  • Hybrid work model to promote inclusivity.
  • We live by our values: craftsmanship, trust, courage, freedom, momentum, humility, and human experience.
  • Top-notch training programs.
  • Our stellar customer lineup!
  • Social, fun environment.
  • Top-ranked employer on Glassdoor.

We understand that experience varies. Not sure you meet all requirements? We still want to hear from you! Your unique experience could be exactly what we’re looking for.

At Solace, we believe diversity and inclusion drive innovation and growth. We strive to create an enriching and safe workplace where you can be yourself. If you want to do the best work of your career and feel supported, we encourage you to join us!

Accommodations are available upon request for anyone participating in the hiring process. Let us know how we can help! We thank all candidates for their interest; however, only those selected for further steps will be contacted.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.