Enable job alerts via email!

Staff Storage Engineer

HRB

Canada

Remote

CAD 90,000 - 120,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in the cloud platform division seeks a Staff Operations Engineer specializing in Ceph Storage. This role involves designing and managing scalable storage solutions, optimizing systems with cutting-edge technologies, and requires strong collaboration and technical skills. Ideal candidates will have extensive experience with open-source distributed storage, automation tools, and a focus on operational excellence.

Qualifications

  • Experience with open-source distributed storage solutions like Ceph.
  • Familiarity with Infrastructure as Code (IaC) and configuration management tools.
  • Strong skills in capacity planning and disaster recovery.

Responsibilities

  • Design and build a scalable storage layer for global operation.
  • Maintain automation for logging and monitoring of storage systems.
  • Take ownership of projects, fostering accountability and trust.

Skills

Collaboration
Adaptability
Problem Solving
Scalability
Reliability

Tools

Ceph
Ansible
Terraform
Salt
Puppet
Hadoop
Spark
Kafka

Job description

We’re looking for a Staff Operations Engineer – Ceph Storage to support our storage team in the Cloud Platform division. Our scale spans the globe, with transactions happening 24x7 across our data centers. Every second, millions of requests are evaluated across our exchange. To achieve our mission, global efficiency and reliability are crucial, as every millisecond counts in our business.

What We’re Looking For:

  • Facilitator: Ability to relay information and ideas effectively within and across teams. While technical skills are vital, your ability to collaborate is equally important.
  • Adaptable: Capable of keeping up with industry fast-paced changes and prioritizing tasks amidst competing scope and timelines.
  • Technical: Strong foundation in Operations, with experience solving complex problems and building solutions, including CI/CD, real-time monitoring, and handling production issues.
  • Rigorous: Experience designing and managing massive, globally distributed systems that handle billions of transactions daily. Your approach should be thorough, scalable, and reliable.

Here’s What You’ll be Doing:

  • Design, build, and operate a highly scalable, performant, and resilient storage layer on a global scale.
  • Develop and maintain automation for logging, monitoring, and maintenance of the storage layer.
  • Work with technologies such as Hadoop, Spark, Aerospike, Kafka to enhance and optimize systems.
  • Participate in complex security system designs and mentor junior team members.
  • Take ownership of large projects and components as a senior contributor.
  • Champion process and procedure improvements within the team and division.
  • Influence the team’s direction, fostering accountability, trust, and goal focus.
  • Promote company values internally and externally.

Here's What You Need:

  • Experience building, maintaining, and troubleshooting open-source distributed storage solutions like Ceph and storage orchestrators such as Rook, in an automated and large-scale environment.
  • Experience with Infrastructure as Code (IaC) and configuration management tools like Salt, Ansible, Puppet, or Terraform.
  • Experience with storage-level replication technologies.
  • Strong skills in capacity planning, disaster recovery, and monitoring.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.