Enable job alerts via email!

Senior Software Engineer - Ceph

Boson AI

Toronto

On-site

CAD 150,000 - 250,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Boson AI, a startup focused on generative AI, is searching for a Senior Software Engineer with expertise in Ceph to manage their deep learning datacenter in Toronto. This role involves integrating storage with high-performance computing while working with advanced technologies like NVIDIA GPUs and significant storage resources.

Qualifications

  • Experience with Ceph is mandatory.
  • Strong background in maintaining large storage arrays.
  • Proficiency in at least one programming language (e.g., Python).

Responsibilities

  • Design, manage and maintain large storage arrays.
  • Integrate storage systems with Deep Learning infrastructure.
  • Configure and automate on-premises Linux-based systems.

Skills

Problem Solving
Learning New Tools
Maintaining Ceph Clusters
High Performance Computing
Clean Code Practices

Tools

Ceph
Slurm
MAAS
Kubernetes
NVIDIA deepops

Job description

6 days ago Be among the first 25 applicants

Get AI-powered advice on this job and more exclusive features.

This range is provided by Boson AI. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

CA$150,000.00/yr - CA$250,000.00/yr

Boson AI is a startup building large language tools for everyone to use. Our founders (Alex Smola, Mu Li), and a team of Deep Learning, Optimization, NLP, AutoML and Statistics scientists and engineers are working on high quality generative AI models for language, audio, and entertainment.

About The Role

We are looking for a Senior Software Engineer with deep expertise in managing Ceph for our deep learning datacenter in Toronto. The ideal candidate needs to have strong problem solving skills and an ability to learn new tools. Experience with Slurm, MAAS, Infiniband, NVIDIA deepops, Layer 3 networking and related tools are a big plus. You should be comfortable performing some amount of hardware configuration.

You will have the opportunity to work with NVIDIA H100 and A100 GPUs, over 25PB of disk and over 5PB flash storage, Terabit networking and hundreds of computers. You will be responsible for deploying and operating Ceph and its integration with a broad range of infrastructure technologies and hardware systems.

You MUST have prior Ceph experience in order to qualify for the job. If you don't, please don't spam the ATS.

A day in the life:

  • Design, manage and maintain large storage arrays
  • Integrate them with Deep Learning infrasturcture
  • Support troubleshooting for MAAS, Slurm and Kubernetes as needed
  • Configure and automate on-premises Linux-based systems at scale using infrastructure-as-code practices
  • Learn about new tools and deploy them


You might be a great fit if you have:

  • Strong background in maintaining Ceph clusters
  • Experience with high performance computing is highly desirable
  • Experience with with on-premises Data Center operations and technologies
  • Experience in managing a large hardware cluster
  • Proficiency in at least one programming language (e.g. Python) and ability to write clean, maintainable code
  • Experience with managing firmware / systems updates for systems, e.g. on SuperMicro


The ability to solve problems and to learn new techniques is key.

Seniority level
  • Seniority level
    Not Applicable
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology
  • Industries
    Research Services

Referrals increase your chances of interviewing at Boson AI by 2x

Sign in to set job alerts for “Senior Software Engineer” roles.
Senior Engineering Manager, Developer Experience
Senior Software Engineer - User Generated Experiences
Senior Software Engineer (Identity Decisioning)
Senior Software Engineer (Full-Stack) NextJS | Typescript | NestJS | MongoDB
Senior Team Lead, Software Development (IoT)
Java - Applications Development Sr Programmer Analyst - AVP
Senior Software Engineer, Business Process & Automation
Java - Applications Development Sr Programmer Analyst - AVP
Java - Applications Development Sr Programmer Analyst - AVP
Java - Applications Development Sr Programmer Analyst - AVP
Software Engineer III - Mainframe Developer and Lead

Toronto, Ontario, Canada $120,000.00-$175,000,000.00 3 hours ago

Senior Manager, Developer Security Enablement and Operations
Lead Data Scientist - Feature Engineering
Lead Data Scientist - Articial Intelligence Specialist

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Software Engineer II, Backend (Consumer Authentication)

Affirm

Victoria

Remote

CAD 125,000 - 175,000

Yesterday
Be an early applicant

Software Engineer II (Capacity Engineering)

Affirm

Winnipeg

Remote

CAD 125,000 - 175,000

Yesterday
Be an early applicant

Senior Software Engineer - Ceph

Boson AI

Toronto

On-site

USD 150,000 - 250,000

30+ days ago

Software Engineer II (Capacity Engineering)

Affirm

Edmonton

Remote

CAD 125,000 - 175,000

16 days ago