Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
An innovative startup is seeking a Senior Software Engineer with expertise in Ceph management to support its deep learning datacenter in Toronto. In this role, you will work with cutting-edge NVIDIA GPUs and extensive storage systems, ensuring seamless integration with infrastructure technologies. Your responsibilities will include designing and maintaining large storage arrays, troubleshooting various tools, and automating Linux-based systems. If you have a strong problem-solving ability and a passion for learning new technologies, this is an exciting opportunity to contribute to high-quality generative AI models in a dynamic environment.
Boson AI is a startup building large language tools for everyone to use. Our founders (Alex Smola, Mu Li), and a team of Deep Learning, Optimization, NLP, AutoML and Statistics scientists and engineers are working on high quality generative AI models for language, audio, and entertainment.
About The Role
We are looking for a Senior Software Engineer with deep expertise in managing Ceph for our deep learning datacenter in Toronto. The ideal candidate needs to have strong problem solving skills and an ability to learn new tools. Experience with Slurm, MAAS, Infiniband, NVIDIA deepops, Layer 3 networking and related tools are a big plus. You should be comfortable performing some amount of hardware configuration.
You will have the opportunity to work with NVIDIA H100 and A100 GPUs, over 25PB of disk and over 5PB flash storage, Terabit networking and hundreds of computers. You will be responsible for deploying and operating Ceph and its integration with a broad range of infrastructure technologies and hardware systems.
You MUST have prior Ceph experience in order to qualify for the job. If you don't, please don't spam the ATS.
A day in the life:
You might be a great fit if you have:
$150,000 - $250,000 a year
The ability to solve problems and to learn new techniques is key.