Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
An innovative startup is seeking a Senior Software Engineer to manage Ceph for their deep learning datacenter. This role offers the chance to work with cutting-edge NVIDIA GPUs and large-scale storage systems. The ideal candidate will have a strong background in Ceph management, high-performance computing, and a knack for problem-solving. You'll be responsible for designing and maintaining storage solutions, integrating them with deep learning infrastructure, and automating Linux systems using infrastructure-as-code practices. Join a dynamic team at the forefront of generative AI and contribute to groundbreaking projects.
Boson AI is a startup building large language tools for everyone to use. Our founders (Alex Smola, Mu Li), and a team of Deep Learning, Optimization, NLP, AutoML, and Statistics scientists and engineers are working on high-quality generative AI models for language, audio, and entertainment.
We are looking for a Senior Software Engineer with deep expertise in managing Ceph for our deep learning datacenter in Toronto. The ideal candidate should have strong problem-solving skills and an ability to learn new tools. Experience with Slurm, MAAS, Infiniband, NVIDIA DeepOps, Layer 3 networking, and related tools are a big plus. Comfort with hardware configuration is also important.
You will have the opportunity to work with NVIDIA H100 and A100 GPUs, over 25PB of disk and over 5PB of flash storage, Terabit networking, and hundreds of computers. Your responsibilities will include deploying and operating Ceph and integrating it with a broad range of infrastructure technologies and hardware systems.