Huawei Canada has an immediate permanent opening for a Principal Engineer.
About the team:
Established in 2014, the Distributed Scheduling and Data Engine Lab is Huawei Cloud's technical innovation center in Canada. The lab focuses on researching and developing advanced cloud technologies, supporting the productization and iterative optimization of its technical achievements. Current research areas include cloud native databases, infrastructure resource scheduling and prediction, cloud-native middleware, media engines, and user experience studies. The lab fosters a robust technical environment, allowing collaboration with industry experts to create a highly competitive cloud platform. Our team has an immediate permanent opening for a Principal Software Engineer.
About the job:
Integrate AI frameworks with cloud infrastructure to optimize end-to-end architecture for AI inference and fine-tuning scenarios. Focus on improving the observability, reliability, and performance of AI services.
Collaborate with team members to design and develop concept prototypes. Conduct validation of optimization strategies to ensure effectiveness.
Work closely with the product team to support the development of prototypes, taking into account the constraints and requirements of the product's current status.
About the ideal candidate:
5 years of software development experience, with a minimum of 2 years of experience in AI infrastructure-related platform R&D for fine-tuning or inference, including but not limited to AI workload profiling tools development, vLLM or SGLang development, infrastructure level troubleshooting and root cause analysis.
Proficiency in Golang or Rust. Must be able to write clean, efficient, and high-quality code from scratch.
In-depth understanding of AI technologies and familiarity with the module interactions involved in AI model training and inference.
Proficient in Kubernetes or Ray, with practical experience in developing services based on these platforms.
Strong understanding of cloud services and platforms such as AWS and Azure.
Highly analytical, with strong problem-solving skills and the ability to address complex technical challenges effectively.
Self-driven, with a proven ability to learn quickly and take initiative.
Master's or Ph.D. degree in Computer Science, Engineering, or a related field, or equivalent practical experience.