TEEMA Solutions Group
Axelon Services Corporation
Axelon Services Corporation
Connect with headhunters to apply for similar jobsCapital One - CA
AIG Insurance
Englobe
Motive
Solutions Techso
Anradus
AOG Heliservices Inc.
AOG Heliservices Inc.
Powertech Labs
A rapid-growth technology firm in Toronto is seeking a Staff LLMOps Engineer to lead the design and optimization of large language model infrastructure on the cloud. The ideal candidate has over 6 years of experience in DevOps and expertise in deploying LLMs in cloud environments. Responsibilities include architecting deployment pipelines and ensuring high-performance AI applications. Competitive salary and equity are included in the offer.
Location: Downtown Toronto
Hybrid: 4 days in office
Ready to build what powers the next generation of AI?
We’re looking for a Staff LLMOps Engineer to lead the design, deployment, and optimization of large language model (LLM) infrastructure on the cloud.
You’ll be the driving force behind taking trained models from lab to production—scaling efficiently across multi-GPU clusters and pushing the boundaries of inference performance for enterprise-grade AI applications.
If you thrive at the intersection of AI, cloud engineering, and systems optimization, this is your chance to shape the future of large-scale model serving in a high-impact environment.
Architect and operationalize LLM deployment pipelines on AWS and Kubernetes/EKS.
Build and scale multi-GPU inference infrastructure for low latency, high availability, and cost efficiency.
Optimize inference using frameworks like vLLM, SGLang, and DeepSpeed-Inference.
Implement advanced serving techniques: continuous batching, speculative decoding, KV-cache management, and distributed scheduling.
Collaborate with AI researchers to convert model training outputs into production-grade APIs and services.
Establish observability and monitoring for latency, throughput, GPU utilization, and failure recovery.
Automate provisioning, scaling, and upgrades using Terraform and CI/CD pipelines.
Ensure compliance, security, and efficiency in multi-tenant LLM hosting for enterprise clients.
6+ years in DevOps, ML infrastructure, or cloud platform engineering.
2+ years of direct experience deploying and optimizing LLMs or large-scale ML models.
Expertise with GPU-accelerated inference and distributed serving environments.
Deep familiarity with cloud-native architectures (AWS, GCP, Azure) and Kubernetes.
Strong foundation in Python, Bash, and IaC (Terraform).
Experience integrating monitoring tools (Prometheus, Grafana, Datadog) for performance visibility.
Passion for building robust, scalable, and secure AI systems.
Lead and own mission-critical AI infrastructure at a fast-scaling startup.
Work alongside world-class engineers, data scientists, and innovators.
Competitive salary + meaningful equity in a company redefining applied AI.
A culture built on innovation, technical depth, and impact—your work truly matters.
* The salary benchmark is based on the target salaries of market leaders in their relevant sectors. It is intended to serve as a guide to help Premium Members assess open positions and to help in salary negotiations. The salary benchmark is not provided directly by the company, which could be significantly higher or lower.