Work Location: Toronto, Ontario, Canada
Hours: 37.5
Line of Business: Analytics, Insights, & Artificial Intelligence
Pay Details: $120,000 - $153,500 CAD
The pay details posted reflect a temporary market premium specific to this role that is reassessed annually. TD is committed to providing fair and equitable compensation opportunities to all colleagues.
Job Description
We are looking for experienced AI and ML Engineers who have worked under tight deadlines and on challenging tasks. The ideal candidate is a strong coder with solid AI and ML engineering experience, as well as expertise in data engineering, machine learning system design, and AI/MLOps.
Key Responsibilities
- Gen AI: Develop and deploy scalable production Gen AI systems and applications.
- Predictive ML: Develop and deploy batch and real-time model inference pipelines to production, perform end-to-end integration testing.
- Mode Serving Framework: Develop in-house model serving framework or integrate open-source model serving framework with enterprise AI and data platform.
- ML System Design: Architect scalable machine learning and Gen AI systems that integrate with existing AI and data platforms and infrastructure, focusing on automation, operation efficiency, and reliability.
- Data Analysis & Processing: Perform data analysis, data preprocessing, and feature engineering on complex structured and unstructured and large datasets for machine learning models and AI applications.
- Model Deployment & Monitoring: Build and deploy model inference pipeline, ground truth pipeline, model monitoring pipeline to production environment. Continuously monitor production model performance and system performance.
- Automation: Build CI/CD pipelines to automate model deployment, deployment validation, model performance monitoring, and model retraining.
- Research: Stay up to date with the latest advancements in AI/ML technologies and apply them to improve existing ML systems or develop new systems and solutions.
- Technical Leadership: Provide technical expertise with a focus on efficiency, reliability, scalability, and security; includes planning, evaluating, recommending, designing, operationalizing, and supporting solutions in compliance with enterprise and industry standards.
- Collaboration: Work with AI/ML platform team, machine learning scientists, product owners and business partners to gather use case requirements and implement technical solutions for production AI/ML models and applications.
Job Requirements
Required Technical Qualifications
- Undergraduate degree required, advanced technical degree preferred (e.g., math, physics, engineering, finance or computer science) Graduate's degree preferred with either progressive project work experience.
- 2+ years of extensive programming experience, 1+ year experience of building machine learning production systems.
- Solid knowledge of applied Machine Learning, Deep Learning, Large Language Models.
- Solid experience with developing MLOps/AIOps CI/CD pipelines for deploying AI/ML models.
- Solid experience with RAG, Agentic AI, LLM fine tuning, LLM serving, end-to-end GenAI application development, deployment, and production.
- Solid cloud experience with Azure or AWS and cloud AI/ML services such as Databricks, Kubernetes, docker and container orchestration, Azure Machine Learning, Azure Data Factory.
- Strong experience with PySpark for big data processing and PyTorch for deep learning model serving.
- Expert coder with Python, Java, or Scala.
- Practical expertise in performance tuning, bottleneck problems analysis, and troubleshooting.
Preferred Qualifications
- Knowledge of cloud engineering.
- Self-motivated and demonstrated ability to take independent action to deliver results.
- Highly developed critical thinking, analytical and problem-solving skills.
- Strong verbal and written communication skills, with the ability to work effectively across teams.