Enable job alerts via email!

Test Engineer - AI and LLMs

Architech Solutions Consulting Services Inc.

Toronto

On-site

CAD 100,000 - 125,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology solutions company seeks a Test Engineer specializing in AI and LLM evaluation. In this role, you'll leverage your software development and automation expertise to ensure the quality and reliability of advanced AI applications within a telecom environment. Join a diverse team committed to innovation and drive the future of technology with your analytical skills.

Qualifications

  • 3-5 years of experience in SDET or QA automation.
  • Strong hands-on programming experience in Python.
  • Experience with test automation frameworks.

Responsibilities

  • Design and execute automated evaluation suites for AI/LLM components.
  • Define comprehensive test strategies for evaluating LLM outputs.
  • Collaborate with developers and data scientists on AI models.

Skills

Analytical Thinking
Problem Solving
Automation
Communication

Education

Bachelor's degree in Computer Science or related field

Tools

Python
Pytest

Job description

2 weeks ago Be among the first 25 applicants

This range is provided by Architech. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

CA$45.00/hr - CA$60.00/hr

Architech is a Toronto-based software company with 20 years of experience in creating technology solutions for clients across North America. We leverage the latest cloud technology and hire top talent to modernize applications so that businesses can succeed in today’s digital world.

Our Dream Team has a main hub in Toronto, but expands across Canada and Kraków, Poland. Our team consists of over 100 certified technical experts in our Product, Design, Engineering, and Delivery disciplines. Our values drive our culture of success: Think Big, Be Open & Collaborate, Never Fail a Client, Grow Our People, Do the Right Thing, and Embrace Change.

Be Open & Collaborate: Our Culture Says It All

You’ll work very closely with a diverse tight-knit group of creative and talented people who are passionate about technology, software, and solutions. Not only will you work in a collaborative and supportive environment, you’ll also grow your existing skills while keeping up with technology trends.

Who We Are

We’re passionate about creating an environment where every team member feels empowered to share their unique point of view. We celebrate diverse talents and encourage our teammates to share their whole selves – because our greatest source of inspiration is each other, and we believe diversity drives innovation.

In order to be inclusive, we must be intentional. We have taken a multi-pillar approach to D&I at Architech including: Listening & Learning, Being an Ally, and Accountability.

In 2020 we launched our first Diversity & Inclusion survey. While we are always striving for more equal representation, we are very proud of our results:

  • 49% of our people were born in countries other than where our offices are located. Our team members collectively speak 19 different languages. 59% of our people speak more than one language
  • In the past year Architech has increased the number of women in our technology function by 200%. We strive to do even better as our multi-year strategic plan unfolds.
  • We analyzed salaries by gender of persons in the same role and are delighted to report a 0% gender pay gap in our delivery and technology roles!

What Our People Say

“Employees of different backgrounds interact well within our company” – 97% of employees agree

“Architech respects individuals and values their differences” - 96% of employees agree

Welcome to Architech.

Test Engineer - AI and LLMs

We are seeking a highly motivated Test Engineer - AI and LLM Evaluation with a strong software development background and a passion for ensuring the quality and reliability of cutting-edge AI applications. This is not a traditional QA role. We need an engineer experienced in automation who understands software development principles and the nuances of evaluating Generative AI systems, particularly those leveraging Large Language Models (LLMs). You will be integral to testing AI-driven solutions within a telecom-focused environment, focusing on the quality, reliability, performance, safety, and fairness of applications built using LLMs, RAG pipelines, and other AI models through rigorous evaluation and testing.

If you are an analytical thinker, a meticulous problem solver, and a fast learner eager to work at the forefront of AI evaluation, this role is for you!

Key Responsibilities

  • Design, develop, and execute automated evaluation suites and test cases specifically targeting AI/LLM components, focusing on aspects like response quality, factual accuracy, safety, and task completion.
  • Implement and manage batch testing processes using curated datasets to assess model performance, identify regressions, and benchmark different model versions or prompts.
  • Develop, maintain, and enhance test and evaluation frameworks using libraries such as Promptflow, DeepEval, Ragas, and similar LLM evaluation tools.
  • Define and implement comprehensive test strategies to evaluate LLM outputs for accuracy, relevance, coherence, safety (toxicity, bias), hallucination reduction, and consistency, using both automated metrics and potentially qualitative review processes.
  • Collaborate closely with developers, data scientists, and prompt engineers to understand model behavior, identify edge cases, potential biases, and failure modes in AI models and agents.
  • Test and validate components of Retrieval-Augmented Generation (RAG) pipelines, including retriever performance, chunking strategies, and generator quality.
  • Evaluate the end-to-end functionality and performance of AI-driven workflows within telecom applications against defined benchmarks.
  • Continuously research and improve testing methodologies and metrics for AI/LLM applications, incorporating industry best practices in automated evaluation and validation.
  • Document evaluation results and findings, providing actionable feedback to development teams to enhance AI model robustness, reliability, and overall quality.

Required Skills & Qualifications

  • 3-5 years of experience in software development, SDET (Software Development Engineer in Test), or QA automation, with a demonstrable focus on backend systems, APIs, or complex data pipelines.
  • Strong hands-on programming experience in Python is essential.
  • Proven experience with test automation frameworks and libraries (e.g., Pytest).
  • Solid understanding of AI/ML concepts, particularly LLMs, Generative AI, prompt engineering, vector databases, RAG architectures, and principles of LLM safety and ethical AI testing.
  • Experience or strong familiarity with LLM evaluation metrics and methodologies (e.g., ROUGE, BLEU, BertScore, F1, precision, recall, faithfulness, relevance).
  • Familiarity with API testing (e.g., testing RESTful APIs used by AI services) and tools (e.g., Postman, requests library).
  • Experience with version control systems (e.g., Git) and CI/CD pipelines (e.g., Jenkins, GitLab CI, GitHub Actions).
  • Strong analytical skills and a meticulous, problem-solving mindset.
  • Excellent communication skills and the ability to articulate complex technical issues clearly.
  • *A quick learner who can rapidly adapt to evolving AI technologies and evaluation techniques.

Preferred Qualifications

  • Direct hands-on experience using LLM evaluation frameworks like Promptflow, DeepEval, Ragas, LangSmith, or similar.
  • Experience with or exposure to LLM red teaming tools and techniques (e.g., Garak, PyRIT, Giskard, manual adversarial prompt crafting) is a significant advantage.
  • Experience developing and managing datasets for testing and evaluation (e.g., 'golden datasets', adversarial examples).
  • Familiarity with data handling and manipulation libraries in Python (e.g., Pandas, NumPy).
  • Knowledge of AI ethics, fairness, and bias testing methodologies beyond basic safety checks.
  • Experience with cloud platforms (AWS, GCP, Azure), particularly services related to AI/ML.
  • Experience working in the telecom sector.
  • Experience with UI test automation (e.g., Selenium, Playwright) for testing applications integrating AI features is a plus, but not the primary focus of this role.

Architech is an equal opportunity employer committed to diversity. Should you require any accommodations prior to or during the interview process, please indicate this during the interview process. We strongly encourage applications from racialized people, people with disabilities, people from gender and sexually diverse communities and/or people with intersectional identities.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Contract
Job function
  • Job function
    Quality Assurance, Management, and Engineering
  • Industries
    IT Services and IT Consulting

Referrals increase your chances of interviewing at Architech by 2x

Get notified about new Test Engineer jobs in Toronto, Ontario, Canada.

Frontend Software Engineer (Remote - Canada)
Software Engineering Intern (September 2025)
Senior Software Engineer (Features) - North America
Freelance GenAI Developer - Prompt Engineering & Data Workflows
Entry Level iOS Engineer - Services (Remote - Canada)
Freelance Software Developer (Python) - Quality Assurance (AI Trainer)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Founding Engineer, AI

GuruLink

Toronto

Remote

CAD 100,000 - 140,000

5 days ago
Be an early applicant

AI Engineer

Part3 Technologies Corp.

Toronto

Remote

CAD 90,000 - 120,000

6 days ago
Be an early applicant

Software Engineer, AI Agents

Replicant

Toronto

Remote

CAD 80,000 - 110,000

5 days ago
Be an early applicant

Sr. Applied AI Engineer

Zapier

Remote

CAD 100,000 - 130,000

6 days ago
Be an early applicant

AI Developer & Advanced Methods

Cadillac / GM

Markham

Hybrid

CAD 80,000 - 120,000

2 days ago
Be an early applicant

AI Engineer/ Sr AI Engineer

Chubb

Toronto

On-site

CAD 90,000 - 140,000

8 days ago

Machine Learning Engineer: 2D & 3D Geometric Data, Generative AI. Remote or Hybrid Canada

Autodesk

Toronto

Hybrid

CAD 80,000 - 120,000

2 days ago
Be an early applicant

LLM Engineer / AI Solutions Engineer

Fulfillment IQ

Toronto

On-site

CAD 80,000 - 120,000

12 days ago

AI Developer & Advanced Methods

General Motors

Markham

Hybrid

CAD 80,000 - 120,000

5 days ago
Be an early applicant