Enable job alerts via email!

GOV AI Quality Engineer | LLM | NLP

SCIENTEC CONSULTING PTE. LTD.

Singapore

On-site

SGD 60,000 - 80,000

Full time

8 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A consulting firm in Singapore is seeking an AI Quality Engineer to ensure the accuracy and performance of Large Language Models (LLMs) in GenAI applications. The role involves identifying inaccuracies, developing automated test scripts, and collaborating with AI teams to enhance model behaviour. Candidates must possess strong experience in LLM testing, proficiency in Python, and familiarity with NLP methodologies. The position offers dynamic working hours and a remuneration of up to $9,000 plus AWS.

Qualifications

Experience testing LLMs for chatbots and AI.
Strong Python skills for writing test scripts.
Ability to document and track issues.

Responsibilities

Design and execute test cases for LLM accuracy.
Detect and analyse hallucinations in outputs.
Develop automated test scripts to streamline regression testing.
Conduct functional and non-functional testing.
Evaluate model output quality using NLP metrics.
Collaborate with AI engineers to improve model behaviour.
Perform regression testing after updates.
Maintain structured documentation for testing.
Use issue tracking tools to report bugs.
Apply LLMs and NLP knowledge for QA coverage.

Skills

Testing LLMs (e.g., GPT, BERT)

Test automation (PyTest, custom AI frameworks)

Accuracy evaluation methods for NLP

AI/NLP testing methodologies

Python programming

Issue tracking using Jira

Problem-solving skills

Tools

Jira

AWS

GCP

Azure

AI Quality Engineer (LLM/NLP)

Working Hours: Mon-Fri
Location: Central
Remuneration: Up to $9,000 + AWS

Job Summary

We are seeking an AI Quality Engineer to evaluate and ensure the accuracy, reliability, and performance of Large Language Models (LLMs) used in GenAI applications such as chatbots, classification tools, and RAG systems. The role focuses on identifying hallucinations, validating model behaviour, and supporting improvements through structured testing and collaboration.

Key Responsibilities

Design and execute test cases to assess LLM accuracy, relevance, and contextual correctness.
Detect and analyse hallucinations or fabricated outputs, and document them clearly.
Develop automated test scripts (Python, PyTest or similar) to streamline LLM regression testing.
Conduct functional and non-functional testing, including performance and stress tests for LLM-based systems.
Evaluate model output quality using NLP metrics and business-specific correctness rules.
Collaborate with AI engineers, data scientists, and product teams to improve model behaviour based on test findings.
Perform regression testing after fine-tuning, retraining, or system updates to ensure no degradation in accuracy.
Maintain structured documentation: test plans, test cases, test logs, and issue reports.
Use issue tracking tools (e.g., Jira) to report and track LLM-related bugs and inconsistencies.
Apply knowledge of LLMs, NLP concepts, and cloud-based AI environments (AWS/GCP/Azure preferred) to support comprehensive QA coverage.

Requirements

Experience testing LLMs (e.g., GPT, BERT) for chatbots and conversational AI.
Proficiency in test automation (PyTest, custom AI frameworks) to detect inaccuracies and hallucinations.
Familiarity with accuracy evaluation methods for high-stakes NLP applications.
Understanding of AI/NLP testing methodologies, including hallucination and relevance testing.
Strong Python skills for writing test scripts and analysing model.
Ability to document and track issues using tools like Jira.
Strong problem-solving skills to propose improvements and reduce hallucinations.

By submitting your resume, you consent to the collection, use, and disclosure of your personal information per ScienTec’s Privacy Policy (scientecconsulting.com/privacy-policy).

Aloysius Tan Sheng Rong - R22110441

ScienTec Consulting Pte Ltd - 11C5781

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.