Enable job alerts via email!

Automation Quality Engineer (GenAI) - SPVL

SCIENTEC CONSULTING PTE. LTD.

Singapore

Hybrid

SGD 70,000 - 90,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading consulting firm in Singapore seeks a skilled Gen AI Quality Engineer to join their government sector team. You will design and execute tests for Large Language Models, focusing on accuracy and quality assurance in AI applications. The role involves implementing automated testing and collaborating with development teams to identify and resolve issues related to model performance. Ideal candidates will have strong experience with test automation tools, Python programming, and a robust understanding of LLM testing methodologies.

Qualifications

Strong experience in testing Large Language Models (LLMs) like GPT and BERT.
Expertise in test automation tools for catching inaccuracies.
Familiarity with testing accuracy in high-stakes applications.
Understanding AI testing methodologies for NLP tasks.
Strong programming skills in Python for developing test scripts.
Ability to report issues using tracking tools effectively.

Responsibilities

Design and execute test cases for LLM integrated applications.
Identify and report hallucinations during testing.
Assess the accuracy of model outputs in various contexts.
Implement automated testing for common use cases.
Evaluate LLM's functionality and perform non-functional testing.
Document bugs and collaborate with teams for resolution.
Ensure model updates do not introduce regressions.
Maintain test documentation focused on LLM-specific concerns.

Skills

Testing Large Language Models (LLMs)

Test automation tools (Selenium, PyTest)

Accuracy testing methodologies

AI testing methodologies

Programming in Python

Bug reporting and tracking

We are looking for a skilled and detail-oriented Gen AI Quality Engineer to join our team in government sector!

Opportunity to involve in testing LLM integrated into Gen AI applications like a RAG chatbot
Work location: Punggol (hybrid work arrangement)
Candidate with experience in automation tools using Python are welcome to apply

Key Responsibilities:

LLM Testing & Validation: Design and execute comprehensive test cases to evaluate the accuracy, reliability, and performance of LLMs integrated into Gen AI applications. This includes verifying that the model responses are relevant, contextually appropriate, and factually correct.
Hallucination Identification: Focus on detecting hallucinations where the model produces false or fabricated information, ensuring these are promptly reported and addressed by the development team. You will help refine the models to reduce these occurrences.
Accuracy & Quality Assurance: Assess the accuracy of model outputs, especially in high-precision contexts like chatbot conversations or film classification, ensuring that LLMs produce responses that are both relevant and correct according to predefined business logic.
Test Automation for LLMs: Implement automated testing for common use cases, edge cases, and regression tests, especially focusing on cases that tend to trigger hallucinations or inaccuracies in the model’s responses.
Functional & Non-Functional Testing: Evaluate the LLM's functionality in different scenarios to check if it meets the functional requirements. Also, perform non-functional testing like performance, load, and stress tests to assess the scalability of LLMs when handling high loads or multiple queries.
Bug Reporting & Issue Resolution: Identify and document bugs related to hallucinations, inaccurate outputs, or unexpected model behaviors. Work closely with data scientists and developers to resolve issues, refine models, and ensure quality.
Cross-Team Collaboration: Collaborate with AI researchers, engineers, and product teams to understand the nuances of model training, improve the models based on feedback, and suggest improvements based on test findings.
Regression Testing: Ensure that model updates, fine-tuning, or new training data do not introduce regressions or increase hallucinations and inaccuracies. Perform retesting of fixed issues and reassess model accuracy after updates.
LLM-Specific Test Documentation: Maintain thorough test documentation, including test plans, test cases, test logs, and issue reports focused on LLM-specific concerns like hallucinations and inaccuracies.

Required Skills & Qualifications:

Experience with LLMs: Strong experience in testing Large Language Models (LLMs), including models like GPT, BERT, and others used for chatbots and other conversational AI applications.
Test Automation Skills: Expertise in test automation tools (e.g., Selenium, PyTest, custom AI test frameworks) for automating LLM tests, especially those designed to catch inaccuracies or hallucinations in model outputs.
Accuracy Testing: Familiarity with methods and strategies to test accuracy in language models, particularly in high-stakes or mission-critical applications like chatbots and film classification.
AI Testing Methodologies: Understanding of AI-specific testing methodologies, including how to measure and test for accuracy, hallucinations, and relevant responses in natural language processing (NLP) tasks.
Programming Skills: Strong programming skills in languages like Python, particularly for developing test scripts and analyzing the results of LLMs.
Bug Reporting & Tracking: Ability to report issues related to hallucinations, accuracy, and other model-specific issues effectively using issue tracking tools (e.g., Jira).

By submitting your resume, you consent to the collection, use, and disclosure of your personal information per ScienTec’s Privacy Policy (scientecconsulting.com/privacy-policy).
Contact you about potential opportunities.
Delete personal data not required at this application stage.
To withdraw consent, email dpo@scientecconsulting.com.
All applications will be processed with strict confidence. Only shortlisted candidates will be contacted.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top companies

Popular jobs