Working Hours: Monday – Thursday (8.30am – 6pm), Friday (8.30am – 5.30pm) (Hybrid working arrangement)
Working Location: Central
Salary Package: Basic + AWS
We are looking for a skilled and detail-oriented Gen AI Quality Engineer to join our team. In this role, you will be responsible for testing Large Language Models (LLMs) integrated into Gen AI applications like a RAG chatbot and film classification system.
Key Responsibilities
- Design and execute comprehensive test cases to evaluate the accuracy, reliability, and performance of LLMs integrated into Gen AI applications. This includes verifying that the model responses are relevant, contextually appropriate, and factually correct.
- Focus on detecting hallucinations where the model produces false or fabricated information, ensuring these are promptly reported and addressed by the development team. You will help refine the models to reduce these occurrences.
- Assess the accuracy of model outputs, especially in high-precision contexts like chatbot conversations or film classification, ensuring that LLMs produce responses that are both relevant and correct according to predefined business logic.
- Implement automated testing for common use cases, edge cases, and regression tests, especially focusing on cases that tend to trigger hallucinations or inaccuracies in the model’s responses.
- Functional & Non-Functional Testing: Evaluate the LLM's functionality in different scenarios to check if it meets the functional requirements. Also, perform non-functional testing like performance, load, and stress tests to assess the scalability of LLMs when handling high loads or multiple queries.
- Identify and document bugs related to hallucinations, inaccurate outputs, or unexpected model behaviors. Work closely with data scientists and developers to resolve issues, refine models, and ensure quality.
- Collaborate with AI researchers, engineers, and product teams to understand the nuances of model training, improve the models based on feedback, and suggest improvements based on test findings.
- Ensure that model updates, fine-tuning, or new training data do not introduce regressions or increase hallucinations and inaccuracies. Perform retesting of fixed issues and reassess model accuracy after updates.
- Maintain thorough test documentation, including test plans, test cases, test logs, and issue reports focused on LLM-specific concerns like hallucinations and inaccuracies.
Requirements
- Bachelor’s degree in Computer Science, Information Systems, or related field.
- Minimum 4 years of QA experience, with hands‑on exposure to LLM and Gen AI testing.