We are looking for a skilled and detail-oriented Gen AI Quality Engineer to join our team in government sector!
- Opportunity to involve in testing LLM integrated into Gen AI applications like a RAG chatbot
- Work location: Punggol (hybrid work arrangement)
- Candidate with experience in automation tools using Python are welcome to apply
Key Responsibilities:
- LLM Testing & Validation: Design and execute comprehensive test cases to evaluate the accuracy, reliability, and performance of LLMs integrated into Gen AI applications. This includes verifying that the model responses are relevant, contextually appropriate, and factually correct.
- Hallucination Identification: Focus on detecting hallucinations where the model produces false or fabricated information, ensuring these are promptly reported and addressed by the development team. You will help refine the models to reduce these occurrences.
- Accuracy & Quality Assurance: Assess the accuracy of model outputs, especially in high-precision contexts like chatbot conversations or film classification, ensuring that LLMs produce responses that are both relevant and correct according to predefined business logic.
- Test Automation for LLMs: Implement automated testing for common use cases, edge cases, and regression tests, especially focusing on cases that tend to trigger hallucinations or inaccuracies in the model’s responses.
- Functional & Non-Functional Testing: Evaluate the LLM's functionality in different scenarios to check if it meets the functional requirements. Also, perform non-functional testing like performance, load, and stress tests to assess the scalability of LLMs when handling high loads or multiple queries.
- Bug Reporting & Issue Resolution: Identify and document bugs related to hallucinations, inaccurate outputs, or unexpected model behaviors. Work closely with data scientists and developers to resolve issues, refine models, and ensure quality.
- Cross-Team Collaboration: Collaborate with AI researchers, engineers, and product teams to understand the nuances of model training, improve the models based on feedback, and suggest improvements based on test findings.
- Regression Testing: Ensure that model updates, fine-tuning, or new training data do not introduce regressions or increase hallucinations and inaccuracies. Perform retesting of fixed issues and reassess model accuracy after updates.
- LLM-Specific Test Documentation: Maintain thorough test documentation, including test plans, test cases, test logs, and issue reports focused on LLM-specific concerns like hallucinations and inaccuracies.
Required Skills & Qualifications:
- Experience with LLMs: Strong experience in testing Large Language Models (LLMs), including models like GPT, BERT, and others used for chatbots and other conversational AI applications.
- Test Automation Skills: Expertise in test automation tools (e.g., Selenium, PyTest, custom AI test frameworks) for automating LLM tests, especially those designed to catch inaccuracies or hallucinations in model outputs.
- Accuracy Testing: Familiarity with methods and strategies to test accuracy in language models, particularly in high-stakes or mission-critical applications like chatbots and film classification.
- AI Testing Methodologies: Understanding of AI-specific testing methodologies, including how to measure and test for accuracy, hallucinations, and relevant responses in natural language processing (NLP) tasks.
- Programming Skills: Strong programming skills in languages like Python, particularly for developing test scripts and analyzing the results of LLMs.
- Bug Reporting & Tracking: Ability to report issues related to hallucinations, accuracy, and other model-specific issues effectively using issue tracking tools (e.g., Jira).
By submitting your resume, you consent to the collection, use, and disclosure of your personal information per ScienTec’s Privacy Policy (scientecconsulting.com/privacy-policy).
Contact you about potential opportunities.
Delete personal data not required at this application stage.
To withdraw consent, email dpo@scientecconsulting.com.
All applications will be processed with strict confidence. Only shortlisted candidates will be contacted.