PAY: $70-80/hour W2.Our company offers our consultants a suite of benefits after a qualification period including health, vision, dental, life and disability insurance.
100% remote role, no expectation of onsite work
W2 Candidates only
6+ month contract role
Manager Notes
- This role will be a blend of Manual & Automated Testing – being able to Automate the possible. The team uses Playwright, similar to Selenium. Python + PyTest, Jest Testing. Open to other toolsets.
- Seeking someone that can Lead / strategize for QA Testing vs simply execution.
- Role will develop comprehensive Testing strategy. Some knowledge/experience with GenAi idealOpen to functions that involves creativity around Testing methods. Legal/HR potentially mentioned as interesting domains, involving sensitive, protected datasets.
- Formulating the approach as well as execution, dive into code, as a true QA expert would be ideal.
- From a Risk / GenAI perspective, understanding Bias Checking of the GenAi tool
Description:- As the QA lead for LLM testing, will define and execute the technical vision and strategy for AI controls and testing.
- Responsibilities will include continuous monitoring, evaluation, and reporting of LLM features to ensure compliance with internal standards, best practices, and external regulations.
- Play a key role in risk assessment and mitigation, guiding the responsible development and deployment of LLMs.
- Will design and implement test cases for LLM governance and development, enabling your team to define features and mitigate risks.
- Develop tools, automation strategies, and data pipelines to support scalable LLM management.
- Create standardized reporting templates for both technical and senior leadership audiences, ensuring clear communication of results.
- Work will involve close collaboration with tool owners and senior management to present findings, assess risk implications, and propose enhancements to AI tools.
Responsibilities- Lead QA efforts for the platform, focusing on LLM output testing to ensure reliability, accuracy, and performance
- Develop and maintain comprehensive testing strategies, including semantic similarity, Q&A validation, claims verification, LLM judge evaluations, and metrics like ROUGE
- Collaborate with engineering, product, and data science teams to define testing requirements, thresholds, and standards
- Design and implement robust test cases aligned with business goals and user needs
- Write and maintain automated tests in Python using frameworks like pytest (prior experience with Opik is not required)
- Monitor and improve test stability to support application changes
- Establish and track QA KPIs, such as test coverage and stability, to measure and communicate platform quality
- Stay updated on industry best practices for GenAI/LLM testing and integrate them into QA processes
Qualifications- Strong experience in writing and maintaining Python code
- Familiarity with testing LLM outputs, including semantic similarity, Q&A validation, claims verification, LLM judges, and evaluation metrics like ROUGE
- Experience with automated testing tools (e.g., pytest); willingness to learn Opik if unfamiliar
- Proven ability to design and implement test strategies for complex systems
Who We Are:
The Fountain Group is a nationwide staffing firm with over 80 Fortune 100-500 clients. Since 2001, TFG has maintained a consistent standard of excellence, and our work is broadly recognized every year through numerous industry performance awards. Our success is a team effort.
Browse our website below for additional information on our company.
The Fountain Group
3407 W Martin Luther King Jr. Dr. Tampa, FL 33607
“We work in Life Sciences, Clinical, Engineering, IT, and more. Above all, we specialize in people.”
#LI-RM1