Our team is comprised of very talented individuals who are passionate about LLM and ensuring Apple services are at their best. As part of the Human-Centered AI team, you'll play a central role in enhancing the user experience.
Responsibilities include:
- Collaborate with Engineering, Products, Research, Operations, and Editorial teams to evaluate algorithms and AI models powering various features, identifying opportunities for improvement.
- Build data products (feature datasets, analyses, models, etc.) and scalable tools (typically in Python or Scala) to drive hypothesis generation and support collaborative decision-making with our partner teams in engineering and product management.
- Create structured evaluations to assess the quality of AI-generated responses, ensuring they align with company standards and customer expectations.
- Create evaluation task design and guidelines; identify a relevant data annotation platform to run evaluations at scale.
- Implement metrics to measure the effectiveness and accuracy of models to ensure they meet performance standards.
- Establish data quality thresholds and reporting on metrics & insights to inform feature business decisions.
- Monitor LLM performance in production environments through human evaluations, identifying trends, and raising alerts when quality degradation occurs.
- Perform detailed failure analysis to understand model weaknesses and identify areas for improvement, offering actionable insights to engineers
- Maintain high standards for data quality and continuously enhance processes based on both quantitative and qualitative feedback
Minimum Qualifications
- Experience with machine learning concepts, including model evaluation metrics, and data analysis. Proven data analysis expertise using SQL, Python, and Tableau to deliver actionable insights
- Experience with Large Language Models and evaluation techniques
- Fluency in English reading, writing and comprehension skills to partner with international teams
- Fluent in another language then English (reading, writing, and comprehension skills) to support the language specific market
- Expert linguistic and cultural skills for a non English market to accurately represent user experience in early development cycles
Preferred Qualifications
- Ability to analyze complex issues, and identify potential problems with LLM outputs to improve quality with keen attention to detail
- Effective collaboration with cross-functional teams to define ML/LLM evaluation requirements
- Experience crafting, conducting, analyzing, and interpreting experiments and investigations
- Excellent communication skills
- Flexibility to work early morning or late night shift patterns required