This position is within a project with one of the foundational LLM companies. The goal is to assist these foundational LLM companies in enhancing their Large Language Models.
One way we help these companies improve their models is by providing them with high-quality proprietary data. This data serves two main purposes: first, as a basis for fine-tuning their models, and second, as an evaluation set to benchmark the performance of their models or competitor models.
For example, in the case of Agent Completion (AC) data generation, your task will be to simulate high-quality multi-turn conversations between a user and a smart assistant that utilizes function-calling tools to accomplish user goals. You will craft these dialogues by playing both the assistant and the user, while simulating tool use where necessary to guide the assistant through complex decision-making and real‑world reasoning scenarios.